String Manipulation in Python

Strings are everywhere—usernames, emails, URLs, logs, configs, chat messages. This guide goes beyond basics and shows how to search, slice, clean, format, transform, compare, and validate strings like a pro. We’ll also cover Unicode gotchas and performance tips.

String Manipulation in Python
String Manipulation in Python

1) What exactly is a Python string?

  • A string is an immutable sequence of Unicode characters.
  • “Immutable” means you can’t change a string in place—every change creates a new string.
  • Create with quotes: '...', "...", triple quotes for multi-line, and raw strings for patterns: r"\d+\s\w+".
s = "SmartTejas\nPython"
raw = r"SmartTejas\nPython"   # backslash not interpreted
print(s)   # newline
print(raw) # literally \n

2) Immutability & building strings efficiently

Avoid repeated += in loops—create a list and "".join(...) at the end.

words = ["clean", "fast", "robust"]
# Bad: creates many temporary strings
t = ""
for w in words:
    t += w + " "
# Good:
t = " ".join(words)

When concatenating thousands of small pieces (e.g., logs), join (or io.StringIO) is much faster and memory-friendly.


3) Indexing & slicing (master these!)

Strings are sequences, so indexing and slicing work like lists.

s = "automation"
print(s[0])     # 'a'
print(s[-1])    # 'n'
print(s[1:5])   # 'utom'
print(s[:4])    # 'auto'
print(s[4:])    # 'mation'
print(s[::2])   # 'aiaon' (step 2)
print(s[::-1])  # 'noitamotua' (reverse)

Remember: no slice assignment (immutable), so s[0] = 'A' is an error.


4) Searching & membership

Quick membership:

"url" in "open_url?next=/home"      # True
"login" not in "open_url?next=/home"  # True

Precise methods:

text = "Order #123 arrived. Order #124 shipped."
text.find("Order")      # 0 (first index) or -1 if missing
text.rfind("Order")     # 19 (last index)
text.count("Order")     # 2

# Safer case-insensitive search:
needle = "order"
text_lower = text.casefold()
needle_lower = needle.casefold()
idx = text_lower.find(needle_lower)

index()/rindex() behave like find()/rfind() but raise ValueError if not found.

startswith() / endswith() accept tuples for multiple options:

filename = "report.csv"
filename.endswith((".csv", ".txt"))  # True

5) Cleaning strings (strip, split, replace, join)

Trimming:

s = "  hello\n"
print(s.strip())   # "hello"
print(s.lstrip())  # "hello\n"
print(s.rstrip())  # "  hello"
# Strip a set of characters:
"---data--".strip("-")  # "data"

Splitting & joining:

csv = "a,b,c,,d"
print(csv.split(","))         # ['a','b','c','','d']
print("path/to/file".split("/", 1))  # ['path', 'to/file']

parts = ["id", "name", "email"]
print(",".join(parts))        # 'id,name,email'

# Normalize all whitespace to single spaces:
msg = "Too   many\tspaces\nhere"
clean = " ".join(msg.split())   # "Too many spaces here"

Replacing:

"2025-08-17".replace("-", "/")  # "2025/08/17"

Also Read: Lists in Python


6) Case operations (and the right way to compare)

s = "PyThOn"
s.lower()       # 'python'
s.upper()       # 'PYTHON'
s.capitalize()  # 'Python'
s.title()       # 'Python'
s.swapcase()    # 'pYtHoN'

For international, case-insensitive comparisons, prefer casefold() over lower():

a = "straße"
b = "STRASSE"
a.casefold() == b.casefold()   # True

7) Validation helpers (is… methods)

"abc".isalpha()      # True
"123".isdigit()      # True
"१२३".isdigit()      # True (Devanagari digits)
"123".isdecimal()    # True (ASCII decimal digits only)
"123".isnumeric()    # True (includes other numeric chars)
"abc123".isalnum()   # True
"   ".isspace()      # True
"Hello".isascii()    # True

Choose the right checker depending on your input (e.g., isdecimal() if you only want 0–9).


8) Alignment & padding

"42".zfill(5)         # '00042'
"done".rjust(10, '.') # '......done'
"done".ljust(10, '-') # 'done------'
"done".center(10, '*')# '***done***'

Great for console tables or fixed-width exports.


9) Formatting strings (f-strings, format specs)

f-strings (recommended)

name = "Tejas"
score = 91.789
print(f"Hello {name}, score: {score:.2f}")  # Hello Tejas, score: 91.79

Format specifiers you’ll use a lot:

  • Numbers: :.2f (2 decimal places), :, (thousands), :06d (zero padding)
  • Alignment: :<20, :>20, :^20
  • Percent: {pct:.1%}
n = 1234567
print(f"{n:,}")         # 1,234,567
print(f"{42:06d}")      # 000042
print(f"[{ 'center':^10 }]")  # [  center  ]

Legacy (still seen): str.format() and % formatting.


10) Advanced transformations (translate, maketrans)

translate() is blazing fast for character-level mapping or removal:

import string

s = "Hello, World!"
# Remove punctuation
tbl = str.maketrans("", "", string.punctuation)
print(s.translate(tbl))   # 'Hello World'

# Map characters
tbl = str.maketrans({"H": "J", "W": "V"})
print(s.translate(tbl))   # 'Jello, Vorld!'

This is often faster and cleaner than multiple replace() calls.


11) Unicode pitfalls & normalization (important!)

Visually identical text can be encoded differently. é may be one code point (\u00E9) or e + combining accent.

Use unicodedata.normalize to compare reliably:

import unicodedata as ud

a = "é"              # single code point
b = "e\u0301"        # 'e' + combining acute
print(a == b)        # False without normalization

def norm(s): return ud.normalize("NFC", s)
print(norm(a) == norm(b))  # True

Tip: For user-input deduplication, normalize + casefold.


12) Bytes vs str (encoding/decoding)

  • str = text (Unicode)
  • bytes = raw bytes (e.g., network, files)
text = "नमस्ते"
data = text.encode("utf-8")           # bytes
restored = data.decode("utf-8")       # back to str
safe = text.encode("ascii", "ignore") # drop non-ASCII

Use the right encoding for APIs/files (UTF-8 is the safe default today).


13) Split variants you should know

s = "line1\nline2\r\nline3"
s.splitlines()         # ['line1', 'line2', 'line3']
s.splitlines(keepends=True)  # keeps '\n', '\r\n'

path = "C:\\work\\docs"
path.rsplit("\\", 1)   # split from the right: ['C:\\work', 'docs']

"key=value=more".partition("=")   # ('key', '=', 'value=more')
"key=value=more".rpartition("=")  # ('key=value', '=', 'more')

partition/rpartition are safer than split("=", 1) because they always return a 3-tuple.


14) Quick regex toolbox (when plain methods aren’t enough)

Regular expressions power complex matching/replacements. Use them when simple methods won’t do.

import re

email = "user.name+tag@gmail.com"
bool(re.fullmatch(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}", email))  # True

text = "IDs: A-12, B-7, C-999"
print(re.findall(r"[A-Z]-\d+", text))  # ['A-12', 'B-7', 'C-999']

masked = re.sub(r"\b(\d{6})(\d{4})\b", r"\1****", "Card: 1234567890")  # Card: 123456****

Tips

  • Precompile for reuse: pat = re.compile(...)
  • Prefer non-regex methods for simple tasks (faster, clearer).

15) Performance tips & best practices

  • Use join to concatenate many pieces.
  • For heavy appends, consider io.StringIO.
  • Prefer startswith/endswith over regex for prefix/suffix checks.
  • Use casefold() + normalize() for robust comparisons.
  • Use translate() for character-class removals (punctuation, accents).
  • Avoid is for string equality—use ==.

16) Mini tasks (practical, copy-paste ready)

A) URL slug generator (no regex)

import string

def slugify(title: str) -> str:
    title = title.strip().casefold()
    # Keep letters, digits, and spaces; remove punctuation
    tbl = str.maketrans("", "", string.punctuation)
    title = title.translate(tbl)
    # Collapse whitespace and replace with '-'
    parts = title.split()
    return "-".join(parts)

print(slugify("  Intro to Python: Strings & Unicode!  "))  # intro-to-python-strings-unicode

B) Normalize and compare user names (Unicode-safe)

import unicodedata as ud

def norm_text(s: str) -> str:
    return ud.normalize("NFC", s).casefold().strip()

print(norm_text("Straße") == norm_text("STRASSE"))  # True

C) Clean CSV-ish line into list

def parse_line(line: str):
    # remove surrounding spaces, collapse inner whitespace, split by commas
    fields = [ " ".join(part.split()) for part in line.strip().split(",") ]
    return fields

print(parse_line("  Alice ,  91  ,  Delhi "))  # ['Alice', '91', 'Delhi']

D) Mask email (regex + simple groups)

import re

def mask_email(email: str) -> str:
    # keep first char, mask rest before @
    return re.sub(r"(^.)(.*)(@.*$)", r"\1***\3", email)

print(mask_email("tejas.smart@example.com"))  # t***@example.com

17) Common pitfalls

  • s.replace('a','b') doesn’t change s—strings are immutable. Do: s = s.replace('a','b').
  • split() without args splits on any whitespace and collapses multiples.
  • title() is not linguistically perfect (e.g., “O’Neil”).
  • Don’t use is to compare string values: a == b, not a is b.

18) Quick reference (methods you’ll use most)

  • Trim/clean: strip, lstrip, rstrip, replace, split, rsplit, splitlines, " ".join(...)
  • Search: in, find, rfind, count, startswith, endswith
  • Case: lower, upper, capitalize, title, swapcase, casefold
  • Test: isalpha, isdigit, isdecimal, isnumeric, isalnum, isspace, isascii
  • Align/pad: zfill, ljust, rjust, center
  • Advanced: translate, maketrans
  • Formatting: f-strings and format specs

FAQs – String Manipulation in Python

  • find() returns -1 if the substring is not found.

  • index() raises an error if the substring is not found.

Use the strip() method:

text = " hello "
print(text.strip()) # Output: hello

F-strings allow easy string formatting:

name = "Tejas"
age = 25
print(f"My name is {name} and I am {age} years old.")

You can reverse using slicing:

text = "Python"
print(text[::-1]) # nohtyP

What’s Next?

In the next post, we’ll learn about the File Handling in Python

Spread the love

Leave a Comment

Your email address will not be published. Required fields are marked *

Translate »
Scroll to Top