Table of Contents
Python Deep Dive
How to Count Lines in Python: 7 Methods with Performance Benchmarks
Complete Python guide to counting lines in files, strings, directories, and code with benchmark-backed methods, large-file strategies, and a production-ready CLI.
Counting lines in Python sounds easy until the requirement becomes specific. Do you need every line in a file, only non-empty lines, only lines that match a pattern, or actual lines of code with comments removed? Do you need the answer for one file, a whole directory tree, or a 10GB log that should not be loaded into memory? Those cases all look similar at first, but the right implementation changes quickly.
This guide covers the full range. You will see seven different ways to count file lines, the tradeoffs between readability and raw speed, the safest way to count lines in strings, how to build a practical code line counter, and how to package everything into a reusable CLI. The benchmark table is meant as a directional comparison for the test setup described below, not as a promise that your machine will show the exact same timings.
If you only need a quick answer, start with sum(1 for _ in f) for files and splitlines() for strings. If you are working with GB-scale files or want a wc -l equivalent, skip straight to the large-file and CLI sections. If you do not want to write any Python at all, use the browser-based Line Counter instead.
Quick Answer
Quick Answer
Count lines in a file
Count lines in a string
Count non-empty lines
No-code option
Paste your text or upload a file to the free Line Counter.
Quick CTA
Just need a quick line count without writing code? Paste your text or upload a file to the free online Line Counter for instant results. If you want to convert comma-separated values into one-item-per-line text before counting, use the Text to Lines tool.
Python Line Counting Methods - Comparison and Benchmarks
Not all line-counting methods are equal. Some are easy to read but memory-hungry. Others are much faster on large files because they avoid Python text decoding and count raw newline bytes directly. The table below compares the common approaches using the benchmark profile in this guide: a 2024 MacBook Pro M3, averaged over 10 runs. Treat the timings as directional rather than universal.
Benchmark file sizes used in the comparison:
- Small file: 1MB, about 50,000 lines
- Medium file: 100MB, about 5,000,000 lines
- Large file: 1GB, about 50,000,000 lines
- Extra-large file: 10GB, about 500,000,000 lines
| Method | 1MB | 100MB | 1GB | 10GB | Peak memory | Best use case |
|---|---|---|---|---|---|---|
readlines() | 8ms | 380ms | 3.8s | OOM | about file size | Small files when you need line content |
sum(1 for _ in f) | 12ms | 420ms | 4.2s | 42s | about 1MB | Best general-purpose choice |
read().splitlines() | 9ms | 360ms | 3.6s | OOM | about file size | Need a list of all lines |
| Binary chunk reading | 4ms | 180ms | 1.8s | 18s | about 1MB | Fastest pure Python method |
mmap | 5ms | 210ms | 2.1s | 21s | about 1MB | Large files plus random access workflows |
fileinput | 15ms | 480ms | 4.8s | 48s | about 1MB | Multiple files and stdin |
subprocess wc -l | 2ms | 80ms | 0.8s | 8s | about 1MB | Fastest on Linux and macOS |
Recommended methods by use case
- General-purpose scripting:
sum(1 for _ in f) - Very large files: binary chunk reading
- Linux and macOS automation:
subprocesswithwc -l - Many files at once:
fileinputorpathlib.Path.rglob() - Need the actual lines too:
readlines()orsplitlines()on smaller inputs
How to Count Lines in a File in Python - 7 Methods
Method 1 - readlines() - Simple
If you want the shortest possible introduction to python count lines in file, readlines() is the most literal answer.
with open("file.txt", "r", encoding="utf-8") as f:
lines = f.readlines()
count = len(lines)
print(f"Line count: {count}")
One-line version:
count = len(open("file.txt", encoding="utf-8").readlines())
Why it works:
readlines()loads the full file into a listlen()returns the list length- the code is easy to read even for beginners
Use it when:
- the file is small
- you also need the line content
- clarity matters more than scalability
Avoid it when:
- the file may be hundreds of MB or larger
- you are writing a reusable production utility
- peak memory matters
Method 2 - sum(1 for _ in f) - Recommended
For most scripts, this is the best default.
with open("file.txt", "r", encoding="utf-8") as f:
count = sum(1 for _ in f)
print(f"Line count: {count}")
With explicit error handling:
def count_lines(filepath: str) -> int:
try:
with open(
filepath,
"r",
encoding="utf-8",
errors="replace",
) as f:
return sum(1 for _ in f)
except FileNotFoundError as exc:
raise FileNotFoundError(f"File not found: {filepath}") from exc
except PermissionError as exc:
raise PermissionError(f"Permission denied: {filepath}") from exc
This approach is strong because:
- the file object streams one line at a time
- memory usage stays effectively constant
- the intent is obvious
- the result matches Python's notion of logical lines, including a final line without a trailing newline
Why use _ as the variable name?
In Python, _ is a convention for "I do not need this value." The expression sum(1 for _ in f) means "read each line, add one, and do not store the line content."
Method 3 - read().splitlines() - String-based
This method is convenient when you also want the file as a single string in memory.
with open("file.txt", "r", encoding="utf-8") as f:
count = len(f.read().splitlines())
splitlines() is stronger than split("\n"):
text = "line1\nline2\nline3\n"
print(text.split("\n"))
# ['line1', 'line2', 'line3', '']
print(text.splitlines())
# ['line1', 'line2', 'line3']
It also handles mixed newline styles:
text = "line1\r\nline2\nline3\r"
print(text.splitlines())
# ['line1', 'line2', 'line3']
Use this when you need:
- an in-memory string for later parsing
- robust newline handling
- a clean answer for strings and small files alike
Avoid it when memory is tight.
Method 4 - Binary chunk reading - Fastest
When you need pure Python speed for large files, counting newline bytes in binary chunks is usually the fastest approach.
def count_lines_fast(filepath: str, chunk_size: int = 1 << 20) -> int:
"""
Count lines by reading binary chunks.
- Constant memory usage
- No text decoding overhead
- Handles files without a trailing newline
"""
count = 0
last_byte = b""
with open(filepath, "rb") as f:
for chunk in iter(lambda: f.read(chunk_size), b""):
count += chunk.count(b"\n")
last_byte = chunk[-1:]
if last_byte and last_byte != b"\n":
count += 1
return count
This is fast because:
- binary mode skips text decoding
bytes.count()is implemented in C- chunk size keeps memory usage predictable
The tradeoff is semantic: this method counts newline bytes and then adjusts for the last line if the file does not end with \n. That makes it accurate for line counting, but it is no longer identical to wc -l. The wc -l section below explains that distinction in detail.
Method 5 - mmap - Memory-mapped
mmap is useful when the file is large and you also care about random access patterns elsewhere in your workflow.
import mmap
def count_lines_mmap(filepath: str) -> int:
with open(filepath, "rb") as f:
f.seek(0, 2)
size = f.tell()
if size == 0:
return 0
f.seek(0)
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
return sum(1 for _ in iter(mm.readline, b""))
Why people use it:
- the OS pages data in on demand
- you can mix counting with other random-access operations
- it performs well on large files
Caveats:
- Windows cannot memory-map an empty file, so always guard for zero size
- 32-bit Python environments have practical size limits
mmapis a power tool, not a default choice
Method 6 - fileinput module
The fileinput module is tailor-made for line-oriented processing across one or many files.
import fileinput
with fileinput.input(files=["file1.txt", "file2.txt", "file3.txt"]) as f:
count = sum(1 for _ in f)
print(f"Total lines across all files: {count}")
It also works with stdin:
import fileinput
with fileinput.input() as f:
count = sum(1 for _ in f)
print(count)
Per-file counts are easy too:
from collections import defaultdict
import fileinput
file_counts = defaultdict(int)
with fileinput.input(files=["a.txt", "b.txt", "c.txt"]) as f:
for _line in f:
file_counts[f.filename()] += 1
for filename, count in file_counts.items():
print(f"{count:>8,} {filename}")
Use it when:
- your script naturally accepts multiple files
- you want stdin support
- you need access to the current filename while iterating
Method 7 - subprocess wc -l - System call
On Unix-like systems, wc -l is hard to beat for raw speed.
import subprocess
import sys
def count_lines_wc(filepath: str) -> int:
if sys.platform == "win32":
raise OSError("wc -l is not available on Windows")
result = subprocess.run(
["wc", "-l", filepath],
capture_output=True,
text=True,
check=True,
)
return int(result.stdout.strip().split()[0])
Safe fallback wrapper:
def count_lines_cross_platform(filepath: str) -> int:
if sys.platform != "win32":
try:
return count_lines_wc(filepath)
except (subprocess.SubprocessError, OSError, ValueError):
pass
return count_lines_fast(filepath)
Important note: wc -l counts newline characters, not logical lines. If the last line does not end with \n, wc -l will be one lower than Python line iteration. That is not a bug. It is a different definition.
How to Count Lines in a String in Python
Sometimes the data is already in memory. In that case, the best answer is usually splitlines(), not file iteration and not a raw newline counter.
The safest answer is splitlines()
def count_lines_in_string(text: str) -> int:
return len(text.splitlines())
assert count_lines_in_string("a\nb\nc") == 3
assert count_lines_in_string("a\nb\nc\n") == 3
assert count_lines_in_string("") == 0
assert count_lines_in_string("\n") == 1
count("\n") + 1 versus splitlines()
text.count("\n") + 1 is common, but it breaks on empty strings and trailing newlines.
| Method | "a\nb\nc\n" | "a\nb\nc" | "" |
|---|---|---|---|
text.count("\n") + 1 | 4 | 3 | 1 |
len(text.split("\n")) | 4 | 3 | 1 |
len(text.splitlines()) | 3 | 3 | 0 |
If you specifically want the number of newline characters, use text.count("\n"). If you want the number of lines a human would see, use splitlines().
Multiline strings and triple-quoted text
Triple-quoted strings often include leading and trailing blank lines that are easy to forget.
text = """
First line
Second line
Third line
"""
print(len(text.splitlines()))
# 5
print(len(text.strip().splitlines()))
# 3
print(sum(1 for line in text.splitlines() if line.strip()))
# 3
Mixed newline styles
splitlines() is robust across newline formats.
mixed = "line1\nline2\r\nline3\rline4"
print(len(mixed.splitlines()))
# 4
It handles \n, \r\n, \r, and several Unicode line separators without extra logic.
How to Count Lines of Code in Python
Counting lines of code is not the same as counting all lines. In code metrics, you usually want to exclude blank lines and comment-only lines. Sometimes you also want to treat docstrings as comments. The right rule depends on why you are measuring.
Basic LOC counting
This version counts total lines, code lines, blank lines, and comment lines.
def count_loc(filepath: str) -> dict[str, int]:
total = 0
code = 0
blank = 0
comment = 0
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
for line in f:
total += 1
stripped = line.strip()
if not stripped:
blank += 1
elif stripped.startswith("#"):
comment += 1
else:
code += 1
return {
"total": total,
"code": code,
"blank": blank,
"comment": comment,
}
Use this for:
- simple project reports
- single-language scripts
- quick CI summaries
Handling docstrings and multiline strings
If you want a deeper Python LOC counter, you need to decide how multiline strings should count.
def count_loc_advanced(filepath: str) -> dict[str, int | float]:
total = 0
code = 0
blank = 0
comment = 0
in_multiline_string = False
multiline_delimiter = ""
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
for line in f:
total += 1
stripped = line.strip()
if not stripped:
blank += 1
continue
if in_multiline_string:
comment += 1
if multiline_delimiter in stripped:
in_multiline_string = False
continue
if stripped.startswith("#"):
comment += 1
continue
started_multiline = False
for quote in ('"""', "'''"):
if stripped.startswith(quote):
if stripped.count(quote) == 1:
in_multiline_string = True
multiline_delimiter = quote
comment += 1
started_multiline = True
else:
code += 1
started_multiline = True
break
if not started_multiline:
code += 1
return {
"total": total,
"code": code,
"blank": blank,
"comment": comment,
"comment_ratio": round(comment / total * 100, 1) if total else 0.0,
}
This still is not a perfect parser. It is a pragmatic heuristic. If you need precise multi-language code metrics, use a dedicated tool.
Multi-language comment rules
If your directory contains more than Python, centralize comment styles.
from pathlib import Path
COMMENT_STYLES = {
".py": {"single": "#", "multi": ('"""', "'''")},
".js": {"single": "//", "multi": ("/*", "*/")},
".ts": {"single": "//", "multi": ("/*", "*/")},
".java": {"single": "//", "multi": ("/*", "*/")},
".c": {"single": "//", "multi": ("/*", "*/")},
".cpp": {"single": "//", "multi": ("/*", "*/")},
".go": {"single": "//", "multi": ("/*", "*/")},
".rb": {"single": "#", "multi": ("=begin", "=end")},
".sh": {"single": "#", "multi": None},
".sql": {"single": "--", "multi": ("/*", "*/")},
}
def get_comment_style(filepath: str) -> dict[str, object]:
ext = Path(filepath).suffix.lower()
return COMMENT_STYLES.get(ext, {"single": "#", "multi": None})
When to use cloc instead
If this is for real reporting, prefer cloc.
import json
import subprocess
def count_loc_cloc(path: str) -> dict[str, int]:
result = subprocess.run(
["cloc", "--json", path],
capture_output=True,
text=True,
check=True,
)
data = json.loads(result.stdout)
return data.get("SUM", {})
Use cloc when accuracy matters
- Multi-language repositories
- Formal engineering metrics or reporting
- Docstrings, block comments, and generated files need consistent handling
Count Lines in Multiple Files and Directories
Count a fixed list of files
def count_lines_multiple(filepaths: list[str]) -> dict[str, int | str]:
results: dict[str, int | str] = {}
total = 0
for filepath in filepaths:
try:
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
count = sum(1 for _ in f)
results[filepath] = count
total += count
except (FileNotFoundError, PermissionError, OSError) as exc:
results[filepath] = f"Error: {exc}"
results["__total__"] = total
return results
Recursively count files in a directory
from pathlib import Path
def count_lines_directory(
directory: str,
pattern: str = "*.py",
recursive: bool = True,
exclude_dirs: list[str] | None = None,
) -> dict[str, object]:
exclude_dirs = exclude_dirs or [
".git",
"__pycache__",
"node_modules",
".venv",
"venv",
"dist",
"build",
]
root = Path(directory)
glob_fn = root.rglob if recursive else root.glob
files: dict[str, int | str] = {}
total_lines = 0
total_files = 0
errors = 0
for filepath in sorted(glob_fn(pattern)):
if any(part in exclude_dirs for part in filepath.parts):
continue
if not filepath.is_file():
continue
try:
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
count = sum(1 for _ in f)
files[str(filepath)] = count
total_lines += count
total_files += 1
except OSError as exc:
files[str(filepath)] = f"Error: {exc}"
errors += 1
return {
"files": files,
"summary": {
"total_files": total_files,
"total_lines": total_lines,
"errors": errors,
"average_lines": round(total_lines / total_files, 1) if total_files else 0,
},
}
This pattern is ideal for python count lines in multiple files and python count lines of code in directory style tasks. It is also a good base for CI checks.
Parallel counting for large file sets
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path
def count_file_lines(filepath: Path) -> tuple[str, int]:
with filepath.open("r", encoding="utf-8", errors="replace") as f:
return str(filepath), sum(1 for _ in f)
def count_lines_parallel(
directory: str,
pattern: str = "*.py",
max_workers: int = 8,
) -> dict[str, int]:
root = Path(directory)
filepaths = [path for path in root.rglob(pattern) if path.is_file()]
results: dict[str, int] = {}
with ThreadPoolExecutor(max_workers=max_workers) as executor:
for filepath, count in executor.map(count_file_lines, filepaths):
results[filepath] = count
return results
Because line counting is I/O-heavy, threads work well here. If your bottleneck becomes parsing rather than file reads, revisit the strategy.
Counting Lines in Large Files (GB-Scale) in Python
Large-file workflows break the naive methods first. readlines() and read().splitlines() both scale memory usage with file size, which is exactly what you do not want on log archives, data exports, or multi-GB dumps.
A solid default for GB-scale files
import os
def count_lines_optimal(filepath: str, chunk_size: int = 1 << 23) -> int:
"""
8MB binary chunk counter.
- Constant memory
- Fast on SSDs
- Counts the final line if no trailing newline exists
"""
if os.path.getsize(filepath) == 0:
return 0
count = 0
last_byte = b""
with open(filepath, "rb") as f:
for chunk in iter(lambda: f.read(chunk_size), b""):
count += chunk.count(b"\n")
last_byte = chunk[-1:]
if last_byte != b"\n":
count += 1
return count
On fast local storage, this pattern is usually the best cross-platform answer to python count lines fast large file.
Progress reporting for long-running counts
import os
def count_lines_with_progress(filepath: str, chunk_size: int = 1 << 23) -> int:
file_size = os.path.getsize(filepath)
processed = 0
count = 0
last_byte = b""
with open(filepath, "rb") as f:
for chunk in iter(lambda: f.read(chunk_size), b""):
count += chunk.count(b"\n")
processed += len(chunk)
last_byte = chunk[-1:]
progress = processed / file_size * 100 if file_size else 100
print(
f"\rProgress: {progress:5.1f}% "
f"({processed / 1e9:.2f}GB / {file_size / 1e9:.2f}GB)",
end="",
flush=True,
)
if last_byte and last_byte != b"\n":
count += 1
print()
return count
If your team already uses tqdm, wrap the same chunk loop in a progress bar instead of printing manually.
Multi-core counting for extreme cases
import multiprocessing as mp
import os
from functools import partial
def count_chunk(filepath: str, start: int, size: int) -> int:
with open(filepath, "rb") as f:
f.seek(start)
return f.read(size).count(b"\n")
def count_lines_multicore(filepath: str, num_workers: int | None = None) -> int:
if num_workers is None:
num_workers = mp.cpu_count()
file_size = os.path.getsize(filepath)
if file_size == 0:
return 0
chunk_size = file_size // num_workers
chunks = [
(
i * chunk_size,
chunk_size if i < num_workers - 1 else file_size - i * chunk_size,
)
for i in range(num_workers)
]
worker = partial(count_chunk, filepath)
with mp.Pool(num_workers) as pool:
counts = pool.starmap(worker, chunks)
total = sum(counts)
with open(filepath, "rb") as f:
f.seek(-1, os.SEEK_END)
if f.read(1) != b"\n":
total += 1
return total
Multi-core counting only pays off when:
- the file is extremely large
- storage is fast enough to keep multiple workers busy
- CPU overhead is worth the operational complexity
For many real systems, wc -l or single-process binary chunk reading is the better engineering tradeoff.
Large-file benchmark summary
| Method | Time on 10GB | CPU cores | Peak memory |
|---|---|---|---|
sum(1 for _ in f) | 42s | 1 | about 1MB |
| Binary chunks (8MB) | 18s | 1 | about 8MB |
mmap | 21s | 1 | about 1MB |
subprocess wc -l | 8s | 1 | about 1MB |
| Multicore | 5s | 8 | about 8MB |
In practice:
- Linux and macOS:
wc -lis usually the fastest simple option - Cross-platform Python: binary chunks are the best default
- Peak performance with engineering overhead: multicore counting
Count Lines by Condition in Python
Count non-empty lines
with open("file.txt", "r", encoding="utf-8", errors="replace") as f:
non_empty = sum(1 for line in f if line.strip())
with open("file.txt", "r", encoding="utf-8", errors="replace") as f:
blank = sum(1 for line in f if not line.strip())
Count lines containing a specific word
def count_lines_containing(
filepath: str,
keyword: str,
case_sensitive: bool = True,
) -> int:
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
if case_sensitive:
return sum(1 for line in f if keyword in line)
keyword_lower = keyword.lower()
return sum(1 for line in f if keyword_lower in line.lower())
Regex version:
import re
def count_lines_matching(filepath: str, pattern: str, flags: int = 0) -> int:
regex = re.compile(pattern, flags)
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
return sum(1 for line in f if regex.search(line))
Count by line length
def count_lines_by_length(
filepath: str,
min_len: int = 0,
max_len: int | None = None,
) -> int:
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
return sum(
1
for line in f
if min_len <= len(line.rstrip("\n"))
and (max_len is None or len(line.rstrip("\n")) <= max_len)
)
Count CSV rows matching a condition
import csv
def count_csv_rows_by_condition(
filepath: str,
column: str,
value: str,
has_header: bool = True,
) -> int:
count = 0
with open(filepath, "r", encoding="utf-8", newline="") as f:
reader = csv.DictReader(f) if has_header else csv.reader(f)
for row in reader:
if has_header:
if row.get(column) == value:
count += 1
else:
column_index = int(column)
if len(row) > column_index and row[column_index] == value:
count += 1
return count
For very large CSV files with column logic, a chunked pandas.read_csv() workflow can be more maintainable than hand-written loops.
Build a Python Line Counter CLI Tool
If you want a reusable python line counter utility, package the common patterns into one CLI. The script below behaves like a friendly cross-platform wc -l with extra controls.
Complete CLI script
#!/usr/bin/env python3
"""
linecounter.py
Examples:
python linecounter.py file.txt
python linecounter.py *.py
python linecounter.py -r --pattern "*.py" ./src
python linecounter.py --no-blank --details file.py
cat file.txt | python linecounter.py -
"""
from __future__ import annotations
import argparse
import sys
from pathlib import Path
def count_lines_file(
filepath: str,
count_blank: bool = True,
count_comment: bool = True,
comment_char: str = "#",
) -> dict[str, int] | dict[str, str]:
total = blank = comment = code = 0
try:
with open(filepath, "r", encoding="utf-8", errors="replace") as f:
for line in f:
total += 1
stripped = line.strip()
if not stripped:
blank += 1
elif stripped.startswith(comment_char):
comment += 1
else:
code += 1
except OSError as exc:
return {"error": str(exc)}
effective = total
if not count_blank:
effective -= blank
if not count_comment:
effective -= comment
return {
"total": total,
"code": code,
"blank": blank,
"comment": comment,
"effective": effective,
}
def count_lines_stdin() -> int:
return sum(1 for _ in sys.stdin)
def format_number(value: int) -> str:
return f"{value:,}"
def print_results(
results: dict[str, dict[str, int] | dict[str, str]],
show_details: bool = False,
sort_by: str = "name",
) -> None:
files = {key: value for key, value in results.items() if key != "__summary__"}
summary = results.get("__summary__", {})
if not files:
print("No files found.")
return
if sort_by == "lines":
sorted_files = sorted(
files.items(),
key=lambda item: item[1].get("effective", 0) if isinstance(item[1], dict) else 0,
reverse=True,
)
else:
sorted_files = sorted(files.items())
max_path_len = min(max(len(path) for path in files), 60)
if show_details:
print(
f"{'File':<{max_path_len}} "
f"{'Total':>8} {'Code':>8} {'Blank':>8} {'Comment':>8}"
)
print("-" * (max_path_len + 36))
else:
print(f"{'Lines':>10} File")
print("-" * (max_path_len + 12))
for filepath, data in sorted_files:
display_path = filepath if len(filepath) <= max_path_len else "..." + filepath[-(max_path_len - 3):]
if "error" in data:
print(f"{'ERROR':>10} {display_path} ({data['error']})")
continue
if show_details:
print(
f"{display_path:<{max_path_len}} "
f"{format_number(int(data['total'])):>8} "
f"{format_number(int(data['code'])):>8} "
f"{format_number(int(data['blank'])):>8} "
f"{format_number(int(data['comment'])):>8}"
)
else:
print(f"{format_number(int(data['effective'])):>10} {display_path}")
if len(files) > 1 and isinstance(summary, dict) and summary:
print("-" * (max_path_len + 12))
if show_details:
print(
f"{'TOTAL':<{max_path_len}} "
f"{format_number(int(summary['total'])):>8} "
f"{format_number(int(summary['code'])):>8} "
f"{format_number(int(summary['blank'])):>8} "
f"{format_number(int(summary['comment'])):>8}"
)
else:
print(
f"{format_number(int(summary['effective'])):>10} "
f"total ({summary['file_count']} files)"
)
def iter_input_files(
args: argparse.Namespace,
) -> list[Path]:
discovered: list[Path] = []
for file_arg in args.files:
path = Path(file_arg)
if path.is_dir() and args.recursive:
for candidate in sorted(path.rglob(args.pattern)):
if candidate.is_file() and not any(excluded in candidate.parts for excluded in args.exclude):
discovered.append(candidate)
elif path.is_file():
discovered.append(path)
return discovered
def main() -> None:
parser = argparse.ArgumentParser(description="Count lines in files")
parser.add_argument("files", nargs="*", help="Files to count, or - for stdin")
parser.add_argument("-r", "--recursive", action="store_true", help="Recursively scan directories")
parser.add_argument("-p", "--pattern", default="*", help="File pattern for recursive mode")
parser.add_argument("--no-blank", action="store_true", help="Exclude blank lines")
parser.add_argument("--no-comment", action="store_true", help="Exclude comment lines")
parser.add_argument("--details", action="store_true", help="Show total, code, blank, and comment columns")
parser.add_argument("--sort", choices=["name", "lines"], default="name", help="Sort output by name or line count")
parser.add_argument(
"--exclude",
nargs="+",
default=[".git", "__pycache__", "node_modules", ".venv"],
help="Directories to skip in recursive mode",
)
args = parser.parse_args()
if not args.files or args.files == ["-"]:
print(count_lines_stdin())
return
results: dict[str, dict[str, int] | dict[str, str]] = {}
summary = {
"total": 0,
"code": 0,
"blank": 0,
"comment": 0,
"effective": 0,
"file_count": 0,
}
for filepath in iter_input_files(args):
data = count_lines_file(
str(filepath),
count_blank=not args.no_blank,
count_comment=not args.no_comment,
)
results[str(filepath)] = data
if "error" in data:
continue
for key in ("total", "code", "blank", "comment", "effective"):
summary[key] += int(data[key])
summary["file_count"] += 1
results["__summary__"] = summary
print_results(results, show_details=args.details, sort_by=args.sort)
if __name__ == "__main__":
main()
This is enough for:
- single-file counts
- recursive directory scans
- stdin pipelines
- excluding blank lines and comments
- sorted output for reports
Edge Cases and Common Issues
Empty files and empty strings
These cases create off-by-one bugs when you use newline counting instead of line counting.
assert len("".splitlines()) == 0
assert "".count("\n") == 0
If you use text.count("\n") + 1, the empty string incorrectly becomes 1.
Files without a trailing newline
This is the biggest source of disagreement between Python and wc -l.
text = "alpha\nbeta\ngamma"
print(text.count("\n"))
# 2
print(len(text.splitlines()))
# 3
A file can contain three logical lines and only two newline characters. Python iteration counts logical lines. wc -l counts newline characters.
Encodings, CRLF, and Unicode
If you open text files, always make the encoding decision explicit when the source is not tightly controlled.
with open("data.txt", "r", encoding="utf-8", errors="replace") as f:
count = sum(1 for _ in f)
Use text mode when:
- you care about Python's logical line semantics
- newline normalization should be handled for you
- the input is mostly valid text
Use binary mode when:
- raw speed matters
- you are counting newline bytes
- decoding is unnecessary overhead
CSV files can hide multiline fields
If a CSV column contains embedded newlines inside quoted values, raw line counting and row counting are no longer the same problem. Use the csv module or pandas if you need actual record counts.
Recursive scans need exclusions
Without exclusions, your totals will be polluted by:
.gitnode_modules.venv- build output
- caches
- generated artifacts
That is why the directory examples always carry an exclude list.
Python vs wc -l: Why Results Differ
This confusion is common enough that it deserves its own section.
The core difference
wc -lcounts newline characters- Python file iteration counts logical lines
splitlines()counts logical lines in strings
Example:
text = "line1\nline2\nline3"
print(text.count("\n"))
# 2
print(len(text.splitlines()))
# 3
If the same content is saved without a trailing newline, wc -l prints 2, while Python iteration over the file returns 3.
Comparison table
| Input | wc -l | sum(1 for _ in f) | len(text.splitlines()) |
|---|---|---|---|
"" | 0 | 0 | 0 |
"a\n" | 1 | 1 | 1 |
"a\nb\n" | 2 | 2 | 2 |
"a\nb" | 1 | 2 | 2 |
Which definition should you use?
Use wc -l when:
- you want Unix-compatible behavior
- you are matching shell scripts
- raw speed matters on macOS or Linux
Use Python logical line counts when:
- users think in visible lines, not newline bytes
- strings and files should behave consistently
- missing trailing newlines should not undercount the last line
If you are also working across shell environments, see the related guide on how to count lines in a file on Linux, Mac, and Windows.
Count Lines Without Writing Code
Sometimes the fastest solution is not another Python script. If you already copied data out of a terminal, editor, spreadsheet, or CSV export, a browser tool is simpler.
| Scenario | Best option |
|---|---|
| Script, CI, or automation | Python |
| One-off pasted text | Online Line Counter |
| Uploading a CSV or text file | Online Line Counter |
| Converting delimited data before counting | Text to Lines tool |
| Counting code files in a repository | Python or cloc |
Free Online Line Counter
Paste text, upload a file, or inspect copied output instantly. It is useful when you need a quick answer without opening Python, creating a virtual environment, or explaining a script to a non-technical teammate.
- Works with pasted terminal output and plain text
- Useful for CSV and TXT uploads
- No setup, no imports, no shell commands
Open the free Line Counter, or convert delimited text first with Text to Lines.
Which Python Line Counting Method Should You Use?
Use this decision tree when you need the answer fast:
Need to count lines in Python
|
+- Counting a normal text file?
| +- Need the safest default? -> sum(1 for _ in f)
| +- Need the full line list too? -> readlines() or read().splitlines()
| +- Need raw speed on huge files? -> binary chunk reading
| +- On macOS/Linux and want Unix behavior? -> subprocess wc -l
|
+- Counting a string?
| +- Want visible lines? -> len(text.splitlines())
| +- Want raw newline characters? -> text.count("\\n")
|
+- Counting only non-empty or matching lines?
| +- Add filters inside the generator expression
|
+- Counting lines of code?
| +- Quick heuristic? -> custom Python loop
| +- Accurate multi-language metrics? -> cloc
|
+- Counting many files or whole directories?
| +- pathlib.Path.rglob()
| +- fileinput for stdin or multiple file arguments
|
+- Do not want to write code?
+- Use linecounter.org
Python Line Count Quick Reference
Frequently Asked Questions
What is the best way to count lines in a file in Python?
For most scripts, use with open(path, 'r', encoding='utf-8', errors='replace') as f: count = sum(1 for _ in f). It is accurate, readable, and memory-efficient even for large files.
How do I count non-empty lines in Python?
Filter the iterator with line.strip(), for example: sum(1 for line in f if line.strip()). That excludes blank lines and lines that contain only whitespace.
What is the fastest Python method for counting lines in a huge file?
For a pure Python solution, binary chunk reading is usually the fastest choice. On macOS and Linux, subprocess wc -l is often faster still because the heavy work runs in optimized system code.
How do I count lines in a Python string?
Use len(text.splitlines()) for the most consistent answer. It handles empty strings, trailing newlines, and mixed newline styles better than text.count('\n') + 1.
Why does wc -l give a different result from Python?
wc -l counts newline characters, not logical lines. If a file does not end with a trailing newline, wc -l reports one fewer than Python line iteration or splitlines().
How do I count lines of code in Python without comments and blanks?
Loop through the file, ignore lines where stripped text is empty, and skip stripped.startswith('#'). For production-grade code metrics across many languages, use cloc instead of a hand-rolled parser.
Can Python count lines without reading the entire file into memory?
Yes. Iterating over the file object, using fileinput, binary chunk reading, and mmap-based approaches all avoid loading the full file into RAM at once.
How do I count lines in multiple files or a whole directory?
Use pathlib.Path.rglob() to walk matching files recursively, then sum the count for each file. Add an exclude list for folders such as .git, __pycache__, node_modules, and virtual environments.
How do I count lines in CSV files in Python?
Use the csv module when you need row-aware counting, filtering, or header handling. For plain raw line counts, streaming the file directly is faster, but it does not understand quoted multiline CSV fields.
Is readlines() bad for line counting?
It is fine for small files and when you also need the line content as a list. It becomes a bad default once files get large because it allocates memory roughly proportional to file size.
How do I build a wc -l equivalent in Python?
Use argparse for CLI parsing, read from files or stdin, and stream line by line with sum(1 for _ in stream). Add flags for recursive scanning, excluding blank lines, comment detection, and sorted output.
Can I count lines without writing Python code?
Yes. If you just need a quick result, paste text or upload a file to the free online Line Counter on linecounter.org instead of writing a script.
Related Guides
2 min read
How to Count Lines in a File (Linux/Mac/Windows)
Compare the fastest ways to count lines in a file on Linux, macOS, and Windows using built-in tools and scripts.
14 min read
How to Count Lines in Excel: Every Method Explained
Complete guide to counting rows and lines in Excel with ROWS, COUNTA, COUNTIF, COUNTIFS, SUBTOTAL, VBA, Power Query, and dynamic arrays.