Table of Contents
Python Deep Dive
How to Count Lines in Python: 7 Methods, Benchmarked and Battle-Tested
Count lines in Python strings, text files, large files, and directories. Includes real performance benchmarks, empty file handling, splitlines vs split, and production-ready functions.
The most upvoted Stack Overflow answer for "count lines in Python" looks clean until you try an empty file.
def file_len(filename):
with open(filename) as f:
for i, _ in enumerate(f):
pass
return i + 1
Run that on an empty file and Python raises:
UnboundLocalError: local variable 'i' referenced before assignment
That bug matters because it is the same shape of mistake people repeat in Python count lines in file code, string counting code, and CSV counters. This guide fixes that bug, six more edge cases, and the performance assumptions that usually go untested.
If you only need a quick answer, start with len(text.splitlines()) for strings and sum(1 for _ in f) for files. If you need Python large file line count behavior, skip to the benchmark section. If you do not want to write code at all, the browser-based Line Counter handles pasted text and uploads instantly.
If you came here to count lines in Python without debugging edge cases, the rest of the guide keeps the string, file, CSV, and project cases separate.
30-Second Cheat Sheet
Copy this first
This guide covers seven real Python count lines in file and string scenarios:
- Strings already in memory.
- Small text files.
- Large files that should not be read all at once.
- Lines matching a substring or regex.
- CSV and pandas workflows.
- Recursive project counts.
- A production-ready helper that wraps the common cases.
Method 1: Count Lines in a Python String
Use this when
Your text is already in memory and you want the most accurate answer with the least surprise.
Avoid this when
You only need newline bytes, not logical lines. In that case count newline characters directly.
The best default for Python count lines in string work is splitlines().
If you need to count lines in Python strings that already exist in memory, this is the function to remember.
For Python count lines in string examples that include copied Windows text, splitlines() is safer than hand-splitting on "\n".
text = "line1\nline2\nline3"
empty = ""
trailing = "line1\nline2\n"
mixed = "line1\r\nline2\rline3"
len(text.splitlines()) # 3
len(empty.splitlines()) # 0
len(trailing.splitlines()) # 2
len(mixed.splitlines()) # 3
That is why splitlines Python is the phrase to remember. It handles \n, \r\n, and \r together, and it does not invent an extra line for a trailing newline.
By contrast, split('\n') is only safe if you want literal LF-based splitting.
len("".split("\n")) # 1, not 0
len("a\nb\n".split("\n")) # 3, not 2
len("a\r\nb".split("\n")) # 2, but the \r stays attached
The most common confusion in Python count lines in string code is that split('\n') answers "how many pieces if I split on LF?" rather than "how many logical lines are there?" Those are not the same question.
count('\n') is different again.
text.count("\n") # counts newline characters only
text.count("\n") + 1 # not safe for empty strings or trailing newlines
Use count('\n') only when you explicitly want newline characters, not line count. It is the fastest of the three string options, but it is the least semantic.
| Input | split('\n') | count('\n') + 1 | splitlines() |
|---|---|---|---|
"" | 1 ❌ | 1 ❌ | 0 ✅ |
"hello" | 1 ✅ | 1 ✅ | 1 ✅ |
"a\nb" | 2 ✅ | 2 ✅ | 2 ✅ |
"a\nb\n" | 3 ❌ | 3 ❌ | 2 ✅ |
"a\r\nb" | 2 ❌ | 2 ✅ | 2 ✅ |
"\n\n" | 3 ❌ | 3 ❌ | 2 ✅ |
If you only need raw newline counts, text.count("\n") is extremely fast. If you need a line count that matches what users expect, splitlines() is the right default.
Method 2: Count Lines in a Small Text File
Use this when
The file is ordinary text and small enough that readability matters more than shaving a few milliseconds.
Avoid this when
The file may be hundreds of MB or larger. Use the large-file method instead.
For most line-counting file cases, this is the right default:
If you want to count lines in Python file code that stays readable and empty-file safe, this is still the right starting point.
def count_lines(filepath: str) -> int:
with open(filepath, encoding="utf-8") as f:
return sum(1 for _ in f)
It is memory-friendly, safe on empty files, and concise enough to keep in production code.
An explicit loop is equivalent:
def count_lines_explicit(filepath: str) -> int:
count = 0
with open(filepath, encoding="utf-8") as f:
for _ in f:
count += 1
return count
Avoid readlines() as a default:
def count_lines_bad(filepath: str) -> int:
with open(filepath, encoding="utf-8") as f:
return len(f.readlines())
That works on small files, but it reads the whole file into memory and stops being a good idea as soon as the file grows.
The empty-file-safe part is the main reason the generator pattern beat the old enumerate Stack Overflow answer. That answer crashes because the i variable never exists when the file is empty.
Method 3: Count Lines in Large Files Efficiently
Use this when
The file is too large to load comfortably, or you want the fastest practical Python large file line count approach.
Choose between
Readable iteration, binary chunk scanning, mmap, or a Unix wc -l subprocess.
3A. Generator iteration
def count_lines_iter(filepath: str) -> int:
with open(filepath, encoding="utf-8") as f:
return sum(1 for _ in f)
This is still a solid choice for files up to a few hundred MB when you want a plain Python solution.
3B. Binary chunk scanning
from pathlib import Path
def count_lines_fast(filepath: str) -> int:
path = Path(filepath)
size = path.stat().st_size
if size == 0:
return 0
with path.open("rb") as f:
count = sum(chunk.count(b"\n") for chunk in iter(lambda: f.read(1 << 20), b""))
f.seek(-1, 2)
if f.read(1) != b"\n":
count += 1
return count
This is the fastest pure-Python method in most local tests because it counts newline bytes directly and avoids decoding every line into text first.
When you need to count lines in Python large files, this is the method worth benchmarking first.
3C. mmap
import mmap
def count_lines_mmap(filepath: str) -> int:
with open(filepath, "rb") as f:
if f.seek(0, 2) == 0:
return 0
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
try:
count = 0
while mm.readline():
count += 1
return count
finally:
mm.close()
In practice, mmap is more useful when you also need random access or repeated scans. For pure line counting, binary chunk scanning is usually simpler and faster. On Windows, mmap works, but empty files need explicit handling and some antivirus tools can affect timing.
3D. Unix wc -l equivalent
import subprocess
def count_lines_wc(filepath: str) -> int:
result = subprocess.run(["wc", "-l", filepath], capture_output=True, text=True, check=True)
return int(result.stdout.split()[0])
This is the closest Python wc -l equivalent on Linux and macOS. It is fast because the heavy work happens in optimized system code, but it is not portable in the same way as a pure Python function.
Benchmark: Python line counting performance
These local measurements were collected with Python 3.12 on a MacBook Pro M2 with an NVMe SSD. They are directional, not universal, but the ranking is stable enough to guide implementation choices.
| Method | 1MB | 100MB | 1GB | Peak memory | Best use case |
|---|---|---|---|---|---|
readlines() | 8ms | 820ms | 8.5s | about file size × 3 | Small files only |
sum(1 for _ in f) | 22ms | 2.1s | 21s | under 5MB | Best default |
enumerate answer | 24ms | 2.3s | 23s | under 5MB | ❌ empty-file bug |
| Binary chunk scan | 6ms | 580ms | 5.8s | under 2MB | Fastest pure Python |
mmap | 18ms | 1.7s | 17s | virtual memory | Random-access workflows |
wc -l subprocess | 3ms | 290ms | 2.9s | under 1MB | Unix-only speed path |
The decision rule is simple:
- File under 100MB, code clarity matters most, use
sum(1 for _ in f). - File over 100MB, speed matters most, use binary chunk scanning.
- Unix-only automation, need the closest Python wc -l equivalent, use
subprocess. - Need random access as well as counts, consider
mmap.
Method 4: Count Lines Matching a Pattern
Use this when
You need count non-empty lines Python behavior, log filtering, or pattern-based counts rather than raw line totals.
import re
def count_lines_containing(filepath: str, pattern: str) -> int:
with open(filepath, encoding="utf-8") as f:
return sum(1 for line in f if pattern in line)
def count_lines_matching(filepath: str, pattern: str) -> int:
regex = re.compile(pattern)
with open(filepath, encoding="utf-8") as f:
return sum(1 for line in f if regex.search(line))
def count_non_empty_lines(filepath: str) -> int:
with open(filepath, encoding="utf-8") as f:
return sum(1 for line in f if line.strip())
This is the right place for non-empty-line workflows. It is also the right place to separate raw line counts from meaningful content counts.
If your task is to count lines in Python but skip blanks or match only some rows, this is the section that changes the answer.
If you are scanning Python source, a simple comment filter can work for rough metrics:
def count_code_lines(filepath: str) -> int:
with open(filepath, encoding="utf-8") as f:
return sum(
1 for line in f
if line.strip() and not line.lstrip().startswith("#")
)
That is not a full parser, but it is often enough for a quick script.
Method 5: Count Lines in CSV Files and pandas DataFrames
If you are doing data analysis
You usually need data rows, not raw line breaks. CSV files can contain embedded newlines inside quoted fields.
For CSV files, raw line count and row count are not always the same thing.
import csv
import pandas as pd
def count_csv_rows(filepath: str) -> int:
with open(filepath, encoding="utf-8", newline="") as f:
reader = csv.reader(f)
next(reader, None) # header
return sum(1 for _ in reader)
That function is safer than sum(1 for _ in f) - 1 because the CSV parser understands quoted multiline fields.
If the file is already in pandas:
df = pd.read_csv("data.csv")
print(len(df))
print(df.shape[0])
For very large CSVs, read chunks:
def count_csv_rows_pandas(filepath: str, chunksize: int = 10000) -> int:
total = 0
for chunk in pd.read_csv(filepath, chunksize=chunksize):
total += len(chunk)
return total
That is usually the right answer for file-counting questions that really mean "how many data rows are in this CSV?"
When people say count lines in Python for CSV data, they often mean rows, not literal line breaks.
Method 6: Count Lines of Code Across a Python Project
Use this when
You want a recursive project count with exclusions and file-level output.
from pathlib import Path
from typing import Any, Dict, Tuple
def count_project_lines(
root: str,
extensions: Tuple[str, ...] = (".py",),
exclude_dirs: Tuple[str, ...] = ("__pycache__", ".git", "venv", "node_modules"),
) -> Dict[str, Any]:
root_path = Path(root)
results: Dict[str, Any] = {"total": 0, "by_file": {}}
for filepath in root_path.rglob("*"):
if any(part in exclude_dirs for part in filepath.parts):
continue
if filepath.suffix not in extensions:
continue
try:
with filepath.open(encoding="utf-8") as f:
count = sum(1 for _ in f)
except (UnicodeDecodeError, PermissionError):
continue
results["by_file"][str(filepath)] = count
results["total"] += count
return results
This is a good small-script answer for Python count lines of code tasks. It is also where you should stop and reach for cloc or tokei if you need language-aware comment handling, generated-file exclusions, or repeatable team reports.
For repository scans, this is the smallest safe script-shaped answer.
Method 7: Production-Ready Line Counter Helper
Use this when
You want one helper that handles strings, files, patterns, and a fast path without making every caller reimplement the same logic.
from pathlib import Path
from typing import Optional
import logging
logger = logging.getLogger(__name__)
def count_lines(
source: str,
mode: str = "auto",
encoding: str = "utf-8",
pattern: Optional[str] = None,
exclude_empty: bool = False,
) -> int:
"""
Count lines in a string or file.
mode:
auto -> treat existing path-like input as file, otherwise string
string -> count string lines with splitlines()
file -> iterate file line by line
fast -> binary chunk scan for file paths
"""
if mode == "string":
lines = source.splitlines()
if pattern:
lines = [line for line in lines if pattern in line]
if exclude_empty:
lines = [line for line in lines if line.strip()]
return len(lines)
try:
path = Path(source)
source_is_file = path.exists()
except OSError:
source_is_file = False
if mode == "auto" and not source_is_file:
return count_lines(source, mode="string", encoding=encoding, pattern=pattern, exclude_empty=exclude_empty)
if not source_is_file:
raise FileNotFoundError(f"File not found: {source}")
if mode in {"auto", "file"}:
with path.open(encoding=encoding) as f:
count = 0
for line in f:
if exclude_empty and not line.strip():
continue
if pattern and pattern not in line:
continue
count += 1
return count
if mode == "fast":
if pattern or exclude_empty:
logger.warning("fast mode ignores pattern/exclude_empty; falling back to file mode")
return count_lines(source, mode="file", encoding=encoding, pattern=pattern, exclude_empty=exclude_empty)
size = path.stat().st_size
if size == 0:
return 0
with path.open("rb") as f:
count = sum(chunk.count(b"\n") for chunk in iter(lambda: f.read(1 << 20), b""))
f.seek(-1, 2)
if f.read(1) != b"\n":
count += 1
return count
raise ValueError("Invalid mode. Use 'auto', 'string', 'file', or 'fast'.")
This helper is useful in real projects because it centralizes the same edge-case policy in one place. It also gives you a clean splitlines Python default for strings and a fast binary path when the file grows.
If you want to count lines in Python in multiple modes from one function, this is the shape to keep.
Method Comparison Table
| Method | Scenario | Size limit | Performance | Complexity |
|---|---|---|---|---|
splitlines() | String already in memory | Memory limit | Very fast | Low |
sum(1 for _ in f) | Small or medium file | About 100MB+ practical ceiling | Fast enough | Low |
| Binary chunk scan | Large files | No fixed ceiling | Fastest pure Python | Medium |
count('\n') | Raw newline count | Memory limit | Fastest on strings | Low |
csv.reader | CSV rows | Depends on file | Accurate for CSV | Medium |
Path.rglob() + iteration | Project scan | File-system bound | Good | Medium |
wc -l subprocess | Unix-only line count | No fixed ceiling | Fastest on Unix | Medium |
Common Pitfalls
split('\n')returns 1 for"".count('\n') + 1is wrong for empty strings and can be wrong for trailing newlines.readlines()is convenient but memory-hungry.- The empty-file
enumerateanswer crashes. - CSV row counts can differ from raw line counts.
mmapneeds explicit empty-file handling.- Python count lines in file scripts should decide whether blank lines count before shipping.
FAQ
How do I count lines in a Python string?
Use len(text.splitlines()). It is the most reliable default because it handles empty strings, trailing newlines, and mixed newline styles correctly.
That is the first answer most people want when they search to count lines in Python.
How do I count lines in a file in Python?
Use sum(1 for _ in f) on an open file object. It is the best general-purpose answer for Python count lines in file tasks.
It is the safest answer when you need to count lines in Python files without loading the whole file.
What is the fastest way to count lines in Python?
For large files, a binary chunk scan is usually the fastest pure-Python method. On Linux and macOS, a wc -l subprocess can be even faster.
That is the performance answer when you need to count lines in Python at scale.
Why does enumerate() crash on empty files?
Because the loop variable never gets assigned when the file has no lines. The old Stack Overflow pattern returns i + 1, which raises UnboundLocalError on empty files.
How do I count non-empty lines in Python?
Use sum(1 for line in f if line.strip()). That is the standard count non-empty lines Python pattern.
How do I count lines in a CSV without loading it?
Use csv.reader for row-aware counting or pandas.read_csv(..., chunksize=...) when you want chunked DataFrame processing.
What is the difference between splitlines() and split('\n')?
splitlines() is line-aware and handles all common newline conventions. split('\n') is only useful when you explicitly want raw LF splitting.
How do I count lines of code in a Python project?
Use Path.rglob() plus file iteration for a small script, or switch to cloc or tokei when you want language-aware code metrics.
Sources Checked
- Python standard library documentation for
str.splitlines()and file iteration behavior: https://docs.python.org/3/library/stdtypes.html - Stack Overflow question on counting lines in a file in Python, including the empty-file edge case: https://stackoverflow.com/questions/845058/how-to-get-the-line-count-of-a-large-file-cheaply-in-python
- Stack Overflow discussion of
splitlines()versussplit('\n'): https://stackoverflow.com/questions/14618577/python-splitlines-vs-splitn
Related Guides and Tools
- count lines with wc -l in Bash for shell-based file, variable, and pipeline counting.
- JavaScript line counting for strings, browser files, Node.js streams, and React textareas.
- PHP line counting for
file(),fgets,SplFileObject, WordPress uploads, and large files. - Ruby line counting for
File.foreach, binary mode, Rails importers, andwc -lsafety. - Kotlin line counting for
useLines,BufferedReader, coroutines, and Android file URIs. - Java file line counting for
Files.lines(),BufferedReader, Spring Boot uploads, and JVM large-file benchmarks. - C line counting for
fread,mmap,wc -linternals, and byte-level newline scanning. - Lua line counting for
io.lines,fh:lines(), LuaJIT compatibility, and fast byte scanning. - Go line counting for
bufio.Scanner, the 64KB token limit, gzip input, and goroutines. - Rust line counting for
BufReader,read_line, Rayon, and byte scanning. - C# line counting for
File.ReadLines,StreamReader, async I/O, and large files. - Swift line counting for
FileHandle,URL.lines, SwiftUI progress, and Apple memory constraints. - Scala line counting for
Source,Using,Files.lines, Spark, and theToo many open filestrap. - R line counting for
readLines(),readr::read_lines(),R.utils::countLines(), and warning-safe CSV workflows. - Perl line counting for
$.,IO::File,sysread, and bioinformatics-oriented line counting. - SQL row counting for
COUNT(*), estimates, SQLAlchemy, Django, and Prisma. - count lines in VS Code for editor status bar, selection, and project-wide methods.
- online line counter for browser-based text, file, and CSV counting.
- Line Counter tool for pasted text and uploaded files.
Counting Lines Without Writing Code?
If you just need a quick line count for a log file, CSV, or any text, paste it or upload it to the Line Counter. It shows total lines, non-empty lines, and blank lines instantly in the browser.
Quick CTA
Counting lines in a file you do not want to script? Paste it into the Line Counter and get the answer instantly.
Frequently Asked Questions
How do I count lines in a Python string?
Use len(text.splitlines()). It handles empty strings, trailing newlines, and mixed newline styles better than split('\n') or count('\n') + 1.
How do I count lines in a file in Python?
For most files, use sum(1 for _ in f) on an open file object. It is memory-friendly, safe on empty files, and easy to read.
What is the fastest way to count lines in Python?
For large files, a binary chunk scan is usually the fastest pure-Python method. On Unix-like systems, a wc -l subprocess can be faster still.
Why does enumerate() crash on empty files?
The classic Stack Overflow answer stores the last loop index in i and returns i + 1. On an empty file, the loop never runs, so i is undefined and Python raises UnboundLocalError.
How do I count non-empty lines in Python?
Use sum(1 for line in f if line.strip()) to skip blank and whitespace-only lines.
How do I count lines in a CSV without loading it?
Use csv.reader for row-aware counting or a chunked pandas read if you want DataFrame-style filtering on a large file.
What is the difference between splitlines() and split('\n')?
splitlines() handles empty strings, trailing newlines, and mixed newline styles correctly. split('\n') is only safe if you explicitly want raw LF splits.
How do I count lines of code in a Python project?
Use Path.rglob() plus file iteration for a small script, or a dedicated tool such as cloc or tokei when you need language-aware code, comment, and blank-line counts.
Related Guides
16 min read
How to Count Lines in Bash: The Complete Guide with Edge Cases
Master line counting in Bash: count lines in files, variables, command output, and directories. Covers wc -l pitfalls, empty files, filenames with spaces, and shell script usage.
18 min read
How to Count Lines in JavaScript: 6 Methods with Performance Benchmarks
Count lines in JavaScript strings, files, Node.js streams, and the browser. Includes real performance benchmarks, edge case handling, and a decision guide for every scenario.
13 min read
How to Count Lines in a File in C (And Why `fgetc` Is 9x Slower Than `fread`)
Count lines in a file in C — fgets, fread, mmap, and the large performance gap between them. Covers `wc -l` internals, Windows vs Linux portability, long-line traps, and production-ready counting patterns for large files.
14 min read
How to Count Lines in a File Using Swift (And the autoreleasepool Trap That Crashes Your App)
Count lines in a file using Swift — String(contentsOfFile), FileHandle, and async/await. Covers the autoreleasepool memory trap, iOS memory warnings, and SwiftUI progress patterns with benchmarks.