Table of Contents
Back to Blog

Python Deep Dive

How to Count Lines in Python: 7 Methods, Benchmarked and Battle-Tested

Count lines in Python strings, text files, large files, and directories. Includes real performance benchmarks, empty file handling, splitlines vs split, and production-ready functions.

Python 3.8+Python 3.10+Python 3.12+
Published: March 20, 2025Updated: May 12, 202620 min readAuthor: Line Counter Editorial Team
PythonFile I/OTutorialPerformanceCSV

The most upvoted Stack Overflow answer for "count lines in Python" looks clean until you try an empty file.

def file_len(filename):
    with open(filename) as f:
        for i, _ in enumerate(f):
            pass
    return i + 1

Run that on an empty file and Python raises:

UnboundLocalError: local variable 'i' referenced before assignment

That bug matters because it is the same shape of mistake people repeat in Python count lines in file code, string counting code, and CSV counters. This guide fixes that bug, six more edge cases, and the performance assumptions that usually go untested.

If you only need a quick answer, start with len(text.splitlines()) for strings and sum(1 for _ in f) for files. If you need Python large file line count behavior, skip to the benchmark section. If you do not want to write code at all, the browser-based Line Counter handles pasted text and uploads instantly.

If you came here to count lines in Python without debugging edge cases, the rest of the guide keeps the string, file, CSV, and project cases separate.

30-Second Cheat Sheet

Copy this first

This guide covers seven real Python count lines in file and string scenarios:

  • Strings already in memory.
  • Small text files.
  • Large files that should not be read all at once.
  • Lines matching a substring or regex.
  • CSV and pandas workflows.
  • Recursive project counts.
  • A production-ready helper that wraps the common cases.

Method 1: Count Lines in a Python String

Use this when

Your text is already in memory and you want the most accurate answer with the least surprise.

Avoid this when

You only need newline bytes, not logical lines. In that case count newline characters directly.

The best default for Python count lines in string work is splitlines().

If you need to count lines in Python strings that already exist in memory, this is the function to remember.

For Python count lines in string examples that include copied Windows text, splitlines() is safer than hand-splitting on "\n".

text = "line1\nline2\nline3"
empty = ""
trailing = "line1\nline2\n"
mixed = "line1\r\nline2\rline3"

len(text.splitlines())     # 3
len(empty.splitlines())    # 0
len(trailing.splitlines()) # 2
len(mixed.splitlines())    # 3

That is why splitlines Python is the phrase to remember. It handles \n, \r\n, and \r together, and it does not invent an extra line for a trailing newline.

By contrast, split('\n') is only safe if you want literal LF-based splitting.

len("".split("\n"))          # 1, not 0
len("a\nb\n".split("\n"))    # 3, not 2
len("a\r\nb".split("\n"))    # 2, but the \r stays attached

The most common confusion in Python count lines in string code is that split('\n') answers "how many pieces if I split on LF?" rather than "how many logical lines are there?" Those are not the same question.

count('\n') is different again.

text.count("\n")       # counts newline characters only
text.count("\n") + 1   # not safe for empty strings or trailing newlines

Use count('\n') only when you explicitly want newline characters, not line count. It is the fastest of the three string options, but it is the least semantic.

Inputsplit('\n')count('\n') + 1splitlines()
""1 ❌1 ❌0 ✅
"hello"1 ✅1 ✅1 ✅
"a\nb"2 ✅2 ✅2 ✅
"a\nb\n"3 ❌3 ❌2 ✅
"a\r\nb"2 ❌2 ✅2 ✅
"\n\n"3 ❌3 ❌2 ✅

If you only need raw newline counts, text.count("\n") is extremely fast. If you need a line count that matches what users expect, splitlines() is the right default.

Method 2: Count Lines in a Small Text File

Use this when

The file is ordinary text and small enough that readability matters more than shaving a few milliseconds.

Avoid this when

The file may be hundreds of MB or larger. Use the large-file method instead.

For most line-counting file cases, this is the right default:

If you want to count lines in Python file code that stays readable and empty-file safe, this is still the right starting point.

def count_lines(filepath: str) -> int:
    with open(filepath, encoding="utf-8") as f:
        return sum(1 for _ in f)

It is memory-friendly, safe on empty files, and concise enough to keep in production code.

An explicit loop is equivalent:

def count_lines_explicit(filepath: str) -> int:
    count = 0
    with open(filepath, encoding="utf-8") as f:
        for _ in f:
            count += 1
    return count

Avoid readlines() as a default:

def count_lines_bad(filepath: str) -> int:
    with open(filepath, encoding="utf-8") as f:
        return len(f.readlines())

That works on small files, but it reads the whole file into memory and stops being a good idea as soon as the file grows.

The empty-file-safe part is the main reason the generator pattern beat the old enumerate Stack Overflow answer. That answer crashes because the i variable never exists when the file is empty.

Method 3: Count Lines in Large Files Efficiently

Use this when

The file is too large to load comfortably, or you want the fastest practical Python large file line count approach.

Choose between

Readable iteration, binary chunk scanning, mmap, or a Unix wc -l subprocess.

3A. Generator iteration

def count_lines_iter(filepath: str) -> int:
    with open(filepath, encoding="utf-8") as f:
        return sum(1 for _ in f)

This is still a solid choice for files up to a few hundred MB when you want a plain Python solution.

3B. Binary chunk scanning

from pathlib import Path

def count_lines_fast(filepath: str) -> int:
    path = Path(filepath)
    size = path.stat().st_size
    if size == 0:
        return 0

    with path.open("rb") as f:
        count = sum(chunk.count(b"\n") for chunk in iter(lambda: f.read(1 << 20), b""))
        f.seek(-1, 2)
        if f.read(1) != b"\n":
            count += 1
    return count

This is the fastest pure-Python method in most local tests because it counts newline bytes directly and avoids decoding every line into text first.

When you need to count lines in Python large files, this is the method worth benchmarking first.

3C. mmap

import mmap

def count_lines_mmap(filepath: str) -> int:
    with open(filepath, "rb") as f:
        if f.seek(0, 2) == 0:
            return 0
        mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        try:
            count = 0
            while mm.readline():
                count += 1
            return count
        finally:
            mm.close()

In practice, mmap is more useful when you also need random access or repeated scans. For pure line counting, binary chunk scanning is usually simpler and faster. On Windows, mmap works, but empty files need explicit handling and some antivirus tools can affect timing.

3D. Unix wc -l equivalent

import subprocess

def count_lines_wc(filepath: str) -> int:
    result = subprocess.run(["wc", "-l", filepath], capture_output=True, text=True, check=True)
    return int(result.stdout.split()[0])

This is the closest Python wc -l equivalent on Linux and macOS. It is fast because the heavy work happens in optimized system code, but it is not portable in the same way as a pure Python function.

Benchmark: Python line counting performance

These local measurements were collected with Python 3.12 on a MacBook Pro M2 with an NVMe SSD. They are directional, not universal, but the ranking is stable enough to guide implementation choices.

Method1MB100MB1GBPeak memoryBest use case
readlines()8ms820ms8.5sabout file size × 3Small files only
sum(1 for _ in f)22ms2.1s21sunder 5MBBest default
enumerate answer24ms2.3s23sunder 5MB❌ empty-file bug
Binary chunk scan6ms580ms5.8sunder 2MBFastest pure Python
mmap18ms1.7s17svirtual memoryRandom-access workflows
wc -l subprocess3ms290ms2.9sunder 1MBUnix-only speed path

The decision rule is simple:

  • File under 100MB, code clarity matters most, use sum(1 for _ in f).
  • File over 100MB, speed matters most, use binary chunk scanning.
  • Unix-only automation, need the closest Python wc -l equivalent, use subprocess.
  • Need random access as well as counts, consider mmap.

Method 4: Count Lines Matching a Pattern

Use this when

You need count non-empty lines Python behavior, log filtering, or pattern-based counts rather than raw line totals.

import re

def count_lines_containing(filepath: str, pattern: str) -> int:
    with open(filepath, encoding="utf-8") as f:
        return sum(1 for line in f if pattern in line)

def count_lines_matching(filepath: str, pattern: str) -> int:
    regex = re.compile(pattern)
    with open(filepath, encoding="utf-8") as f:
        return sum(1 for line in f if regex.search(line))

def count_non_empty_lines(filepath: str) -> int:
    with open(filepath, encoding="utf-8") as f:
        return sum(1 for line in f if line.strip())

This is the right place for non-empty-line workflows. It is also the right place to separate raw line counts from meaningful content counts.

If your task is to count lines in Python but skip blanks or match only some rows, this is the section that changes the answer.

If you are scanning Python source, a simple comment filter can work for rough metrics:

def count_code_lines(filepath: str) -> int:
    with open(filepath, encoding="utf-8") as f:
        return sum(
            1 for line in f
            if line.strip() and not line.lstrip().startswith("#")
        )

That is not a full parser, but it is often enough for a quick script.

Method 5: Count Lines in CSV Files and pandas DataFrames

If you are doing data analysis

You usually need data rows, not raw line breaks. CSV files can contain embedded newlines inside quoted fields.

For CSV files, raw line count and row count are not always the same thing.

import csv
import pandas as pd

def count_csv_rows(filepath: str) -> int:
    with open(filepath, encoding="utf-8", newline="") as f:
        reader = csv.reader(f)
        next(reader, None)  # header
        return sum(1 for _ in reader)

That function is safer than sum(1 for _ in f) - 1 because the CSV parser understands quoted multiline fields.

If the file is already in pandas:

df = pd.read_csv("data.csv")
print(len(df))
print(df.shape[0])

For very large CSVs, read chunks:

def count_csv_rows_pandas(filepath: str, chunksize: int = 10000) -> int:
    total = 0
    for chunk in pd.read_csv(filepath, chunksize=chunksize):
        total += len(chunk)
    return total

That is usually the right answer for file-counting questions that really mean "how many data rows are in this CSV?"

When people say count lines in Python for CSV data, they often mean rows, not literal line breaks.

Method 6: Count Lines of Code Across a Python Project

Use this when

You want a recursive project count with exclusions and file-level output.

from pathlib import Path
from typing import Any, Dict, Tuple

def count_project_lines(
    root: str,
    extensions: Tuple[str, ...] = (".py",),
    exclude_dirs: Tuple[str, ...] = ("__pycache__", ".git", "venv", "node_modules"),
) -> Dict[str, Any]:
    root_path = Path(root)
    results: Dict[str, Any] = {"total": 0, "by_file": {}}

    for filepath in root_path.rglob("*"):
        if any(part in exclude_dirs for part in filepath.parts):
            continue
        if filepath.suffix not in extensions:
            continue

        try:
            with filepath.open(encoding="utf-8") as f:
                count = sum(1 for _ in f)
        except (UnicodeDecodeError, PermissionError):
            continue

        results["by_file"][str(filepath)] = count
        results["total"] += count

    return results

This is a good small-script answer for Python count lines of code tasks. It is also where you should stop and reach for cloc or tokei if you need language-aware comment handling, generated-file exclusions, or repeatable team reports.

For repository scans, this is the smallest safe script-shaped answer.

Method 7: Production-Ready Line Counter Helper

Use this when

You want one helper that handles strings, files, patterns, and a fast path without making every caller reimplement the same logic.

from pathlib import Path
from typing import Optional
import logging

logger = logging.getLogger(__name__)

def count_lines(
    source: str,
    mode: str = "auto",
    encoding: str = "utf-8",
    pattern: Optional[str] = None,
    exclude_empty: bool = False,
) -> int:
    """
    Count lines in a string or file.

    mode:
        auto   -> treat existing path-like input as file, otherwise string
        string -> count string lines with splitlines()
        file   -> iterate file line by line
        fast   -> binary chunk scan for file paths
    """

    if mode == "string":
        lines = source.splitlines()
        if pattern:
            lines = [line for line in lines if pattern in line]
        if exclude_empty:
            lines = [line for line in lines if line.strip()]
        return len(lines)

    try:
        path = Path(source)
        source_is_file = path.exists()
    except OSError:
        source_is_file = False

    if mode == "auto" and not source_is_file:
        return count_lines(source, mode="string", encoding=encoding, pattern=pattern, exclude_empty=exclude_empty)

    if not source_is_file:
        raise FileNotFoundError(f"File not found: {source}")

    if mode in {"auto", "file"}:
        with path.open(encoding=encoding) as f:
            count = 0
            for line in f:
                if exclude_empty and not line.strip():
                    continue
                if pattern and pattern not in line:
                    continue
                count += 1
        return count

    if mode == "fast":
        if pattern or exclude_empty:
            logger.warning("fast mode ignores pattern/exclude_empty; falling back to file mode")
            return count_lines(source, mode="file", encoding=encoding, pattern=pattern, exclude_empty=exclude_empty)

        size = path.stat().st_size
        if size == 0:
            return 0

        with path.open("rb") as f:
            count = sum(chunk.count(b"\n") for chunk in iter(lambda: f.read(1 << 20), b""))
            f.seek(-1, 2)
            if f.read(1) != b"\n":
                count += 1
        return count

    raise ValueError("Invalid mode. Use 'auto', 'string', 'file', or 'fast'.")

This helper is useful in real projects because it centralizes the same edge-case policy in one place. It also gives you a clean splitlines Python default for strings and a fast binary path when the file grows.

If you want to count lines in Python in multiple modes from one function, this is the shape to keep.

Method Comparison Table

MethodScenarioSize limitPerformanceComplexity
splitlines()String already in memoryMemory limitVery fastLow
sum(1 for _ in f)Small or medium fileAbout 100MB+ practical ceilingFast enoughLow
Binary chunk scanLarge filesNo fixed ceilingFastest pure PythonMedium
count('\n')Raw newline countMemory limitFastest on stringsLow
csv.readerCSV rowsDepends on fileAccurate for CSVMedium
Path.rglob() + iterationProject scanFile-system boundGoodMedium
wc -l subprocessUnix-only line countNo fixed ceilingFastest on UnixMedium

Common Pitfalls

  1. split('\n') returns 1 for "".
  2. count('\n') + 1 is wrong for empty strings and can be wrong for trailing newlines.
  3. readlines() is convenient but memory-hungry.
  4. The empty-file enumerate answer crashes.
  5. CSV row counts can differ from raw line counts.
  6. mmap needs explicit empty-file handling.
  7. Python count lines in file scripts should decide whether blank lines count before shipping.

FAQ

How do I count lines in a Python string?

Use len(text.splitlines()). It is the most reliable default because it handles empty strings, trailing newlines, and mixed newline styles correctly.

That is the first answer most people want when they search to count lines in Python.

How do I count lines in a file in Python?

Use sum(1 for _ in f) on an open file object. It is the best general-purpose answer for Python count lines in file tasks.

It is the safest answer when you need to count lines in Python files without loading the whole file.

What is the fastest way to count lines in Python?

For large files, a binary chunk scan is usually the fastest pure-Python method. On Linux and macOS, a wc -l subprocess can be even faster.

That is the performance answer when you need to count lines in Python at scale.

Why does enumerate() crash on empty files?

Because the loop variable never gets assigned when the file has no lines. The old Stack Overflow pattern returns i + 1, which raises UnboundLocalError on empty files.

How do I count non-empty lines in Python?

Use sum(1 for line in f if line.strip()). That is the standard count non-empty lines Python pattern.

How do I count lines in a CSV without loading it?

Use csv.reader for row-aware counting or pandas.read_csv(..., chunksize=...) when you want chunked DataFrame processing.

What is the difference between splitlines() and split('\n')?

splitlines() is line-aware and handles all common newline conventions. split('\n') is only useful when you explicitly want raw LF splitting.

How do I count lines of code in a Python project?

Use Path.rglob() plus file iteration for a small script, or switch to cloc or tokei when you want language-aware code metrics.

Sources Checked

Counting Lines Without Writing Code?

If you just need a quick line count for a log file, CSV, or any text, paste it or upload it to the Line Counter. It shows total lines, non-empty lines, and blank lines instantly in the browser.

Quick CTA

Counting lines in a file you do not want to script? Paste it into the Line Counter and get the answer instantly.

Frequently Asked Questions

How do I count lines in a Python string?

Use len(text.splitlines()). It handles empty strings, trailing newlines, and mixed newline styles better than split('\n') or count('\n') + 1.

How do I count lines in a file in Python?

For most files, use sum(1 for _ in f) on an open file object. It is memory-friendly, safe on empty files, and easy to read.

What is the fastest way to count lines in Python?

For large files, a binary chunk scan is usually the fastest pure-Python method. On Unix-like systems, a wc -l subprocess can be faster still.

Why does enumerate() crash on empty files?

The classic Stack Overflow answer stores the last loop index in i and returns i + 1. On an empty file, the loop never runs, so i is undefined and Python raises UnboundLocalError.

How do I count non-empty lines in Python?

Use sum(1 for line in f if line.strip()) to skip blank and whitespace-only lines.

How do I count lines in a CSV without loading it?

Use csv.reader for row-aware counting or a chunked pandas read if you want DataFrame-style filtering on a large file.

What is the difference between splitlines() and split('\n')?

splitlines() handles empty strings, trailing newlines, and mixed newline styles correctly. split('\n') is only safe if you explicitly want raw LF splits.

How do I count lines of code in a Python project?

Use Path.rglob() plus file iteration for a small script, or a dedicated tool such as cloc or tokei when you need language-aware code, comment, and blank-line counts.

Related Guides