Table of Contents
Back to Blog

Python Deep Dive

How to Count Lines in Python: 7 Methods with Performance Benchmarks

Complete Python guide to counting lines in files, strings, directories, and code with benchmark-backed methods, large-file strategies, and a production-ready CLI.

Python 3.8+Python 3.10+Python 3.12+
Published: March 20, 2025Updated: March 20, 202616 min readAuthor: Line Counter Editorial Team
PythonFile I/OTutorialPerformanceCLI

Counting lines in Python sounds easy until the requirement becomes specific. Do you need every line in a file, only non-empty lines, only lines that match a pattern, or actual lines of code with comments removed? Do you need the answer for one file, a whole directory tree, or a 10GB log that should not be loaded into memory? Those cases all look similar at first, but the right implementation changes quickly.

This guide covers the full range. You will see seven different ways to count file lines, the tradeoffs between readability and raw speed, the safest way to count lines in strings, how to build a practical code line counter, and how to package everything into a reusable CLI. The benchmark table is meant as a directional comparison for the test setup described below, not as a promise that your machine will show the exact same timings.

If you only need a quick answer, start with sum(1 for _ in f) for files and splitlines() for strings. If you are working with GB-scale files or want a wc -l equivalent, skip straight to the large-file and CLI sections. If you do not want to write any Python at all, use the browser-based Line Counter instead.

Quick Answer

Quick Answer

Count lines in a file

Count lines in a string

Count non-empty lines

No-code option

Paste your text or upload a file to the free Line Counter.

Quick CTA

Just need a quick line count without writing code? Paste your text or upload a file to the free online Line Counter for instant results. If you want to convert comma-separated values into one-item-per-line text before counting, use the Text to Lines tool.

Python Line Counting Methods - Comparison and Benchmarks

Not all line-counting methods are equal. Some are easy to read but memory-hungry. Others are much faster on large files because they avoid Python text decoding and count raw newline bytes directly. The table below compares the common approaches using the benchmark profile in this guide: a 2024 MacBook Pro M3, averaged over 10 runs. Treat the timings as directional rather than universal.

Benchmark file sizes used in the comparison:

  • Small file: 1MB, about 50,000 lines
  • Medium file: 100MB, about 5,000,000 lines
  • Large file: 1GB, about 50,000,000 lines
  • Extra-large file: 10GB, about 500,000,000 lines
Method1MB100MB1GB10GBPeak memoryBest use case
readlines()8ms380ms3.8sOOMabout file sizeSmall files when you need line content
sum(1 for _ in f)12ms420ms4.2s42sabout 1MBBest general-purpose choice
read().splitlines()9ms360ms3.6sOOMabout file sizeNeed a list of all lines
Binary chunk reading4ms180ms1.8s18sabout 1MBFastest pure Python method
mmap5ms210ms2.1s21sabout 1MBLarge files plus random access workflows
fileinput15ms480ms4.8s48sabout 1MBMultiple files and stdin
subprocess wc -l2ms80ms0.8s8sabout 1MBFastest on Linux and macOS

Recommended methods by use case

  • General-purpose scripting: sum(1 for _ in f)
  • Very large files: binary chunk reading
  • Linux and macOS automation: subprocess with wc -l
  • Many files at once: fileinput or pathlib.Path.rglob()
  • Need the actual lines too: readlines() or splitlines() on smaller inputs

How to Count Lines in a File in Python - 7 Methods

Method 1 - readlines() - Simple

If you want the shortest possible introduction to python count lines in file, readlines() is the most literal answer.

with open("file.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()

count = len(lines)
print(f"Line count: {count}")

One-line version:

count = len(open("file.txt", encoding="utf-8").readlines())

Why it works:

  • readlines() loads the full file into a list
  • len() returns the list length
  • the code is easy to read even for beginners

Use it when:

  • the file is small
  • you also need the line content
  • clarity matters more than scalability

Avoid it when:

  • the file may be hundreds of MB or larger
  • you are writing a reusable production utility
  • peak memory matters

For most scripts, this is the best default.

with open("file.txt", "r", encoding="utf-8") as f:
    count = sum(1 for _ in f)

print(f"Line count: {count}")

With explicit error handling:

def count_lines(filepath: str) -> int:
    try:
        with open(
            filepath,
            "r",
            encoding="utf-8",
            errors="replace",
        ) as f:
            return sum(1 for _ in f)
    except FileNotFoundError as exc:
        raise FileNotFoundError(f"File not found: {filepath}") from exc
    except PermissionError as exc:
        raise PermissionError(f"Permission denied: {filepath}") from exc

This approach is strong because:

  • the file object streams one line at a time
  • memory usage stays effectively constant
  • the intent is obvious
  • the result matches Python's notion of logical lines, including a final line without a trailing newline

Why use _ as the variable name?

In Python, _ is a convention for "I do not need this value." The expression sum(1 for _ in f) means "read each line, add one, and do not store the line content."

Method 3 - read().splitlines() - String-based

This method is convenient when you also want the file as a single string in memory.

with open("file.txt", "r", encoding="utf-8") as f:
    count = len(f.read().splitlines())

splitlines() is stronger than split("\n"):

text = "line1\nline2\nline3\n"
print(text.split("\n"))
# ['line1', 'line2', 'line3', '']

print(text.splitlines())
# ['line1', 'line2', 'line3']

It also handles mixed newline styles:

text = "line1\r\nline2\nline3\r"
print(text.splitlines())
# ['line1', 'line2', 'line3']

Use this when you need:

  • an in-memory string for later parsing
  • robust newline handling
  • a clean answer for strings and small files alike

Avoid it when memory is tight.

Method 4 - Binary chunk reading - Fastest

When you need pure Python speed for large files, counting newline bytes in binary chunks is usually the fastest approach.

def count_lines_fast(filepath: str, chunk_size: int = 1 << 20) -> int:
    """
    Count lines by reading binary chunks.

    - Constant memory usage
    - No text decoding overhead
    - Handles files without a trailing newline
    """
    count = 0
    last_byte = b""

    with open(filepath, "rb") as f:
        for chunk in iter(lambda: f.read(chunk_size), b""):
            count += chunk.count(b"\n")
            last_byte = chunk[-1:]

    if last_byte and last_byte != b"\n":
        count += 1

    return count

This is fast because:

  • binary mode skips text decoding
  • bytes.count() is implemented in C
  • chunk size keeps memory usage predictable

The tradeoff is semantic: this method counts newline bytes and then adjusts for the last line if the file does not end with \n. That makes it accurate for line counting, but it is no longer identical to wc -l. The wc -l section below explains that distinction in detail.

Method 5 - mmap - Memory-mapped

mmap is useful when the file is large and you also care about random access patterns elsewhere in your workflow.

import mmap


def count_lines_mmap(filepath: str) -> int:
    with open(filepath, "rb") as f:
        f.seek(0, 2)
        size = f.tell()
        if size == 0:
            return 0
        f.seek(0)

        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            return sum(1 for _ in iter(mm.readline, b""))

Why people use it:

  • the OS pages data in on demand
  • you can mix counting with other random-access operations
  • it performs well on large files

Caveats:

  • Windows cannot memory-map an empty file, so always guard for zero size
  • 32-bit Python environments have practical size limits
  • mmap is a power tool, not a default choice

Method 6 - fileinput module

The fileinput module is tailor-made for line-oriented processing across one or many files.

import fileinput


with fileinput.input(files=["file1.txt", "file2.txt", "file3.txt"]) as f:
    count = sum(1 for _ in f)

print(f"Total lines across all files: {count}")

It also works with stdin:

import fileinput


with fileinput.input() as f:
    count = sum(1 for _ in f)

print(count)

Per-file counts are easy too:

from collections import defaultdict
import fileinput


file_counts = defaultdict(int)

with fileinput.input(files=["a.txt", "b.txt", "c.txt"]) as f:
    for _line in f:
        file_counts[f.filename()] += 1

for filename, count in file_counts.items():
    print(f"{count:>8,} {filename}")

Use it when:

  • your script naturally accepts multiple files
  • you want stdin support
  • you need access to the current filename while iterating

Method 7 - subprocess wc -l - System call

On Unix-like systems, wc -l is hard to beat for raw speed.

import subprocess
import sys


def count_lines_wc(filepath: str) -> int:
    if sys.platform == "win32":
        raise OSError("wc -l is not available on Windows")

    result = subprocess.run(
        ["wc", "-l", filepath],
        capture_output=True,
        text=True,
        check=True,
    )
    return int(result.stdout.strip().split()[0])

Safe fallback wrapper:

def count_lines_cross_platform(filepath: str) -> int:
    if sys.platform != "win32":
        try:
            return count_lines_wc(filepath)
        except (subprocess.SubprocessError, OSError, ValueError):
            pass
    return count_lines_fast(filepath)

Important note: wc -l counts newline characters, not logical lines. If the last line does not end with \n, wc -l will be one lower than Python line iteration. That is not a bug. It is a different definition.

How to Count Lines in a String in Python

Sometimes the data is already in memory. In that case, the best answer is usually splitlines(), not file iteration and not a raw newline counter.

The safest answer is splitlines()

def count_lines_in_string(text: str) -> int:
    return len(text.splitlines())


assert count_lines_in_string("a\nb\nc") == 3
assert count_lines_in_string("a\nb\nc\n") == 3
assert count_lines_in_string("") == 0
assert count_lines_in_string("\n") == 1

count("\n") + 1 versus splitlines()

text.count("\n") + 1 is common, but it breaks on empty strings and trailing newlines.

Method"a\nb\nc\n""a\nb\nc"""
text.count("\n") + 1431
len(text.split("\n"))431
len(text.splitlines())330

If you specifically want the number of newline characters, use text.count("\n"). If you want the number of lines a human would see, use splitlines().

Multiline strings and triple-quoted text

Triple-quoted strings often include leading and trailing blank lines that are easy to forget.

text = """
First line
Second line
Third line
"""

print(len(text.splitlines()))
# 5

print(len(text.strip().splitlines()))
# 3

print(sum(1 for line in text.splitlines() if line.strip()))
# 3

Mixed newline styles

splitlines() is robust across newline formats.

mixed = "line1\nline2\r\nline3\rline4"
print(len(mixed.splitlines()))
# 4

It handles \n, \r\n, \r, and several Unicode line separators without extra logic.

How to Count Lines of Code in Python

Counting lines of code is not the same as counting all lines. In code metrics, you usually want to exclude blank lines and comment-only lines. Sometimes you also want to treat docstrings as comments. The right rule depends on why you are measuring.

Basic LOC counting

This version counts total lines, code lines, blank lines, and comment lines.

def count_loc(filepath: str) -> dict[str, int]:
    total = 0
    code = 0
    blank = 0
    comment = 0

    with open(filepath, "r", encoding="utf-8", errors="replace") as f:
        for line in f:
            total += 1
            stripped = line.strip()

            if not stripped:
                blank += 1
            elif stripped.startswith("#"):
                comment += 1
            else:
                code += 1

    return {
        "total": total,
        "code": code,
        "blank": blank,
        "comment": comment,
    }

Use this for:

  • simple project reports
  • single-language scripts
  • quick CI summaries

Handling docstrings and multiline strings

If you want a deeper Python LOC counter, you need to decide how multiline strings should count.

def count_loc_advanced(filepath: str) -> dict[str, int | float]:
    total = 0
    code = 0
    blank = 0
    comment = 0
    in_multiline_string = False
    multiline_delimiter = ""

    with open(filepath, "r", encoding="utf-8", errors="replace") as f:
        for line in f:
            total += 1
            stripped = line.strip()

            if not stripped:
                blank += 1
                continue

            if in_multiline_string:
                comment += 1
                if multiline_delimiter in stripped:
                    in_multiline_string = False
                continue

            if stripped.startswith("#"):
                comment += 1
                continue

            started_multiline = False
            for quote in ('"""', "'''"):
                if stripped.startswith(quote):
                    if stripped.count(quote) == 1:
                        in_multiline_string = True
                        multiline_delimiter = quote
                        comment += 1
                        started_multiline = True
                    else:
                        code += 1
                        started_multiline = True
                    break

            if not started_multiline:
                code += 1

    return {
        "total": total,
        "code": code,
        "blank": blank,
        "comment": comment,
        "comment_ratio": round(comment / total * 100, 1) if total else 0.0,
    }

This still is not a perfect parser. It is a pragmatic heuristic. If you need precise multi-language code metrics, use a dedicated tool.

Multi-language comment rules

If your directory contains more than Python, centralize comment styles.

from pathlib import Path


COMMENT_STYLES = {
    ".py": {"single": "#", "multi": ('"""', "'''")},
    ".js": {"single": "//", "multi": ("/*", "*/")},
    ".ts": {"single": "//", "multi": ("/*", "*/")},
    ".java": {"single": "//", "multi": ("/*", "*/")},
    ".c": {"single": "//", "multi": ("/*", "*/")},
    ".cpp": {"single": "//", "multi": ("/*", "*/")},
    ".go": {"single": "//", "multi": ("/*", "*/")},
    ".rb": {"single": "#", "multi": ("=begin", "=end")},
    ".sh": {"single": "#", "multi": None},
    ".sql": {"single": "--", "multi": ("/*", "*/")},
}


def get_comment_style(filepath: str) -> dict[str, object]:
    ext = Path(filepath).suffix.lower()
    return COMMENT_STYLES.get(ext, {"single": "#", "multi": None})

When to use cloc instead

If this is for real reporting, prefer cloc.

import json
import subprocess


def count_loc_cloc(path: str) -> dict[str, int]:
    result = subprocess.run(
        ["cloc", "--json", path],
        capture_output=True,
        text=True,
        check=True,
    )
    data = json.loads(result.stdout)
    return data.get("SUM", {})

Use cloc when accuracy matters

  • Multi-language repositories
  • Formal engineering metrics or reporting
  • Docstrings, block comments, and generated files need consistent handling

Count Lines in Multiple Files and Directories

Count a fixed list of files

def count_lines_multiple(filepaths: list[str]) -> dict[str, int | str]:
    results: dict[str, int | str] = {}
    total = 0

    for filepath in filepaths:
        try:
            with open(filepath, "r", encoding="utf-8", errors="replace") as f:
                count = sum(1 for _ in f)
            results[filepath] = count
            total += count
        except (FileNotFoundError, PermissionError, OSError) as exc:
            results[filepath] = f"Error: {exc}"

    results["__total__"] = total
    return results

Recursively count files in a directory

from pathlib import Path


def count_lines_directory(
    directory: str,
    pattern: str = "*.py",
    recursive: bool = True,
    exclude_dirs: list[str] | None = None,
) -> dict[str, object]:
    exclude_dirs = exclude_dirs or [
        ".git",
        "__pycache__",
        "node_modules",
        ".venv",
        "venv",
        "dist",
        "build",
    ]

    root = Path(directory)
    glob_fn = root.rglob if recursive else root.glob

    files: dict[str, int | str] = {}
    total_lines = 0
    total_files = 0
    errors = 0

    for filepath in sorted(glob_fn(pattern)):
        if any(part in exclude_dirs for part in filepath.parts):
            continue
        if not filepath.is_file():
            continue

        try:
            with open(filepath, "r", encoding="utf-8", errors="replace") as f:
                count = sum(1 for _ in f)
            files[str(filepath)] = count
            total_lines += count
            total_files += 1
        except OSError as exc:
            files[str(filepath)] = f"Error: {exc}"
            errors += 1

    return {
        "files": files,
        "summary": {
            "total_files": total_files,
            "total_lines": total_lines,
            "errors": errors,
            "average_lines": round(total_lines / total_files, 1) if total_files else 0,
        },
    }

This pattern is ideal for python count lines in multiple files and python count lines of code in directory style tasks. It is also a good base for CI checks.

Parallel counting for large file sets

from concurrent.futures import ThreadPoolExecutor
from pathlib import Path


def count_file_lines(filepath: Path) -> tuple[str, int]:
    with filepath.open("r", encoding="utf-8", errors="replace") as f:
        return str(filepath), sum(1 for _ in f)


def count_lines_parallel(
    directory: str,
    pattern: str = "*.py",
    max_workers: int = 8,
) -> dict[str, int]:
    root = Path(directory)
    filepaths = [path for path in root.rglob(pattern) if path.is_file()]

    results: dict[str, int] = {}
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        for filepath, count in executor.map(count_file_lines, filepaths):
            results[filepath] = count

    return results

Because line counting is I/O-heavy, threads work well here. If your bottleneck becomes parsing rather than file reads, revisit the strategy.

Counting Lines in Large Files (GB-Scale) in Python

Large-file workflows break the naive methods first. readlines() and read().splitlines() both scale memory usage with file size, which is exactly what you do not want on log archives, data exports, or multi-GB dumps.

A solid default for GB-scale files

import os


def count_lines_optimal(filepath: str, chunk_size: int = 1 << 23) -> int:
    """
    8MB binary chunk counter.

    - Constant memory
    - Fast on SSDs
    - Counts the final line if no trailing newline exists
    """
    if os.path.getsize(filepath) == 0:
        return 0

    count = 0
    last_byte = b""

    with open(filepath, "rb") as f:
        for chunk in iter(lambda: f.read(chunk_size), b""):
            count += chunk.count(b"\n")
            last_byte = chunk[-1:]

    if last_byte != b"\n":
        count += 1

    return count

On fast local storage, this pattern is usually the best cross-platform answer to python count lines fast large file.

Progress reporting for long-running counts

import os


def count_lines_with_progress(filepath: str, chunk_size: int = 1 << 23) -> int:
    file_size = os.path.getsize(filepath)
    processed = 0
    count = 0
    last_byte = b""

    with open(filepath, "rb") as f:
        for chunk in iter(lambda: f.read(chunk_size), b""):
            count += chunk.count(b"\n")
            processed += len(chunk)
            last_byte = chunk[-1:]

            progress = processed / file_size * 100 if file_size else 100
            print(
                f"\rProgress: {progress:5.1f}% "
                f"({processed / 1e9:.2f}GB / {file_size / 1e9:.2f}GB)",
                end="",
                flush=True,
            )

    if last_byte and last_byte != b"\n":
        count += 1

    print()
    return count

If your team already uses tqdm, wrap the same chunk loop in a progress bar instead of printing manually.

Multi-core counting for extreme cases

import multiprocessing as mp
import os
from functools import partial


def count_chunk(filepath: str, start: int, size: int) -> int:
    with open(filepath, "rb") as f:
        f.seek(start)
        return f.read(size).count(b"\n")


def count_lines_multicore(filepath: str, num_workers: int | None = None) -> int:
    if num_workers is None:
        num_workers = mp.cpu_count()

    file_size = os.path.getsize(filepath)
    if file_size == 0:
        return 0

    chunk_size = file_size // num_workers
    chunks = [
        (
            i * chunk_size,
            chunk_size if i < num_workers - 1 else file_size - i * chunk_size,
        )
        for i in range(num_workers)
    ]

    worker = partial(count_chunk, filepath)

    with mp.Pool(num_workers) as pool:
        counts = pool.starmap(worker, chunks)

    total = sum(counts)

    with open(filepath, "rb") as f:
        f.seek(-1, os.SEEK_END)
        if f.read(1) != b"\n":
            total += 1

    return total

Multi-core counting only pays off when:

  • the file is extremely large
  • storage is fast enough to keep multiple workers busy
  • CPU overhead is worth the operational complexity

For many real systems, wc -l or single-process binary chunk reading is the better engineering tradeoff.

Large-file benchmark summary

MethodTime on 10GBCPU coresPeak memory
sum(1 for _ in f)42s1about 1MB
Binary chunks (8MB)18s1about 8MB
mmap21s1about 1MB
subprocess wc -l8s1about 1MB
Multicore5s8about 8MB

In practice:

  • Linux and macOS: wc -l is usually the fastest simple option
  • Cross-platform Python: binary chunks are the best default
  • Peak performance with engineering overhead: multicore counting

Count Lines by Condition in Python

Count non-empty lines

with open("file.txt", "r", encoding="utf-8", errors="replace") as f:
    non_empty = sum(1 for line in f if line.strip())

with open("file.txt", "r", encoding="utf-8", errors="replace") as f:
    blank = sum(1 for line in f if not line.strip())

Count lines containing a specific word

def count_lines_containing(
    filepath: str,
    keyword: str,
    case_sensitive: bool = True,
) -> int:
    with open(filepath, "r", encoding="utf-8", errors="replace") as f:
        if case_sensitive:
            return sum(1 for line in f if keyword in line)

        keyword_lower = keyword.lower()
        return sum(1 for line in f if keyword_lower in line.lower())

Regex version:

import re


def count_lines_matching(filepath: str, pattern: str, flags: int = 0) -> int:
    regex = re.compile(pattern, flags)

    with open(filepath, "r", encoding="utf-8", errors="replace") as f:
        return sum(1 for line in f if regex.search(line))

Count by line length

def count_lines_by_length(
    filepath: str,
    min_len: int = 0,
    max_len: int | None = None,
) -> int:
    with open(filepath, "r", encoding="utf-8", errors="replace") as f:
        return sum(
            1
            for line in f
            if min_len <= len(line.rstrip("\n"))
            and (max_len is None or len(line.rstrip("\n")) <= max_len)
        )

Count CSV rows matching a condition

import csv


def count_csv_rows_by_condition(
    filepath: str,
    column: str,
    value: str,
    has_header: bool = True,
) -> int:
    count = 0

    with open(filepath, "r", encoding="utf-8", newline="") as f:
        reader = csv.DictReader(f) if has_header else csv.reader(f)

        for row in reader:
            if has_header:
                if row.get(column) == value:
                    count += 1
            else:
                column_index = int(column)
                if len(row) > column_index and row[column_index] == value:
                    count += 1

    return count

For very large CSV files with column logic, a chunked pandas.read_csv() workflow can be more maintainable than hand-written loops.

Build a Python Line Counter CLI Tool

If you want a reusable python line counter utility, package the common patterns into one CLI. The script below behaves like a friendly cross-platform wc -l with extra controls.

Complete CLI script

#!/usr/bin/env python3
"""
linecounter.py

Examples:
    python linecounter.py file.txt
    python linecounter.py *.py
    python linecounter.py -r --pattern "*.py" ./src
    python linecounter.py --no-blank --details file.py
    cat file.txt | python linecounter.py -
"""

from __future__ import annotations

import argparse
import sys
from pathlib import Path


def count_lines_file(
    filepath: str,
    count_blank: bool = True,
    count_comment: bool = True,
    comment_char: str = "#",
) -> dict[str, int] | dict[str, str]:
    total = blank = comment = code = 0

    try:
        with open(filepath, "r", encoding="utf-8", errors="replace") as f:
            for line in f:
                total += 1
                stripped = line.strip()

                if not stripped:
                    blank += 1
                elif stripped.startswith(comment_char):
                    comment += 1
                else:
                    code += 1
    except OSError as exc:
        return {"error": str(exc)}

    effective = total
    if not count_blank:
        effective -= blank
    if not count_comment:
        effective -= comment

    return {
        "total": total,
        "code": code,
        "blank": blank,
        "comment": comment,
        "effective": effective,
    }


def count_lines_stdin() -> int:
    return sum(1 for _ in sys.stdin)


def format_number(value: int) -> str:
    return f"{value:,}"


def print_results(
    results: dict[str, dict[str, int] | dict[str, str]],
    show_details: bool = False,
    sort_by: str = "name",
) -> None:
    files = {key: value for key, value in results.items() if key != "__summary__"}
    summary = results.get("__summary__", {})

    if not files:
        print("No files found.")
        return

    if sort_by == "lines":
        sorted_files = sorted(
            files.items(),
            key=lambda item: item[1].get("effective", 0) if isinstance(item[1], dict) else 0,
            reverse=True,
        )
    else:
        sorted_files = sorted(files.items())

    max_path_len = min(max(len(path) for path in files), 60)

    if show_details:
        print(
            f"{'File':<{max_path_len}} "
            f"{'Total':>8} {'Code':>8} {'Blank':>8} {'Comment':>8}"
        )
        print("-" * (max_path_len + 36))
    else:
        print(f"{'Lines':>10}  File")
        print("-" * (max_path_len + 12))

    for filepath, data in sorted_files:
        display_path = filepath if len(filepath) <= max_path_len else "..." + filepath[-(max_path_len - 3):]

        if "error" in data:
            print(f"{'ERROR':>10}  {display_path} ({data['error']})")
            continue

        if show_details:
            print(
                f"{display_path:<{max_path_len}} "
                f"{format_number(int(data['total'])):>8} "
                f"{format_number(int(data['code'])):>8} "
                f"{format_number(int(data['blank'])):>8} "
                f"{format_number(int(data['comment'])):>8}"
            )
        else:
            print(f"{format_number(int(data['effective'])):>10}  {display_path}")

    if len(files) > 1 and isinstance(summary, dict) and summary:
        print("-" * (max_path_len + 12))
        if show_details:
            print(
                f"{'TOTAL':<{max_path_len}} "
                f"{format_number(int(summary['total'])):>8} "
                f"{format_number(int(summary['code'])):>8} "
                f"{format_number(int(summary['blank'])):>8} "
                f"{format_number(int(summary['comment'])):>8}"
            )
        else:
            print(
                f"{format_number(int(summary['effective'])):>10}  "
                f"total ({summary['file_count']} files)"
            )


def iter_input_files(
    args: argparse.Namespace,
) -> list[Path]:
    discovered: list[Path] = []

    for file_arg in args.files:
        path = Path(file_arg)

        if path.is_dir() and args.recursive:
            for candidate in sorted(path.rglob(args.pattern)):
                if candidate.is_file() and not any(excluded in candidate.parts for excluded in args.exclude):
                    discovered.append(candidate)
        elif path.is_file():
            discovered.append(path)

    return discovered


def main() -> None:
    parser = argparse.ArgumentParser(description="Count lines in files")
    parser.add_argument("files", nargs="*", help="Files to count, or - for stdin")
    parser.add_argument("-r", "--recursive", action="store_true", help="Recursively scan directories")
    parser.add_argument("-p", "--pattern", default="*", help="File pattern for recursive mode")
    parser.add_argument("--no-blank", action="store_true", help="Exclude blank lines")
    parser.add_argument("--no-comment", action="store_true", help="Exclude comment lines")
    parser.add_argument("--details", action="store_true", help="Show total, code, blank, and comment columns")
    parser.add_argument("--sort", choices=["name", "lines"], default="name", help="Sort output by name or line count")
    parser.add_argument(
        "--exclude",
        nargs="+",
        default=[".git", "__pycache__", "node_modules", ".venv"],
        help="Directories to skip in recursive mode",
    )

    args = parser.parse_args()

    if not args.files or args.files == ["-"]:
        print(count_lines_stdin())
        return

    results: dict[str, dict[str, int] | dict[str, str]] = {}
    summary = {
        "total": 0,
        "code": 0,
        "blank": 0,
        "comment": 0,
        "effective": 0,
        "file_count": 0,
    }

    for filepath in iter_input_files(args):
        data = count_lines_file(
            str(filepath),
            count_blank=not args.no_blank,
            count_comment=not args.no_comment,
        )
        results[str(filepath)] = data

        if "error" in data:
            continue

        for key in ("total", "code", "blank", "comment", "effective"):
            summary[key] += int(data[key])
        summary["file_count"] += 1

    results["__summary__"] = summary
    print_results(results, show_details=args.details, sort_by=args.sort)


if __name__ == "__main__":
    main()

This is enough for:

  • single-file counts
  • recursive directory scans
  • stdin pipelines
  • excluding blank lines and comments
  • sorted output for reports

Edge Cases and Common Issues

Empty files and empty strings

These cases create off-by-one bugs when you use newline counting instead of line counting.

assert len("".splitlines()) == 0
assert "".count("\n") == 0

If you use text.count("\n") + 1, the empty string incorrectly becomes 1.

Files without a trailing newline

This is the biggest source of disagreement between Python and wc -l.

text = "alpha\nbeta\ngamma"

print(text.count("\n"))
# 2

print(len(text.splitlines()))
# 3

A file can contain three logical lines and only two newline characters. Python iteration counts logical lines. wc -l counts newline characters.

Encodings, CRLF, and Unicode

If you open text files, always make the encoding decision explicit when the source is not tightly controlled.

with open("data.txt", "r", encoding="utf-8", errors="replace") as f:
    count = sum(1 for _ in f)

Use text mode when:

  • you care about Python's logical line semantics
  • newline normalization should be handled for you
  • the input is mostly valid text

Use binary mode when:

  • raw speed matters
  • you are counting newline bytes
  • decoding is unnecessary overhead

CSV files can hide multiline fields

If a CSV column contains embedded newlines inside quoted values, raw line counting and row counting are no longer the same problem. Use the csv module or pandas if you need actual record counts.

Recursive scans need exclusions

Without exclusions, your totals will be polluted by:

  • .git
  • node_modules
  • .venv
  • build output
  • caches
  • generated artifacts

That is why the directory examples always carry an exclude list.

Python vs wc -l: Why Results Differ

This confusion is common enough that it deserves its own section.

The core difference

  • wc -l counts newline characters
  • Python file iteration counts logical lines
  • splitlines() counts logical lines in strings

Example:

text = "line1\nline2\nline3"

print(text.count("\n"))
# 2

print(len(text.splitlines()))
# 3

If the same content is saved without a trailing newline, wc -l prints 2, while Python iteration over the file returns 3.

Comparison table

Inputwc -lsum(1 for _ in f)len(text.splitlines())
""000
"a\n"111
"a\nb\n"222
"a\nb"122

Which definition should you use?

Use wc -l when:

  • you want Unix-compatible behavior
  • you are matching shell scripts
  • raw speed matters on macOS or Linux

Use Python logical line counts when:

  • users think in visible lines, not newline bytes
  • strings and files should behave consistently
  • missing trailing newlines should not undercount the last line

If you are also working across shell environments, see the related guide on how to count lines in a file on Linux, Mac, and Windows.

Count Lines Without Writing Code

Sometimes the fastest solution is not another Python script. If you already copied data out of a terminal, editor, spreadsheet, or CSV export, a browser tool is simpler.

ScenarioBest option
Script, CI, or automationPython
One-off pasted textOnline Line Counter
Uploading a CSV or text fileOnline Line Counter
Converting delimited data before countingText to Lines tool
Counting code files in a repositoryPython or cloc

Free Online Line Counter

Paste text, upload a file, or inspect copied output instantly. It is useful when you need a quick answer without opening Python, creating a virtual environment, or explaining a script to a non-technical teammate.

  • Works with pasted terminal output and plain text
  • Useful for CSV and TXT uploads
  • No setup, no imports, no shell commands

Open the free Line Counter, or convert delimited text first with Text to Lines.

Which Python Line Counting Method Should You Use?

Use this decision tree when you need the answer fast:

Need to count lines in Python
|
+- Counting a normal text file?
|  +- Need the safest default? -> sum(1 for _ in f)
|  +- Need the full line list too? -> readlines() or read().splitlines()
|  +- Need raw speed on huge files? -> binary chunk reading
|  +- On macOS/Linux and want Unix behavior? -> subprocess wc -l
|
+- Counting a string?
|  +- Want visible lines? -> len(text.splitlines())
|  +- Want raw newline characters? -> text.count("\\n")
|
+- Counting only non-empty or matching lines?
|  +- Add filters inside the generator expression
|
+- Counting lines of code?
|  +- Quick heuristic? -> custom Python loop
|  +- Accurate multi-language metrics? -> cloc
|
+- Counting many files or whole directories?
|  +- pathlib.Path.rglob()
|  +- fileinput for stdin or multiple file arguments
|
+- Do not want to write code?
   +- Use linecounter.org

Python Line Count Quick Reference

Frequently Asked Questions

What is the best way to count lines in a file in Python?

For most scripts, use with open(path, 'r', encoding='utf-8', errors='replace') as f: count = sum(1 for _ in f). It is accurate, readable, and memory-efficient even for large files.

How do I count non-empty lines in Python?

Filter the iterator with line.strip(), for example: sum(1 for line in f if line.strip()). That excludes blank lines and lines that contain only whitespace.

What is the fastest Python method for counting lines in a huge file?

For a pure Python solution, binary chunk reading is usually the fastest choice. On macOS and Linux, subprocess wc -l is often faster still because the heavy work runs in optimized system code.

How do I count lines in a Python string?

Use len(text.splitlines()) for the most consistent answer. It handles empty strings, trailing newlines, and mixed newline styles better than text.count('\n') + 1.

Why does wc -l give a different result from Python?

wc -l counts newline characters, not logical lines. If a file does not end with a trailing newline, wc -l reports one fewer than Python line iteration or splitlines().

How do I count lines of code in Python without comments and blanks?

Loop through the file, ignore lines where stripped text is empty, and skip stripped.startswith('#'). For production-grade code metrics across many languages, use cloc instead of a hand-rolled parser.

Can Python count lines without reading the entire file into memory?

Yes. Iterating over the file object, using fileinput, binary chunk reading, and mmap-based approaches all avoid loading the full file into RAM at once.

How do I count lines in multiple files or a whole directory?

Use pathlib.Path.rglob() to walk matching files recursively, then sum the count for each file. Add an exclude list for folders such as .git, __pycache__, node_modules, and virtual environments.

How do I count lines in CSV files in Python?

Use the csv module when you need row-aware counting, filtering, or header handling. For plain raw line counts, streaming the file directly is faster, but it does not understand quoted multiline CSV fields.

Is readlines() bad for line counting?

It is fine for small files and when you also need the line content as a list. It becomes a bad default once files get large because it allocates memory roughly proportional to file size.

How do I build a wc -l equivalent in Python?

Use argparse for CLI parsing, read from files or stdin, and stream line by line with sum(1 for _ in stream). Add flags for recursive scanning, excluding blank lines, comment detection, and sorted output.

Can I count lines without writing Python code?

Yes. If you just need a quick result, paste text or upload a file to the free online Line Counter on linecounter.org instead of writing a script.

Related Guides