Table of Contents
C Deep Dive
How to Count Lines in a File in C (And Why `fgetc` Is 9x Slower Than `fread`)
Count lines in a file in C — fgets, fread, mmap, and the large performance gap between them. Covers `wc -l` internals, Windows vs Linux portability, long-line traps, and production-ready counting patterns for large files.
Most C tutorials teach line counting with fgetc:
int c;
long count = 0;
while ((c = fgetc(fp)) != EOF) {
if (c == '\n')
count++;
}
This code is correct for counting newline bytes.
It is also one of the slowest serious ways to solve c count lines in file.
In the left404.com disk-I/O benchmark, a 150 MB read/write pass using fgetc/fputc took 5.90 seconds. A chunked fread/fwrite pass at 65536 bytes took 0.63 seconds. That is about a 9x difference in the same basic workload.
There is a second problem too: most tutorial code only counts '\n', which means it undercounts a non-empty file that does not end with a newline.
So the real c count lines question is not "how do I increment a counter?"
It is:
- how much overhead do I pay per byte?
- do I need portable code or platform-specific speed?
- am I counting logical lines or only newline bytes?
- how close do I want to get to how
wc -lreally works?
This guide covers the real count lines in c ladder:
fgetcfor teaching and tiny memory budgetsfgetsfor portable line-oriented codefreadfor the best cross-platform performance defaultmmapfor POSIX-only mapped scanning- the low-level
read()model that explainshow does wc -l work
If you only want the short answer:
- small file and readability first:
fgets - large file and portable speed:
freadwith a64 KiBbuffer - Linux or macOS only and you are willing to benchmark:
mmap
That is the real c count lines windows linux rule of thumb: use fread as the production baseline, then specialize only when the platform and workload justify it.
Quick Method Guide
| I want to... | Use this | Main warning |
|---|---|---|
| Learn the problem with minimal code | fgetc | slowest tier |
| Stay portable across Windows and Linux | fgets or fread | naive fgets counting breaks on long lines |
| Get the best cross-platform speed | fread + byte scan | must handle a missing final newline |
Match wc -l more closely | read() + byte scan | POSIX-only API surface |
| Try zero-copy mapped access | mmap | POSIX-only and not always faster than tuned fread |
| Add a Windows-specific fast path | CreateFileMapping + MapViewOfFile | separate implementation branch |
For most c count lines in file code, fread is the practical sweet spot.
Method 1: fgetc - Simple, Correct, and Slow
If you want the smallest possible count lines in c example, this is it:
#include <stdio.h>
long count_lines_fgetc(const char *filename) {
FILE *fp = fopen(filename, "rb");
if (!fp) {
perror("fopen");
return -1;
}
long count = 0;
int c;
int saw_any = 0;
int last = '\n';
while ((c = fgetc(fp)) != EOF) {
saw_any = 1;
if (c == '\n')
count++;
last = c;
}
if (ferror(fp)) {
perror("fgetc");
fclose(fp);
return -1;
}
if (saw_any && last != '\n')
count++;
fclose(fp);
return count;
}
This version fixes the first common c count lines bug: a non-empty file without a trailing newline still counts as one more logical line.
Why fgetc loses
The left404.com benchmark compared fgetc/fputc, fgets/fputs, and multiple fread/fwrite chunk sizes on a 150 MB file.
The ranking was brutal:
| Method | Chunk size | Time |
|---|---|---|
fgetc/fputc | 1 byte | 5.90s |
fgets/fputs | 64 bytes | 1.71s |
fread/fwrite | 65536 bytes | 0.63s |
Important nuance:
- that benchmark is a read/write pass, not a pure newline-count microbenchmark
- but it is still a strong proxy for the same I/O hierarchy
- the lesson survives intact:
fgetc vs fread performanceis not close on large files
Use fgetc when:
- the file is small
- you are teaching stdio basics
- code size matters more than throughput
Do not use it when performance is one of the reasons you chose C.
Method 2: fgets - The Portable Line-Oriented Standard
If your first instinct for c fgets count lines is this:
while (fgets(buf, sizeof buf, fp))
count++;
stop there.
That is only correct if you know every line fits into your buffer.
The long-line trap
The cppreference fgets page says fgets reads at most count - 1 characters and stops when it finds a newline or reaches end-of-file.
That means one logical line can arrive in multiple fgets calls if the line is longer than your buffer.
So correct c fgets count lines code looks like this:
#include <stdio.h>
#include <string.h>
#define BUF_SIZE 4096
long count_lines_fgets(const char *filename) {
FILE *fp = fopen(filename, "rb");
if (!fp) {
perror("fopen");
return -1;
}
long count = 0;
char buf[BUF_SIZE];
int saw_any = 0;
char last = '\n';
while (fgets(buf, sizeof buf, fp)) {
size_t len = strlen(buf);
if (len > 0) {
saw_any = 1;
last = buf[len - 1];
if (last == '\n')
count++;
}
}
if (ferror(fp)) {
perror("fgets");
fclose(fp);
return -1;
}
if (saw_any && last != '\n')
count++;
fclose(fp);
return count;
}
This is the portable c count lines windows linux solution when you want line-oriented code without depending on POSIX extensions.
fgets versus getline
getline(3) is documented by POSIX.1-2008, not ISO C.
That means:
- it is normal on Linux and macOS
- it is not a safe assumption on native Windows toolchains like MSVC
The man page also makes the allocation model explicit:
getlinecan allocate the buffer for you- it can
reallocit as needed - it returns the number of characters read, including the delimiter
If you are on POSIX and want arbitrarily long lines without manual fixed buffers, getline is excellent:
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
long count_lines_getline(const char *filename) {
FILE *fp = fopen(filename, "r");
if (!fp) {
perror("fopen");
return -1;
}
long count = 0;
char *line = NULL;
size_t cap = 0;
while (getline(&line, &cap, fp) != -1)
count++;
free(line);
if (ferror(fp)) {
perror("getline");
fclose(fp);
return -1;
}
fclose(fp);
return count;
}
But for maximum portability, fgets still wins.
If you are coming from C++, this exact portability question is one reason std::getline feels simpler than the C world.
Method 3: fread - The Best Cross-Platform Performance Default
This is the workhorse c fread count lines pattern:
#include <stdio.h>
#define CHUNK_SIZE 65536
long count_lines_fread(const char *filename) {
FILE *fp = fopen(filename, "rb");
if (!fp) {
perror("fopen");
return -1;
}
long count = 0;
unsigned char buf[CHUNK_SIZE];
size_t bytes_read;
int saw_any = 0;
unsigned char last = '\n';
while ((bytes_read = fread(buf, 1, sizeof buf, fp)) > 0) {
saw_any = 1;
for (size_t i = 0; i < bytes_read; i++) {
if (buf[i] == '\n')
count++;
}
last = buf[bytes_read - 1];
}
if (ferror(fp)) {
perror("fread");
fclose(fp);
return -1;
}
if (saw_any && last != '\n')
count++;
fclose(fp);
return count;
}
This is usually the strongest c count lines in file answer for production code because it combines:
- standard C file APIs
- explicit buffer sizing
- no per-character function call overhead
- clean Windows and Linux behavior in binary mode
Why the 64 KiB buffer keeps showing up
The left404.com benchmark found a flattening curve:
fread/fwrite chunk | Time |
|---|---|
4096 | 0.71s |
16384 | 0.64s |
65536 | 0.63s |
262144 | 0.66s |
So the practical lesson is not "64 KiB is universally magic."
It is:
- tiny chunks are bad
- medium-to-large chunks are much better
- the returns flatten somewhere around
16 KiBto64 KiBin that workload
That is a strong starting point for c fread count lines.
Why "rb" matters
When you count bytes directly, use binary mode.
On Windows, text mode may translate "\r\n" into "\n" for stdio text input. That is fine for logical lines, but it means your byte-level semantics are no longer raw file bytes.
For portable byte scanning, open with:
fopen(filename, "rb");
and count '\n'.
Method 4: mmap - The POSIX Mapped-File Fast Path
The Linux mmap(2) man page describes mmap as mapping file contents into the process virtual address space. It also says the file descriptor can be closed immediately after mmap() returns without invalidating the mapping.
That is what makes c mmap count lines attractive:
- no
freadloop - no explicit user-space copy buffer
- the kernel manages paging and read-ahead
Here is the standard skeleton:
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
long count_lines_mmap(const char *filename) {
int fd = open(filename, O_RDONLY);
if (fd == -1) {
perror("open");
return -1;
}
struct stat st;
if (fstat(fd, &st) == -1) {
perror("fstat");
close(fd);
return -1;
}
if (st.st_size == 0) {
close(fd);
return 0;
}
const unsigned char *data =
mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (data == MAP_FAILED) {
perror("mmap");
close(fd);
return -1;
}
#ifdef MADV_SEQUENTIAL
madvise((void *)data, st.st_size, MADV_SEQUENTIAL);
#endif
close(fd);
long count = 0;
for (off_t i = 0; i < st.st_size; i++) {
if (data[i] == '\n')
count++;
}
if (data[st.st_size - 1] != '\n')
count++;
munmap((void *)data, st.st_size);
return count;
}
The Linux madvise(2) page says MADV_SEQUENTIAL tells the kernel to expect sequential page references, so pages may be read ahead aggressively and freed soon after access.
That fits c mmap count lines well.
Important caveat: mmap is not a free win
mmap is often presented as the final boss of count lines in c.
That is too simple.
What mmap really gives you is a different cost model:
- page faults instead of explicit read calls
- kernel-managed readahead
- fewer explicit copies into your own buffer
It does not beat the fundamental lower bound.
You still have to inspect every byte. The lower bound is still O(n).
So c mmap count lines can be very good, but tuned fread is already close enough on many systems that you should benchmark before adding a platform-specific branch.
If you want the Rust version of this same tradeoff, the Rust line counting guide covers the same "buffered reader versus byte scan" decision from a systems-language angle.
Part 5: How wc -l Really Works
If you have ever asked how does wc -l work, the answer is closer to this:
#include <fcntl.h>
#include <unistd.h>
#define BUFFER_SIZE (16 * 1024)
long count_lines_read(const char *filename) {
int fd = open(filename, O_RDONLY);
if (fd == -1)
return -1;
long count = 0;
char buf[BUFFER_SIZE];
ssize_t n;
while ((n = read(fd, buf, sizeof buf)) > 0) {
for (ssize_t i = 0; i < n; i++) {
if (buf[i] == '\n')
count++;
}
}
close(fd);
return n < 0 ? -1 : count;
}
This is not a copy of GNU wc.c. It is the simplified shape.
The real GNU coreutils source confirms the important points:
BUFFER_SIZEis16 * 1024- it uses
safe_read - it calls
fdadvise(..., FADVISE_SEQUENTIAL) - for the lines-only fast path, it counts
'\n'bytes directly - for longer lines, it may switch to
memchr-based scanning
That is why wc -l usually embarrasses tutorial-style fgetc code.
It is doing fewer expensive things per byte.
If your interest is mostly shell-side usage rather than C internals, the Bash wc -l guide covers the command-line behavior, including the missing-final-newline trap.
Part 6: Windows and Cross-Platform Patterns
If you need c count lines windows linux, there are really two sane strategies.
Strategy 1: one portable code path
Use fread with "rb" mode everywhere.
This is the simplest answer when:
- you want one implementation
- you want solid performance
- you do not want separate POSIX and Win32 mapping code
That is why the production helper later defaults to fread.
Strategy 2: portable baseline plus platform-specific fast path
On Windows, the mapped-file equivalent of mmap uses CreateFileMapping and MapViewOfFile.
Microsoft's docs describe them this way:
CreateFileMappingAcreates or opens a file mapping object for a fileMapViewOfFilemaps a view of that object into the process address space
The skeleton looks like this:
#ifdef _WIN32
#include <windows.h>
long count_lines_win32_map(const char *filename) {
HANDLE hFile = CreateFileA(
filename,
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL
);
if (hFile == INVALID_HANDLE_VALUE)
return -1;
LARGE_INTEGER size;
if (!GetFileSizeEx(hFile, &size)) {
CloseHandle(hFile);
return -1;
}
if (size.QuadPart == 0) {
CloseHandle(hFile);
return 0;
}
HANDLE hMap = CreateFileMappingA(hFile, NULL, PAGE_READONLY, 0, 0, NULL);
if (!hMap) {
CloseHandle(hFile);
return -1;
}
const unsigned char *data = MapViewOfFile(hMap, FILE_MAP_READ, 0, 0, 0);
if (!data) {
CloseHandle(hMap);
CloseHandle(hFile);
return -1;
}
long count = 0;
for (LONGLONG i = 0; i < size.QuadPart; i++) {
if (data[i] == '\n')
count++;
}
if (data[size.QuadPart - 1] != '\n')
count++;
UnmapViewOfFile(data);
CloseHandle(hMap);
CloseHandle(hFile);
return count;
}
#endif
This is the right answer if you truly need a Windows-specific mapped fast path.
But it is not the right default answer for most teams.
Part 7: A Production-Ready C Line Counter
This helper keeps the production default simple:
freadfor the fast portable path- one result struct
- explicit handling for empty files and missing final newline
#ifndef LINE_COUNTER_H
#define LINE_COUNTER_H
#include <errno.h>
#include <stdio.h>
#include <string.h>
typedef struct {
long line_count;
size_t file_size;
int error;
char error_msg[256];
} lc_result_t;
lc_result_t lc_count(const char *filename);
#endif
#include "line_counter.h"
#define LC_CHUNK_SIZE 65536
lc_result_t lc_count(const char *filename) {
lc_result_t r = {0};
FILE *fp = fopen(filename, "rb");
if (!fp) {
r.error = errno;
snprintf(r.error_msg, sizeof r.error_msg,
"Cannot open '%s': %s", filename, strerror(errno));
return r;
}
unsigned char buf[LC_CHUNK_SIZE];
size_t n;
int saw_any = 0;
unsigned char last = '\n';
while ((n = fread(buf, 1, sizeof buf, fp)) > 0) {
saw_any = 1;
r.file_size += n;
for (size_t i = 0; i < n; i++) {
if (buf[i] == '\n')
r.line_count++;
}
last = buf[n - 1];
}
if (ferror(fp)) {
r.error = errno ? errno : 1;
snprintf(r.error_msg, sizeof r.error_msg,
"Read error on '%s'", filename);
fclose(fp);
return r;
}
if (saw_any && last != '\n')
r.line_count++;
fclose(fp);
return r;
}
That is the version I would actually ship first for c count lines in file.
Benchmark: What the Performance Ladder Really Means
The cleanest benchmark data I could verify for this article comes from the left404.com 150 MB disk-I/O test.
It is not a pure line-count benchmark, so treat it as a directional systems-I/O comparison, not a universal promise.
| Method | Buffer or unit | Time shape | What it tells you |
|---|---|---|---|
fgetc | 1 byte | slowest | avoid per-character overhead on large files |
fgets | line / fixed buffer | much better | portable and reasonable |
fread | 4 KiB to 64 KiB | best cross-platform tier | chunked byte scans are the sweet spot |
read() / wc style | 16 KiB | same general tier as tuned fread | lower-level loop, fewer abstractions |
mmap | mapped file | benchmark it | can be excellent, but not automatically dominant |
The important production conclusion is simple:
- use
freadfirst - use
mmaponly after measuring - do not treat
fgetsas "one call equals one line" unless line length is bounded - do not let newline-byte counting forget the final unterminated line
Quick FAQ
How do I count lines in C?
For most code, use fread in "rb" mode and count '\n' bytes in a medium-sized buffer, then add one more line if the non-empty file does not end with '\n'.
Why is fgetc slow for large files?
Because it does too little work per call. The fgetc vs fread performance gap comes from repeated function-call overhead and tiny effective units of work.
How do I count lines with mmap in C?
Map the file, scan the mapped bytes for '\n', and then fix the missing-final-newline case.
How does wc -l work?
The GNU source uses a low-level buffered read loop and counts '\n' directly. That is the core answer to how does wc -l work.
Should I use getline or fgets?
Use getline on POSIX when you need unbounded line length. Use fgets when you need standard-C portability.
What is the best fread buffer size?
Start around 16 KiB to 64 KiB. The exact sweet spot depends on your filesystem, kernel, CPU cache behavior, and workload.
What should I use on Windows and Linux?
For c count lines windows linux, use fread as the common denominator. Add mapping-based fast paths only if you really need them.
Sources Checked
left404.combenchmark comparingfgetc/fputc,fgets/fputs, and multiplefread/fwritechunk sizes on a 150 MB file: https://left404.com/2011/03/17/disk-io-in-c-avoid-fgetcfputc/- GNU coreutils
wc.csource showingBUFFER_SIZE (16 * 1024),safe_read,fdadvise(..., FADVISE_SEQUENTIAL), and direct newline counting: https://sources.debian.org/src/coreutils/8.32-4/src/wc.c getline(3)Linux man page confirming POSIX.1-2008 status and dynamic buffer semantics: https://man7.org/linux/man-pages/man3/getline.3.htmlmmap(2)Linux man page confirming file mapping behavior and that the file descriptor may be closed after mapping: https://man7.org/linux/man-pages/man2/mmap.2.htmlmadvise(2)Linux man page forMADV_SEQUENTIAL: https://man7.org/linux/man-pages/man2/madvise.2.htmlfgetsreference describingcount - 1reads, newline retention, and stop conditions: https://en.cppreference.com/c/io/fgets- Microsoft docs for
CreateFileMappingAandMapViewOfFile: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createfilemappinga and https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-mapviewoffile
Related Guides and Tools
- C++
ifstreamand iterator patterns - Rust BufReader vs byte scanning
- Bash
wc -lcommand - Python file I/O
- Java file line counting
- Line Counter tool
Need to count lines in a file right now, without writing a single line of C?
Paste the file into the Line Counter. No fgetc bottlenecks. No buffer-size tuning. Just the number.
Frequently Asked Questions
How do I count lines in C?
For portable code, use fgets or fread. For production performance across Windows and Linux, fread with a 64 KiB buffer is the strongest default.
Why is fgetc slow for large files?
It processes one character at a time and pays per-call overhead repeatedly. Even though stdio buffers internally, fgetc still loses badly to block-based scanning on large files.
How do I count lines with mmap in C?
Open the file, fstat it, map it with mmap, scan the mapped bytes for '\n', then add one more line if the non-empty file does not end with '\n'.
How does wc -l work internally?
GNU wc uses a low-level safe_read/read-style loop with a fixed buffer and counts '\n' bytes directly. It does not do fgets-style line parsing.
Should I use getline or fgets in C?
Use getline when you are on POSIX and want arbitrarily long lines without manual buffer management. Use fgets when you need standard-C portability, especially on native Windows toolchains.
What is the best buffer size for fread line counting?
A 16 KiB to 64 KiB region is a good starting point. External benchmarks often show the returns flattening after that range.
How do I count lines in C on Windows and Linux?
Use fread in rb mode for the portable fast path. Add a POSIX mmap fast path or a Windows file-mapping fast path only if the extra complexity is worth it.
Related Guides
12 min read
How to Count Lines in a File in C++ (And Two Classic Traps That Catch Everyone)
Count lines in a file in C++ using ifstream, istreambuf_iterator, fread, and mmap. Covers the while(!eof()) off-by-one trap, the single-pass istreambuf_iterator reset issue, and why newline-byte counters must fix the missing final newline case.
16 min read
How to Count Lines in a File Using Rust (The Right Way, and the Fast Way)
Count lines in a file using Rust — from .lines().count() to zero-allocation byte scanning. Covers the 8KB buffer trap, String allocation overhead, and concurrent multi-file processing with Rayon.
16 min read
How to Count Lines in Bash: The Complete Guide with Edge Cases
Master line counting in Bash: count lines in files, variables, command output, and directories. Covers wc -l pitfalls, empty files, filenames with spaces, and shell script usage.
20 min read
How to Count Lines in Python: 7 Methods, Benchmarked and Battle-Tested
Count lines in Python strings, text files, large files, and directories. Includes real performance benchmarks, empty file handling, splitlines vs split, and production-ready functions.