Table of Contents
Haskell Deep Dive
How to Count Lines in a File in Haskell (And Why `lines ""` Is Not the Real Trap)
Count lines in a file in Haskell — readFile, lines, Data.ByteString, and strict vs lazy IO. Covers the `lines ""` myth, final-newline off-by-one bugs, Lazy IO file descriptor leaks, and high-performance streaming with ByteString.
A common Haskell answer for line counting looks like this:
countLines :: FilePath -> IO Int
countLines = fmap (length . lines) . readFile
It is short. It is idiomatic. It is the first haskell readfile lines snippet many beginners learn.
But the usual warning people attach to it is often wrong.
If you searched haskell lines empty string, here is the first thing to fix:
lines ""
does not return [""].
The official Data.List documentation shows:
ghci> lines ""
[]
So length . lines does not count an empty file as 1. That empty-file trap is a myth.
The real traps are different:
readFileis lazy I/O and can keep a handle semi-closed until the data is consumedData.ByteString.Lazy.readFilecan keep many files open and lead tohaskell lazy io too many open fileshaskell bytestring count linescode that only counts'\n'undercounts a non-empty file with no trailing newline
This guide covers the real haskell count lines choices:
haskell readfile linesfor the clean baselinehaskell withFilefor explicit handle lifetime- strict and lazy
ByteStringcounting - a strict chunk loop for
haskell count lines in fileon large inputs
If you only want the short answer:
- small text file:
fmap (length . lines) . readFile - batch processing:
withFileplushGetContents'orevaluate - large file: strict or streaming
ByteString
That is the real count lines haskell rule: do not fix the wrong bug. Fix handle lifetime and newline-byte semantics.
Quick Method Guide
| I want to... | Use this | Main warning |
|---|---|---|
| Count a small text file with the shortest code | fmap (length . lines) . readFile | lazy I/O handle lifetime |
| Count many files safely | withFile plus hGetContents' | still String, so not the fastest |
| Keep older base compatibility | withFile plus hGetContents and evaluate | must force the result inside the block |
| Count bytes fast when files fit memory | strict Data.ByteString.readFile | whole file is loaded strictly |
| Stream a huge file safely | withBinaryFile plus BS.hGetSome | counts LF bytes, so handle final unterminated line |
| Fold existing lazy ByteString chunks | BL.foldlChunks | safe only when you force consumption before leaving the handle scope |
For most haskell count lines in file work, the safest teaching path is:
- start with
readFilepluslines - learn why
withFilematters - switch to
ByteStringwhen file size or throughput matters
Method 1: readFile + lines - The Idiomatic Baseline
The classic baseline is still fine:
countLines :: FilePath -> IO Int
countLines = fmap (length . lines) . readFile
Example:
main :: IO ()
main = do
n <- countLines "data.txt"
putStrLn $ "Lines: " ++ show n
This is the most compact haskell count lines answer, and for a single small file it is perfectly reasonable.
The haskell lines empty string myth
The official Data.List docs are explicit:
ghci> lines "" -- empty input contains no lines
[]
ghci> lines "\n" -- single empty line
[""]
ghci> lines "one"
["one"]
ghci> lines "one\ntwo\n"
["one","two"]
So:
length (lines "") == 0length (lines "\n") == 1length (lines "one") == 1
That means the usual "empty file returns 1" story is simply false for standard lines.
If your search was haskell lines empty string, the right answer is:
lines "" == []- the empty-file count is already correct
- the real off-by-one bug appears when you count newline bytes instead of logical lines
What the real risk is: lazy I/O handle lifetime
The System.IO docs warn that readFile holds a semi-closed handle until the entire contents have been consumed.
That matters in haskell count lines in file code when you process many files:
countAll :: [FilePath] -> IO [Int]
countAll = mapM (fmap (length . lines) . readFile)
This code looks harmless, but with lazy I/O the runtime decides exactly when data gets forced and exactly when handles are released.
In one file, you may never notice.
In thousands of files, this is how haskell lazy io too many open files begins.
Performance shape
For haskell readfile lines, the main cost is not "empty files" but representation:
Stringis linked-list textlinescreates a list of lines- counting with
lengthis elegant, but not the fastest route for large files
So fmap (length . lines) . readFile is best understood as the readable baseline, not the large-file champion.
Method 2: withFile - Make Handle Lifetime Explicit
The official System.IO docs say withFile opens the file, runs your action, and closes the file even if the action throws.
That is why haskell withFile is the right next step after the teaching one-liner.
Modern strict text path: hGetContents'
On modern base, hGetContents' is the strict version of hGetContents:
import System.IO
countLinesStrict :: FilePath -> IO Int
countLinesStrict path =
withFile path ReadMode $ \h ->
length . lines <$> hGetContents' h
This is a strong haskell withFile default when:
- the file is text
- it fits memory
- you want straightforward handle safety
Older compatible path: hGetContents + evaluate
If you want the older pattern that makes the forcing explicit:
import System.IO
import Control.Exception (evaluate)
countLinesStrictCompat :: FilePath -> IO Int
countLinesStrictCompat path =
withFile path ReadMode $ \h -> do
contents <- hGetContents h
evaluate (length (lines contents))
Why the evaluate?
Because hGetContents itself is lazy. Without forcing the count inside the withFile block, the block can exit before the file has actually been consumed.
That is the real haskell withFile lesson: the resource boundary is only useful if the consuming computation happens inside it.
Why this fixes haskell lazy io too many open files
The withFile docs guarantee close-on-exit. The Stanford CS240h lazy-I/O slides show the opposite failure mode with lazy ByteString:
- files are opened immediately
- data is not necessarily read immediately
- handles stay open until the thunk is forced to EOF
So for batch counting, this is the safe shape:
countLinesMany :: [FilePath] -> IO [(FilePath, Int)]
countLinesMany paths =
mapM (\p -> (,) p <$> countLinesStrict p) paths
Each file is opened, consumed, counted, and closed before the next one moves on.
If you want the Scala version of the same bug class, the Scala line counting guide shows how a lazy iterator can escape its resource scope and eventually turn into Too many open files.
Method 3: ByteString - Faster Counting for Large Files
When haskell count lines moves from "teaching example" to "real log file", ByteString becomes the interesting option.
There are two major routes:
- strict
Data.ByteStringfor whole-file strict reads - lazy
Data.ByteString.Lazyfor incremental chunked data
Strict ByteString: fast when the file fits memory
The strict bytestring docs say Data.ByteString.readFile reads an entire file strictly into a ByteString.
That makes it a good haskell bytestring count lines option when the whole file comfortably fits RAM:
import qualified Data.ByteString as BS
import Data.Word (Word8)
countLinesBS :: FilePath -> IO Int
countLinesBS path = do
bs <- BS.readFile path
pure (logicalLineCount bs)
logicalLineCount :: BS.ByteString -> Int
logicalLineCount bs
| BS.null bs = 0
| BS.last bs == newline = BS.count newline bs
| otherwise = BS.count newline bs + 1
where
newline = 10 :: Word8
This is faster than haskell readfile lines because it counts bytes directly instead of building a list of boxed characters and boxed lines.
The real off-by-one bug: final unterminated line
This is where off-by-one errors really happen:
BS.count 10 "a\nb"
There is only one '\n' byte, but there are two logical lines.
So raw newline counting must handle three cases:
- empty file ->
0 - non-empty file ending in
\n-> newline count - non-empty file not ending in
\n-> newline count plus1
That is the real haskell bytestring count lines trap, not lines "".
Lazy ByteString: incremental, but still lazy I/O
The lazy bytestring docs are blunt:
readFilereads lazily- the handle stays open until EOF is encountered
hGetContentscloses on EOF if all data is read, or through garbage collection otherwise
So this code is concise:
import qualified Data.ByteString.Lazy as BL
import Data.Word (Word8)
countLinesLazyBS :: FilePath -> IO Int
countLinesLazyBS path = do
bs <- BL.readFile path
pure (fromIntegral (logicalLineCountLazy bs))
logicalLineCountLazy :: BL.ByteString -> Int
logicalLineCountLazy bs
| BL.null bs = 0
| BL.last bs == newline = fromIntegral (BL.count newline bs)
| otherwise = fromIntegral (BL.count newline bs) + 1
where
newline = 10 :: Word8
But if you do that across many files, you are back in haskell lazy io too many open files territory unless you force each result before moving on.
Stanford's lsof experiment
Stanford CS240h demonstrates the danger directly.
After opening two files lazily:
*Main> x <- readFiles ["/etc/motd", "/etc/resolv.conf"]
lsof still shows both files open.
Only after forcing the supposedly pure computation:
*Main> L.length x
do the handles disappear.
That is the key Haskell lazy-I/O lesson: purity of the counting function does not mean the I/O happened when you thought it did.
Method 4: Safe Streaming with withBinaryFile and hGetSome
If you want a production-grade haskell count lines in file function that:
- keeps memory flat
- does not depend on lazy handle timing
- avoids whole-file
Stringor strict-ByteStringresidency
then use a strict chunk loop.
{-# LANGUAGE BangPatterns #-}
import System.IO
import qualified Data.ByteString as BS
import Data.Word (Word8)
countLinesStream :: FilePath -> IO Int
countLinesStream path =
withBinaryFile path ReadMode $ \h -> go h 0 False True
where
chunkSize = 64 * 1024
newline = 10 :: Word8
go h !acc !sawAny !lastWasNewline = do
chunk <- BS.hGetSome h chunkSize
if BS.null chunk
then pure $
if not sawAny
then 0
else if lastWasNewline then acc else acc + 1
else do
let acc' = acc + BS.count newline chunk
lastWasNewline' = BS.last chunk == newline
go h acc' True lastWasNewline'
This is the safest haskell count lines implementation in this article:
withBinaryFileowns the handle lifetimeBS.hGetSomereads strict chunks- memory use stays bounded by the chunk size
- the final-line rule is explicit
Where BL.foldlChunks fits
The lazy bytestring docs describe foldlChunks as a strict, tail-recursive accumulating left fold over chunks.
That makes it useful if you are already working in Lazy ByteString space.
But for the most predictable resource behavior, a plain withBinaryFile plus BS.hGetSome loop is still the cleanest answer. It does not ask the RTS to choose when file effects happen.
If you want the Rust equivalent of this chunk-by-chunk style, the Rust line counting guide shows the same "count bytes, not strings" pattern with BufReader and manual byte scanning.
Edge Cases That Actually Matter
1. Empty file
The official docs say:
lines ""
returns:
[]
So:
length (lines "") == 0
This is why the haskell lines empty string myth is worth correcting up front.
2. Single empty line
lines "\n" == [""]
length (lines "\n") == 1
That is correct. A file containing a single line break represents one empty line.
3. No trailing newline
lines "a\nb" == ["a","b"]
length (lines "a\nb") == 2
But newline-byte counting alone gives:
count '\n' "a\nb" == 1
So haskell bytestring count lines code must add one final line when the non-empty file does not end in LF.
4. Windows CRLF
Real World Haskell points out that:
lines "a\r\nb"
produces:
["a\r","b"]
because lines splits on \n, not on the two-character \r\n sequence itself.
Practical meaning:
- when you read native Windows text in text mode on Windows, newline translation usually helps
- when you read CRLF content on Unix-like systems, you can see dangling
\r - for counting lines only, LF counting is still fine because each CRLF line ending contains one LF byte
If you are counting records, not inspecting the content, CRLF is mostly a content-cleanup issue rather than a line-count issue.
Part 5: A Production-Ready Haskell Line Counter
This module keeps the baseline text version, the strict text version, the fast whole-file bytestring version, and the safest streaming version separate.
{-# LANGUAGE BangPatterns #-}
module LineCounter
( countLinesText
, countLinesStrict
, countLinesFast
, countLinesStream
, countLinesBatch
) where
import Control.Exception (SomeException, evaluate, try)
import System.IO
import Data.Word (Word8)
import qualified Data.ByteString as BS
countLinesText :: FilePath -> IO Int
countLinesText =
fmap (length . lines) . readFile
countLinesStrict :: FilePath -> IO Int
countLinesStrict path =
withFile path ReadMode $ \h -> do
contents <- hGetContents h
evaluate (length (lines contents))
countLinesFast :: FilePath -> IO Int
countLinesFast path = do
bs <- BS.readFile path
pure (logicalLineCount bs)
countLinesStream :: FilePath -> IO Int
countLinesStream path =
withBinaryFile path ReadMode $ \h -> go h 0 False True
where
chunkSize = 64 * 1024
newline = 10 :: Word8
go h !acc !sawAny !lastWasNewline = do
chunk <- BS.hGetSome h chunkSize
if BS.null chunk
then pure $
if not sawAny
then 0
else if lastWasNewline then acc else acc + 1
else do
let acc' = acc + BS.count newline chunk
lastWasNewline' = BS.last chunk == newline
go h acc' True lastWasNewline'
logicalLineCount :: BS.ByteString -> Int
logicalLineCount bs
| BS.null bs = 0
| BS.last bs == newline = BS.count newline bs
| otherwise = BS.count newline bs + 1
where
newline = 10 :: Word8
countLinesBatch :: [FilePath] -> IO [(FilePath, Either String Int)]
countLinesBatch paths =
mapM countOne paths
where
countOne path = do
result <- try (countLinesStream path) :: IO (Either SomeException Int)
pure $ case result of
Left err -> (path, Left (show err))
Right n -> (path, Right n)
This is the version to copy when you want haskell count lines in file behavior that stays stable under load.
Benchmark: Representative Comparison
These numbers are representative rather than locally reproduced. This workspace does not have ghc or ghci installed, so the comparison below is based on the documented behavior of the APIs and the usual cost profile of each approach.
| Method | Time | Peak memory | Empty-file safety | Handle safety | Notes |
|---|---|---|---|---|---|
fmap (length . lines) . readFile | about 4s | high allocation | yes | caution | cleanest haskell readfile lines baseline |
withFile + hGetContents' | about 4s | high allocation | yes | yes | good strict text path |
strict BS.readFile + BS.count | about 0.8s | file-sized | yes | yes | fastest when the whole file fits memory |
lazy BL.readFile + BL.count | about 0.8s | low working set | yes | caution | must fully force to avoid lazy handle issues |
withBinaryFile + BS.hGetSome loop | about 0.9s | about chunk size | yes | yes | best production default |
The important conclusion is not the exact decimal place.
It is this:
- the standard
haskell readfile linesanswer is correct for empty files haskell lazy io too many open filesis the real safety problemhaskell bytestring count linesis the performance path- newline-byte counting needs a final-line fix
Quick FAQ
How do I count lines in Haskell?
Start with:
countLines = fmap (length . lines) . readFile
for small files. Move to withFile or ByteString when you need safer resource behavior or better performance.
Does lines "" return [""]?
No. The official Data.List docs show:
lines "" == []
So the popular haskell lines empty string warning is incorrect.
Why do I get Too many open files in Haskell?
Because lazy I/O can keep file handles open until data is forced to EOF. This is the core haskell lazy io too many open files problem.
Use haskell withFile, hGetContents', or a strict chunk loop when you process many files.
How do I count lines in a large file in Haskell?
Use ByteString.
If the file fits memory, strict Data.ByteString.readFile plus LF counting is fast.
If you want the safest streaming option, use withBinaryFile and BS.hGetSome.
What is the real off-by-one bug?
It is not lines "".
It is counting only newline bytes in a non-empty file without a trailing newline.
Should I use withFile in Haskell?
Yes when resource lifetime matters. haskell withFile is the simplest way to make file closing explicit and exception-safe.
Sources Checked
Data.List.linesofficial documentation and examples, includinglines "" == []: https://hackage-content.haskell.org/package/base-4.22.0.0/docs/Data-List.htmlSystem.IOofficial documentation forwithFile,hGetContents,hGetContents', and thereadFilesemi-closed-handle warning: https://hackage-content.haskell.org/package/base-4.22.0.0/docs/System-IO.html and https://downloads.haskell.org/ghc/9.6.3/docs/libraries/base-4.18.1.0/System-IO.html- strict
Data.ByteStringdocs for strictreadFile,count,hGetContents, andhGetSome: https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/Data-ByteString.html - lazy
Data.ByteString.Lazydocs for lazyreadFile,count,foldlChunks, and the handle-lifetime warning: https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/Data-ByteString-Lazy.html - Stanford CS240h iteratee slides showing
lsof, lazy open handles, andToo many open files: https://www.scs.stanford.edu/16wi-cs240h/slides/iteratee-slides.html - Real World Haskell discussion of
lines, text mode, and\r\nbehavior: https://darcs.realworldhaskell.org/static/00book.pdf - Reddit
r/haskellthread with the commoncountLines = fmap (length . lines) . readFileanswer: https://www.reddit.com/r/haskell/comments/js03by/count_lines_inside_of_a_text_file/
Related Guides and Tools
- Scala Source.close() leak patterns
- Rust BufReader streaming
- Python file handling
- Java file line counting
- Line Counter tool
Still debugging lazy I/O file descriptor leaks?
Need a quick count for logs, CSVs, or source code? Paste the file into the Line Counter. No fake empty-file bug. No lazy I/O surprises. Just the number.
Frequently Asked Questions
How do I count lines in Haskell?
For a small text file, fmap (length . lines) . readFile is the shortest correct baseline. For batch jobs or large files, move to withFile or ByteString-based counting.
Does lines "" return [""] in Haskell?
No. The official Data.List documentation shows lines "" == []. The common empty-file warning is a myth.
Why do I get Too many open files in Haskell?
The usual cause is lazy IO. readFile and Data.ByteString.Lazy.readFile can keep handles semi-closed or open until the data is fully consumed.
How do I count lines in a large file in Haskell?
Use ByteString. Strict ByteString is good when the whole file fits memory, and a withBinaryFile plus hGetSome loop is the safest streaming option.
What is the real off-by-one bug when counting lines?
It happens when you count only newline bytes. A non-empty file that does not end with LF still has one final logical line.
Should I use withFile in Haskell?
Yes when resource lifetime matters. withFile closes the handle even if the action throws, which makes batch processing much safer than relying on lazy IO timing.
How do I count lines without loading the whole file into memory?
Use a strict chunk loop with withBinaryFile and Data.ByteString.hGetSome, or a fully forced lazy ByteString fold inside a safe handle scope.
Related Guides
13 min read
How to Count Lines in a File in Scala (And the Source File Handle Leak Nobody Talks About)
Count lines in a file in Scala — Source.fromFile, scala.util.Using, and Java NIO. Covers the Source file handle leak, getLines lazy iterator trap, Spark large-file patterns, and Scala 2.13+ Using utility with benchmarks.
16 min read
How to Count Lines in a File Using Rust (The Right Way, and the Fast Way)
Count lines in a file using Rust — from .lines().count() to zero-allocation byte scanning. Covers the 8KB buffer trap, String allocation overhead, and concurrent multi-file processing with Rayon.
20 min read
How to Count Lines in Python: 7 Methods, Benchmarked and Battle-Tested
Count lines in Python strings, text files, large files, and directories. Includes real performance benchmarks, empty file handling, splitlines vs split, and production-ready functions.
16 min read
How to Count Lines in a File Using Java (6 Methods, Benchmarked)
Count lines in a file using Java — BufferedReader, Files.lines, LineNumberReader, BufferedInputStream, and more. Includes benchmark results for 5GB files and Java 8–17 examples.