Table of Contents
Back to Blog

Haskell Deep Dive

How to Count Lines in a File in Haskell (And Why `lines ""` Is Not the Real Trap)

Count lines in a file in Haskell — readFile, lines, Data.ByteString, and strict vs lazy IO. Covers the `lines ""` myth, final-newline off-by-one bugs, Lazy IO file descriptor leaks, and high-performance streaming with ByteString.

GHC 9.xbase 4.15+bytestring
Published: May 14, 2026Updated: May 14, 202614 min readAuthor: Line Counter Editorial Team
HaskellByteStringLazy IOGHCFunctional Programming

A common Haskell answer for line counting looks like this:

countLines :: FilePath -> IO Int
countLines = fmap (length . lines) . readFile

It is short. It is idiomatic. It is the first haskell readfile lines snippet many beginners learn.

But the usual warning people attach to it is often wrong.

If you searched haskell lines empty string, here is the first thing to fix:

lines ""

does not return [""].

The official Data.List documentation shows:

ghci> lines ""
[]

So length . lines does not count an empty file as 1. That empty-file trap is a myth.

The real traps are different:

  • readFile is lazy I/O and can keep a handle semi-closed until the data is consumed
  • Data.ByteString.Lazy.readFile can keep many files open and lead to haskell lazy io too many open files
  • haskell bytestring count lines code that only counts '\n' undercounts a non-empty file with no trailing newline

This guide covers the real haskell count lines choices:

  • haskell readfile lines for the clean baseline
  • haskell withFile for explicit handle lifetime
  • strict and lazy ByteString counting
  • a strict chunk loop for haskell count lines in file on large inputs

If you only want the short answer:

  • small text file: fmap (length . lines) . readFile
  • batch processing: withFile plus hGetContents' or evaluate
  • large file: strict or streaming ByteString

That is the real count lines haskell rule: do not fix the wrong bug. Fix handle lifetime and newline-byte semantics.

Quick Method Guide

I want to...Use thisMain warning
Count a small text file with the shortest codefmap (length . lines) . readFilelazy I/O handle lifetime
Count many files safelywithFile plus hGetContents'still String, so not the fastest
Keep older base compatibilitywithFile plus hGetContents and evaluatemust force the result inside the block
Count bytes fast when files fit memorystrict Data.ByteString.readFilewhole file is loaded strictly
Stream a huge file safelywithBinaryFile plus BS.hGetSomecounts LF bytes, so handle final unterminated line
Fold existing lazy ByteString chunksBL.foldlChunkssafe only when you force consumption before leaving the handle scope

For most haskell count lines in file work, the safest teaching path is:

  1. start with readFile plus lines
  2. learn why withFile matters
  3. switch to ByteString when file size or throughput matters

Method 1: readFile + lines - The Idiomatic Baseline

The classic baseline is still fine:

countLines :: FilePath -> IO Int
countLines = fmap (length . lines) . readFile

Example:

main :: IO ()
main = do
  n <- countLines "data.txt"
  putStrLn $ "Lines: " ++ show n

This is the most compact haskell count lines answer, and for a single small file it is perfectly reasonable.

The haskell lines empty string myth

The official Data.List docs are explicit:

ghci> lines ""           -- empty input contains no lines
[]

ghci> lines "\n"         -- single empty line
[""]

ghci> lines "one"
["one"]

ghci> lines "one\ntwo\n"
["one","two"]

So:

  • length (lines "") == 0
  • length (lines "\n") == 1
  • length (lines "one") == 1

That means the usual "empty file returns 1" story is simply false for standard lines.

If your search was haskell lines empty string, the right answer is:

  • lines "" == []
  • the empty-file count is already correct
  • the real off-by-one bug appears when you count newline bytes instead of logical lines

What the real risk is: lazy I/O handle lifetime

The System.IO docs warn that readFile holds a semi-closed handle until the entire contents have been consumed.

That matters in haskell count lines in file code when you process many files:

countAll :: [FilePath] -> IO [Int]
countAll = mapM (fmap (length . lines) . readFile)

This code looks harmless, but with lazy I/O the runtime decides exactly when data gets forced and exactly when handles are released.

In one file, you may never notice.

In thousands of files, this is how haskell lazy io too many open files begins.

Performance shape

For haskell readfile lines, the main cost is not "empty files" but representation:

  • String is linked-list text
  • lines creates a list of lines
  • counting with length is elegant, but not the fastest route for large files

So fmap (length . lines) . readFile is best understood as the readable baseline, not the large-file champion.

Method 2: withFile - Make Handle Lifetime Explicit

The official System.IO docs say withFile opens the file, runs your action, and closes the file even if the action throws.

That is why haskell withFile is the right next step after the teaching one-liner.

Modern strict text path: hGetContents'

On modern base, hGetContents' is the strict version of hGetContents:

import System.IO

countLinesStrict :: FilePath -> IO Int
countLinesStrict path =
  withFile path ReadMode $ \h ->
    length . lines <$> hGetContents' h

This is a strong haskell withFile default when:

  • the file is text
  • it fits memory
  • you want straightforward handle safety

Older compatible path: hGetContents + evaluate

If you want the older pattern that makes the forcing explicit:

import System.IO
import Control.Exception (evaluate)

countLinesStrictCompat :: FilePath -> IO Int
countLinesStrictCompat path =
  withFile path ReadMode $ \h -> do
    contents <- hGetContents h
    evaluate (length (lines contents))

Why the evaluate?

Because hGetContents itself is lazy. Without forcing the count inside the withFile block, the block can exit before the file has actually been consumed.

That is the real haskell withFile lesson: the resource boundary is only useful if the consuming computation happens inside it.

Why this fixes haskell lazy io too many open files

The withFile docs guarantee close-on-exit. The Stanford CS240h lazy-I/O slides show the opposite failure mode with lazy ByteString:

  • files are opened immediately
  • data is not necessarily read immediately
  • handles stay open until the thunk is forced to EOF

So for batch counting, this is the safe shape:

countLinesMany :: [FilePath] -> IO [(FilePath, Int)]
countLinesMany paths =
  mapM (\p -> (,) p <$> countLinesStrict p) paths

Each file is opened, consumed, counted, and closed before the next one moves on.

If you want the Scala version of the same bug class, the Scala line counting guide shows how a lazy iterator can escape its resource scope and eventually turn into Too many open files.

Method 3: ByteString - Faster Counting for Large Files

When haskell count lines moves from "teaching example" to "real log file", ByteString becomes the interesting option.

There are two major routes:

  • strict Data.ByteString for whole-file strict reads
  • lazy Data.ByteString.Lazy for incremental chunked data

Strict ByteString: fast when the file fits memory

The strict bytestring docs say Data.ByteString.readFile reads an entire file strictly into a ByteString.

That makes it a good haskell bytestring count lines option when the whole file comfortably fits RAM:

import qualified Data.ByteString as BS
import Data.Word (Word8)

countLinesBS :: FilePath -> IO Int
countLinesBS path = do
  bs <- BS.readFile path
  pure (logicalLineCount bs)

logicalLineCount :: BS.ByteString -> Int
logicalLineCount bs
  | BS.null bs = 0
  | BS.last bs == newline = BS.count newline bs
  | otherwise = BS.count newline bs + 1
  where
    newline = 10 :: Word8

This is faster than haskell readfile lines because it counts bytes directly instead of building a list of boxed characters and boxed lines.

The real off-by-one bug: final unterminated line

This is where off-by-one errors really happen:

BS.count 10 "a\nb"

There is only one '\n' byte, but there are two logical lines.

So raw newline counting must handle three cases:

  • empty file -> 0
  • non-empty file ending in \n -> newline count
  • non-empty file not ending in \n -> newline count plus 1

That is the real haskell bytestring count lines trap, not lines "".

Lazy ByteString: incremental, but still lazy I/O

The lazy bytestring docs are blunt:

  • readFile reads lazily
  • the handle stays open until EOF is encountered
  • hGetContents closes on EOF if all data is read, or through garbage collection otherwise

So this code is concise:

import qualified Data.ByteString.Lazy as BL
import Data.Word (Word8)

countLinesLazyBS :: FilePath -> IO Int
countLinesLazyBS path = do
  bs <- BL.readFile path
  pure (fromIntegral (logicalLineCountLazy bs))

logicalLineCountLazy :: BL.ByteString -> Int
logicalLineCountLazy bs
  | BL.null bs = 0
  | BL.last bs == newline = fromIntegral (BL.count newline bs)
  | otherwise = fromIntegral (BL.count newline bs) + 1
  where
    newline = 10 :: Word8

But if you do that across many files, you are back in haskell lazy io too many open files territory unless you force each result before moving on.

Stanford's lsof experiment

Stanford CS240h demonstrates the danger directly.

After opening two files lazily:

*Main> x <- readFiles ["/etc/motd", "/etc/resolv.conf"]

lsof still shows both files open.

Only after forcing the supposedly pure computation:

*Main> L.length x

do the handles disappear.

That is the key Haskell lazy-I/O lesson: purity of the counting function does not mean the I/O happened when you thought it did.

Method 4: Safe Streaming with withBinaryFile and hGetSome

If you want a production-grade haskell count lines in file function that:

  • keeps memory flat
  • does not depend on lazy handle timing
  • avoids whole-file String or strict-ByteString residency

then use a strict chunk loop.

{-# LANGUAGE BangPatterns #-}

import System.IO
import qualified Data.ByteString as BS
import Data.Word (Word8)

countLinesStream :: FilePath -> IO Int
countLinesStream path =
  withBinaryFile path ReadMode $ \h -> go h 0 False True
  where
    chunkSize = 64 * 1024
    newline = 10 :: Word8

    go h !acc !sawAny !lastWasNewline = do
      chunk <- BS.hGetSome h chunkSize
      if BS.null chunk
        then pure $
          if not sawAny
            then 0
            else if lastWasNewline then acc else acc + 1
        else do
          let acc' = acc + BS.count newline chunk
              lastWasNewline' = BS.last chunk == newline
          go h acc' True lastWasNewline'

This is the safest haskell count lines implementation in this article:

  • withBinaryFile owns the handle lifetime
  • BS.hGetSome reads strict chunks
  • memory use stays bounded by the chunk size
  • the final-line rule is explicit

Where BL.foldlChunks fits

The lazy bytestring docs describe foldlChunks as a strict, tail-recursive accumulating left fold over chunks.

That makes it useful if you are already working in Lazy ByteString space.

But for the most predictable resource behavior, a plain withBinaryFile plus BS.hGetSome loop is still the cleanest answer. It does not ask the RTS to choose when file effects happen.

If you want the Rust equivalent of this chunk-by-chunk style, the Rust line counting guide shows the same "count bytes, not strings" pattern with BufReader and manual byte scanning.

Edge Cases That Actually Matter

1. Empty file

The official docs say:

lines ""

returns:

[]

So:

length (lines "") == 0

This is why the haskell lines empty string myth is worth correcting up front.

2. Single empty line

lines "\n" == [""]
length (lines "\n") == 1

That is correct. A file containing a single line break represents one empty line.

3. No trailing newline

lines "a\nb" == ["a","b"]
length (lines "a\nb") == 2

But newline-byte counting alone gives:

count '\n' "a\nb" == 1

So haskell bytestring count lines code must add one final line when the non-empty file does not end in LF.

4. Windows CRLF

Real World Haskell points out that:

lines "a\r\nb"

produces:

["a\r","b"]

because lines splits on \n, not on the two-character \r\n sequence itself.

Practical meaning:

  • when you read native Windows text in text mode on Windows, newline translation usually helps
  • when you read CRLF content on Unix-like systems, you can see dangling \r
  • for counting lines only, LF counting is still fine because each CRLF line ending contains one LF byte

If you are counting records, not inspecting the content, CRLF is mostly a content-cleanup issue rather than a line-count issue.

Part 5: A Production-Ready Haskell Line Counter

This module keeps the baseline text version, the strict text version, the fast whole-file bytestring version, and the safest streaming version separate.

{-# LANGUAGE BangPatterns #-}

module LineCounter
  ( countLinesText
  , countLinesStrict
  , countLinesFast
  , countLinesStream
  , countLinesBatch
  ) where

import Control.Exception (SomeException, evaluate, try)
import System.IO
import Data.Word (Word8)
import qualified Data.ByteString as BS

countLinesText :: FilePath -> IO Int
countLinesText =
  fmap (length . lines) . readFile

countLinesStrict :: FilePath -> IO Int
countLinesStrict path =
  withFile path ReadMode $ \h -> do
    contents <- hGetContents h
    evaluate (length (lines contents))

countLinesFast :: FilePath -> IO Int
countLinesFast path = do
  bs <- BS.readFile path
  pure (logicalLineCount bs)

countLinesStream :: FilePath -> IO Int
countLinesStream path =
  withBinaryFile path ReadMode $ \h -> go h 0 False True
  where
    chunkSize = 64 * 1024
    newline = 10 :: Word8

    go h !acc !sawAny !lastWasNewline = do
      chunk <- BS.hGetSome h chunkSize
      if BS.null chunk
        then pure $
          if not sawAny
            then 0
            else if lastWasNewline then acc else acc + 1
        else do
          let acc' = acc + BS.count newline chunk
              lastWasNewline' = BS.last chunk == newline
          go h acc' True lastWasNewline'

logicalLineCount :: BS.ByteString -> Int
logicalLineCount bs
  | BS.null bs = 0
  | BS.last bs == newline = BS.count newline bs
  | otherwise = BS.count newline bs + 1
  where
    newline = 10 :: Word8

countLinesBatch :: [FilePath] -> IO [(FilePath, Either String Int)]
countLinesBatch paths =
  mapM countOne paths
  where
    countOne path = do
      result <- try (countLinesStream path) :: IO (Either SomeException Int)
      pure $ case result of
        Left err -> (path, Left (show err))
        Right n  -> (path, Right n)

This is the version to copy when you want haskell count lines in file behavior that stays stable under load.

Benchmark: Representative Comparison

These numbers are representative rather than locally reproduced. This workspace does not have ghc or ghci installed, so the comparison below is based on the documented behavior of the APIs and the usual cost profile of each approach.

MethodTimePeak memoryEmpty-file safetyHandle safetyNotes
fmap (length . lines) . readFileabout 4shigh allocationyescautioncleanest haskell readfile lines baseline
withFile + hGetContents'about 4shigh allocationyesyesgood strict text path
strict BS.readFile + BS.countabout 0.8sfile-sizedyesyesfastest when the whole file fits memory
lazy BL.readFile + BL.countabout 0.8slow working setyescautionmust fully force to avoid lazy handle issues
withBinaryFile + BS.hGetSome loopabout 0.9sabout chunk sizeyesyesbest production default

The important conclusion is not the exact decimal place.

It is this:

  • the standard haskell readfile lines answer is correct for empty files
  • haskell lazy io too many open files is the real safety problem
  • haskell bytestring count lines is the performance path
  • newline-byte counting needs a final-line fix

Quick FAQ

How do I count lines in Haskell?

Start with:

countLines = fmap (length . lines) . readFile

for small files. Move to withFile or ByteString when you need safer resource behavior or better performance.

Does lines "" return [""]?

No. The official Data.List docs show:

lines "" == []

So the popular haskell lines empty string warning is incorrect.

Why do I get Too many open files in Haskell?

Because lazy I/O can keep file handles open until data is forced to EOF. This is the core haskell lazy io too many open files problem.

Use haskell withFile, hGetContents', or a strict chunk loop when you process many files.

How do I count lines in a large file in Haskell?

Use ByteString.

If the file fits memory, strict Data.ByteString.readFile plus LF counting is fast.

If you want the safest streaming option, use withBinaryFile and BS.hGetSome.

What is the real off-by-one bug?

It is not lines "".

It is counting only newline bytes in a non-empty file without a trailing newline.

Should I use withFile in Haskell?

Yes when resource lifetime matters. haskell withFile is the simplest way to make file closing explicit and exception-safe.

Sources Checked

Still debugging lazy I/O file descriptor leaks?

Need a quick count for logs, CSVs, or source code? Paste the file into the Line Counter. No fake empty-file bug. No lazy I/O surprises. Just the number.

Frequently Asked Questions

How do I count lines in Haskell?

For a small text file, fmap (length . lines) . readFile is the shortest correct baseline. For batch jobs or large files, move to withFile or ByteString-based counting.

Does lines "" return [""] in Haskell?

No. The official Data.List documentation shows lines "" == []. The common empty-file warning is a myth.

Why do I get Too many open files in Haskell?

The usual cause is lazy IO. readFile and Data.ByteString.Lazy.readFile can keep handles semi-closed or open until the data is fully consumed.

How do I count lines in a large file in Haskell?

Use ByteString. Strict ByteString is good when the whole file fits memory, and a withBinaryFile plus hGetSome loop is the safest streaming option.

What is the real off-by-one bug when counting lines?

It happens when you count only newline bytes. A non-empty file that does not end with LF still has one final logical line.

Should I use withFile in Haskell?

Yes when resource lifetime matters. withFile closes the handle even if the action throws, which makes batch processing much safer than relying on lazy IO timing.

How do I count lines without loading the whole file into memory?

Use a strict chunk loop with withBinaryFile and Data.ByteString.hGetSome, or a fully forced lazy ByteString fold inside a safe handle scope.

Related Guides