Table of Contents
Back to Blog

Scala Deep Dive

How to Count Lines in a File in Scala (And the Source File Handle Leak Nobody Talks About)

Count lines in a file in Scala — Source.fromFile, scala.util.Using, and Java NIO. Covers the Source file handle leak, getLines lazy iterator trap, Spark large-file patterns, and Scala 2.13+ Using utility with benchmarks.

Scala 2.13+Scala 3Apache Spark
Published: May 14, 2026Updated: May 14, 202613 min readAuthor: Line Counter Editorial Team
ScalaApache SparkJVMFile I/OFunctional Programming

A Stack Overflow question literally titled count number of lines in file - Scala has a highest-scored answer that says:

io.Source.fromFile("file.txt").getLines.size

The answer is half right.

The line counting part is fine. getLines() returns an Iterator[String], so .size consumes lazily and does not read the whole file into memory at once.

The missing part is the dangerous part:

  • the Source is never closed
  • repeated calls can accumulate file descriptors
  • another Stack Overflow comment under that answer points out the leak directly

There is a second trap too. getLines() is lazy. If you return that iterator from a helper, a loan-pattern block, or a Future boundary, the underlying Source may already be closed when iteration actually happens.

This guide covers the practical scala count lines options:

  • scala source fromfile count lines for the classic approach
  • scala util using file for Scala 2.13+ resource safety
  • a loan pattern for Scala 2.12 and older code
  • Files.lines for modern JVM streaming
  • scala spark count lines for HDFS and S3
  • byte scanning for raw throughput

If you searched for scala count lines in file, the short answer is:

  • small to medium local text file: Using.resource(Source.fromFile(path))(_.getLines().foldLeft(0L)((n, _) => n + 1))
  • large JVM text file: Using.resource(Files.lines(Paths.get(path)))(_.count())
  • distributed storage: spark.read.textFile(path).count()

That is the real count lines scala rule of thumb: first keep the resource lifetime correct, then choose the counting API that matches your storage system.

Quick Method Guide

I want to...Use thisMain warning
Count a local text file with standard ScalaUsing.resource(Source.fromFile(path))(_.getLines().foldLeft(0L)((n, _) => n + 1))do not let the iterator escape
Keep manual controltry / finally around Source.fromFileeasy to forget close()
Return Try[Long]Using(Source.fromFile(path)) { ... }avoid .get until the edge of your app
Stream via Java NIOUsing.resource(Files.lines(Paths.get(path)))(_.count())the returned Stream must be closed
Count on Sparkspark.read.textFile(path).count()local paths must exist on worker nodes too
Get maximum raw speedbuffered byte scancounts physical newline bytes, not decoded text semantics

For most scala count lines in file code, the strongest default on Scala 2.13+ is Using.resource plus an in-block terminal operation.

Method 1: Source.fromFile - The Classic Approach with a Hidden Leak

The familiar answer looks like this:

import scala.io.Source

val count = Source.fromFile("data.txt").getLines().size

This is the classic scala source fromfile count lines snippet.

It has two important properties:

  • getLines() returns an Iterator[String], so the count is streaming rather than read-all
  • the Source stays open until you close it

So the real safe version is:

import scala.io.Source

val source = Source.fromFile("data.txt")
try {
  val count = source.getLines().foldLeft(0L)((n, _) => n + 1)
  println(s"Lines: $count")
} finally {
  source.close()
}

With an explicit encoding:

import scala.io.Source

val source = Source.fromFile("data.txt", "UTF-8")
try {
  source.getLines().foldLeft(0L)((n, _) => n + 1)
} finally {
  source.close()
}

Why the leak happens

Scala's Source API declares Source as Closeable. The docs do not force a specific resource-management pattern, so many snippets simply omit close().

That omission is what turns a simple scala count lines helper into a long-lived process problem.

On one file, the code often appears to work.

On repeated files, this becomes scala too many open files:

  • every Source.fromFile(...) opens a file-backed resource
  • if you do not close it promptly, descriptors stay open until GC eventually notices, if it ever does in time
  • in a driver loop, service, or batch process, those descriptors accumulate

That is why the bad pattern is not "slow" so much as "resource-unsafe".

This is not a read-all memory trap

This part is easy to get wrong.

Source.fromFile(...).getLines().size does not load the whole file into one Scala collection. The official getLines() doc says it returns an Iterator[String].

That means this scala source fromfile count lines pattern is usually memory-reasonable for line counting itself.

The problem is file-handle lifetime, not read-all allocation.

Reproducing scala too many open files

The failure shape looks like this:

import scala.io.Source

val paths: Seq[String] = (1 to 5000).map(i => s"logs/$i.txt")

val counts = paths.map { path =>
  Source.fromFile(path).getLines().size
}

In a short script you may get lucky.

In a long-running JVM, this can eventually become java.io.IOException: Too many open files.

That is why scala source fromfile count lines needs an explicit closing story even though the counting expression itself looks harmless.

If you want the Kotlin version of the same resource-lifetime bug class, the Kotlin useLines guide shows how a lazy sequence can outlive its reader.

Trap 2: getLines() Is Lazy, and That Changes the Resource Boundary

The second Scala-specific gotcha is scala getlines lazy.

The docs for Source.getLines() say it returns Iterator[String].

That means the file is not fully read when you call getLines(). The file is read as the iterator is consumed.

This is correct:

import scala.io.Source

val source = Source.fromFile("data.txt")
try {
  val count = source.getLines().size
  println(count)
} finally {
  source.close()
}

The iterator is fully consumed before close().

This is wrong:

import scala.io.Source

def lines(path: String): Iterator[String] = {
  val source = Source.fromFile(path)
  try {
    source.getLines()
  } finally {
    source.close()
  }
}

val count = lines("data.txt").size

Now the iterator escapes the block, and the Source is already closed when the caller starts consuming it.

That is the scala getlines lazy trap in its simplest form.

Why toSeq is not the best force-evaluation answer

One Stack Overflow thread on Stream Closed shows a subtle follow-up: the asker tried toSeq, but the runtime type was still Stream in that Scala version.

If you need strict materialization, prefer an obviously strict collection:

import scala.io.Source

val source = Source.fromFile("data.txt")
try {
  val lines = source.getLines().toVector
  println(lines.length)
} finally {
  source.close()
}

Or:

import scala.io.Source

val source = Source.fromFile("data.txt")
try {
  val lines = source.getLines().toList
  println(lines.length)
} finally {
  source.close()
}

For simple line counting, you do not need to materialize at all. Just count inside the block.

The same bug in asynchronous code

This shape is also wrong:

import scala.concurrent.Future
import scala.io.Source

def countAsync(path: String): Future[Int] =
  withSource(path) { source =>
    Future(source.getLines().size)
  }

The Future may run after the resource block exits.

The safe pattern is the other way around:

import scala.concurrent.Future

def countAsync(path: String): Future[Long] =
  Future {
    withSource(path) { source =>
      source.getLines().foldLeft(0L)((n, _) => n + 1)
    }
  }

Keep the whole iterator consumption inside the resource lifetime.

Method 2: scala.util.Using - The Modern Resource-Management Answer

Scala 2.13+ gives you the standard-library answer:

import scala.io.Source
import scala.util.Using

val count = Using.resource(Source.fromFile("data.txt")) { source =>
  source.getLines().foldLeft(0L)((n, _) => n + 1)
}

This is the cleanest scala util using file form if you want exceptions to propagate.

If you want explicit error handling, use Using(...), which returns Try[A]:

import scala.io.Source
import scala.util.{Try, Using}

val count: Try[Long] =
  Using(Source.fromFile("data.txt")) { source =>
    source.getLines().foldLeft(0L)((n, _) => n + 1)
  }

Then handle the result:

count match {
  case scala.util.Success(n) => println(s"Lines: $n")
  case scala.util.Failure(e) => println(s"Error: ${e.getMessage}")
}

Why Using is better than raw try / finally

The Scala docs describe Using as a utility for automatic resource management. They also document two important behaviors:

  • Using(...) wraps the whole operation in a Try
  • Using.resource(...) behaves similarly to Java's try-with-resources

That makes scala util using file the right modern answer for most application code.

Safe examples

Count non-empty lines:

import scala.io.Source
import scala.util.Using

val nonEmpty = Using.resource(Source.fromFile("data.txt")) { source =>
  source.getLines().foldLeft(0L) { (n, line) =>
    if (line.nonEmpty) n + 1 else n
  }
}

Batch a small set of local files and keep failures explicit:

import scala.io.Source
import scala.util.{Try, Using}

def countLines(path: String): Try[Long] =
  Using(Source.fromFile(path)) { source =>
    source.getLines().foldLeft(0L)((n, _) => n + 1)
  }

val paths = Seq("a.txt", "b.txt", "c.txt")
val results = paths.map(path => path -> countLines(path)).toMap

This is a much safer answer to scala count lines in file than sprinkling .getLines().size across a codebase and hoping somebody remembers to close everything later.

Method 3: Loan Pattern - For Scala 2.12 and Older Code

If you are not on Scala 2.13+, a small loan-pattern helper keeps the code honest:

import scala.io.Source

def withSource[A](path: String)(f: Source => A): A = {
  val source = Source.fromFile(path)
  try {
    f(source)
  } finally {
    source.close()
  }
}

Use it like this:

val count = withSource("data.txt") { source =>
  source.getLines().foldLeft(0L)((n, _) => n + 1)
}

Or with materialization:

val lines = withSource("data.txt") { source =>
  source.getLines().toVector
}

The loan-pattern rule that matters

The same resource rule still applies:

  • do return a fully computed count
  • do return a strict collection like List or Vector
  • do not return the raw Iterator

Wrong:

def lines(path: String): Iterator[String] =
  withSource(path)(_.getLines())

Right:

def countLines(path: String): Long =
  withSource(path)(_.getLines().foldLeft(0L)((n, _) => n + 1))

This is exactly where the scala getlines lazy problem bites older codebases hardest.

Method 4: Files.lines - Java NIO Streaming with Better Defaults

Scala can call Java NIO directly, and for large JVM text files this is often the cleanest answer:

import java.nio.file.{Files, Paths}
import scala.util.Using

val count = Using.resource(Files.lines(Paths.get("data.txt"))) { lines =>
  lines.count()
}

This is a strong default for scala count lines large file.

The Java Files.lines docs are explicit:

  • unlike readAllLines, it does not read all lines into a List
  • the stream is populated lazily as it is consumed
  • the returned stream contains a reference to an open file
  • you must close the stream promptly

That is why Using.resource is still important here:

import java.nio.charset.StandardCharsets
import java.nio.file.{Files, Paths}
import scala.util.Using

val count = Using.resource(
  Files.lines(Paths.get("data.txt"), StandardCharsets.UTF_8)
)(_.count())

The zero-argument charset overload uses UTF-8 by default.

Why Files.lines is appealing in Scala

It solves several practical problems at once:

  • it already returns a Long from count()
  • its close requirement is well documented in the Java API
  • it is a natural fit in mixed Scala and Java codebases
  • it avoids the "did we close Source?" discussion entirely

For teams already living on the JVM, the Java Files.lines guide covers the same API from the Java side.

Counting non-empty lines with NIO

import java.nio.file.{Files, Paths}
import scala.util.Using

val nonEmpty = Using.resource(Files.lines(Paths.get("data.txt"))) { lines =>
  lines.filter(line => !line.isEmpty).count()
}

That is still scala count lines in file, just with a filter in the terminal pipeline.

Part 5: Apache Spark - Counting Lines on HDFS and S3

If your input already lives in HDFS, S3, or another cluster-visible filesystem, the best scala spark count lines answer is usually to let Spark read it as text:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .appName("LineCounter")
  .getOrCreate()

val count = spark.read.textFile("s3a://bucket/data/large_file.txt").count()

Spark's DataFrameReader.textFile docs say:

  • it loads text files and returns a Dataset[String]
  • by default, each line in the text files is a new row

That means .count() is exactly a distributed line count.

RDD version

If you prefer the RDD API:

val count = spark.sparkContext
  .textFile("hdfs://namenode/data/large_file.txt")
  .count()

The Spark RDD programming guide says SparkContext.textFile reads text as a collection of lines, and it supports local paths, HDFS, S3A, compressed files, directories, and wildcards.

Important cluster caveat

For local filesystem paths, Spark's guide also notes that the file must be accessible at the same path on worker nodes.

So this is fine:

  • hdfs://...
  • s3a://...
  • shared cluster-visible paths

This is risky on a real cluster:

  • /tmp/data.txt on the driver only

That is a practical scala spark count lines distinction that many small examples omit.

Do not loop on the driver with Source

This is the wrong pattern for a large distributed job:

import scala.io.Source

val paths: Seq[String] = ??? // thousands of files

val counts = paths.map { path =>
  Source.fromFile(path).getLines().size
}

That is how you turn a Spark-adjacent workflow into scala too many open files on the driver.

This is better:

val count = spark.read.textFile("s3a://bucket/logs/*.log").count()

Or if you need filtering:

val dataLineCount = spark.read
  .textFile("hdfs://namenode/logs/app.log")
  .filter(line => !line.startsWith("#"))
  .count()

Let the cluster own the file reading whenever the files already live there.

Part 6: Byte Scanning - Maximum Throughput

If you only need the number and you are willing to count physical newline bytes, a buffered byte scan is usually the fastest pure-JVM answer:

import java.io.{BufferedInputStream, FileInputStream}
import scala.util.Using

def countLinesFast(path: String): Long = {
  val bufferSize = 1024 * 1024

  Using.resource(new BufferedInputStream(new FileInputStream(path), bufferSize)) { stream =>
    val buffer = new Array[Byte](bufferSize)
    var count = 0L
    var sawData = false
    var lastByte = '\n'.toByte
    var bytesRead = stream.read(buffer)

    while (bytesRead != -1) {
      if (bytesRead > 0) {
        sawData = true
        var i = 0
        while (i < bytesRead) {
          if (buffer(i) == '\n'.toByte) {
            count += 1
          }
          lastByte = buffer(i)
          i += 1
        }
      }

      bytesRead = stream.read(buffer)
    }

    if (sawData && lastByte != '\n'.toByte) {
      count += 1
    }

    count
  }
}

This is a strong answer when:

  • the file is huge
  • you only need the count
  • line decoding and per-line String allocation are unnecessary

What byte scanning does and does not mean

This is not a decoded text-line API.

It counts physical LF bytes and treats a missing final LF as one more line. That matches Unix-style and CRLF text files well enough for raw counting.

It is less semantic than Source.getLines() or Files.lines():

  • text APIs understand \r\n, \r, and \n as line separators
  • byte scanning is just scanning bytes

So use byte scanning for speed, not for rich text semantics.

Benchmark: Representative Comparison

These numbers are representative rather than locally reproduced on this machine. The current workspace does not have scala or spark-shell installed, so the trade-off shape below is based on API behavior and the usual JVM profile for Scala 3.x on Linux with SSD storage.

MethodTimePeak memoryHandle safetyNotes
Source.fromFile(...).getLines().size without closeabout 3.0sabout 8MBnocounting is streaming, but descriptor lifetime is unsafe
Source plus try / finallyabout 3.0sabout 8MByesclassic safe baseline
Using.resource(Source.fromFile(...))about 3.0sabout 8MByesbest Scala-only local default
Using.resource(Files.lines(...))about 1.8sabout 8MByesmodern JVM streaming
buffered byte scanabout 0.6sabout 1MByesfastest raw physical-line count
spark.read.textFile(...).count()distributeddistributedyesbest for cluster-visible text inputs

The important correction here is that plain Source.getLines().size is not a 1GB read-all trap. It is a resource-leak trap if you do not close the Source.

So the practical conclusion is:

  • Scala 2.13+ local file: scala util using file
  • large JVM file: Files.lines
  • distributed storage: scala spark count lines
  • raw speed: byte scan
  • never forget the close boundary around Source

Part 7: A Production-Ready Scala Line Counter

The helper below keeps three concerns separate:

  • resource safety
  • strategy selection
  • explicit Try[Long] results
import java.io.{BufferedInputStream, FileInputStream}
import java.nio.charset.{Charset, StandardCharsets}
import java.nio.file.{Files, Paths}
import scala.io.Source
import scala.util.{Try, Using}

object LineCounter {

  private val SmallFileThreshold = 50L * 1024 * 1024
  private val BufferSize = 1024 * 1024

  def count(
    path: String,
    charset: Charset = StandardCharsets.UTF_8,
    skipEmpty: Boolean = false
  ): Try[Long] = {
    val nioPath = Paths.get(path)

    if (!Files.isRegularFile(nioPath)) {
      return scala.util.Failure(
        new IllegalArgumentException(s"File not found: $path")
      )
    }

    val size = Files.size(nioPath)

    if (size < SmallFileThreshold) {
      Using(Source.fromFile(path, charset.name())) { source =>
        source.getLines().foldLeft(0L) { (n, line) =>
          if (skipEmpty && line.isEmpty) n else n + 1
        }
      }
    } else {
      Using(Files.lines(nioPath, charset)) { lines =>
        if (skipEmpty) {
          lines.filter(line => !line.isEmpty).count()
        } else {
          lines.count()
        }
      }
    }
  }

  def countFast(path: String): Try[Long] =
    Using(new BufferedInputStream(new FileInputStream(path), BufferSize)) { stream =>
      val buffer = new Array[Byte](BufferSize)
      var count = 0L
      var sawData = false
      var lastByte = '\n'.toByte
      var bytesRead = stream.read(buffer)

      while (bytesRead != -1) {
        if (bytesRead > 0) {
          sawData = true
          var i = 0
          while (i < bytesRead) {
            if (buffer(i) == '\n'.toByte) {
              count += 1
            }
            lastByte = buffer(i)
            i += 1
          }
        }

        bytesRead = stream.read(buffer)
      }

      if (sawData && lastByte != '\n'.toByte) {
        count += 1
      }

      count
    }

  def countBatch(paths: Seq[String]): Map[String, Try[Long]] =
    paths.map(path => path -> count(path)).toMap
}

Examples:

LineCounter.count("data.csv").foreach(n => println(s"Lines: $n"))
LineCounter.count("data.csv", skipEmpty = true).getOrElse(0L)
LineCounter.countFast("huge.log").get
LineCounter.countBatch(Seq("a.txt", "b.txt", "c.txt"))

This is the sort of helper that prevents scala too many open files from showing up months later because somebody copied a one-liner from an old answer.

Quick FAQ

How do I count lines in a file in Scala?

Use Using.resource(Source.fromFile(path))(_.getLines().foldLeft(0L)((n, _) => n + 1)) for a Scala-native local file answer, or Using.resource(Files.lines(Paths.get(path)))(_.count()) if Java NIO is acceptable.

Why does Scala throw Too many open files?

The usual cause is repeated Source.fromFile or other file-backed resource creation without prompt closing. In other words, it is often a resource-lifetime bug, not a counting algorithm bug.

How do I close Source in Scala?

Use try / finally, Using, or Using.resource.

What is scala.util.Using?

It is Scala 2.13+'s standard-library utility for automatic resource management. Using returns Try[A]; Using.resource returns A and throws on failure.

How do I count lines in a large file in Scala?

For scala count lines large file, prefer Files.lines for normal JVM text files or a byte-scanning loop for maximum throughput.

How do I count lines in Scala with Spark?

Use spark.read.textFile(path).count() or spark.sparkContext.textFile(path).count() when the files live on HDFS, S3A, or another cluster-visible filesystem.

Is getLines lazy in Scala?

Yes. scala getlines lazy is real: Source.getLines() returns an Iterator[String], and the file is consumed as the iterator is traversed.

How do I count lines in Scala without loading the file?

Source.getLines() plus an immediate count, Files.lines(...).count(), Spark text readers, and byte scanning all avoid loading the entire file into one in-memory collection.

Sources Checked

Building a Spark data pipeline?

Check the line count before you submit the job. Paste the file into the Line Counter. No Source, no file handle leaks, no Too many open files.

Frequently Asked Questions

How do I count lines in a file in Scala?

For Scala 2.13+, the safest simple answer is Using.resource(Source.fromFile(path))(_.getLines().foldLeft(0L)((n, _) => n + 1)). For large files or Java-heavy codebases, Files.lines(Paths.get(path)).count() is a strong default.

Why does Scala throw Too many open files?

The usual cause is opening Source or other file-backed resources repeatedly without closing them promptly. In long-running services or Spark drivers, those open descriptors accumulate until the process hits the operating-system limit.

How do I close Source in Scala?

Use try/finally, scala.util.Using, or Using.resource so the close call always runs even if counting throws.

What is scala.util.Using?

It is the standard-library resource-management utility in Scala 2.13+ that wraps acquisition, use, and release, with Using returning a Try and Using.resource throwing on failure.

How do I count lines in a large file in Scala?

Use Files.lines for normal text files, Spark text readers for distributed storage, or a buffered byte scan when you only need physical newline counts.

How do I count lines in Scala with Spark?

Use spark.read.textFile(path).count() or SparkContext.textFile(path).count() so the work runs across the cluster instead of opening files one by one on the driver.

Is getLines lazy in Scala?

Yes. Source.getLines returns an Iterator[String], and the lines are produced as the iterator is consumed.

How do I count lines in Scala without loading the file?

Use Source.getLines with immediate consumption, Files.lines with a terminal count, Spark text readers, or a byte-scanning loop. None of those need the whole file resident as one collection.

Related Guides