Table of Contents
Back to Blog

Ruby Deep Dive

How to Count Lines in a File Using Ruby (And the Encoding Trap Nobody Warns You About)

Count lines in a file using Ruby — File.foreach, readlines, IO.read, and wc -l. Covers the invalid byte sequence trap, memory issues, and Rails-safe patterns with benchmarks.

Ruby 2.7+Ruby 3.3Rails 5.2+
Published: May 14, 2026Updated: May 14, 202613 min readAuthor: Line Counter Editorial Team
RubyRailsActiveStorageFile I/OPerformance

You are building a CSV importer in Rails. Before the job starts, you want a progress bar. That means you need the total line count first.

The obvious version is:

total = File.readlines("data.csv").count

That works on a 10MB file. On a 500MB file, it can blow up memory because the whole file becomes an array of Ruby strings.

The streaming fix is:

total = File.foreach("data.csv").count

That works for file size. It can still fail on encoding. On modern UTF-8-oriented systems, a file with invalid bytes often raises an ArgumentError with the message invalid byte sequence in UTF-8 during line iteration. In transcoding paths, you may also see Encoding::InvalidByteSequenceError ruby style failures.

This guide covers the practical ruby count lines choices:

  • File.readlines.count for small known files.
  • File.foreach count lines for a streaming default.
  • File.foreach(..., encoding: "binary").count for unknown encodings.
  • chunked binary reads for ruby count lines large file work.
  • wc -l for Unix fast paths, without the shell bug from the common Stack Overflow snippet.
  • rails count lines activestorage patterns for uploads and background jobs.

If you only need a number now, use the Line Counter tool. If you are wiring this into a Ruby script, a Rails importer, or a data job, the details below are where the traps live.

If your search was literally ruby count lines in file, the shortest honest answer is File.foreach(path, encoding: "binary").count.

Quick Method Guide

I want to...Use thisMain warning
Count a small fileFile.readlines(path).countloads the whole file
Stream a normal fileFile.foreach(path).countcan fail on invalid bytes
Count a mixed-encoding fileFile.foreach(path, encoding: "binary").countline content stays binary
Count a large file fast in pure Rubychunked read plus count("\n")more code
Use Unix wc -lOpen3.capture2("wc", "-l", path)not portable to Windows
Count a Rails uploadattachment.open plus binary countingattachment must exist after commit

For most ruby count lines in file code, the safest default is File.foreach(path, encoding: "binary").count. It is streaming, cross-platform, and does not assume the file is valid UTF-8 text.

For count lines ruby tasks where the file size is unknown, that is a better default than trying to guess whether readlines will fit in memory.

Method 1: File.readlines.count - Simple but Memory-Hungry

The shortest answer is:

count = File.readlines("data.txt").count

Equivalent forms:

lines = File.readlines("data.txt")
count = lines.length

count = File.readlines("data.txt").count { |line| line.strip != "" }

With basic validation:

def count_lines_small_file(file_path)
  raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)

  File.readlines(file_path).count
end

Ruby's IO documentation says readlines returns an array of all lines read from the stream. That is the whole trade-off. Simplicity is high. Memory use is high too.

File sizeUse File.readlines?Why
Under 50MByessimplest code
50MB to 200MBmaybememory grows fast
Over 200MBavoidarray and string overhead becomes expensive
Around 1GBnolikely memory pressure or OOM

File.readlines.count is fine for tiny fixtures, config files, and developer scripts. It is not a production answer for user uploads or multi-hundred-MB logs.

You can see the object pressure directly:

require "objspace"

before = ObjectSpace.memsize_of_all
lines = File.readlines("large_file.txt")
after = ObjectSpace.memsize_of_all

puts "Memory used: #{(after - before) / 1024.0 / 1024.0} MB"

That is the Ruby version of the same read-all trap seen in PHP and Python guides: a simple API that quietly turns a file into an in-memory collection.

The streaming version is the first serious answer to ruby count lines:

count = File.foreach("data.txt").count

Equivalent forms:

count = 0
File.foreach("data.txt") { count += 1 }

count = File.foreach("data.txt").inject(0) { |c, _line| c + 1 }

count = File.foreach("data.txt").count { |line| !line.chomp.empty? }

The inject style is common in older answers, but modern Ruby makes the intent clearer with count. For File.foreach count lines code, count is the better default because the reader immediately knows the block is just a predicate or a raw tally.

Why foreach behaves differently

Ruby's IO documentation says foreach(...) yields each successive line and returns an enumerator when no block is given.

enum = File.foreach("data.txt")
count = enum.count

That means the file is read lazily instead of becoming one array:

File.open("data.txt") do |file|
  file.each_line do |line|
    # only the current line is in play
  end
end

This is why File.foreach count lines is the normal production answer when you want low memory use without writing a custom loop.

For ruby count lines in file jobs that run in Rails workers, cron scripts, or import pipelines, File.foreach count lines also keeps GC pressure predictable.

The encoding trap

This is the Ruby-specific trap most tutorials skip.

On a typical UTF-8-oriented Ruby setup, a file with invalid text bytes can fail during line iteration:

File.foreach("data.txt").count
# ArgumentError: invalid byte sequence in UTF-8

Depending on how data is being converted, you may also see Encoding::InvalidByteSequenceError ruby exceptions in related code paths. The real problem is the same in both cases: Ruby is treating bytes as text in an encoding they do not actually match.

Common triggers:

  • Windows-generated CSV or log files in a legacy encoding.
  • mixed text and binary bytes in application logs.
  • Latin-1 or Shift_JIS exports from older systems.
  • user uploads where the file encoding is unknown.

The safest fix for line counting is binary mode:

count = File.foreach("data.txt", encoding: "binary").count

This is the practical ruby encoding binary count lines pattern. Ruby's IO docs show that binary mode uses ASCII-8BIT semantics, which avoids text transcoding and still lets \n act as the line separator byte.

If you keep one phrase from this article, keep this ruby encoding binary count lines fix: count bytes in binary mode unless you truly need decoded text.

If you know the external encoding, you can be explicit instead:

count = File.foreach("data.txt", encoding: "ISO-8859-1").count
count = File.foreach("data.txt", encoding: "Windows-31J").count

Or you can replace invalid bytes:

count = File.foreach(
  "data.txt",
  encoding: "UTF-8",
  invalid: :replace,
  undef: :replace
).count

For raw line counts, the default recommendation is still binary:

def count_lines(file_path)
  raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)

  File.foreach(file_path, encoding: "binary").count
end

That is the cleanest fix for ruby count lines in file code where you only care about \n, not decoded characters.

It is also the most repeatable ruby count lines in file pattern across macOS, Linux, Rails uploads, and older exported data files.

Method 3: IO.read and Chunked Reads - Good Primitive, Bad Default

A common idea is:

count = IO.read("data.txt").count("\n")

This is fast on small files because String#count("\n") is efficient. But IO.read opens the file, reads its content, and returns a string. That means you are back to read-all memory behavior.

For a small file, that is fine. For ruby count lines large file code, it is the same class of problem as File.readlines.

The better version: chunked binary reads

def count_lines_chunked(file_path, chunk_size: 64 * 1024)
  raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)

  count = 0
  last_chunk = nil

  File.open(file_path, "rb") do |file|
    while (chunk = file.read(chunk_size))
      count += chunk.count("\n")
      last_chunk = chunk
    end
  end

  if last_chunk && !last_chunk.empty? && last_chunk[-1] != "\n"
    count += 1
  end

  count
end

Why this is good:

  • fixed memory footprint
  • no line-by-line Ruby string allocation
  • no UTF-8 assumptions
  • correct for files without a trailing newline

This is the fastest pure-Ruby answer in many cases. It is especially good when you only need a number and not the line contents.

Why count("\n") beats split("\n").length

String#count("\n") scans bytes. split("\n") allocates an array and many strings.

The semantic trap is also different: Ruby split drops trailing empty fields by default, so it is not a reliable logical line counter for strings with blank trailing lines.

Recommended string counting:

def count_lines_in_string(text)
  return 0 if text.empty?

  count = text.count("\n")
  text.end_with?("\n") ? count : count + 1
end

Recommended text-line counting when you want Ruby's line semantics:

count = text.lines.count

Method 4: wc -l via Shell - Fastest on Unix, Dangerous if Written Carelessly

The common Stack Overflow shape looks like this:

count = %x{wc -l #{filename}}.split.first.to_i

This has three problems:

  • a path with spaces breaks unless it is escaped correctly
  • a path with shell metacharacters can become a command injection bug
  • when you pass the filename directly, the output includes the filename too

Safer shell string version with Shellwords

require "shellwords"

def count_lines_wc(file_path)
  raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)

  escaped = Shellwords.escape(file_path)
  `wc -l < #{escaped}`.strip.to_i
end

Ruby's Shellwords.escape documentation says it escapes a string so it can be safely used in a Bourne shell command line, and that the returned string should be used unquoted. That is exactly why wc -l < #{escaped} is the right shell-string form.

This is the direct answer to ruby wc -l shellwords usage.

Safer no-shell version with Open3

If you do not need redirection, avoid the shell entirely:

require "open3"

def count_lines_wc_safest(file_path)
  raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)

  stdout, status = Open3.capture2("wc", "-l", file_path)
  raise "wc failed" unless status.success?

  stdout.split.first.to_i
end

Ruby's Open3.capture2 docs say that when you pass an executable and arguments separately, Ruby invokes the executable directly, with no shell and no shell expansion. That is why this form is safer than backticks.

The missing trailing newline problem

wc -l counts newline characters, not logical lines. GNU and POSIX wc documentation describe line counts in terms of newline characters.

That means a file like this is undercounted:

line1
line2
line3

If the final line does not end with \n, raw wc -l returns 2, not 3.

Fix it like this:

require "open3"

def count_lines_wc_fixed(file_path)
  raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)

  return 0 if File.zero?(file_path)

  stdout, status = Open3.capture2("wc", "-l", file_path)
  raise "wc failed" unless status.success?

  count = stdout.split.first.to_i

  File.open(file_path, "rb") do |file|
    file.seek(-1, IO::SEEK_END)
    last_byte = file.read(1)
    last_byte == "\n" ? count : count + 1
  end
end

This keeps the fast Unix path while fixing the newline edge case.

When to use wc

Use wc -l when all of these are true:

  • you are on Linux, macOS, or another Unix-like system
  • wc is available
  • you want raw speed
  • you can tolerate a Unix-only fast path in your codebase

Do not make it the only method in a cross-platform Ruby library.

Benchmark: Representative Method Comparison

These numbers are representative for a Ruby 3.3, macOS, SSD setup with a 500MB text file of roughly 5 million lines. Treat them as benchmark shape, not a universal promise.

MethodTimePeak memoryEncoding-safeCross-platform
File.readlines.countabout 4.2sabout 900MBriskyyes
File.foreach.countabout 2.8sabout 1MBrisky on bad bytesyes
File.foreach(..., encoding: "binary").countabout 2.8sabout 1MByesyes
File.foreach.inject(0)about 2.9sabout 1MBdepends on encodingyes
chunked binary count("\n")about 1.2sabout 64KByesyes
wc -l fixedabout 0.4svery lowyesUnix only

The practical takeaway:

  • general default: File.foreach(path, encoding: "binary").count
  • ruby count lines large file, pure Ruby: chunked binary reads
  • Unix internal tooling: fixed wc -l
  • tiny files: File.readlines.count

That summary is the modern count lines ruby answer: stream first, force binary when the bytes are untrusted, and only reach for shell tools when the platform and deployment model allow it.

Part 6: Counting Lines in Rails with ActiveStorage

Rails upload flows have a different constraint: the file may live in local storage, S3, or another service. You need a tempfile first.

Rails guides recommend the attachment's open method when you want a blob downloaded to disk for external processing:

# app/services/file_line_counter.rb

class FileLineCounter
  def self.count(attachment)
    raise ArgumentError, "Attachment is missing" unless attachment&.attached?

    attachment.open do |tempfile|
      return count_path(tempfile.path)
    end
  end

  def self.count_path(file_path)
    File.foreach(file_path, encoding: "binary").count
  end

  private_class_method :count_path
end

Controller usage:

class ImportsController < ApplicationController
  def create
    @import = Import.new(import_params)

    if @import.save
      total_lines = FileLineCounter.count(@import.file)
      @import.update!(total_lines: total_lines)

      ImportJob.perform_later(@import.id)

      render json: {
        import_id: @import.id,
        total_lines: total_lines,
        status: "queued"
      }
    else
      render json: { errors: @import.errors.full_messages }, status: :unprocessable_entity
    end
  end
end

Background job usage:

class ImportJob < ApplicationJob
  def perform(import_id)
    import = Import.find(import_id)
    processed = 0

    import.file.open do |tempfile|
      File.foreach(tempfile.path, encoding: "binary") do |line|
        process_line(line)
        processed += 1

        if (processed % 1000).zero?
          import.update!(processed_lines: processed)
        end
      end
    end
  end
end

Two Rails-specific details matter:

  • attachment.open downloads to a tempfile on disk, which is exactly what you want for line counting and external tooling
  • Rails guides note that the file is available in after_create_commit, not after_create

That is the safe baseline for rails count lines activestorage code.

Part 7: A Production-Ready Ruby Line Counter

require "open3"
require "shellwords"

module LineCounter
  SMALL_FILE_THRESHOLD = 50 * 1024 * 1024
  CHUNK_SIZE = 64 * 1024

  module_function

  def count(file_path, skip_empty: false)
    raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)
    return 0 if File.zero?(file_path)

    if skip_empty
      return File.foreach(file_path, encoding: "binary")
                 .count { |line| !line.chomp.empty? }
    end

    file_size = File.size(file_path)

    if file_size < SMALL_FILE_THRESHOLD
      return File.readlines(file_path, encoding: "binary").count
    end

    if unix?
      begin
        return count_with_wc(file_path)
      rescue Errno::ENOENT
        # wc is not available; fall through to the pure Ruby path.
      end
    end

    count_with_chunks(file_path)
  end

  def count_non_empty(file_path)
    count(file_path, skip_empty: true)
  end

  def count_with_wc(file_path)
    stdout, status = Open3.capture2("wc", "-l", file_path)
    raise "wc failed" unless status.success?

    count = stdout.split.first.to_i

    File.open(file_path, "rb") do |file|
      file.seek(-1, IO::SEEK_END)
      last_byte = file.read(1)
      last_byte == "\n" ? count : count + 1
    end
  end

  def count_with_chunks(file_path)
    count = 0
    last_chunk = nil

    File.open(file_path, "rb") do |file|
      while (chunk = file.read(CHUNK_SIZE))
        count += chunk.count("\n")
        last_chunk = chunk
      end
    end

    if last_chunk && !last_chunk.empty? && last_chunk[-1] != "\n"
      count + 1
    else
      count
    end
  end

  def unix?
    /linux|darwin|bsd/ === RUBY_PLATFORM
  end
end

Usage:

total = LineCounter.count("data.csv")
non_empty = LineCounter.count_non_empty("config.txt")

This design keeps the user-facing method simple:

  • tiny files: readlines for clarity
  • skip-empty counts: line-aware foreach
  • large files on Unix: wc -l fast path
  • large files everywhere else: chunked pure Ruby fallback

Special Scenarios

Count lines in a string

Recommended logical line count:

count = text.lines.count

Fast raw newline count with correction:

def count_lines_in_string_fast(text)
  return 0 if text.empty?

  count = text.count("\n")
  text.end_with?("\n") ? count : count + 1
end

Avoid using text.split("\n").count as a general line counter. Ruby split drops trailing empty fields by default, so strings that end with blank lines can be undercounted.

Count STDIN

puts $stdin.each_line(encoding: "binary").count

Count lines in a Rake task

namespace :data do
  desc "Count lines in a file"
  task count_lines: :environment do
    file_path = ENV["FILE"] || "data/import.csv"

    unless File.file?(file_path)
      warn "File not found: #{file_path}"
      exit 1
    end

    puts LineCounter.count(file_path)
  end
end

Decision Tree

How large is the file?
|
+-- Under 50MB
|   +-- File.readlines(path).count if simplicity matters most
|
+-- Size unknown or over 50MB
|   +-- File.foreach(path, encoding: "binary").count
|
+-- Over 500MB and on Unix
|   +-- chunked binary reads or fixed wc -l

Does the file contain unknown bytes?
|
+-- Yes: use encoding: "binary"
+-- No, and the external encoding is known: you can set it explicitly

Do you need Rails upload support?
|
+-- Yes: attachment.open, then count the tempfile path

Ruby Version Compatibility

FeatureVersion supportNotes
File.foreachRuby 1.8+streaming line iteration
File.readlinesRuby 1.8+reads all lines into an array
encoding: "binary"Ruby 1.9+binary or ASCII-8BIT semantics
Shellwords.escapeRuby 1.9+safe shell escaping helper
Open3.capture2Ruby 1.9.3+no-shell direct process invocation
IO::SEEK_ENDRuby 1.8+used to inspect the last byte
ActiveStorage attachment.openRails 5.2+downloads to a tempfile

The article targets Ruby 2.7+ projects, with Ruby 3.3 used for the benchmark environment description.

Production Checklist

  • Use File.foreach, not File.readlines, for unknown or large files.
  • Add encoding: "binary" when file bytes may not be valid UTF-8.
  • Prefer count over inject(0) { |c, _| c + 1 } for File.foreach count lines code.
  • If you call wc, escape the path with Shellwords.escape or skip the shell with Open3.capture2.
  • Fix the missing trailing newline case when you rely on wc -l.
  • Use block forms of File.open so file descriptors close automatically.
  • In Rails, use attachment.open and count the tempfile path.
  • Wait until after_create_commit if you need the uploaded file immediately after creation.

Sources Checked

Building a CSV importer in Rails?

Check the line count before you queue the job. Paste the file into the Line Counter. No Ruby runtime, no encoding surprises, no shell escaping required.

Frequently Asked Questions

How do I count lines in a file in Ruby?

For a safe default, use File.foreach(path, encoding: 'binary').count. It streams the file and avoids text-decoding failures on bad bytes.

What is the difference between readlines and foreach in Ruby?

readlines returns an array of all lines, while foreach yields one line at a time. The first is simpler for small files; the second is the production default.

How do I fix invalid byte sequence errors in Ruby?

Use binary mode or an explicit external encoding. For line counting, File.foreach(path, encoding: 'binary').count is usually the safest answer.

How do I count lines in a large file in Ruby?

Use File.foreach(path, encoding: 'binary').count for the simplest streaming version, or read binary chunks and count '\n' bytes for higher throughput.

How do I use wc -l safely in Ruby?

Either escape the path with Shellwords.escape when using a shell string, or avoid the shell entirely with Open3.capture2('wc', '-l', path).

How do I count lines in Ruby on Rails?

For ActiveStorage attachments, use attachment.open to get a tempfile path and count lines from that file in binary mode.

How do I count non-empty lines in Ruby?

Use File.foreach(path, encoding: 'binary').count { |line| !line.chomp.empty? }.

How do I count lines in a string in Ruby?

Use text.lines.count for logical text lines, or count '\n' and add one when the non-empty string does not end with a newline.

Related Guides