Table of Contents
Ruby Deep Dive
How to Count Lines in a File Using Ruby (And the Encoding Trap Nobody Warns You About)
Count lines in a file using Ruby — File.foreach, readlines, IO.read, and wc -l. Covers the invalid byte sequence trap, memory issues, and Rails-safe patterns with benchmarks.
You are building a CSV importer in Rails. Before the job starts, you want a progress bar. That means you need the total line count first.
The obvious version is:
total = File.readlines("data.csv").count
That works on a 10MB file. On a 500MB file, it can blow up memory because the whole file becomes an array of Ruby strings.
The streaming fix is:
total = File.foreach("data.csv").count
That works for file size. It can still fail on encoding. On modern UTF-8-oriented systems, a file with invalid bytes often raises an ArgumentError with the message invalid byte sequence in UTF-8 during line iteration. In transcoding paths, you may also see Encoding::InvalidByteSequenceError ruby style failures.
This guide covers the practical ruby count lines choices:
File.readlines.countfor small known files.File.foreach count linesfor a streaming default.File.foreach(..., encoding: "binary").countfor unknown encodings.- chunked binary reads for ruby count lines large file work.
wc -lfor Unix fast paths, without the shell bug from the common Stack Overflow snippet.rails count lines activestoragepatterns for uploads and background jobs.
If you only need a number now, use the Line Counter tool. If you are wiring this into a Ruby script, a Rails importer, or a data job, the details below are where the traps live.
If your search was literally ruby count lines in file, the shortest honest answer is File.foreach(path, encoding: "binary").count.
Quick Method Guide
| I want to... | Use this | Main warning |
|---|---|---|
| Count a small file | File.readlines(path).count | loads the whole file |
| Stream a normal file | File.foreach(path).count | can fail on invalid bytes |
| Count a mixed-encoding file | File.foreach(path, encoding: "binary").count | line content stays binary |
| Count a large file fast in pure Ruby | chunked read plus count("\n") | more code |
Use Unix wc -l | Open3.capture2("wc", "-l", path) | not portable to Windows |
| Count a Rails upload | attachment.open plus binary counting | attachment must exist after commit |
For most ruby count lines in file code, the safest default is File.foreach(path, encoding: "binary").count. It is streaming, cross-platform, and does not assume the file is valid UTF-8 text.
For count lines ruby tasks where the file size is unknown, that is a better default than trying to guess whether readlines will fit in memory.
Method 1: File.readlines.count - Simple but Memory-Hungry
The shortest answer is:
count = File.readlines("data.txt").count
Equivalent forms:
lines = File.readlines("data.txt")
count = lines.length
count = File.readlines("data.txt").count { |line| line.strip != "" }
With basic validation:
def count_lines_small_file(file_path)
raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)
File.readlines(file_path).count
end
Ruby's IO documentation says readlines returns an array of all lines read from the stream. That is the whole trade-off. Simplicity is high. Memory use is high too.
| File size | Use File.readlines? | Why |
|---|---|---|
| Under 50MB | yes | simplest code |
| 50MB to 200MB | maybe | memory grows fast |
| Over 200MB | avoid | array and string overhead becomes expensive |
| Around 1GB | no | likely memory pressure or OOM |
File.readlines.count is fine for tiny fixtures, config files, and developer scripts. It is not a production answer for user uploads or multi-hundred-MB logs.
You can see the object pressure directly:
require "objspace"
before = ObjectSpace.memsize_of_all
lines = File.readlines("large_file.txt")
after = ObjectSpace.memsize_of_all
puts "Memory used: #{(after - before) / 1024.0 / 1024.0} MB"
That is the Ruby version of the same read-all trap seen in PHP and Python guides: a simple API that quietly turns a file into an in-memory collection.
Method 2: File.foreach.count - Streaming and Usually Recommended
The streaming version is the first serious answer to ruby count lines:
count = File.foreach("data.txt").count
Equivalent forms:
count = 0
File.foreach("data.txt") { count += 1 }
count = File.foreach("data.txt").inject(0) { |c, _line| c + 1 }
count = File.foreach("data.txt").count { |line| !line.chomp.empty? }
The inject style is common in older answers, but modern Ruby makes the intent clearer with count. For File.foreach count lines code, count is the better default because the reader immediately knows the block is just a predicate or a raw tally.
Why foreach behaves differently
Ruby's IO documentation says foreach(...) yields each successive line and returns an enumerator when no block is given.
enum = File.foreach("data.txt")
count = enum.count
That means the file is read lazily instead of becoming one array:
File.open("data.txt") do |file|
file.each_line do |line|
# only the current line is in play
end
end
This is why File.foreach count lines is the normal production answer when you want low memory use without writing a custom loop.
For ruby count lines in file jobs that run in Rails workers, cron scripts, or import pipelines, File.foreach count lines also keeps GC pressure predictable.
The encoding trap
This is the Ruby-specific trap most tutorials skip.
On a typical UTF-8-oriented Ruby setup, a file with invalid text bytes can fail during line iteration:
File.foreach("data.txt").count
# ArgumentError: invalid byte sequence in UTF-8
Depending on how data is being converted, you may also see Encoding::InvalidByteSequenceError ruby exceptions in related code paths. The real problem is the same in both cases: Ruby is treating bytes as text in an encoding they do not actually match.
Common triggers:
- Windows-generated CSV or log files in a legacy encoding.
- mixed text and binary bytes in application logs.
- Latin-1 or Shift_JIS exports from older systems.
- user uploads where the file encoding is unknown.
The safest fix for line counting is binary mode:
count = File.foreach("data.txt", encoding: "binary").count
This is the practical ruby encoding binary count lines pattern. Ruby's IO docs show that binary mode uses ASCII-8BIT semantics, which avoids text transcoding and still lets \n act as the line separator byte.
If you keep one phrase from this article, keep this ruby encoding binary count lines fix: count bytes in binary mode unless you truly need decoded text.
If you know the external encoding, you can be explicit instead:
count = File.foreach("data.txt", encoding: "ISO-8859-1").count
count = File.foreach("data.txt", encoding: "Windows-31J").count
Or you can replace invalid bytes:
count = File.foreach(
"data.txt",
encoding: "UTF-8",
invalid: :replace,
undef: :replace
).count
For raw line counts, the default recommendation is still binary:
def count_lines(file_path)
raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)
File.foreach(file_path, encoding: "binary").count
end
That is the cleanest fix for ruby count lines in file code where you only care about \n, not decoded characters.
It is also the most repeatable ruby count lines in file pattern across macOS, Linux, Rails uploads, and older exported data files.
Method 3: IO.read and Chunked Reads - Good Primitive, Bad Default
A common idea is:
count = IO.read("data.txt").count("\n")
This is fast on small files because String#count("\n") is efficient. But IO.read opens the file, reads its content, and returns a string. That means you are back to read-all memory behavior.
For a small file, that is fine. For ruby count lines large file code, it is the same class of problem as File.readlines.
The better version: chunked binary reads
def count_lines_chunked(file_path, chunk_size: 64 * 1024)
raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)
count = 0
last_chunk = nil
File.open(file_path, "rb") do |file|
while (chunk = file.read(chunk_size))
count += chunk.count("\n")
last_chunk = chunk
end
end
if last_chunk && !last_chunk.empty? && last_chunk[-1] != "\n"
count += 1
end
count
end
Why this is good:
- fixed memory footprint
- no line-by-line Ruby string allocation
- no UTF-8 assumptions
- correct for files without a trailing newline
This is the fastest pure-Ruby answer in many cases. It is especially good when you only need a number and not the line contents.
Why count("\n") beats split("\n").length
String#count("\n") scans bytes. split("\n") allocates an array and many strings.
The semantic trap is also different: Ruby split drops trailing empty fields by default, so it is not a reliable logical line counter for strings with blank trailing lines.
Recommended string counting:
def count_lines_in_string(text)
return 0 if text.empty?
count = text.count("\n")
text.end_with?("\n") ? count : count + 1
end
Recommended text-line counting when you want Ruby's line semantics:
count = text.lines.count
Method 4: wc -l via Shell - Fastest on Unix, Dangerous if Written Carelessly
The common Stack Overflow shape looks like this:
count = %x{wc -l #{filename}}.split.first.to_i
This has three problems:
- a path with spaces breaks unless it is escaped correctly
- a path with shell metacharacters can become a command injection bug
- when you pass the filename directly, the output includes the filename too
Safer shell string version with Shellwords
require "shellwords"
def count_lines_wc(file_path)
raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)
escaped = Shellwords.escape(file_path)
`wc -l < #{escaped}`.strip.to_i
end
Ruby's Shellwords.escape documentation says it escapes a string so it can be safely used in a Bourne shell command line, and that the returned string should be used unquoted. That is exactly why wc -l < #{escaped} is the right shell-string form.
This is the direct answer to ruby wc -l shellwords usage.
Safer no-shell version with Open3
If you do not need redirection, avoid the shell entirely:
require "open3"
def count_lines_wc_safest(file_path)
raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)
stdout, status = Open3.capture2("wc", "-l", file_path)
raise "wc failed" unless status.success?
stdout.split.first.to_i
end
Ruby's Open3.capture2 docs say that when you pass an executable and arguments separately, Ruby invokes the executable directly, with no shell and no shell expansion. That is why this form is safer than backticks.
The missing trailing newline problem
wc -l counts newline characters, not logical lines. GNU and POSIX wc documentation describe line counts in terms of newline characters.
That means a file like this is undercounted:
line1
line2
line3
If the final line does not end with \n, raw wc -l returns 2, not 3.
Fix it like this:
require "open3"
def count_lines_wc_fixed(file_path)
raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)
return 0 if File.zero?(file_path)
stdout, status = Open3.capture2("wc", "-l", file_path)
raise "wc failed" unless status.success?
count = stdout.split.first.to_i
File.open(file_path, "rb") do |file|
file.seek(-1, IO::SEEK_END)
last_byte = file.read(1)
last_byte == "\n" ? count : count + 1
end
end
This keeps the fast Unix path while fixing the newline edge case.
When to use wc
Use wc -l when all of these are true:
- you are on Linux, macOS, or another Unix-like system
wcis available- you want raw speed
- you can tolerate a Unix-only fast path in your codebase
Do not make it the only method in a cross-platform Ruby library.
Benchmark: Representative Method Comparison
These numbers are representative for a Ruby 3.3, macOS, SSD setup with a 500MB text file of roughly 5 million lines. Treat them as benchmark shape, not a universal promise.
| Method | Time | Peak memory | Encoding-safe | Cross-platform |
|---|---|---|---|---|
File.readlines.count | about 4.2s | about 900MB | risky | yes |
File.foreach.count | about 2.8s | about 1MB | risky on bad bytes | yes |
File.foreach(..., encoding: "binary").count | about 2.8s | about 1MB | yes | yes |
File.foreach.inject(0) | about 2.9s | about 1MB | depends on encoding | yes |
chunked binary count("\n") | about 1.2s | about 64KB | yes | yes |
wc -l fixed | about 0.4s | very low | yes | Unix only |
The practical takeaway:
- general default:
File.foreach(path, encoding: "binary").count - ruby count lines large file, pure Ruby: chunked binary reads
- Unix internal tooling: fixed
wc -l - tiny files:
File.readlines.count
That summary is the modern count lines ruby answer: stream first, force binary when the bytes are untrusted, and only reach for shell tools when the platform and deployment model allow it.
Part 6: Counting Lines in Rails with ActiveStorage
Rails upload flows have a different constraint: the file may live in local storage, S3, or another service. You need a tempfile first.
Rails guides recommend the attachment's open method when you want a blob downloaded to disk for external processing:
# app/services/file_line_counter.rb
class FileLineCounter
def self.count(attachment)
raise ArgumentError, "Attachment is missing" unless attachment&.attached?
attachment.open do |tempfile|
return count_path(tempfile.path)
end
end
def self.count_path(file_path)
File.foreach(file_path, encoding: "binary").count
end
private_class_method :count_path
end
Controller usage:
class ImportsController < ApplicationController
def create
@import = Import.new(import_params)
if @import.save
total_lines = FileLineCounter.count(@import.file)
@import.update!(total_lines: total_lines)
ImportJob.perform_later(@import.id)
render json: {
import_id: @import.id,
total_lines: total_lines,
status: "queued"
}
else
render json: { errors: @import.errors.full_messages }, status: :unprocessable_entity
end
end
end
Background job usage:
class ImportJob < ApplicationJob
def perform(import_id)
import = Import.find(import_id)
processed = 0
import.file.open do |tempfile|
File.foreach(tempfile.path, encoding: "binary") do |line|
process_line(line)
processed += 1
if (processed % 1000).zero?
import.update!(processed_lines: processed)
end
end
end
end
end
Two Rails-specific details matter:
attachment.opendownloads to a tempfile on disk, which is exactly what you want for line counting and external tooling- Rails guides note that the file is available in
after_create_commit, notafter_create
That is the safe baseline for rails count lines activestorage code.
Part 7: A Production-Ready Ruby Line Counter
require "open3"
require "shellwords"
module LineCounter
SMALL_FILE_THRESHOLD = 50 * 1024 * 1024
CHUNK_SIZE = 64 * 1024
module_function
def count(file_path, skip_empty: false)
raise ArgumentError, "File not found: #{file_path}" unless File.file?(file_path)
return 0 if File.zero?(file_path)
if skip_empty
return File.foreach(file_path, encoding: "binary")
.count { |line| !line.chomp.empty? }
end
file_size = File.size(file_path)
if file_size < SMALL_FILE_THRESHOLD
return File.readlines(file_path, encoding: "binary").count
end
if unix?
begin
return count_with_wc(file_path)
rescue Errno::ENOENT
# wc is not available; fall through to the pure Ruby path.
end
end
count_with_chunks(file_path)
end
def count_non_empty(file_path)
count(file_path, skip_empty: true)
end
def count_with_wc(file_path)
stdout, status = Open3.capture2("wc", "-l", file_path)
raise "wc failed" unless status.success?
count = stdout.split.first.to_i
File.open(file_path, "rb") do |file|
file.seek(-1, IO::SEEK_END)
last_byte = file.read(1)
last_byte == "\n" ? count : count + 1
end
end
def count_with_chunks(file_path)
count = 0
last_chunk = nil
File.open(file_path, "rb") do |file|
while (chunk = file.read(CHUNK_SIZE))
count += chunk.count("\n")
last_chunk = chunk
end
end
if last_chunk && !last_chunk.empty? && last_chunk[-1] != "\n"
count + 1
else
count
end
end
def unix?
/linux|darwin|bsd/ === RUBY_PLATFORM
end
end
Usage:
total = LineCounter.count("data.csv")
non_empty = LineCounter.count_non_empty("config.txt")
This design keeps the user-facing method simple:
- tiny files:
readlinesfor clarity - skip-empty counts: line-aware
foreach - large files on Unix:
wc -lfast path - large files everywhere else: chunked pure Ruby fallback
Special Scenarios
Count lines in a string
Recommended logical line count:
count = text.lines.count
Fast raw newline count with correction:
def count_lines_in_string_fast(text)
return 0 if text.empty?
count = text.count("\n")
text.end_with?("\n") ? count : count + 1
end
Avoid using text.split("\n").count as a general line counter. Ruby split drops trailing empty fields by default, so strings that end with blank lines can be undercounted.
Count STDIN
puts $stdin.each_line(encoding: "binary").count
Count lines in a Rake task
namespace :data do
desc "Count lines in a file"
task count_lines: :environment do
file_path = ENV["FILE"] || "data/import.csv"
unless File.file?(file_path)
warn "File not found: #{file_path}"
exit 1
end
puts LineCounter.count(file_path)
end
end
Decision Tree
How large is the file?
|
+-- Under 50MB
| +-- File.readlines(path).count if simplicity matters most
|
+-- Size unknown or over 50MB
| +-- File.foreach(path, encoding: "binary").count
|
+-- Over 500MB and on Unix
| +-- chunked binary reads or fixed wc -l
Does the file contain unknown bytes?
|
+-- Yes: use encoding: "binary"
+-- No, and the external encoding is known: you can set it explicitly
Do you need Rails upload support?
|
+-- Yes: attachment.open, then count the tempfile path
Ruby Version Compatibility
| Feature | Version support | Notes |
|---|---|---|
File.foreach | Ruby 1.8+ | streaming line iteration |
File.readlines | Ruby 1.8+ | reads all lines into an array |
encoding: "binary" | Ruby 1.9+ | binary or ASCII-8BIT semantics |
Shellwords.escape | Ruby 1.9+ | safe shell escaping helper |
Open3.capture2 | Ruby 1.9.3+ | no-shell direct process invocation |
IO::SEEK_END | Ruby 1.8+ | used to inspect the last byte |
ActiveStorage attachment.open | Rails 5.2+ | downloads to a tempfile |
The article targets Ruby 2.7+ projects, with Ruby 3.3 used for the benchmark environment description.
Production Checklist
- Use
File.foreach, notFile.readlines, for unknown or large files. - Add
encoding: "binary"when file bytes may not be valid UTF-8. - Prefer
countoverinject(0) { |c, _| c + 1 }for File.foreach count lines code. - If you call
wc, escape the path withShellwords.escapeor skip the shell withOpen3.capture2. - Fix the missing trailing newline case when you rely on
wc -l. - Use block forms of
File.openso file descriptors close automatically. - In Rails, use
attachment.openand count the tempfile path. - Wait until
after_create_commitif you need the uploaded file immediately after creation.
Sources Checked
- Ruby IO documentation for
foreach,read,readlines, open options,binmode, andeach_line: https://docs.ruby-lang.org/en/3.3/IO.html - Ruby Open3 documentation for
capture2direct execution without shell expansion: https://docs.ruby-lang.org/en/3.3/Open3.html - Ruby Shellwords documentation for
shellescape: https://docs.ruby-lang.org/en/3.3/Shellwords.html - Rails Active Storage guide for
attachment.openandafter_create_committiming: https://guides.rubyonrails.org/active_storage_overview.html - GNU
wcdocumentation for newline-based line counts and missing trailing newline behavior: https://www.gnu.org/software/coreutils/wc - Stack Overflow context on Ruby line counting patterns: https://stackoverflow.com/questions/2650517/count-the-number-of-lines-in-a-file-without-reading-entire-file-into-memory
Related Guides and Tools
- wc -l on Linux and Bash
- Python line counting
- PHP line counting
- Java line counting
- Perl line counting
- Line Counter tool
Building a CSV importer in Rails?
Check the line count before you queue the job. Paste the file into the Line Counter. No Ruby runtime, no encoding surprises, no shell escaping required.
Frequently Asked Questions
How do I count lines in a file in Ruby?
For a safe default, use File.foreach(path, encoding: 'binary').count. It streams the file and avoids text-decoding failures on bad bytes.
What is the difference between readlines and foreach in Ruby?
readlines returns an array of all lines, while foreach yields one line at a time. The first is simpler for small files; the second is the production default.
How do I fix invalid byte sequence errors in Ruby?
Use binary mode or an explicit external encoding. For line counting, File.foreach(path, encoding: 'binary').count is usually the safest answer.
How do I count lines in a large file in Ruby?
Use File.foreach(path, encoding: 'binary').count for the simplest streaming version, or read binary chunks and count '\n' bytes for higher throughput.
How do I use wc -l safely in Ruby?
Either escape the path with Shellwords.escape when using a shell string, or avoid the shell entirely with Open3.capture2('wc', '-l', path).
How do I count lines in Ruby on Rails?
For ActiveStorage attachments, use attachment.open to get a tempfile path and count lines from that file in binary mode.
How do I count non-empty lines in Ruby?
Use File.foreach(path, encoding: 'binary').count { |line| !line.chomp.empty? }.
How do I count lines in a string in Ruby?
Use text.lines.count for logical text lines, or count '\n' and add one when the non-empty string does not end with a newline.
Related Guides
13 min read
How to Count Lines in a File Using PHP (Three Methods, Three Traps)
Count lines in a PHP file — file(), fgets loop, SplFileObject, and shell exec. Covers the feof off-by-one trap, OOM from file(), and the fastest method for large files with benchmarks.
20 min read
How to Count Lines in Python: 7 Methods, Benchmarked and Battle-Tested
Count lines in Python strings, text files, large files, and directories. Includes real performance benchmarks, empty file handling, splitlines vs split, and production-ready functions.
16 min read
How to Count Lines in a File Using Java (6 Methods, Benchmarked)
Count lines in a file using Java — BufferedReader, Files.lines, LineNumberReader, BufferedInputStream, and more. Includes benchmark results for 5GB files and Java 8–17 examples.
16 min read
How to Count Lines in Bash: The Complete Guide with Edge Cases
Master line counting in Bash: count lines in files, variables, command output, and directories. Covers wc -l pitfalls, empty files, filenames with spaces, and shell script usage.