Split CSV Files into Smaller Chunks in Python

Splits a large CSV file into multiple smaller chunk files, preserving the header row in each chunk.

Easy Python 3.9+ Jun 28, 2026 Files & data 2 views 0 copies

csv file-splitting batch-processing stdlib

Python code

47 lines

Python 3.9+

import csv
import os

def split_csv(input_file, chunk_size=1000, output_prefix="chunk"):
    """Split a large CSV file into smaller chunks."""
    with open(input_file, 'r', newline='') as infile:
        reader = csv.reader(infile)
        header = next(reader)
        
        file_count = 1
        row_count = 0
        outfile = None
        writer = None
        
        for row in reader:
            if row_count % chunk_size == 0:
                if outfile:
                    outfile.close()
                output_file = f"{output_prefix}_{file_count}.csv"
                outfile = open(output_file, 'w', newline='')
                writer = csv.writer(outfile)
                writer.writerow(header)
                file_count += 1
            writer.writerow(row)
            row_count += 1
        
        if outfile:
            outfile.close()
    
    print(f"Split '{input_file}' into {file_count-1} chunks.")

if __name__ == "__main__":
    # Create a sample large CSV for demonstration
    sample_file = "large_data.csv"
    with open(sample_file, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(["id", "name", "value"])
        for i in range(2500):
            writer.writerow([i, f"item_{i}", i * 1.5])
    
    # Split into chunks of 1000 rows each
    split_csv(sample_file, chunk_size=1000, output_prefix="split_chunk")
    
    # Cleanup sample files
    os.remove(sample_file)
    for i in range(1, 4):
        os.remove(f"split_chunk_{i}.csv")

Output

stdout

Split 'large_data.csv' into 3 chunks.

How it works

The script reads the header once from the original CSV, then writes that header at the start of each new chunk file. It tracks the row count and creates a new output file every N rows (default 1000) using modulo logic. Each chunk file is named with an incrementing suffix (e.g., chunk_1.csv) so you can easily identify parts. The csv module handles quoting and line endings correctly, making the split reliable for real-world data.

Common mistakes

Forgetting to write the header in every chunk file, causing data to lose column names.
Not closing the previous output file before opening a new one, which can lead to corrupted files.
Assuming all CSV files have a header row; the script breaks if the file is headerless.
Using `'w'` mode without `newline=''`, which can add extra blank lines on Windows.

Variations

Use `pandas.read_csv` with `chunksize` parameter and `to_csv` for memory-efficient splitting of huge files.
Skip the header row and split only data rows if the original CSV has no header.

Real-world use cases

Breaking a multi-gigabyte log export into 10 MB chunks for uploading to cloud storage with file size limits.
Distributing a customer database across parallel batch processing jobs where each job handles one chunk.
Splitting a monthly sales report into daily partitions so analysts can load one day at a time.

Split CSV Files into Smaller Chunks in Python

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Files & data

Tutorials

Quizzes

Python code

Output

How it works

Common mistakes

Variations

Real-world use cases

More from Files & data

Keep learning

Tutorials

Quizzes