Maintenance

Site is under maintenance — quizzes are still available.

Go to quizzes
Sponsored Reserved space — layout preview until AdSense is connected

Split CSV Files into Smaller Chunks in Python

Splits a large CSV file into multiple smaller chunk files, preserving the header row in each chunk.

Easy Python 3.9+ Jun 28, 2026 Files & data 2 views 0 copies

Python code

47 lines
Python 3.9+
import csv
import os

def split_csv(input_file, chunk_size=1000, output_prefix="chunk"):
    """Split a large CSV file into smaller chunks."""
    with open(input_file, 'r', newline='') as infile:
        reader = csv.reader(infile)
        header = next(reader)
        
        file_count = 1
        row_count = 0
        outfile = None
        writer = None
        
        for row in reader:
            if row_count % chunk_size == 0:
                if outfile:
                    outfile.close()
                output_file = f"{output_prefix}_{file_count}.csv"
                outfile = open(output_file, 'w', newline='')
                writer = csv.writer(outfile)
                writer.writerow(header)
                file_count += 1
            writer.writerow(row)
            row_count += 1
        
        if outfile:
            outfile.close()
    
    print(f"Split '{input_file}' into {file_count-1} chunks.")

if __name__ == "__main__":
    # Create a sample large CSV for demonstration
    sample_file = "large_data.csv"
    with open(sample_file, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(["id", "name", "value"])
        for i in range(2500):
            writer.writerow([i, f"item_{i}", i * 1.5])
    
    # Split into chunks of 1000 rows each
    split_csv(sample_file, chunk_size=1000, output_prefix="split_chunk")
    
    # Cleanup sample files
    os.remove(sample_file)
    for i in range(1, 4):
        os.remove(f"split_chunk_{i}.csv")

Output

stdout
Split 'large_data.csv' into 3 chunks.

How it works

The script reads the header once from the original CSV, then writes that header at the start of each new chunk file. It tracks the row count and creates a new output file every N rows (default 1000) using modulo logic. Each chunk file is named with an incrementing suffix (e.g., chunk_1.csv) so you can easily identify parts. The csv module handles quoting and line endings correctly, making the split reliable for real-world data.

Common mistakes

  • Forgetting to write the header in every chunk file, causing data to lose column names.
  • Not closing the previous output file before opening a new one, which can lead to corrupted files.
  • Assuming all CSV files have a header row; the script breaks if the file is headerless.
  • Using `'w'` mode without `newline=''`, which can add extra blank lines on Windows.

Variations

  1. Use `pandas.read_csv` with `chunksize` parameter and `to_csv` for memory-efficient splitting of huge files.
  2. Skip the header row and split only data rows if the original CSV has no header.

Real-world use cases

  • Breaking a multi-gigabyte log export into 10 MB chunks for uploading to cloud storage with file size limits.
  • Distributing a customer database across parallel batch processing jobs where each job handles one chunk.
  • Splitting a monthly sales report into daily partitions so analysts can load one day at a time.

Sponsored

Sponsored Reserved space — layout preview until AdSense is connected

Run this sample

Open the browser IDE to tweak the example and see results without installing anything.

Open editor

More from Files & data

Related tutorials and quizzes for this topic.