Benchmark File Read and Write Speed in Python
Measures file write and read throughput in MB/s by writing and reading a temporary file of a given size.
Python code
32 linesimport os
import time
import tempfile
def benchmark_write(file_path, size_mb=100):
data = b'x' * (1024 * 1024) # 1 MB block
start = time.perf_counter()
with open(file_path, 'wb') as f:
for _ in range(size_mb):
f.write(data)
elapsed = time.perf_counter() - start
return size_mb / elapsed
def benchmark_read(file_path):
file_size = os.path.getsize(file_path) / (1024 * 1024) # MB
start = time.perf_counter()
with open(file_path, 'rb') as f:
while f.read(1024 * 1024):
pass
elapsed = time.perf_counter() - start
return file_size / elapsed
if __name__ == "__main__":
with tempfile.NamedTemporaryFile(delete=False) as tmp:
tmp_path = tmp.name
try:
write_speed = benchmark_write(tmp_path, size_mb=50)
read_speed = benchmark_read(tmp_path)
print(f"Write speed: {write_speed:.2f} MB/s")
print(f"Read speed: {read_speed:.2f} MB/s")
finally:
os.unlink(tmp_path)
Output
Write speed: 450.12 MB/s
Read speed: 520.34 MB/s
How it works
The benchmark_write function writes a fixed block repeatedly to measure write throughput, while benchmark_read reads the entire file in blocks to measure read throughput. Both use time.perf_counter for high-resolution timing and os.path.getsize for accurate file size. A temporary file is created and cleaned up to avoid cluttering the filesystem. The results depend on disk type, filesystem cache, and system load.
Common mistakes
- Not clearing the filesystem cache before read benchmarks, leading to inflated results due to caching.
- Using a block size that is too small, increasing overhead and reducing measured throughput.
- Forgetting to delete the temporary file, leaving artifacts behind.
Variations
- Use `os.fsync` after writes to force data to disk for more accurate write benchmarks.
- Benchmark with different block sizes to find optimal transfer size for your storage.
Real-world use cases
- Compare disk performance across different storage backends like SSD vs HDD in a server deployment.
- Validate that a cloud instance's attached EBS or persistent disk meets advertised I/O throughput.
- Tune block sizes in data pipeline scripts that write large files to minimize runtime.
Sponsored
More from Automation & scripting
- Automatically Clean Temporary Files from Applications Using Python medium
- Automatically Download the Latest Software Release from GitHub with Python medium
- Automatically Generate Charts from CSV Files with One Command medium
- Automatically Generate Hardware Inventory Reports in Python easy
- Automatically Log CPU, RAM, and Disk Usage Every Minute in Python easy
- Batch Rename Hundreds of Files in Python easy
Keep learning
Related tutorials and quizzes for this topic.