IO Bench

IO Bench is a library designed to benchmark the performance of different file formats and partitioning schemes for large in memory datasets. It allows users to generate sample data, convert source data to various formats, and run benchmarks to measure the performance of these formats.

Features

  • Generate sample data for benchmarking.
  • Convert CSV data to various partitioned formats (Avro, Parquet, Feather).
  • Benchmark reading performance of different file formats using Polars, PyArrow, and FastParquet.
  • Generate comprehensive reports of benchmark results.