Benchmarks

To generate the data to run all the benchmarks: python src/tabmat/benchmark/generate_matrices.py. Then, to run all the benchmarks: python src/tabmat/benchmark/main.py. To produce or update these figures, open src/tabmat/benchmark/visualize_benchmarks.py as a notebook via jupytext.

For more info on the benchmark CLI: python src/tabmat/benchmark/main.py --help.

Performance

Dense matrix, 4M x 10:

One-hot encoded categorical variable, 1M x 100k:

Sparse matrix, 1M x 1k:

Two categorical matrices, 1M x 2k:

Dense matrix plus two categorical matrices, 3M x (dense=5, cat1=10, cat2=1000).