1. Masked Loads: Efficient data processing by selectively loading data. 2. Half-Precision Floating Point Math: Accelerated computations with reduced memory footprint.
These features proved invaluable for the latest SimSIMD release. The software now processes vector similarities up to 300x faster using NEON, SVE, AVX2, and AVX-512 extensions across Inner Product, Euclidean, Angular, Hamming, and Jaccard distances. It outstrips commonly used libraries like NumPy and SciPy, famously built on BLAS and LAPACK.
Check out the post for some cool tricks, clarifications on AVX-512 and SVE advantages, and benchmark numbers :)
Here is the repo: https://github.com/ashvardanian/simsimd
Though SymPy.physics only yet supports X,Y,Z vectors and doesn't mention e.g. "jaccard"?, FWIW: https://docs.sympy.org/latest/modules/physics/vector/vectors... https://docs.sympy.org/latest/modules/physics/vector/fields.... #cfd
include/simsimd/simsimd.h: https://github.com/ashvardanian/SimSIMD/blob/main/include/si...
conda-forge maintainer docs > Switching BLAS implementation: https://conda-forge.org/docs/maintainer/knowledge_base.html#... :
conda install "libblas=*=*mkl"
conda install "libblas=*=*openblas"
conda install "libblas=*=*blis"
conda install "libblas=*=*accelerate"
conda install "libblas=*=*netlib"
numpy-feedstock: https://github.com/conda-forge/numpy-feedstock/blob/main/rec...scipy-feedstock: https://github.com/conda-forge/scipy-feedstock/blob/main/rec...
pysimdjson-feedstock: https://github.com/conda-forge/pysimdjson-feedstock/blob/mai...
simdjson-feedstock: https://github.com/conda-forge/simdjson-feedstock/blob/main/...
mkl_random-feedstock: https://github.com/conda-forge/mkl_random-feedstock https://github.com/google/paranoid_crypto/tree/main/paranoid... :
> NumPy-based implementation of random number generation sampling using Intel (R) Math Kernel Library, mirroring numpy.random, but exposing all choices of sampling algorithms available in MKL
blas: https://github.com/conda-forge/blas-feedstock/blob/main/reci...
xtensor-blas-feedstock: https://github.com/conda-forge/xtensor-blas-feedstock
xtensor-fftw (FFT with xtensor (c++)) could probably be AVX-512 and SVE -optimized as well? https://github.com/xtensor-stack/xtensor-fftw
ggml_cpu_has_avx512() https://github.com/search?q=repo%3Aggerganov%2Fggml%20AVX&ty... https://github.com/search?q=repo%3Aggerganov%2Fllama.cpp%20a...
CuPy would also be an impactful place to merge and defend these optimizations; though no GPUs have AVX-512 or SVE? cupyx.scipy.spatial.distance: https://docs.cupy.dev/en/stable/reference/scipy_spatial_dist... https://docs.cupy.dev/en/stable/reference/comparison.html
From "PostgresML is 8-40x faster than Python HTTP microservices" (2023) >>33270638 :
> Apache Ballista and Polars do Apache Arrow and SIMD.
> The Polars homepage links to the "Database-like ops benchmark" of {Polars, data.table, DataFrames.jl, ClickHouse, cuDF, spark, (py)datatable, dplyr, pandas, dask, Arrow, DuckDB, Modin,} but not yet PostgresML? https://h2oai.github.io/db-benchmark/ *
LLM -> Vector database: https://en.wikipedia.org/wiki/Vector_database
/? inurl:awesome site:github.com "vector database" https://www.google.com/search?q=inurl%253Aawesome+site%253Ag... : https://github.com/dangkhoasdc/awesome-vector-database , https://github.com/mileszim/awesome-vector-database , https://github.com/currentslab/awesome-vector-search
/? "vector database" "duckdb" https://www.google.com/search?q=+%22vector+database%22+%22du... ... pgvector
pgvector/pgvector/src/vector.c: vector_spherical_distance https://github.com/pgvector/pgvector/blob/master/src/vector....
postgresml/postgresml: /? distance https://github.com/search?q=repo%3Apostgresml%2Fpostgresml%2...