--- categories: - docs - develop - stack - rs - rc - oss - kubernetes - clients description: Learn how Redis vector sets behave under load and how to optimize for speed and recall linkTitle: Performance title: Performance weight: 15 --- ## Query performance Vector similarity queries using the [`VSIM`]({{< relref "/commands/vsim" >}}) are threaded by default. Redis uses up to 32 threads to process these queries in parallel. - `VSIM` performance scales nearly linearly with available CPU cores. - Expect ~50,000 similarity queries per second for a 3M-item set with 300-dim vectors using int8 quantization. - Performance depends heavily on the `EF` parameter: - Higher `EF` improves recall, but slows down search. - Lower `EF` returns faster results with reduced accuracy. ## Insertion performance Inserting vectors with the [`VADD`]({{< relref "/commands/vadd" >}}) command is more computationally expensive than querying: - Insertion is single-threaded by default. - Use the `CAS` option to offload candidate graph search to a background thread. - Expect a few thousand insertions per second on a single node. ## Quantization effects Quantization greatly impacts both speed and memory: - `Q8` (default): 4x smaller than `FP32`, high recall, high speed - `BIN` (binary): 32x smaller than `FP32`, lower recall, fastest search - `NOQUANT` (`FP32`): Full precision, slower performance, highest memory use Use the quantization mode that best fits your tradeoff between precision and efficiency. The examples below show how the different modes affect a simple vector. Note that even with `NOQUANT` mode, the values change slightly, due to floating point rounding. {{< clients-example set="vecset_tutorial" step="add_quant" description="Quantization modes: Compare Q8, NOQUANT, and BIN quantization modes using VADD and VEMB to understand the tradeoffs between precision, memory, and performance" difficulty="advanced" >}} > VADD quantSetQ8 VALUES 2 1.262185 1.958231 quantElement Q8 (integer) 1 > VEMB quantSetQ8 quantElement 1) "1.2643694877624512" 2) "1.958230972290039" > VADD quantSetNoQ VALUES 2 1.262185 1.958231 quantElement NOQUANT (integer) 1 > VEMB quantSetNoQ quantElement 1) "1.262184977531433" 2) "1.958230972290039" > VADD quantSetBin VALUES 2 1.262185 1.958231 quantElement BIN (integer) 1 > VEMB quantSetBin quantElement 1) "1" 2) "1" {{< /clients-example >}} ## Deletion performance Deleting large vector sets using the [`DEL`]({{< relref "/commands/del" >}}) can cause latency spikes: - Redis must unlink and restructure many graph nodes. - Latency is most noticeable when deleting millions of elements. ## Save and load performance Vector sets save and load the full HNSW graph structure: - When reloading from disk is fast and there's no need to rebuild the graph. Example: A 3M vector set with 300 components loads in ~15 seconds. ## Summary of tuning tips | Factor | Effect on performance | Tip | |------------|-------------------------------------|------------------------------------------------| | `EF` | Slower queries but higher recall | Start low (for example, 200) and tune upward | | `M` | More memory per node, better recall | Use defaults unless recall is too low | | Quant type | Binary is fastest, `FP32` is slowest| Use `Q8` or `BIN` unless full precision needed | | `CAS` | Faster insertions with threading | Use when high write throughput is needed | ## See also - [Memory usage]({{< relref "/develop/data-types/vector-sets/memory" >}}) - [Scalability]({{< relref "/develop/data-types/vector-sets/scalability" >}}) - [Filtered search]({{< relref "/develop/data-types/vector-sets/filtered-search" >}})