---
categories:
- docs
- develop
- stack
- rs
- rc
- oss
- kubernetes
- clients
description: Scale Redis vector sets to handle larger data sets and workloads
linkTitle: Scalability
title: Scalability
weight: 20
---

## Multi-instance scalability

Vector sets can scale horizontally by sharding your data across multiple Redis instances. This is done by partitioning the dataset manually across keys and nodes.

### Example strategy

You can shard data using a consistent hash:

```python
key_index = crc32(item) % 3
key = f"vset:{key_index}"
```

Then add elements into different keys:

```bash
VADD vset:0 VALUES 3 0.1 0.2 0.3 item1
VADD vset:1 VALUES 3 0.4 0.5 0.6 item2
```

To run a similarity search across all shards, send [`VSIM`]({{< relref "/commands/vsim" >}}) commands to each key and then merge the results client-side:

```bash
VSIM vset:0 VALUES ... WITHSCORES
VSIM vset:1 VALUES ... WITHSCORES
VSIM vset:2 VALUES ... WITHSCORES
```

Then combine and sort the results by score.

## Key properties

- Write operations ([`VADD`]({{< relref "/commands/vadd" >}}), [`VREM`]({{< relref "/commands/vrem" >}})) scale linearly—you can insert in parallel across instances.
- Read operations ([`VSIM`]({{< relref "/commands/vsim" >}})) do not scale linearly—you must query all shards for a full result set.
- Smaller vector sets yield faster queries, so distributing them helps reduce query time per node.
- Merging results client-side keeps logic simple and doesn't add server-side overhead.

## Availability benefits

This sharding model also improves fault tolerance:

- If one instance is down, you can still retrieve partial results from others.
- Use timeouts and partial fallbacks to increase resilience.

## Latency considerations

To avoid additive latency across N instances:

- Send queries to all shards in parallel.
- Wait for the slowest response.

This makes total latency close to the worst-case shard time, not the sum of all times.

## Summary

| Goal                      | Approach                                          |
|---------------------------|---------------------------------------------------|
| Scale inserts             | Split data across keys and instances              |
| Scale reads               | Query all shards and merge results                |
| High availability         | Accept partial results when some shards fail      |
| Maintain performance      | Use smaller shards for faster per-node traversal  |

## See also

- [Performance]({{< relref "/develop/data-types/vector-sets/performance" >}})
- [Filtered search]({{< relref "/develop/data-types/vector-sets/filtered-search" >}})
- [Memory usage]({{< relref "/develop/data-types/vector-sets/memory" >}})