Over the past few years, our engineering team has been working hard on improving the scalability of our search index. We’re thrilled to be officially releasing the Voyager “Flex” Index: the next generation of our search index built for scalability and reliability.
Traditionally, our search index has been deployed onto a single machine where it runs alongside the other parts of the Voyager software stack. While this deployment strategy is great for its simplicity, it has some obvious limitations, the most glaring of which is that the size of the search index is limited by the capacity of a single machine.
Historically, Voyager Search customers have solved this problem with our Federation Extension. Federation allows the search index to be broken up into multiple parts while still providing a single point of search. Although this solution works well for some scenarios, it’s not ideal for others as it forces customers to deal with multiple disparate search indices and requires configuration to be duplicated across multiple Voyager deployments. Our new Flex Index solves these issues by providing a single logical index that, behind the scenes, is physically split up and deployed across a cluster of machines.
Since a Flex Index is not constrained by a single machine, it can store virtually an unlimited number of documents. Additionally, the way it works provides built-in replication of index data that offers redundancy and fault tolerance features to ensure the search index is always up and in a healthy state.
The Voyager Flex index builds on top of the built-in capabilities of Apache Solr to provide a highly scalable search index. Dubbed “SolrCloud” by the Solr community, it provides us with a capability called “sharding” that allows Voyager to partition a large search index into multiple smaller ones. The concept of sharding is a widely used practice in the database and search world. The following diagram is a simple depiction of the sharding process.
A single Voyager index being split into multiple shards.
In the above diagram, a single search index is partitioned into two shards. To keep the concept simple two shards are being shown here, but any number of shards could be used. The number of shards is generally dictated by the number of documents that need to be stored in the search index.
The concept of sharding is kept hidden from the end user performing a search. To them, it’s still just a single logical index comprised of documents. Behind the scenes, Solr is breaking up a single query into multiple queries against each shard and then transparently coalescing the results for us.
A user searching against a classic index vs searching against the Flex Index.
As mentioned previously, SolrCloud also offers features of redundancy and fault tolerance. These are achieved with another concept called a “replica”. A replica is essentially a copy of a shard stored in a different physical location. Let’s revisit the previous diagram with replicas added.
A Flex Index comprised of two shards, each with a replica.
Replicas add a number of key benefits, the first of which is fault tolerance. Replicas live on separate servers so if one of the servers in our index cluster goes down, we are still able to serve up the contents of that shard.
A single Voyager index being split into multiple shards.
A second benefit of replicas is that they can speed up search queries. Solr is smart enough to know when it can distribute a query amongst a shard and its replicas rather than send the entire query to just a shard.
A query being distributed across shard replicas to improve overall query time.
Splitting up the query in this manner can often lead to performance benefits and faster search results for the end user.
The Flex index is an exciting new capability of the Voyager Search family of products that customers are already benefiting from. Our team continues to work hard on improvements and tools to make deploying and managing a Flex Index as easy as possible. Stay tuned!