Ben Isherwood

HCI: Scaling a search index

Blog Post created by Ben Isherwood Employee on Apr 6, 2017

Today's search based applications quickly push beyond the capabilities of a single machine to satisfy both large index sizes and high query volume. A well configured, distributed multi-instance index service configuration can provide sub-second search response times across *billions* of indexed documents.

The standard procedure for scaling HCI indexes is:

  • First, maximize single instance index performance by configuration and schema tuning.
  • Next, address availability concerns and absorb high query volume by replicating the index to multiple machines.
  • If the index becomes too large for a single machine, split the index across multiple machines (i.e.,  "shard" the index).
  • Finally, for high query volume and large index size,  replicate each index shard to separate server instances.


Summary of index scaling options:

Let's look at how each of these tasks is accomplished using the HCI index service.



Maximizing Index Performance


HCI aims to provide great out-of-the-box performance for typical index and query use cases, but proper tuning for your specific environment can bring significant performance improvements.

There are two major aspects of index performance:  indexing and query. Both can be improved by a properly configured index schema. Configuration is the key to maximizing index performance. It’s important to set up your index from the start with performance in mind.



  • Be sure to choose the proper index and schema configuration to avoid complete re-indexing of your content, which can be time consuming for large indexes.
  • "Schemaless" indexing can be great for starting out to see what fields can be generated, but optimally your index should not contain any fields you are not using and no duplicate fields.
  • Use the HCI "Basic" index schema template, and add optimized index fields automatically using the pipeline test and workflow discoveries tools. Your index size will be substantially smaller and indexing/query will be more efficient.
  • Use the correct field types AND field attributes for your fields that you need. Use "omitNorms" whenever you can. Don't mark fields as "stored" when you don't need them returned in results.
  • Minimize fields with full content tokenization (text_*) and limit the number of "dynamic" and "copy" fields in use to the minimum required. Avoid dynamic fields completely for a fully predictable schema.
  • Configure stopwords.txt in the "Index > Advanced" UI to eliminate overly common terms from index and query processing like "a", "the", or "is" - these are just noise and overhead that can be easily avoided. There are example configurations here for each language in the "lang/stopwords_*.txt" section.
  • Keep the Index service's memory "Heap size" below a max of 31.5 GB (default is 1.8 GB). Using a setting of 32 GB or greater will impact JVM optimizations and can result in out of memory errors and increased risk of GC pauses. Therefore, scaling out with more instances is preferred to scaling up a single instance beyond these levels.


When in doubt, run a workflow (or pipeline/workflow test) and use the "Add Fields to Index" feature on each discovered fields page to automatically configure your indexes for production use.



Tuning HCI index fields and their impact by use case (only select what you need):





Availability and High Query Volume


Eventually, your index will grow to a point where a single machine can’t keep up with a given query load. The proper way to handle this situation is to replicate the index to other servers. HCI will then automatically load balance query requests across the each index service instance, each of which contains a "copy" of the index. Copies are updated over time as the "master" version of the index changes.




Index replication is accomplished in HCI by editing the "Index Protection Level" of the index from 1 copy (the default) to multiple copies. This count determines the number of index copies to store across all HCI index service instances. HCI automatically balances these copies across the index services running on each HCI instance. You may modify and apply this setting (increasing or decreasing dynamically) at any time. Additional index copies improves the availability of each search index as well as results in a distributed query load (allowing for more concurrent queries without performance degradation).




Scale by Sharding


Eventually, some indexes will get so large that a single machine cannot contain them. In the millions of documents range, you will likely run into this. The general solution is to break up the index into pieces that are kept on on multiple servers.

A single search can then be issued to each server and the results can be pulled (likely in parallel), and then combined into a single result set for the user.


Building a distributed index starts by running the HCI "index" service on multiple HCI instances. For each index, you then define the shard count to use for each individual index you create.


Index shard count can be configured when the index is initially created. If you expect your index to get very large, increase the default shard count from 1 to a number large enough to balance across the number of index service instances you expect to use for this index. Initially "over sharding" is a great strategy for when you know your index will grow very large. Keep in mind that you will need enough HCI instances to run enough copies of the index service across the system to satisfy the target shard count. For example, a shard count of 3 would likely require 3 HCI instances running the index service for peak performance. Keeping multiple shards on a single instance works fine and allows you to grow by adding instances in the future by balancing shards to other index service instances. However, there is a performance trade-off when using multiple shards on a single instance that you can avoid by balancing up front across more instances.

HCI also supports the automatic balancing of shards across all available instances of the index service. Additional instances of the index service may be run on new HCI instances, and a service configuration task may be executed to balance the shards across those instances.




Availability and Scale


When your index is too large for a single machine and you have a query volume that single shards cannot keep up with, it's time to replicate each shard in your distributed search setup. Using multiple index shards for scale out plus replication for availability and high query load support gives you the best of both worlds. In the image below, there will be a "master" index service for each shard and then 1-n "slave" shards that are replicated from each master to other index services in the HCI system. This allows the master to handle updates and optimizations without adversely affecting query handling performance. Query requests are automatically load balanced across each of the shard slaves, providing both increased query handling capacity and seamless fail over to backup copies if a server goes down.









There are a number of options and configurations available to scale HCI indexes for improved availability, to satisfy query load requirements, and/or to grow to store billions of indexed documents. As always, the optimal configuration will depend on your specific use case and requirements for your applications. HCI provides the tools to help make index management at large scale as simple and flexible as possible.


Thanks for reading,