Elasticsearch Reindex

If you use Elasticsearch as search engine for yuuvis® Momentum, find here a reindex example procedure.

Table of Contents

Introduction

In case new requirements, either in terms of load capacity or functionality (e.g., new supported languages), are introduced to a yuuvis® Momentum system during production, it may become necessary to overhaul the backend Elasticsearch cluster by performing a reindex operation. The operation itself is slow and requires a lot of resources, as it essentially creates a copy of the original Elasticsearch index within the same Elasticsearch cluster, to make sure enough storage space is available and to optionally create more data nodes which can be shut down after the operation

An Elasticsearch reindex entails the creation of a new Momentum-capable index, the migration of Elasticsearch data from the original index into said new index, and finally the removal of the old index. These steps are achieved by interaction with the Elasticsearch API, which is exposed by Elasticsearch through port 9200 on selected master nodes. It is highly recommended to create an Elasticsearch snapshot using the same API before attempting the reindex.

Below, you can find a detailed overview of the CURL commands needed to successfully perform a reindex in yuuvis® Momentum. Note that all commands assume you have port-forwarded the ElasticSearch API to http:\\localhost:9200.

Creating a Momentum-Capable Elasticsearch Index

Two steps are required to create an Elasticsearch index that can interact with yuuvis® Momentum:

  1. Creation of a new Elasticsearch index, and
  2. Applying yuuvis® Momentum Elasticsearch index mapping and settings

Creating a new Elasticsearch Index

A new index needs to be created to match the specifications of the new requirements. 

 CURL command & more information

Create a new Index with a unique name.

The creation of the new Elasticsearch index allows for the optimization of the working index for the current storage requirements. Indices work best when storing around 10 GB, and should contain no more than 50 GB of data. 

For a cluster than contains 50 GB of Elasticsearch data, a high-performance index might look something like this:

CURL command
curl -X PUT "localhost:9200/yuuvis_2?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "number_of_shards": 5, 
      "number_of_replicas": 1 
    }
  }
}
'

Make sure to change the index parameters to suit your storage and reliability requirements.

Applying yuuvis® Momentum Elasticsearch Index Mapping and Settings

The yuuvis® Momentum services require Elasticsearch to use a custom mapping. Using Elasticsearch's automatic mapping algorithm for reindexing renders the new index unusable for the yuuvis® Momentum system.

 CURL command & more information

To get a compatible mapping, retrieve the original Elasticsearch index's mapping by using the Get mapping API:

CURL command
curl -X GET "localhost:9200/yuuvis/_mapping" 

Then apply the extracted mapping the new index

CURL command mapping
curl -X PUT "localhost:9200/yuuvis_2/_mapping?pretty" -H 'Content-Type: application/json' -d'
{
   
    "dynamic_templates" : [
        {
            "keyword" : {
            "match" : "key_*",
            "mapping" : {
                "type" : "keyword"
            }
            }
        },
        {
            "text" : {
            "match" : "txt_*",
            "mapping" : {
                "type" : "text"
            }
            }
        },
        {
            "string" : {
            "match" : "str_*",
            "mapping" : {
                "fields" : {
                "raw" : {
                    "type" : "keyword"
                }
                },
                "type" : "text"
            }
            }
        },
        {
            "number" : {
            "match" : "num_*",
            "mapping" : {
                "type" : "long"
            }
            }
        },
        {
            "double" : {
            "match" : "dbl_*",
            "match_mapping_type" : "double",
            "mapping" : {
                "type" : "double"
            }
            }
        },
        {
            "object" : {
            "match" : "obj_*",
            "match_mapping_type" : "object",
            "mapping" : {
                "type" : "object"
            }
            }
        },
        {
            "date" : {
            "match" : "dte_*",
            "match_mapping_type" : "date",
            "mapping" : {
                "format" : "date_optional_time",
                "type" : "date"
            }
            }
        },
        {
            "boolean" : {
            "match" : "bol_*",
            "match_mapping_type" : "boolean",
            "mapping" : {
                "type" : "boolean"
            }
            }
        },
        {
            "table" : {
            "match" : "tab_*",
            "mapping" : {
                "type" : "nested"
            }
            }
        },
        {
            "rawtable" : {
            "match" : "rtb_*",
            "mapping" : {
                "fields" : {
                "raw" : {
                    "type" : "keyword"
                }
                },
                "type" : "text"
            }
            }
        },
        {
            "locationpath" : {
            "match" : "locationpath",
            "match_mapping_type" : "string",
            "match_pattern" : "regex",
            "mapping" : {
                "analyzer" : "paths",
                "type" : "string"
            }
            }
        },
        {
            "typepath" : {
            "match" : "typepath",
            "match_mapping_type" : "string",
            "match_pattern" : "regex",
            "mapping" : {
                "analyzer" : "paths",
                "type" : "string"
            }
            }
        },
        {
            "contentidx" : {
            "match" : "contentidx",
            "match_mapping_type" : "string",
            "mapping" : {
                "term_vector" : "no",
                "type" : "text"
            }
            }
        },
        {
            "contentfile" : {
            "match" : "contentfile",
            "match_mapping_type" : "string",
            "mapping" : {
                "term_vector" : "no",
                "type" : "text"
            }
            }
        }
        ],
    "properties" : {
        "contentfile" : {
            "type" : "text"
        },
        "contentidx" : {
            "type" : "text"
        },
        "dte_date" : {
            "type" : "date",
            "format" : "date_optional_time"
        },
        "dte_system:creationdate" : {
            "type" : "date",
            "format" : "date_optional_time"
        },
        "dte_system:lastmodificationdate" : {
            "type" : "date",
            "format" : "date_optional_time"
        },
        "num_appbillion:index" : {
            "type" : "long"
        },
        "num_system:contentstreamlength" : {
            "type" : "long"
        },
        "num_system:versionnumber" : {
            "type" : "long"
        },
        "str_appbillion:bmstring1" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_appbillion:bmstring2" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_name" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:basetypeid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreamfilename" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreamid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreammimetype" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreammimetypegroup" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreamrepositoryid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:createdby" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:digest" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:lastmodifiedby" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:objecttype" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:objecttypeid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:secondaryobjecttypeids" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:tenant" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:traceid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        }
    }

}
'

Optionally, you can also base the new index' settings on the original configuration:

CURL command settings
curl -X PUT "localhost:9200/yuuvis_2/_settings?pretty" -H 'Content-Type: application/json' -d'
{
    "index": {
        "codec": "best_compression",
        "number_of_shards": "80",
        "max_result_window": "2147483647",
        "analysis": {
            "filter": [],
            "analyzer": {
                "default_search": {
                    "useExactTerms": "false",
                    "prefix_length": "0",
                    "languages": ["de","en"],
                    "type": "intrafind_search",
                    "excessiveSplitting": "false",
                    "stopwords": ["",""]
                },
                "default": {
                    "useExactTerms": "false",
                    "prefix_length": "0",
                    "languages": ["de","en"],
                    "type": "intrafind_index",
                    "excessiveSplitting": "false",
                    "stopwords": ["",""]
                },
                "paths": {
                    "prefix_length": "0",
                    "tokenizer": "path_hierarchy"
                }
                
            },
            "number_of_replicas": "1"
        }
        
    }

}
'

Migrating the Data to the new Index

Once a compatible index has been created, the reindex operation can be triggered through the Elasticsearch API.

 CURL command & more information

The reindex operation itself provides a few options for configuration. For large data volumes especially, we employ parameters that increase stability and performance of the operation, such as the slices parameter for automatic parallelization of the reindex.

CURL command reindex operation
curl -X POST "localhost:9200/_reindex?pretty&slices=20&wait_for_completion=false&refresh" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "yuuvis_1"
  },
  "dest": {
    "index": "yuuvis_2"
  },
  "conflicts": "proceed"
}
'

After the reindex is completed, the new index must be activated by reassigning the yuuvis alias.  

 CURL command & more information

The new index needs to inherit the yuuvis alias from the original index, meaning that the yuuvis alias must be deleted from the original index beforehand.

CURL command reindex operation
curl -X DELETE "localhost:9200/yuuvis_1/_alias/yuuvis?pretty"

curl -X POST "localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d'
{
  "actions": [
    {
      "add": {
        "index": "yuuvis_2",
        "alias": "yuuvis"
      }
    }
  ]
}
'

Deleting the Original Index

To free up space in the Elasticsearch cluster, it is sensible to remove the original index after verifying the yuuvis® Momentum system has accepted the new index.

 CURL command & more information

Make sure that the yuuvis® Momentum system still functions properly and contains all expected data before proceeding with the deletion of the original index.

CURL command
curl -X DELETE "localhost:9200/yuuvis_1"

Read on

COMMANDER Service for System Maintenance

Perform low-level maintenance on your core system, access the database and carry out Elasticsearch queries. Keep reading

yuuvis® Postman Collections

Postman is a free API development tool with a multitude of useful functions for automated testing, documentation, and more. Our yuuvis® Postman Collections kick-start you right into the yuuvis® API world. Keep reading

Service Monitoring and Maintenance

Use monitoring and maintenance endpoints of a service running in your yuuvis® Momentum cluster. Keep reading