If you use Elasticsearch as search engine for yuuvis® Momentum, find here a reindex example procedure.
Introduction
In case new requirements, either in terms of load capacity or functionality (i.E. new supported languages), are introduced to a yuuvis® Momentum system during production, it may become necessary to overhaul the backend Elasticsearch cluster by performing a Reindex operation. The operation itself is slow and resource-intensive, as it essentially creates a copy of the original Elasticsearch Index within the same Elasticsearch cluster, so make sure enough storage space is available and optionally create more data nodes which can be shut down after the operation.
An Elasticsearch Reindex entails the creation of a new Momentum-capable Index, the migration of Elasticsearch data from the original Index into said new Index, and finally the removal of the old Index. These steps are achieved by interaction with the Elasticsearch API, which is exposed by Elasticsearch through Port 9200 on elected Master Nodes. It's highly recommended to create an Elasticsearch snapshot using the same API before attempting the Reindex.
Below you can find a detailed overview of the CURL commands needed to successfully perform a Reindex in yuuvis® Momentum. Note that all commands assume you have port-forwarded the ElasticSearch API to http:\\localhost:9200.
Creating a Momentum-Capable Elasticsearch Index
Two steps are required to create an Elasticsearch Index that can interact with yuuvis® Momentum:
Creating a new Elasticsearch Index
A new index needs to be created to fit the specifications of the new requirements.
CURL command & more information
Create a new Index with a unique name.
The creation of the new Elasticsearch index allows for the optimization of the working Index for the current storage requirements. Indices work best when storing around 10 GB, and should contain no more than 50 GB of data.
For a cluster than contains 50GB of Elasticsearch data, a high-performance Index might look something like this:
curl -X PUT "localhost:9200/yuuvis_2?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 1
}
}
}
'
Make sure to change the Index parameters to suit your storage and reliability requirements.
Apply yuuvis® Momentum Elasticsearch Index Mapping and Settings
The yuuvis® Momentum services require Elasticsearch to use a custom mapping. Using Elasticsearch's automatic mapping algorithm for reindexing renders the new index unusable for the yuuvis® Momentum system.
CURL command & more information
To get a compatible mapping, one can retrieve the original Elasticsearch Index's mapping through use of the Get mapping API:
curl -X GET "localhost:9200/yuuvis/_mapping"
Then apply the extracted Mapping the new Index
curl -X PUT "localhost:9200/yuuvis_2/_mapping?pretty" -H 'Content-Type: application/json' -d'
{
"dynamic_templates" : [
{
"keyword" : {
"match" : "key_*",
"mapping" : {
"type" : "keyword"
}
}
},
{
"text" : {
"match" : "txt_*",
"mapping" : {
"type" : "text"
}
}
},
{
"string" : {
"match" : "str_*",
"mapping" : {
"fields" : {
"raw" : {
"type" : "keyword"
}
},
"type" : "text"
}
}
},
{
"number" : {
"match" : "num_*",
"mapping" : {
"type" : "long"
}
}
},
{
"double" : {
"match" : "dbl_*",
"match_mapping_type" : "double",
"mapping" : {
"type" : "double"
}
}
},
{
"object" : {
"match" : "obj_*",
"match_mapping_type" : "object",
"mapping" : {
"type" : "object"
}
}
},
{
"date" : {
"match" : "dte_*",
"match_mapping_type" : "date",
"mapping" : {
"format" : "date_optional_time",
"type" : "date"
}
}
},
{
"boolean" : {
"match" : "bol_*",
"match_mapping_type" : "boolean",
"mapping" : {
"type" : "boolean"
}
}
},
{
"table" : {
"match" : "tab_*",
"mapping" : {
"type" : "nested"
}
}
},
{
"rawtable" : {
"match" : "rtb_*",
"mapping" : {
"fields" : {
"raw" : {
"type" : "keyword"
}
},
"type" : "text"
}
}
},
{
"locationpath" : {
"match" : "locationpath",
"match_mapping_type" : "string",
"match_pattern" : "regex",
"mapping" : {
"analyzer" : "paths",
"type" : "string"
}
}
},
{
"typepath" : {
"match" : "typepath",
"match_mapping_type" : "string",
"match_pattern" : "regex",
"mapping" : {
"analyzer" : "paths",
"type" : "string"
}
}
},
{
"contentidx" : {
"match" : "contentidx",
"match_mapping_type" : "string",
"mapping" : {
"term_vector" : "no",
"type" : "text"
}
}
},
{
"contentfile" : {
"match" : "contentfile",
"match_mapping_type" : "string",
"mapping" : {
"term_vector" : "no",
"type" : "text"
}
}
}
],
"properties" : {
"contentfile" : {
"type" : "text"
},
"contentidx" : {
"type" : "text"
},
"dte_date" : {
"type" : "date",
"format" : "date_optional_time"
},
"dte_system:creationdate" : {
"type" : "date",
"format" : "date_optional_time"
},
"dte_system:lastmodificationdate" : {
"type" : "date",
"format" : "date_optional_time"
},
"num_appbillion:index" : {
"type" : "long"
},
"num_system:contentstreamlength" : {
"type" : "long"
},
"num_system:versionnumber" : {
"type" : "long"
},
"str_appbillion:bmstring1" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_appbillion:bmstring2" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_name" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:basetypeid" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:contentid" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:contentstreamfilename" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:contentstreamid" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:contentstreammimetype" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:contentstreammimetypegroup" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:contentstreamrepositoryid" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:createdby" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:digest" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:lastmodifiedby" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:objecttype" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:objecttypeid" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:secondaryobjecttypeids" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:tenant" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
},
"str_system:traceid" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
}
}
}
'
You can also optionally base the new Index' settings on the original configuration:
curl -X PUT "localhost:9200/yuuvis_2/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"index": {
"codec": "best_compression",
"number_of_shards": "80",
"max_result_window": "2147483647",
"analysis": {
"filter": [],
"analyzer": {
"default_search": {
"useExactTerms": "false",
"prefix_length": "0",
"languages": ["de","en"],
"type": "intrafind_search",
"excessiveSplitting": "false",
"stopwords": ["",""]
},
"default": {
"useExactTerms": "false",
"prefix_length": "0",
"languages": ["de","en"],
"type": "intrafind_index",
"excessiveSplitting": "false",
"stopwords": ["",""]
},
"paths": {
"prefix_length": "0",
"tokenizer": "path_hierarchy"
}
},
"number_of_replicas": "1"
}
}
}
'
Migrating the Data to the new Index
Once a compatible Index has been created, the Reindex operation can be triggered through the Elasticsearch API.
CURL command & more information
The Reindex operation itself provides a few options for configuration. For large data volumes especially, we employ parameters that increase stability and performance of the operation, such the 'slices' parameter for automatic parallelization of the Reindex.
curl -X POST "localhost:9200/_reindex?pretty&slices=20&wait_for_completion=false&refresh" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "yuuvis_1"
},
"dest": {
"index": "yuuvis_2"
},
"conflicts": "proceed"
}
'
After the Reindex has completed, the new Index must be activated by reassigning the yuuvis
alias.
CURL command & more information
The new index needs to inherit the 'yuuvis' alias from the original index, meaning that the 'yuuvis' alias must be deleted from the original index beforehand.
curl -X DELETE "localhost:9200/yuuvis_1/_alias/yuuvis?pretty"
curl -X POST "localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d'
{
"actions": [
{
"add": {
"index": "yuuvis_2",
"alias": "yuuvis"
}
}
]
}
'
Deleting the original Index
To free up space in the Elasticsearch Cluster, it's sensible to remove the original Index after verifying the Momentum system has accepted the new Index.
CURL command & more information
Make sure to verify the Momentum system still works correctly and contains all expected data before proceeding with the deletion of the original index.
curl -X DELETE "localhost:9200/yuuvis_1"