Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

If you use Elasticsearch as search engine for yuuvis® Momentum, find here a reindex example procedure.

In case new requirements, either in terms of load capacity or functionality (i.E. new supported languages), are introduced to a Momentum System during production, it may become necessary to overhaul the backend Elasticsearch cluster by performing a Reindex operation. The operation itself is slow and resource-intensive, as it essentially creates a copy of the orignal Elasticsearch Index within the same Elasticsearch cluster, so make sure enough storage space is available and optionally create more data nodes which can be shut down after the operation

An Elasticsearch Reindex entails the creation of a new Momentum-capable Index, the migration of Elasticsearch data from the original Index into said new Index, and finally the removal of the old Index. These steps are achieved by interaction with the Elasticsearch API, which is exposed by Elasticsearch through Port 9200 on elected Master Nodes. It's highly recommended to create an Elasticsearch snapshot using the same API before attempting the Reindex.

Below you can find a detailed overview of the CURL commands needed to successfully perform a Reindex in yuuvis® Momentum. Note that all commands assume you have port-forwarded the ElasticSearch API to http:\\localhost:9200.

1. Creating a Momentum-Capable Elasticsearch Index

Two steps are required to create an Elasticsearch Index that can interact with yuuvis® Momentum:

1. Creating a new Elasticsearch Index

A new index needs to be created to fit the specifications of the new requirements. 

 CURL command & more information

Create a new Index with a unique name.

The creation of the new elasticsearch index allows for the optimization of the working Index for the current storage requirements. Indices work best when storing around 10 GB, and should contain no more than 50 GB of data. 

For a cluster than contains 50GB of Elasticsearch data, a high-performance Index might look something like this:

CURL command
curl -X PUT "localhost:9200/yuuvis_2?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "number_of_shards": 5, 
      "number_of_replicas": 1 
    }
  }
}
'

Make sure to change the Index parameters to suit your storage and reliability requirements.



2. Apply yuuvis® Momentum Elasticsearch Index Mapping and Settings

The yuuvis® Momentum services require Elasticsearch to use a custom mapping. Using Elasticsearchs' automatic mapping algorithm for reindexing renders the new index unusable for the Momentum System.

 CURL command & more information

To get a compatible mapping, one can retrieve the original Elasticsearch Indexs' mapping through use of the Get mapping API:

CURL command
curl -X GET "localhost:9200/yuuvis/_mapping" 

Then apply the extracted Mapping the new Index

CURL Command apply Mapping
curl -X PUT "localhost:9200/yuuvis_2/_mapping?pretty" -H 'Content-Type: application/json' -d'
{
   
    "dynamic_templates" : [
        {
            "keyword" : {
            "match" : "key_*",
            "mapping" : {
                "type" : "keyword"
            }
            }
        },
        {
            "text" : {
            "match" : "txt_*",
            "mapping" : {
                "type" : "text"
            }
            }
        },
        {
            "string" : {
            "match" : "str_*",
            "mapping" : {
                "fields" : {
                "raw" : {
                    "type" : "keyword"
                }
                },
                "type" : "text"
            }
            }
        },
        {
            "number" : {
            "match" : "num_*",
            "mapping" : {
                "type" : "long"
            }
            }
        },
        {
            "double" : {
            "match" : "dbl_*",
            "match_mapping_type" : "double",
            "mapping" : {
                "type" : "double"
            }
            }
        },
        {
            "object" : {
            "match" : "obj_*",
            "match_mapping_type" : "object",
            "mapping" : {
                "type" : "object"
            }
            }
        },
        {
            "date" : {
            "match" : "dte_*",
            "match_mapping_type" : "date",
            "mapping" : {
                "format" : "date_optional_time",
                "type" : "date"
            }
            }
        },
        {
            "boolean" : {
            "match" : "bol_*",
            "match_mapping_type" : "boolean",
            "mapping" : {
                "type" : "boolean"
            }
            }
        },
        {
            "table" : {
            "match" : "tab_*",
            "mapping" : {
                "type" : "nested"
            }
            }
        },
        {
            "rawtable" : {
            "match" : "rtb_*",
            "mapping" : {
                "fields" : {
                "raw" : {
                    "type" : "keyword"
                }
                },
                "type" : "text"
            }
            }
        },
        {
            "locationpath" : {
            "match" : "locationpath",
            "match_mapping_type" : "string",
            "match_pattern" : "regex",
            "mapping" : {
                "analyzer" : "paths",
                "type" : "string"
            }
            }
        },
        {
            "typepath" : {
            "match" : "typepath",
            "match_mapping_type" : "string",
            "match_pattern" : "regex",
            "mapping" : {
                "analyzer" : "paths",
                "type" : "string"
            }
            }
        },
        {
            "contentidx" : {
            "match" : "contentidx",
            "match_mapping_type" : "string",
            "mapping" : {
                "term_vector" : "no",
                "type" : "text"
            }
            }
        },
        {
            "contentfile" : {
            "match" : "contentfile",
            "match_mapping_type" : "string",
            "mapping" : {
                "term_vector" : "no",
                "type" : "text"
            }
            }
        }
        ],
    "properties" : {
        "contentfile" : {
            "type" : "text"
        },
        "contentidx" : {
            "type" : "text"
        },
        "dte_date" : {
            "type" : "date",
            "format" : "date_optional_time"
        },
        "dte_system:creationdate" : {
            "type" : "date",
            "format" : "date_optional_time"
        },
        "dte_system:lastmodificationdate" : {
            "type" : "date",
            "format" : "date_optional_time"
        },
        "num_appbillion:index" : {
            "type" : "long"
        },
        "num_system:contentstreamlength" : {
            "type" : "long"
        },
        "num_system:versionnumber" : {
            "type" : "long"
        },
        "str_appbillion:bmstring1" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_appbillion:bmstring2" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_name" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:basetypeid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreamfilename" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreamid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreammimetype" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreammimetypegroup" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:contentstreamrepositoryid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:createdby" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:digest" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:lastmodifiedby" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:objecttype" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:objecttypeid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:secondaryobjecttypeids" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:tenant" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        },
        "str_system:traceid" : {
            "type" : "text",
            "fields" : {
            "raw" : {
                "type" : "keyword"
            }
            }
        }
    }

}
'

You can also optionally base the new Index' settings on the orignal configuration:

CURL Command Apply Settings
curl -X PUT "localhost:9200/yuuvis_2/_settings?pretty" -H 'Content-Type: application/json' -d'
{
    "index": {
        "codec": "best_compression",
        "number_of_shards": "80",
        "max_result_window": "2147483647",
        "analysis": {
            "filter": [],
            "analyzer": {
                "default_search": {
                    "useExactTerms": "false",
                    "prefix_length": "0",
                    "languages": ["de","en"],
                    "type": "intrafind_search",
                    "excessiveSplitting": "false",
                    "stopwords": ["",""]
                },
                "default": {
                    "useExactTerms": "false",
                    "prefix_length": "0",
                    "languages": ["de","en"],
                    "type": "intrafind_index",
                    "excessiveSplitting": "false",
                    "stopwords": ["",""]
                },
                "paths": {
                    "prefix_length": "0",
                    "tokenizer": "path_hierarchy"
                }
                
            },
            "number_of_replicas": "1"
        }
        
    }

}
'

2. Migrating the Data to the new Index

Once a compatible Index has been created, the Reindex operation can be triggered through the Elasticsearch API.

 CURL command & more information
CURL command
curl -X POST "localhost:9200/_reindex?pretty&slices=20&wait_for_completion=false&refresh" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "yuuvis_1"
},
"dest": {
"index": "yuuvis_2"
},
"conflicts": "proceed"
}
'

After the reindex has completed, the new Index must be activated by reassigning the 'yuuvis' alias from the old Index to the new.  

 CURL command & more information
CURL command
curl -X DELETE "localhost:9200/yuuvis_2/_alias/yuuvis?pretty"

curl -X POST "localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d'
{
"actions": [
{
"add": {
"index": "yuuvis_4",
"alias": "yuuvis"
}
}
]
}
'

3. Deleting the original Index

To free up space in the Elasticsearch Cluster, it's sensible to remove the original Index after verifying the Momentum system has accepted the new Index.

 CURL command & more information

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html

CURL command
curl -X DELETE "localhost:9200/yuuvis_1"

  • No labels