Page Properties | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
Resources & Remarks Modification History
|
...
Service Name | contentanalyzer |
---|---|
Port Range | 7430-7439 |
Profiles | prod, docker, kubernetes, metrics |
Helm Chart | yuuvis |
Function
In the default configuration, each binary content file imported in yuuvis® Momentum passes the CONTENTANALYZER service. Its mime type is calculated and the contained text is extracted for the most common file types.
>> Basic Use Case Flows
...
For the following file types the text extraction is available:
Types | Extension |
---|---|
MS Office Word 97-2016 | doc, docx |
Rich Text Format | rtf |
MS Office PowerPoint 97-2016 | ppt, pptx |
Plain Text | txt |
Comma Separated Values | csv |
MS Office Excel 97-2016 | xls, xlsx |
OpenDocument Text | odt |
OpenDocument Presentation | odp |
OpenDocument Spreadsheet | ods |
HyperText Markup Language (HTML) | html |
MS Outlook | msg |
XML | xml |
JavaScript Object Notation | json |
Encapsulated Portable Document Format | epdf |
Portable Document Format |
...
Configuration
The default behavior of the CONTENTANALYZER service can be changed via serviceConfiguration.json
configuration file. The analysis of the content and/or mime type can be requested or not, depending on defined conditions. If a condition matches during an import process, the content and/or mime type will be analyzed.
>> serviceConfiguration.json
Note: Within each import request body, this configuration can be overwritten by specifying the options
parameters accordingly. The analysis of content and/or mime type can be requested or suppressed even if the opposite behavior is configured in the file serviceConfiguration.json
.
Furthermore, it is possible to set following parameters in a service-specific configuration file for the CONTENTANALYZER:
Parameter | Type | Description | Default |
---|---|---|---|
extraction.exclusiveOfficeLock | boolean | If you need text extraction for large binary content files of Microsoft Office file types, the CONTENTANALYZER service might need its full memory for each single file to be processed. If Note: Nevertheless, sufficient RAM is required for the CONTENTANALYZER service. | false |
mimetype.extension.redetection | comma-separated list of mime types | The standard calculation is based on the analysis of the binary content itself. In case a calculated mime type is wrong, it is possible to reanalyze the file considering the file ending. The mime types for which this second analysis step should be triggered are listed here. | 'image/x-portable-greymap' |
...