Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

This core service is responsible for the analysis of binary content files during their import in yuuvis® Momentum.

Characteristics


Service Namecontentanalyzer
Port Range7430-7439
Profilesprod, docker, kubernetes, metrics
Helm Chartyuuvis

Function

In the default configuration, each binary content file imported in yuuvis® Momentum passes the CONTENTANALYZER service. Its mime type is calculated and the contained text is extracted for the most common file types.
>> Basic Use Case Flows

Text Extraction

The CONTENTANALYZER can extract the text contained in an imported binary content file. The extracted plain text is stored as text rendition in the search index and will be used for the full-text search together with all values of string properties within the metadata.

For the following file types the text extraction is available:

TypesExtension

MS Office Word 97-2016

doc, docx
Rich Text Formatrtf

MS Office PowerPoint 97-2016

ppt, pptx
Plain Texttxt
Comma Separated Valuescsv

MS Office Excel 97-2016

xls, xlsx
OpenDocument Textodt
OpenDocument Presentationodp
OpenDocument Spreadsheetods
HyperText Markup Language (HTML)html
MS Outlookmsg
XMLxml
JavaScript Object Notationjson
Encapsulated Portable Document Formatepdf
Portable Document Formatpdf

The text rendition can be retrieved via the following endpoint:
>> GET /api/dms/objects/{objectId}/contents/renditions​/text

Mime Type Calculation

The CONTENTANALYZER service is responsible for the calculation of the mime type of binary content files that are imported in yuuvis® Momentum. The calculated mime type is stored in the content stream properties section of the corresponding DMS object.

Configuration

The default behavior of the CONTENTANALYZER service can be changed via serviceConfiguration.json configuration file. The analysis of the content and/or mime type can be requested or not, depending on defined conditions. If a condition matches during an import process, the content and/or mime type will be analyzed.
>> serviceConfiguration.json

Note: Within each import request body, this configuration can be overwritten by specifying the options parameters accordingly. The analysis of content and/or mime type can be requested or suppressed even if the opposite behavior is configured in the file serviceConfiguration.json.

Furthermore, it is possible to set following parameters in a service-specific configuration file for the CONTENTANALYZER:

ParameterTypeDescriptionDefault
extraction.exclusiveOfficeLockboolean

If you need text extraction for large binary content files of Microsoft Office file types, the CONTENTANALYZER service might need its full memory for each single file to be processed. If true, all other text extraction processes wait until the processing for the Microsoft Office file is completed.

Note: Nevertheless, sufficient RAM is required for the CONTENTANALYZER service.

false
mimetype.extension.redetectionlist of mime typesThe standard calculation is based on the analysis of the binary content itself. In case a calculated mime type is wrong, it is possible to reanalyze the file considering the file ending. The mime types for which this second analysis step should be triggered are listed here.'image/x-portable-greymap'

Read on

SYSTEM Service

This core service manages schemata, role sets and app sets. Find here characteristics and configuration options. Keep reading

AUTHENTICATION Service

This core service manages the authentication of users and applications for the access to services within the yuuvis® Momentum cluster. Keep reading

Basic Use Case Flows

Graphical overviews describing the interaction of the yuuvis® Momentum core services in exemplary basic use case flows.  Keep reading

  • No labels