TEXTEXTRACTOR Service

The textextractor service extracts text from the content of a DMS object. It is not interested in the objects itself and does its job without even knowing the ID of the corresponding objects.

The textextractor service reads messages from a queue. These messages contain a sourceLink and a targetLink. With the sourceLink, the textextractor service retrieves the content and posts the extracted text to the targetLink after the work is done. Finally, a message is written to a success or error queue, depending on whether the job was successful or not.

Table of Contents

Characteristics

port range: 7420-7429

service name: textextractor

profiles: cloud,lc,mq

Function

Reading Messages

The queue from which the textextractor-service reads messages (5.) is configured by the parameter textextraction.job-queue and has the default value lc.textextraction.job.

application-lc.yml
textextraction.job-queue: lc.textextraction.job

These messages contain sourceLink, targetLink and properties. The properties tell the textextractor-service whether to resolve the links at the Discovery-Service before calling them or not. In addition, the properties are written to the success or error message and are used to assign which messages the textextractor-service has processed and whether the processing was successful or not. An example message can be seen in the Controller-Service description.

Getting Content

The content is retrieved by the sourceLink (6.). If the property useDiscovery in the properties map is set to true, the textextractor-service must resolve the sourceLink at the discovery-service before it can call it. Otherwise, it is only called.

Extracting Text

The text is extracted from the content (8.). The logic for extracting text is the same as that of the contentanalyzer-service.

Forwarding Extracted Text

The extracted text is passed on to the targetLink (9.). Analogous to the sourceLink, the textextractor-service uses the targetLink, depending on how the property useDiscovery is set, simply or must have it resolved at the Discovery-Service against a specific controller-service instance.

Writing Success Message

At the end a success message is written (13.). The message is written to a queue whose name differs from the job queue only by the suffix .success. By default, the queue is called lc.textextraction.job.success. The message contains the initial properties that were in the job queue.

Writing Error Message

If a text extraction request cannot be executed without errors, a message is written to an error queue (13.). The name of this queue consists of the name of the job queue and the suffix .error, and is therefore lc.textextraction.job.error by default. In addition to the initial properties, the message also contains an additional property reason whose value contains an error message.