Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

The task of the controller service is to generate job messages for an asynchronous full-text indexing, to deliver the required binary content and to store the extracted text in Elasticsearch.


Section


Column

Table of Contents

Table of Contents
excludeTable of Contents


Column

Characteristics

port range: 7332-7335

service name: controller

profiles: cloud,es,oauth2,lc,mq,prod


Function

Asynchronous Full-Text Indexing

Section

Anchor
img_asynchronousTextRendition
img_asynchronousTextRendition

Column
width70%


Panel

Asynchronous Full Text Indexing




Column
width30%

For compound documents, it makes sense to perform full-text analysis and indexing asynchronously to the import. For this purpose, the API-Gateway can generate a message during the import, containing the metadata of a certain number of single documents of the compound document (1.). The controller-service consumes these messages (2.) and generates another message (4.) for the textextractor-service, in which it writes a source and a target link (3.). A GET request to the sourceLink enables the textextractor-service to retrieve the content for the object belonging to the message (6. + 7.). After the text has been extracted, it can be saved/updated in Elasticsearch via a POST request to the targetLink (9.-12.).


Reading Messages

The queue from which the controller-service reads the messages is configured by the parameter textextraction.in-queue and has the default value lc.textextraction (lc - lifecycle) (2.).

...

These messages contain the metadata of a DmsApiObject for which the asynchronous full-text indexing is to be executed.

The Controller-Service then generates the links corresponding to the DmsApiObject contained in the message for the textextractor-service (3.). The aim is that the textextractor-service remains unaware of the rest of the system, i.e. the links contain all the information required to retrieve the content or save the extracted text in the form of query parameters. By default, the Controller-Service generates links which the textextractor-service must resolve at the Discovery-Service (how this works, can be read here). This provides a meaningful scaling of the Controller-Service, assuming that the textextractor-service is integrated into the services landscape.

Creating Messages
Anchor
jobMessage
jobMessage

The Controller-Service generates job messages for the textextractor-service (4.). These messages contain two links and additional properties in a map.

...

Code Block
titleapplication-lc.yml
textextraction.job-queue: lc.textextraction.job

The content of a DMS object can be retrieved using a GET request to the sourceLink (6.). The controller-service receives the object ID, version number and tenant via the query parameters of the sourceLink and can use this information to retrieve the content of the object from the API gateway and return it to the caller (7.).

The extracted text of a Dms object can be saved using a POST request to the targetLink (9.). To do this, the text must be contained in the body of the request. From the query parameters of the targetLink, the controller-service receives the object ID, content stream ID, and content stream range of the corresponding DMS object. To ensure that the content of the object has not changed in the time between the creation of the job message and the current point in time, the Controller-Service retrieves the current metadata for the object ID from Elasticsearch (10.) and compares the content stream ID and content stream range from the targetLink with those from the current metadata (11.). If at least one of the two properties does not match, the Controller-Service terminates the update process and returns http status 409 CONFLICT.

...

If the comparison of the content stream ID and the content stream range shows that the content has not changed in the meantime, the text sent in the body will be written in Elasticsearch in the field contentfile of the object with the corresponding object ID (12.).

Processing Error/ Success Messages

The textextractor-service writes a success or error message for each executed full-text extraction. These are read by the controller-service (14.) which logs the contained reports.

...

  • error queue: "<textextraction.job-queue>.error"
  • success queue: "<textextraction.job-queue>.success"

Configuration

There are two modes for creating the links: the textextractor-service must resolve the links at the discovery-service against a controller-service instance before calling them or not. This can be configured by using the parameter controller.links.useDiscovery, whose default value is true. If you want to change the default behavior, you can create a configuration called controller-prod.yml that must contain the following parameters:

...

If the controller.links.useDiscovery parameter is set to false, the controller.links.host parameter must be set, because the controller-service uses this property to create the links. If the textextractor-service now calls one of the links, it explicitly addresses the controller-service instance configured by controller.links.host instead of being given one by the discovery-service. In this case, it is very important that the Controller-Service instance configured by controller.links.host can be reached in the system for the entire duration of the asynchronous full-text indexing so that the textextractor-service can successfully process its jobs.
To make the above configuration work for the controller-service, it must also be started with the profile prod if this is not already the case (if necessary, adjust the entry in the servicewatcher-sw.yml, place the configuration file in the /config directory, and refresh respectively restart ARGUS and CONTROLLER).

Profiles

ProfileMeaning
cloudcentral configuration for all cloud services
escontains configuration parameters for connecting to the Elasticsearch cluster.

oauth2

contains configuration parameters of the tenants of the configured authentication provider of the system.
lclifecycle configuration that contains the queue names for the asynchronous text extraction.
mqmessaging configuration, for the connection to the messaging system
prodproductive configuration, properties from application-prod.yml and controller-prod.yml are considered

...