...
Excerpt |
---|
The task of the controller service is to generate job messages for an asynchronous full-text indexing, to deliver the required binary content and to store the extracted text in Elasticsearch. |
Section | ||||||||
---|---|---|---|---|---|---|---|---|
|
Function
Asynchronous Full-Text Indexing
Section | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Reading Messages
The queue from which the controller-service reads the messages is configured by the parameter textextraction.in-queue
and has the default value lc.textextraction
(lc - lifecycle) (2.).
...
These messages contain the metadata of a DmsApiObject for which the asynchronous full-text indexing is to be executed.
Creating Links
The Controller-Service then generates the links corresponding to the DmsApiObject contained in the message for the textextractor-service (3.). The aim is that the textextractor-service remains unaware of the rest of the system, i.e. the links contain all the information required to retrieve the content or save the extracted text in the form of query parameters. By default, the Controller-Service generates links which the textextractor-service must resolve at the Discovery-Service (how this works, can be read here). This provides a meaningful scaling of the Controller-Service, assuming that the textextractor-service is integrated into the services landscape.
Creating Messages
Anchor | ||||
---|---|---|---|---|
|
The Controller-Service generates job messages for the textextractor-service (4.). These messages contain two links and additional properties in a map.
...
Code Block | ||
---|---|---|
| ||
textextraction.job-queue: lc.textextraction.job |
Calling the sourceLink
The content of a DMS object can be retrieved using a GET request to the sourceLink
(6.). The controller-service receives the object ID, version number and tenant via the query parameters of the sourceLink
and can use this information to retrieve the content of the object from the API gateway and return it to the caller (7.).
Calling the targetLink
The extracted text of a Dms object can be saved using a POST request to the targetLink
(9.). To do this, the text must be contained in the body of the request. From the query parameters of the targetLink
, the controller-service receives the object ID, content stream ID, and content stream range of the corresponding DMS object. To ensure that the content of the object has not changed in the time between the creation of the job message and the current point in time, the Controller-Service retrieves the current metadata for the object ID from Elasticsearch (10.) and compares the content stream ID and content stream range from the targetLink
with those from the current metadata (11.). If at least one of the two properties does not match, the Controller-Service terminates the update process and returns http status 409 CONFLICT
.
...
If the comparison of the content stream ID and the content stream range shows that the content has not changed in the meantime, the text sent in the body will be written in Elasticsearch in the field contentfile
of the object with the corresponding object ID (12.).
Processing Error/ Success Messages
The textextractor-service writes a success or error message for each executed full-text extraction. These are read by the controller-service (14.) which logs the contained reports.
...
- error queue: "<
textextraction.job-queue
>.error" - success queue: "<
textextraction.job-queue
>.success"
Configuration
There are two modes for creating the links: the textextractor-service must resolve the links at the discovery-service against a controller-service instance before calling them or not. This can be configured by using the parameter controller.links.useDiscovery
, whose default value is true
. If you want to change the default behavior, you can create a configuration called controller-prod.yml
that must contain the following parameters:
...
If the controller.links.useDiscovery
parameter is set to false
, the controller.links.host
parameter must be set, because the controller-service uses this property to create the links. If the textextractor-service now calls one of the links, it explicitly addresses the controller-service instance configured by controller.links.host
instead of being given one by the discovery-service. In this case, it is very important that the Controller-Service instance configured by controller.links.host
can be reached in the system for the entire duration of the asynchronous full-text indexing so that the textextractor-service can successfully process its jobs.
To make the above configuration work for the controller-service, it must also be started with the profile prod
if this is not already the case (if necessary, adjust the entry in the servicewatcher-sw.yml
, place the configuration file in the /config
directory, and refresh respectively restart ARGUS and CONTROLLER).
Profiles
Profile | Meaning |
---|---|
cloud | central configuration for all cloud services |
es | contains configuration parameters for connecting to the Elasticsearch cluster. |
oauth2 | contains configuration parameters of the tenants of the configured authentication provider of the system. |
lc | lifecycle configuration that contains the queue names for the asynchronous text extraction. |
mq | messaging configuration, for the connection to the messaging system |
prod | productive configuration, properties from application-prod.yml and controller-prod.yml are considered |
...