CONTROLLER Service
The task of the controller service is to generate job messages for an asynchronous full-text indexing, to deliver the required binary content and to store the extracted text in Elasticsearch.
Table of Contents
Characteristics
port range: 7332-7335
service name: controller
profiles: cloud,es,oauth2,lc,mq,prod
Function
Asynchronous Full-Text Indexing
For compound documents, it makes sense to perform full-text analysis and indexing asynchronously to the import. For this purpose, the API-Gateway can generate a message during the import, containing the metadata of a certain number of single documents of the compound document (1.). The controller-service consumes these messages (2.) and generates another message (4.) for the textextractor-service, in which it writes a source and a target link (3.). A GET request to the sourceLink
enables the textextractor-service to retrieve the content for the object belonging to the message (6. + 7.). After the text has been extracted, it can be saved/updated in Elasticsearch via a POST request to the targetLink
(9.-12.).
Reading Messages
The queue from which the controller-service reads the messages is configured by the parameter textextraction.in-queue
and has the default value lc.textextraction
(lc - lifecycle) (2.).
textextraction.in-queue: lc.textextraction
These messages contain the metadata of a DmsApiObject for which the asynchronous full-text indexing is to be executed.
Creating Links
The Controller-Service then generates the links corresponding to the DmsApiObject contained in the message for the textextractor-service (3.). The aim is that the textextractor-service remains unaware of the rest of the system, i.e. the links contain all the information required to retrieve the content or save the extracted text in the form of query parameters. By default, the Controller-Service generates links which the textextractor-service must resolve at the Discovery-Service (how this works, can be read here). This provides a meaningful scaling of the Controller-Service, assuming that the textextractor-service is integrated into the services landscape.
Creating Messages
The Controller-Service generates job messages for the textextractor-service (4.). These messages contain two links and additional properties in a map.
{ "sourceLink": "http://controller/contents/file?objectId={objectID}&versionNr={versionNr}&tenant={tenant}", "targetLink": "http://controller/contents/renditions/text?objectId={objectID}&contentstreamId={contentstreamId}&contentstreamRange={contentstreamRange}", "properties": { "objectId": "{objectId}" # optional "useDiscovery": "true" } }
The sourceLink
retrieves the content associated with the DMS object and the targetLink
stores the extracted text in Elasticsearch. Additional properties can be specified in a properties
map. The property useDiscovery
is required. It specifies whether the textextractor-service must resolve its links to the discovery-service before it can call them or not. By default, the value of this property is true
. The parameter textextraction.job-queue
is used to configure the queue name to which the controller-service writes the job messages. By default, the value is lc.textextraction.job
.
textextraction.job-queue: lc.textextraction.job
Calling the sourceLink
The content of a DMS object can be retrieved using a GET request to the sourceLink
(6.). The controller-service receives the object ID, version number and tenant via the query parameters of the sourceLink
and can use this information to retrieve the content of the object from the API gateway and return it to the caller (7.).
Calling the targetLink
The extracted text of a Dms object can be saved using a POST request to the targetLink
(9.). To do this, the text must be contained in the body of the request. From the query parameters of the targetLink
, the controller-service receives the object ID, content stream ID, and content stream range of the corresponding DMS object. To ensure that the content of the object has not changed in the time between the creation of the job message and the current point in time, the Controller-Service retrieves the current metadata for the object ID from Elasticsearch (10.) and compares the content stream ID and content stream range from the targetLink
with those from the current metadata (11.). If at least one of the two properties does not match, the Controller-Service terminates the update process and returns http status 409 CONFLICT
.
If the object ID contained in the targetLink
cannot be found in Elasticsearch, the Controller-Service terminates the update process and returns http status 404 NOT FOUND
.
If the comparison of the content stream ID and the content stream range shows that the content has not changed in the meantime, the text sent in the body will be written in Elasticsearch in the field contentfile
of the object with the corresponding object ID (12.).
Processing Error/ Success Messages
The textextractor-service writes a success or error message for each executed full-text extraction. These are read by the controller-service (14.) which logs the contained reports.
The queue names have the following format:
- error queue: "<
textextraction.job-queue
>.error" - success queue: "<
textextraction.job-queue
>.success"
Configuration
There are two modes for creating the links: the textextractor-service must resolve the links at the discovery-service against a controller-service instance before calling them or not. This can be configured by using the parameter controller.links.useDiscovery
, whose default value is true
. If you want to change the default behavior, you can create a configuration called controller-prod.yml
that must contain the following parameters:
# controller configuration controller: links: # specifies <ip>:<port> to use for link creation, if useDiscovery=false host: <ip>:<port> # indicates whether the links should be resolved on discovery or not useDiscovery: false
If the controller.links.useDiscovery
parameter is set to false
, the controller.links.host
parameter must be set, because the controller-service uses this property to create the links. If the textextractor-service now calls one of the links, it explicitly addresses the controller-service instance configured by controller.links.host
instead of being given one by the discovery-service. In this case, it is very important that the Controller-Service instance configured by controller.links.host
can be reached in the system for the entire duration of the asynchronous full-text indexing so that the textextractor-service can successfully process its jobs.
To make the above configuration work for the controller-service, it must also be started with the profile prod
if this is not already the case (if necessary, adjust the entry in the servicewatcher-sw.yml
, place the configuration file in the /config
directory, and refresh respectively restart ARGUS and CONTROLLER).
Profiles
Profile | Meaning |
---|---|
cloud | central configuration for all cloud services |
es | contains configuration parameters for connecting to the Elasticsearch cluster. |
oauth2 | contains configuration parameters of the tenants of the configured authentication provider of the system. |
lc | lifecycle configuration that contains the queue names for the asynchronous text extraction. |
mq | messaging configuration, for the connection to the messaging system |
prod | productive configuration, properties from application-prod.yml and controller-prod.yml are considered |