The service of the Auto ML platform is responsable for the training of models as well as the determination of predictions based on those models.

Characteristics

Beta Version

The ML Pipeline is a component of the Auto ML Platform. This platform is not included in yuuvis® Momentum installations and is available as a beta version only on request.

Function

The Machine Learning (ML) Pipeline is the heart of the Auto ML Platform and as such responsible for data ingestion, data validation, transformation, machine learning training, model evaluation, and model serving. The pipeline is based on MLFlow, but other providers, such as Google TFX, Kubeflow, etc. can be used alternatively.

Requests for the ML Pipeline are managed via separate services that provide an own API:

The PREDICT-API service provides prediction and status endpoints that can be called by client applications.
>> PREDICT-API Endpoints

The management of the ML Pipeline is done via the command line application Kairos CLI.

Machine Learning Training

The ML Pipeline needs to be trained by means of reference objects stored in a document management system, e.g., yuuvis® Momentum, and for which users manually defined the individual object type. The data exported from yuuvis® Momentum is stored in the format suitable for data ingestion on local storage or S3. This data is used to train the models for the determination of predictions.

Document Classification

In the context of the AI platform, classification means the determination of suitable typification classes fitting for an object based on its full-text rendition. For one object, one prediction is provided that contains mappings of classes and their corresponding relevance probability as well as a reference on the object in yuuvis® Momentum via objectId.

Instead of the class names used internally in the ML Pipeline, the prediction response bodies provide the object types as referenced in the inference schema described below.

Metadata Extraction

ML Pipeline can analyze the PDF rendition of binary content files assigned to objects in yuuvis® Momentum in order to extract specific metadata. Based on the trained models, predictions for values of specific object properties can be determined. The object properties have to be listed in the inference schema where also conditions for the values and settings for the prediction responses are specified.

Inference Schema

The inference schema is a JSON configuration file defining the object types that will be available for the classification as well as the properties for which the metadata extraction should determine suitable values. At the same time, each internal aiObjectTypeId (aiPropertyId) gets a corresponding objectTypeId (propertyId) that will be used in the response bodies of the classification (extraction) endpoints to be compatible with the connected client application.

The inference schema is defined for a specific tenant. It is also possible to further limit the usage of the inference schema to an app by specifying appName (e.g., to distinct between a client app for single uploads and batch processing apps).

Example for an inference schema

{
    "tenant" : "mytenant",
    "appName" : "AIInvoiceClient",
    "classification" : {
        "enabled" : true,
        "timeout" : 2,
        "aiClassifierId" : "DOCUMENT_CLASSIFICATION",
        "objectTypes": [
            {
                "objectTypeId" : "appImpulse:receiptsot|appImpulse:receiptType|Rechnung",
                "aiObjectTypeId" : "INVOICE"
            },
            {
                "objectTypeId" : "appImpulse:receiptsot|appImpulse:receiptType|Angebot",
                "aiObjectTypeId" : "DOCUMENT_TYPE_2"
            },
            {
                "objectTypeId" : "appImpulse:hrsot|appImpulse:receiptType|Bewerbung",
                "aiObjectTypeId" : "DOCUMENT_TYPE_3"
            }
        ]
    },
    "extraction" : {
        "enabled" : true,
        "timeout" : 5,
        "objects" : [
            {
                "objectTypeId" : "invoice",
                "enabled" : true,
                "timeout" : 10,
                "propertyReference" : [
                    {
                        "propertyId" : "companyName",
                        "aiPropertyId" : "INVOICE_COMPANY_NAME",
                        "allowedValues" : ["Company1", "Company2", "Company3"],
                        "pattern" : "/^[a-z]|\\d?[a-zA-Z0-9]?[a-zA-Z0-9\\s&@.]+$",
                        "validationService" : "my_company_name_validation_service",
                        "maxNumberOfPredictions" : 5
                    },
                    {
                        "propertyId" : "totalAmount",
                        "aiPropertyId" : "INVOICE_TOTAL_AMOUNT",
                        "pattern" : "^[0-9]*[.][0-9]*$",
                        "validationService" : "my_amounts_validation_service",
                        "maxNumberOfPredictions" : 1
                    }
                ]
            }
        ]
    }
}

The following parameters are available in the inference schema:

Parameter				Description
`tenant`				Tenant for which the inference schema will be applied.
`appName`				Optional parameter: name of the app that uses the inference schema. Other apps within the tenant cannot use this inference schema but only their own app-specific inference schema or the tenant-wide inference schema.
`classification`				Section of parameters for classification processes.
	`enabled`			Boolean value specifying whether the document classification is activated (`true`) or deactivated (`false`).
	`timeout`			Time limit for the determination of a classification predictions in seconds. An error will be thrown if the calculation process could not be finished before the `timeout` was reached.
	`aiClassifierId`			ID in the AI platform dictionary defining the model that will be used for the classification process.
	`objectTypes`			A list of mappings, each of them containing the following keys. This list defines the object types that are available for the classification prediction.
		`objectTypeId`		The identification of an object type as it will appear in prediction response bodies. You can define a concatenation of several secondary object type IDs, catalog values etc. that can be interpreted by your client application to show the prediction results in proper format.
		`aiObjectTypeId`		ID of the internal class used within the Auto ML platform, especially in its dictionary.
`extraction`				Section of parameters for metadata extraction processes.
	`enabled`			Boolean value specifying whether the metadata extraction is activated (`true`) or deactivated (`false`).
	`timeout`			Time limit for the determination of extraction predictions in seconds. The result will be returned even if the calculation process is still running for some models. Those models will be excluded from the response.
	`objects`			List of mappings for the individual object types containing the following keys. This list defines the object types for which metadata extraction will be available.
		`objectTypeId`		The ID of the object type as it will be referenced within each object's metadata in the property `system:objectTypeId`. This property has to be set already during the object creation in yuuvis® Momentum and is thus always assigned to any object to be processed. The available object types are defined in the yuuvis® Momentum schema.
		`enabled`		Boolean value specifying whether the metadata extraction is activated (`true`) or deactivated (`false`) for the specific object type. Ignored if `extraction.enabled` is set to `false`.
		`timeout`		Optional time limit in seconds overwriting `extraction.timeout` for the determination of extraction predictions for properties belonging to the object type specified by `objectTypeId`. The result will be returned even if the calculation process is still running for some models. Those models will be excluded from the response.
		`propertyReference`		A list of mappings, each of them containing the following keys. This list defines the properties for wich metadata should be extracted for an object of type `objectTypeId`.
			`propertyId`	The identification of a property as it will appear in prediction response bodies. You can define a concatenation of several secondary object type IDs, catalog values etc. that can be interpreted by your client application to show the prediction results in proper format.
			`aiPropertyId`	ID of the internal property used within the Auto ML platform, especially in its dictionary.
			`allowedValues`	Optional limitation of the prediction response: List of values for the property specified by `propertyId`. Only values specified in this list are allowed as prediction results of the metadata extraction.
			`pattern`	Optional limitation of the prediction response: Condition for values for the property specified by `propertyId`. Only values matching the condition are allowed as prediction results of the metadata extraction.
			`validationService`	Optional parameter: URL of an endpoint for further validation of the value determined for the property specified by `propertyId`. Note: Not available in the beta version where the connection of an additional validation service needs more configuration steps.
			`maxNumberOfPredictions`	Optional parameter: An integer value defining the maximum number of values included in the prediction response for the property `propertyId`. If not specified, the default value `1` will be used.

In order to combine the AI platform with yuuvis® client as reference implementation, the following inference schema is required:

Inference schema for the combination with yuuvis® Momentum CLIENT service

{
    "tenant" : "os__papi",
    "appName" : "AIInvoiceClient",
    "classification" : {
        "enabled" : true,
        "timeout" : 10,
        "aiClassifierId" : "DOCUMENT_CLASSIFICATION",
		"objectTypes": [ 
			{ 
				"objectTypeId" : "appImpulse:hrdocsot|appImpulse:hrDocumentType|Bescheinigung",
				"aiObjectTypeId" : "appImpulse:contractsot"
			},
			{ 
				"objectTypeId" : "appImpulse:receiptsot",
				"aiObjectTypeId" : "appImpulse:hrdocsot"
			},
			{ 
				"objectTypeId" : "appImpulse:contractsot|appImpulse:contractType|Arbeitsvertrag",
				"aiObjectTypeId" : "appImpulse:receiptsot"
			},
			{ 
				"objectTypeId" : "appImpulse:hrdocsot|appImpulse:hrDocumentType|Arbeitsvertrag",
				"aiObjectTypeId" : "appImpulse:basedocumentsot"
			}
		]
    }
}

Requirements

The Auto ML Pipeline is a part of the Auto ML Platform and can run only in combination with the other included services.

ML Pipeline furthermore requires:

s3 or local storage

If you want to use the ML Pipeline for the AI integration in yuuvis® client as reference implementation, also the requirements of the CLIENT Service have to be considered.

Installation

The Auto ML Platform services including the ML Pipeline are not yet included in yuuvis® Momentum installations but only available on request.

Configuration

The ML Pipeline is managed, configured and maintained via the command line application Kairos CLI.

Machine Learning (ML) Pipeline

Table of Contents