Page Properties | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||
Resources & Remarks Modification History The service of the Auto ML platform is responsable for the training of models as well as the determination of predictions based on those models
| |||||||||||||||||||
Excerpt | |||||||||||||||||||
|
Excerpt |
---|
Responsible for preparing data for training, training of machine learning models, evaluation of trained models, and preparing for the deployment to the production. |
Section | ||||||
---|---|---|---|---|---|---|
| ||||||
|
Characteristics
Note | ||||
---|---|---|---|---|
title | Beta Version||||
The ML Pipeline is a component of the Auto ML Platform. This platform is not included in yuuvis® Momentum installations and is available as a beta version only on request. | ||||
Function
The Machine Learning ( ML ) Training Pipeline is the heart part of the Auto ML Platform and as such Artificial Intelligence Platform responsible for data ingestion, data validation, transformation, machine learning training, and model evaluation, and model serving. The pipeline is based on MLFlow, but other providers, such as Google TFX, Kubeflow, etc. can be used alternatively.
Requests for the ML Pipeline are managed via separate services that provide an own API:
- The PREDICT-API service provides prediction and status endpoints that can be called by client applications.
>> PREDICT-API Endpoints
The management of the ML Pipeline is done via the command line application Kairos CLI.
Machine Learning Training
The ML Pipeline needs to be trained by means of reference objects stored by means of MLflow – an open-source platform for managing ML lifecycles.
Data Export
The source of data for machine learning is a document management system, e e.g., yuuvis® Momentum, and for which users manually defined the individual object type. The data exported from yuuvis® Momentum is stored in the format suitable for data ingestion on local storage or S3. This data is used to train the models for the determination of predictions.. The data shall be exported in a predefined format and shall be made available to the provided training pipelines.
Machine Learning Pipelines
The machine learning pipelines are components developed and shipped by OPTIMAL SYSTEMS GmbH. They contain all necessary procedures and algorithms to train machine learning models for different purposes (e.g., document classification and metadata extraction).
At the moment, pipelines can be used for document classification (for instance it can determine whether a document is an invoice, a contract, a sick-leave or something else) and for metadata extraction (for instance, extract the issuing date, total amount and invoice number from an invoice).
Document Classification
In the context of the AI platform, classification means the determination of suitable typification classes fitting for an object based on its metadatafull-text rendition. For one object, one prediction is provided that contains mappings of classes and their corresponding relevance probability
as well as a reference on the object in yuuvis® Momentum via objectId
.
Instead of the class names used internally in the ML Pipeline, the prediction response bodies provide the object types as referenced in the inference schema described below Inference Schema.
Metadata Extraction
Binary ML Pipeline can analyze the PDF rendition of binary content files assigned to document objects in yuuvis® Momentum can be analyzed in the ML Pipeline in order to extract specific metadata. Based on the trained models, predictions for values of specific object properties can be determined. The object properties have to be listed in the inference schema where also the Inference Schema where conditions for the values and settings for the prediction responses are also specified. Anchor
Inference Schema
The inference schema is a JSON configuration file defining the object types that will be available for the classification as well as the properties for which the metadata extraction should determine suitable values. At the same time, each internal aiObjectTypeId
(aiPropertyId
) gets a corresponding objectTypeId
(propertyId
) that will be used in the response bodies of the classification (extraction) endpoints to be compatible with the connected client application.
The inference schema is defined for a specific tenant
. It is also possible to further limit the usage of the inference schema to an app by specifying appName
(e.g., to distinct between a client app for single uploads and batch processing apps).
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{
"tenant" : "mytenant",
"appName" : "AIInvoiceClient",
"classification" : {
"enabled" : true,
"timeout" : 2,
"aiClassifierId" : "DOCUMENT_CLASSIFICATION",
"objectTypes": [
{
"objectTypeId" : "appImpulse:receiptsot|appImpulse:receiptType|Rechnung",
"aiObjectTypeId" : "INVOICE"
},
{
"objectTypeId" : "appImpulse:receiptsot|appImpulse:receiptType|Angebot",
"aiObjectTypeId" : "DOCUMENT_TYPE_2"
},
{
"objectTypeId" : "appImpulse:hrsot|appImpulse:receiptType|Bewerbung",
"aiObjectTypeId" : "DOCUMENT_TYPE_3"
}
]
},
"extraction" : {
"enabled" : true,
"timeout" : 5,
"objects" : [
{
"objectTypeId" : "invoice",
"enabled" : true,
"timeout" : 10,
"propertyReference" : [
{
"propertyId" : "companyName",
"aiPropertyId" : "INVOICE_COMPANY_NAME",
"allowedValues" : ["Company1", "Company2", "Company3"],
"pattern" : "/^[a-z]|\\d?[a-zA-Z0-9]?[a-zA-Z0-9\\s&@.]+$",
"validationService" : "my_company_name_validation_service",
"maxNumberOfPredictions" : 5
},
{
"propertyId" : "totalAmount",
"aiPropertyId" : "INVOICE_TOTAL_AMOUNT",
"pattern" : "^[0-9]*[.][0-9]*$",
"validationService" : "my_amounts_validation_service",
"maxNumberOfPredictions" : 1
}
]
}
]
}
} |
The following parameters are available in the inference schema:
...
Name of the app that uses the inference schema.
If not specified for an app, the tenant schema will be used for that app.
...
Time limit for the determination of a classification predictions in seconds.
The result will be returned even if the calculation process is still running for some models. Those models will be excluded from the response.
...
A list of mappings, each of them containing the following keys. This list defines the object types that are available for the classification prediction.
...
Time limit for the determination of extraction predictions in seconds.
The result will be returned even if the calculation process is still running for some models. Those models will be excluded from the response.
...
Boolean value specifying whether the metadata extraction is activated (true
) or deactivated (false
) for the specific object type.
Ignored if extraction.enabled
is set to false
.
...
Optional time limit in seconds overwriting extraction.timeout
for the determination of extraction predictions for properties belonging to the object type specified by objectTypeId
.
The result will be returned even if the calculation process is still running for some models. Those models will be excluded from the response.
...
Optional parameter: An integer value defining the maximum number of values included in the prediction response for the property propertyId
.
If not specified, the default value 1
will be used.
In order to combine the AI platform with yuuvis® client as reference implementation, the following inference schema is required:
Code Block | ||||
---|---|---|---|---|
| ||||
{
"tenant" : "os__papi",
"appName" : "AIInvoiceClient",
"classification" : {
"enabled" : true,
"timeout" : 10,
"aiClassifierId" : "DOCUMENT_CLASSIFICATION",
"objectTypes": [
{
"objectTypeId" : "appImpulse:hrdocsot|appImpulse:hrDocumentType|Bescheinigung",
"aiObjectTypeId" : "appImpulse:contractsot"
},
{
"objectTypeId" : "appImpulse:receiptsot",
"aiObjectTypeId" : "appImpulse:hrdocsot"
},
{
"objectTypeId" : "appImpulse:contractsot|appImpulse:contractType|Arbeitsvertrag",
"aiObjectTypeId" : "appImpulse:receiptsot"
},
{
"objectTypeId" : "appImpulse:hrdocsot|appImpulse:hrDocumentType|Arbeitsvertrag",
"aiObjectTypeId" : "appImpulse:basedocumentsot"
}
]
}
} |
Requirements
The Auto ML Pipeline is a part of the Auto ML Platform and can run only in combination with the other included services.
ML Pipeline furthermore requires:
- s3 or local storage
If you want to use the ML Pipeline for the AI integration in yuuvis® client as reference implementation, also the requirements of the CLIENT Service have to be considered.
Installation
The Auto ML Platform services including the ML Pipeline are not yet included in yuuvis® Momentum installations but only available on request.
Configuration
...
Model Evaluation
After the machine learning training is done, the model is evaluated. By examining training results, the user decides whether the model is suitable for use or needs longer training, larger data set, etc.
Model Registry
Models that are suitable for further use are stored in the Model Registry component. From the Model Registry component, models can be dockerized and deployed to the serving infrastructure (typically, to the same Kubernetes cluster where yuuvis Momentum is running).
Info | |||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||
Read on
|