ML Training Pipeline

Responsible for preparing data for training, training of machine learning models, evaluation of trained models, and preparing for the deployment to the production.

Table of Contents

Function

The ML Training Pipeline is part of the Artificial Intelligence Platform responsible for data ingestion, data validation, transformation, machine learning training, and model evaluation. The pipeline is based on MLflow – an open-source platform for managing ML lifecycles.

Data Export

The source of data for machine learning is a document management system, e.g., yuuvis® Momentum. The data shall be exported in a predefined format and shall be made available to the provided training pipelines. 

Machine Learning Pipelines

The machine learning pipelines are components developed and shipped by OPTIMAL SYSTEMS GmbH. They contain all necessary procedures and algorithms to train machine learning models for different purposes (e.g., document classification and metadata extraction).

At the moment, pipelines can be used for document classification (for instance it can determine whether a document is an invoice, a contract, a sick-leave or something else) and for metadata extraction (for instance, extract the issuing date, total amount and invoice number from an invoice). 

Document Classification

In the context of the AI platform, classification means the determination of suitable typification classes fitting for an object based on its full-text rendition. For one object, one prediction is provided that contains mappings of classes and their corresponding relevance probability as well as a reference on the object in yuuvis® Momentum via objectId.

Instead of the class names used internally in the ML Pipeline, the prediction response bodies provide the object types as referenced in the Inference Schema.

Metadata Extraction

ML Pipeline can analyze the PDF rendition of binary content files assigned to objects in yuuvis® Momentum in order to extract specific metadata. Based on the trained models, predictions for values of specific object properties can be determined. The object properties have to be listed in the Inference Schema where conditions for the values and settings for the prediction responses are also specified.

Model Evaluation

After the machine learning training is done, the model is evaluated. By examining training results, the user decides whether the model is suitable for use or needs longer training, larger data set, etc. 

Model Registry

Models that are suitable for further use are stored in the Model Registry component. From the Model Registry component, models can be dockerized and deployed to the serving infrastructure (typically, to the same Kubernetes cluster where yuuvis Momentum is running).

Read on

Inference Schema

The inference schema is a structure that defines what should be done and in which way while making predictions. Keep reading

KAIROS-API Service

This service of the AI platform provides the API for the configuration of AI platform, such as the schema that binds the ML-extractors to fields in object schema. Keep reading

PREDICT-API Service

The service of the AI platform provides the API for the retrieval of typification predictions determined by the Machine Learning (ML) Pipeline. Keep reading