Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The ML Pipeline needs to be trained by means of reference objects stored by means of in a document management system, e.g., yuuvis® Momentum, and for which users manually defined the individual object type. The data exported from yuuvis® Momentum is stored in the format suitable for data ingestion on local storage or S3. This data is used to train the models for the determination of predictions.

...

In the context of the AI platform, classification means the determination of suitable typification classes fitting for an object based on its metadatafull-text rendition. For one object, one prediction is provided that contains mappings of classes and their corresponding relevance probability as well as a reference on the object in yuuvis® Momentum via objectId.

Instead of the class names used internally in the ML Pipeline, the prediction response bodies provide the object types as referenced in the inference schema described below.

Metadata Extraction

Binary ML Pipeline can analyze the PDF rendition of binary content files assigned to document objects in yuuvis® Momentum can be analyzed in the ML Pipeline in order to extract specific metadata. Based on the trained models, predictions for values of specific object properties can be determined. The object properties have to be listed in the inference schema where also conditions for the values and settings for the prediction responses are specified.

Anchor
InferenceSchema
InferenceSchema

...

Code Block
languageyml
titleExample for an inference schemacollapsetrue
{
    "tenant" : "mytenant",
    "appName" : "AIInvoiceClient",
    "classification" : {
        "enabled" : true,
        "timeout" : 2,
        "aiClassifierId" : "DOCUMENT_CLASSIFICATION",
        "objectTypes": [
            {
                "objectTypeId" : "appImpulse:receiptsot|appImpulse:receiptType|Rechnung",
                "aiObjectTypeId" : "INVOICE"
            },
            {
                "objectTypeId" : "appImpulse:receiptsot|appImpulse:receiptType|Angebot",
                "aiObjectTypeId" : "DOCUMENT_TYPE_2"
            },
            {
                "objectTypeId" : "appImpulse:hrsot|appImpulse:receiptType|Bewerbung",
                "aiObjectTypeId" : "DOCUMENT_TYPE_3"
            }
        ]
    },
    "extraction" : {
        "enabled" : true,
        "timeout" : 5,
        "objects" : [
            {
                "objectTypeId" : "invoice",
                "enabled" : true,
                "timeout" : 10,
                "propertyReference" : [
                    {
                        "propertyId" : "companyName",
                        "aiPropertyId" : "INVOICE_COMPANY_NAME",
                        "allowedValues" : ["Company1", "Company2", "Company3"],
                        "pattern" : "/^[a-z]|\\d?[a-zA-Z0-9]?[a-zA-Z0-9\\s&@.]+$",
                        "validationService" : "my_company_name_validation_service",
                        "maxNumberOfPredictions" : 5
                    },
                    {
                        "propertyId" : "totalAmount",
                        "aiPropertyId" : "INVOICE_TOTAL_AMOUNT",
                        "pattern" : "^[0-9]*[.][0-9]*$",
                        "validationService" : "my_amounts_validation_service",
                        "maxNumberOfPredictions" : 1
                    }
                ]
            }
        ]
    }
}

...

ParameterDescription
tenantTenant for which the inference schema will be applied.
appName

Name Optional parameter: name of the app that uses the inference schema.If not specified for an app, the tenant schema will be used for that app Other apps within the tenant cannot use this inference schema but only their own app-specific inference schema or the tenant-wide inference schema.

classificationSection of parameters for classification processes.






enabledBoolean value specifying whether the document classification is activated (true) or deactivated (false).
timeout

Time limit for the determination of a classification predictions in seconds.

The result An error will be returned even thrown if the calculation process is still running for some models. Those models will be excluded from the responsecould not be finished before the timeout was reached.

aiClassifierIdID in the AI platform dictionary defining the model that will be used for the classification process.
objectTypes

A list of mappings, each of them containing the following keys. This list defines the object types that are available for the classification prediction.



objectTypeIdThe identification of an object type as it will appear in prediction response bodies. You can define a concatenation of several secondary object type IDs, catalog values etc. that can be interpreted by your client application to show the prediction results in proper format.
aiObjectTypeIdID of the internal class used within the Auto ML platform, especially in its dictionary.
extractionSection of parameters for metadata extraction processes.













enabledBoolean value specifying whether the metadata extraction is activated (true) or deactivated (false).
timeout

Time limit for the determination of extraction predictions in seconds.

The result will be returned even if the calculation process is still running for some models. Those models will be excluded from the response.

objectsList of mappings for the individual object types containing the following keys. This list defines the object types for which metadata extraction will be available.










objectTypeIdThe ID of the object type as it will be referenced within each object's metadata in the property system:objectTypeId. This property has to be set already during the object creation in yuuvis® Momentum and is thus always assigned to any object to be processed. The available object types are defined in the yuuvis® Momentum schema.
enabled

Boolean value specifying whether the metadata extraction is activated (true) or deactivated (false) for the specific object type.

Ignored if extraction.enabled is set to false.

timeout

Optional time limit in seconds overwriting extraction.timeout for the determination of extraction predictions for properties belonging to the object type specified by objectTypeId.

The result will be returned even if the calculation process is still running for some models. Those models will be excluded from the response.

propertyReferenceA list of mappings, each of them containing the following keys. This list defines the properties for wich metadata should be extracted for an object of type objectTypeId.






propertyIdThe ID of a property available for an object's metadata if it has the object type objectTypeId. It might not necessarily be already assigned to the corresponding object if the property is defined to be optional. The available properties have to be defined in a property definition and referenced in the object type definition in the yuuvis® Momentum schema.
aiPropertyIdID of the internal property used within the Auto ML platform, especially in its dictionary.
allowedValuesOptional limitation of the prediction response: List of values for the property specified by propertyId. Only values specified in this list are allowed as prediction results of the metadata extraction.
patternOptional limitation of the prediction response: Condition for values for the property specified by propertyId. Only values matching the condition are allowed as prediction results of the metadata extraction.
validationService

Optional parameter: URL of an endpoint for further validation of the value determined for the property specified by propertyId.

Note: Not available in the beta version where the connection of an additional validation service needs more configuration steps.

maxNumberOfPredictions

Optional parameter: An integer value defining the maximum number of values included in the prediction response for the property propertyId.

If not specified, the default value 1 will be used.

...

Code Block
languageyml
titleInference schema for the combination with yuuvis® Momentum CLIENT service
collapsetrue
{
    "tenant" : "os__papi",
    "appName" : "AIInvoiceClient",
    "classification" : {
        "enabled" : true,
        "timeout" : 10,
        "aiClassifierId" : "DOCUMENT_CLASSIFICATION",
		"objectTypes": [ 
			{ 
				"objectTypeId" : "appImpulse:hrdocsot|appImpulse:hrDocumentType|Bescheinigung",
				"aiObjectTypeId" : "appImpulse:contractsot"
			},
			{ 
				"objectTypeId" : "appImpulse:receiptsot",
				"aiObjectTypeId" : "appImpulse:hrdocsot"
			},
			{ 
				"objectTypeId" : "appImpulse:contractsot|appImpulse:contractType|Arbeitsvertrag",
				"aiObjectTypeId" : "appImpulse:receiptsot"
			},
			{ 
				"objectTypeId" : "appImpulse:hrdocsot|appImpulse:hrDocumentType|Arbeitsvertrag",
				"aiObjectTypeId" : "appImpulse:basedocumentsot"
			}
		]
    }
}

...