application-storage.yml

Configuration file containing parameters and archive profiles used by the ARCHIVE and REPOSITORY service in order to connect external archives.

Table of Contents

Characteristics

Configuration File Nameapplication-storage.yml
Referenced by Servicesarchive,repository
Storage LocationGit root directory 

File Structure

The file application-storage.yml is structured as follows:

storage:
  repositories:
    <repositoryId>:
      profiles: ['<archiveProfile>']
      <parameter>: <value>
  profiles:
    <archiveDriverName>:
      <archiveProfile>:
        <parameter>: <value>

Define your repositories via repositoryId. For each of them, specify an archive profile and thus define the type of archive driver for the corresponding repository. Set the parameters url and useDiscovery as described below.

For each type of archive driver identified by archiveDriverName, define one or more archive profiles. The name of each archive profile must be unique within the configuration file. Depending on the archive type, specific configuration parameters have to be set as described below too.

Parameters

ParameterDescriptionExample

storage.default-rendition-repository (as of 2021 Winter)


Specify the ID of a default rendition repository. It will be used for the storage of renditions if no repository is specified in the corresponding request header. As of version 2021 Winter, only an S3 storage can be supported. It has to be defined in the storage.repositories section.

It is recommended to configure a separate repository for the storage of renditions. The demands regarding storage of renditions differs from the storage of binary content files. E.g., retention is not supported.

renrepo
storage.repositories

Configurations of repositories for by the REPOSITORY service. The repositoryId of each repository is defined via the key (e.g., s3 or netapp). For each repository, the following parameters have to be specified.

s3:
  profiles: ['s3profile1']
  default: true
netapp:
  profiles: ['netapplike']
  url:  http://archive/api/profiles/{profile}/dms/objects
  useDiscovery: true

profiles

A list of archive profile names, each of them referencing an archive profile defined in the storage.profiles section. The type of archive must be the same in the archive profile definition and in the repository configuration.

For any repository that will be used as rendition repository, an S3 profile has to be specified.

Note: Only the first list entry will be considered whereas further list entries will be ignored.

url

Address of the repository accessible via the ARCHIVE service. The archive profile can be referenced as dynamic component in curly brackets (e.g., http://archive/api/profiles/{profile}/dms/objects).

Not required for a repository of type s3 (e.g., a rendition repository) since it is already implemented in the REPOSITORY service itself.

useDiscovery

Boolean value specifying if the given url contains a reference and should be interpreted via the discovery (true) or if an absolute address is specified (false).

Not required for a repository of type s3 (e.g., a rendition repository) since it is already implemented in the REPOSITORY service itself.

predicate

Available as of 2023 Autumn. Specifies a condition using any properties of the target DMS objects before normalization. If the condition is matched by an imported DMS object, its binary content is stored in the corresponding repository instead of the default repository.


If a repositoryId is explicitly specified during a content-related request, this value dominates and there is no predicate evaluation. 

Default value: 'spel:false'

default

Optional: Boolean value specifying if the repository should be the default repository (true) or not (false). Only one repository can be default repository.

The default repository will be used for each content-related request where no repositoryId is explicitly specified.

The default repository is NOT used as default rendition repository if storage.default-rendition-repository is missing.

storage.profiles

Archive profiles for each archive driver. Multiple profiles can be defined for each archive driver. The name of each profile is defined via the corresponding key (e.g., s3profile1 or netapplike). Each archive profile name has to be unique within the configuration file and can belong to only one archive driver type.

The parameters for the configuration of an archive profile depend on the archive driver indicated by the corresponding archiveDriverName (e.g., s3 or netapp). For the supported archive drivers, the configuration parameters are listed below.

Note: An S3 archive profile configuration is required for the usage of a rendition repository.

s3:
  s3profile1:
    access-key: 'MGMWCOYTDUSLNCFE'
    secret-key: 'changeme'
    url: 'http://minio.infrastructure:9000'
    bucket: 'dmscloudrepodocker'
netapp:
  netapplike:
    # use mountpoint of the persistent volume claim that provides the netapp storage
    # must be the same in the deployment of the archive app
    volume: '/var/lib/netapp/data'
    defaultRetentionInDays: 10

The values for the parameters can be modified as described here.
>> Configuring Services using Profiles.

Parameters in Archive Profiles

Depending on the archive driver, the archive profiles have to be configured with specific parameters.

Drivers for the following archives are available:

  • S3 (s3)
  • NetApp (netapp)
  • iCAS (iternity)
  • Hitachi Content Platform (hcp_s3)
  • Cloudian HyperStore (cloudian_s3)
  • DELL S3 (ecs-s3)
  • File System (filesystem)
  • Azure:
    • Driver for Usage of Blob Store (azure_blobstore)
    • Driver for Usage of Object Retention (azure_objectretention)

General Parameters

Retention

The defaultRetentionInDays parameter can be set in all following archive profiles and is only relevant for the ARCHIVE service. This value is used as retention time for objects that do not have a retention time specified within their metadata during their import. If defaultRetentionInDays is set to 0 which is also the default value, no retention will be set for those objects. If the profile is used for a rendition repository, the parameter is ignored as renditions cannot be under retention.

Path Templates

Many of the below listed archive profiles allow to configure a pathTemplate to specify a path structure within the storage.

The following variables can be used. They are referenced with curly brackets within the pathTemplate value (see example below).

  • tenant, system:tenant
  • contentStreamId, system:contentStreamId
  • objectTypeId, system:objectTypeId

With the substring operation, a specific part of a string can be referenced. For example, to reference the first two characters of the contentStreamId, use {contentStreamId.substring(0,2)}. The second index is excluded. If only one index is specified, the characters beginning from that index are referenced. For example, using 'test'.substring(2) would result in st.

In some cases, the DATE operation can be useful to reference the current date stamp. By passing a parameter to DATE, a format string is specified as defined by SimpleDateFormat. Without any parameter, a yyyyMMdd is considered.

Example:

  • Consider the pathTemplate configuration {tenant}/{DATE(yyyy)/DATE(MM)/DATE(dd)/{contentStreamId.substring(0,2)}}
  • The currently logged-in user belongs to the yuuvistest tenant.
  • The user stores a binary content file at 21 DEC 2022 at 4.11 pm.
  • The binary content file gets 7850BB1A-F749-11E8-A21E-49DDF2475266, as value for the contentStreamId.
  • The archive path results to yuuvistest/2022/12/21/78/.

Parameters for S3 Archive Profiles

Use s3 as value for archiveDriverName.

ParameterDescriptionDefault Value
access-key

Access key

-
secret-keyPassword-
urlURL for S3-
bucketName of the bucket in the archive system for filing.-

defaultRetentionInDays

Retention time in days.

Note: Objects under retention can still be deleted by administrators having direct access to the S3 storage.

To transmit no retention time, set the value 0.

Scheduled retention time (system:rmExpirationDate) has priority over any times specified here.

If the profile is used for a repository used as rendition repository, the parameter is ignored for the storage of renditions as they cannot be under retention. However, if binary content files are stored in the same repository beside the renditions, the defaultRetentionInDays value is NOT ignored for the storage of binary content files.

0

region
(as of 2022 Autumn)

Specifies the location of the data center where to create new buckets.
If not specified, it is automatically determined via the S3 method GetBucketLocation.

-

objectLock
(as of 2022 Autumn)

Boolean value that decides if the archive-internal retention (objectLock) is required (true) or optional (false) for the corresponding bucket.

If true is set:

  • Binary content files under retention specified via secondary object type (SOT) system:rmDestructionRetention can only be stored in buckets with activated ObjectLocking.
  • Buckets automatically created by the ARCHIVE service will have activated objectLock.

If false is set:

  • For binary content files under retention specified via secondary object type (SOT) system:rmDestructionRetention stored in buckets with activated  ObjectLocking, an archive-internal retention will be set.
  • For binary content files under retention specified via secondary object type (SOT) system:rmDestructionRetention stored in buckets with deactivated ObjectLocking, no archive-internal retention will be set. Thus, binary content files can only be protected from manipulation via yuuvis® Momentum endpoints but not from manipulation by direct storage access.
  • Buckets automatically created by the ARCHIVE service will have deactivated objectLock.

In both cases: For objects without retention specified via secondary object type (SOT) system:rmDestructionRetention, no archive-internal retention is set. 

true
pathTemplate

Optional: This parameter can be used to store objects in specific directories (paths) within a bucket.

If the profile is used for a rendition repository, use only the contentStreamId as a placeholder in the specified path. Do not include a DATE reference to avoid path changes for each rendition update (Instead of a rendition update, a new rendition would be created and the previous rendition would have to be deleted.)

-
retentionMode

Optional parameter to select a retention mode for S3 Object Lock.

Available values:

  • COMPLIANCE - objects under retention cannot be modified/deleted even by a storage administrator.
  • GOVERNANCE - storage users with specific permissions can modify/delete objects under retention.
COMPLIANCE

Parameters for NetApp Archive Profiles

Use netapp as value for archiveDriverName.

ParameterDescriptionDefault Value
volumeArchive location for the data to be saved.-

defaultRetentionInDays

Optional: Retention time in days.

To transmit no retention time, set the value 0.

Scheduled retention time (system:rmExpirationDate) has priority over any times specified here.

0
enableReadOnlyOptional: Defines whether the drivers of the archive should define data to be read-only.true
pathTemplateOptional: This parameter can be used to store objects in specific directories (paths) within a bucket.-

Parameters for iCAS Archive Profiles

Use iternity as value for archiveDriverName.

ParameterDescriptionDefault Value
userName

Name of the user with the appropriate rights for the archive.

-
userPasswordThe user's password.-
endpointURL of the iCAS web service.-
cscMode

Storage mode for objects and meta data.

The storage mode is specified with a sequence of four parameters:

Binary Content FileMetadata
CompressionEncryptionCompressionEncryption
S (without)N (without)S (without)N (without)
L (with)S (standard)L (with)S (standard)

A (AES 256)
A (AES 259)

Example:

LSSN

The binary content file is compressed (L) and encrypted with the standard method (S); the metadata is not compressed (S) and not encrypted (N).

-
maxCreateCscSize

Determines the maximum size of containers.

The information is specified in bytes.

10000000
maxCreateCscFile

Determines the maximum number of objects for the containers.

1000
maxCreateCscSingleFileLimit

Determines the individual size limit of an object.

4000000
maxWorkChunkSize

Determines the maximum size of a single chunk.

The information is specified in bytes.

5000000

clientSslTrustStore

Optional: Resource path to a certificate trust store for encrypted communication with the web service.-

clientSslTrustStorePassword

Optional: Password for the certificate trust store.

-

defaultRetentionInDays

Retention time in days.

To transmit no retention time, set the value 0.

Scheduled retention time (system:rmExpirationDate) has priority over any times specified here.

0

Parameters for Hitachi Content Platform Archive Profiles

Use hcp_s3 as value for archiveDriverName.

ParameterDescriptionDefault Value
access-key

Access key

-
secret-keyPassword-
urlURL for HCP-
bucketName of the bucket in the archive system for filing.-

defaultRetentionInDays

Retention time in days.

To transmit no retention time, set the value 0.

Scheduled retention time (system:rmExpirationDate) has priority over any times specified here.

0
pathTemplateOptional: This parameter can be used to store objects in specific directories (paths) within a bucket.-


Parameters for Cloudian HyperStore Archive Profiles

The Cloudian Hyperstore Content Platform provides an AWS-S3-compatible rest API with few extensions used by the ARCHIVE service. In order to configure the Cloudian Hyperstore archive, you need to configure the following (S3-relevant) parameters.

Use cloudian_s3 as value for archiveDriverName.

ParameterDescriptionDefault Value
access-key

Access key

-
secret-keyPassword-
urlURL for Cloudian HyperStore-
bucketName of the bucket in the archive system for filing.-

defaultRetentionInDays

Retention time in days.

To transmit no retention time, set the value 0.

Scheduled retention time (system:rmExpirationDate) has priority over any times specified here.

0
pathTemplateOptional: This parameter can be used to store objects in specific directories (paths) within a bucket.-

Parameters for DELL S3 Archive Profiles

Available as of 2022 Autumn. Use ecs-s3 as value for archiveDriverName.

ParameterDescriptionDefault Value
access-key

Access key

-
secret-keyPassword-
urlURL for S3-
bucketName of the bucket in the archive system for filing.-

defaultRetentionInDays

Retention time in days.

Note: Objects under retention can still be deleted by administrators having direct access to the S3 storage.

To transmit no retention time, set the value 0.

Scheduled retention time (system:rmExpirationDate) has priority over any times specified here.

If the profile is used for a repository used as rendition repository, the parameter is ignored for the storage of renditions as they cannot be under retention. However, if binary content files are stored in the same repository beside the renditions, the defaultRetentionInDays value is NOT ignored for the storage of binary content files.

0

region

Specifies the location of the data center where to create new buckets.
If not specified, it is automatically determined via the S3 method GetBucketLocation.

-

objectLock

Boolean value that decides if the archive-internal retention (objectLock) is required (true) or optional (false) for the corresponding bucket.

If true is set:

  • Binary content files under retention specified via secondary object type (SOT) system:rmDestructionRetention can only be stored in buckets with activated ObjectLocking.
  • Buckets automatically created by the ARCHIVE service will have activated objectLock.

If false is set:

  • For binary content files under retention specified via secondary object type (SOT) system:rmDestructionRetention stored in buckets with activated  ObjectLocking, an archive-internal retention will be set.
  • For binary content files under retention specified via secondary object type (SOT) system:rmDestructionRetention stored in buckets with deactivated ObjectLocking, no archive-internal retention will be set. Thus, binary content files can only be protected from manipulation via yuuvis® Momentum endpoints but not from manipulation by direct storage access.
  • Buckets automatically created by the ARCHIVE service will have deactivated objectLock.

In both cases: For objects without retention specified via secondary object type (SOT) system:rmDestructionRetention, no archive-internal retention is set. 

true
pathTemplate

Optional: This parameter can be used to store objects in specific directories (paths) within a bucket.

If the profile is used for a rendition repository, use only the contentStreamId as a placeholder in the specified path. Do not include a DATE reference to avoid path changes for each rendition update (Instead of a rendition update, a new rendition would be created and the previous rendition would have to be deleted.)

-
retentionMode

Optional parameter to select a retention mode for S3 Object Lock.

Available values:

  • COMPLIANCE - objects under retention cannot be modified/deleted even by a storage administrator.
  • GOVERNANCE - storage users with specific permissions can modify/delete objects under retention.
COMPLIANCE

Parameters for File System Profiles

Use filesystem as value for archiveDriverName.

ParameterDescriptionDefault Value
volume

Storage container location.

-
pathTemplate

This parameter is essential as it configures to store objects in specific directories (paths) within the volume.

{tenant}/DOCUMENT/{contentStreamId.substring(1,3)}/{contentStreamId.substring(3,5)}/{contentStreamId.substring(5,6)}

Each binary content file stored in a file system repository is portioned in sub-sets of data. Each of those chunks is named <contentStreamId>_<count>. All chunks belonging to the same binary content file are encapsulated in one ZIP file. Additionally, each ZIP file contains a meta.xml file containing processing information for the stored binary content file:

  • file name
  • version
  • mime type
  • unpacked file size
  • SHA-256 hash calculated by the REPOSITORY service
  • MD5 hash calculated by the REPOSITORY service
example meta.xml
<?xml version="1.0" encoding="UTF-8"?>
<xmlmeta>
    <filename>content.eml</filename>
    <version>1.0</version>
    <mime>message/rfc822</mime>
    <size>156836</size>
    <sha256>0BA8B60F0A1D16DEB44D9EE427A75FADE6ADF751312BE9D22D7D4ED2CC3B2DA6</sha256>
    <md5>BF8782D1AD01CDF211AD09038AB64349</md5>
    <chunksize>156836</chunksize>
</xmlmeta>

The number of chunks is determined by the file size of the binary content file to be stored:

size of binary content file
number of chunks
size of each chunk
<= 1 MB1= size of binary content file
> 1 MB to <= 10 MBsize of binary content file / 1MB= 1MB (minimum chunk size)
> 10 MB to < 1280 MB (128 MB is the maximum chunk size)10= size of binary content file / number of chunks  + 1 Byte

> (x - 1) • 1280 MB to <= x • 1280 MB

where (x = 2 ... 2n, n ∈ )

10 • x= size of binary content file / number of chunks  + 1 Byte

Parameters for Azure Blob Storage Profiles

Available as of 2022 Winter. Use azure_blobstore as value for archiveDriverName.

ParameterDescriptionDefault Value
connection

Connection string containing all required information for the access to Azure.

-
defaultRetentionInDays

Retention time in days.

Note: Objects under retention can still be deleted by administrators having direct access to the storage.

To transmit no retention time, set the value 0.

Scheduled retention time (system:rmExpirationDate) has priority over any times specified here.

0
pathTemplate

Optional: This parameter can be used to store objects in specific directories (paths) within a bucket.


buckets

A map containing one or more configurations.

Specify a unique configuration name as configuration key. For each key, the name and retentionInYears parameters are required.

Whenever the content file of an object is stored in Azure, the ARCHIVE service selects the container with the lowest retentionInYears value that still ensures the retention for that concrete object (either scheduled via system:rmExpirationDate object property or via defaultRetentionInDays). The name of the selected bucket is used as prefix for the contentStreamId that is generated for the object content file.

-

name

Name of the container in Azure.

Note: The container has to exist.



retentionInYears

Retention time that is configured in Azure for the container specified by name.

If retentionInYears is 0, the corresponding container does not have archive-internal retention. It is used to store objects without retention (neither scheduled via system:rmExpirationDate object property nor via defaultRetentionInDays).

Note: The specified retention time has to match the Azure container configuration.


Example: Consider the configuration below. If the content file of an object with a scheduled retention time of 2 years would be stored, blob20years container would be used.

Example Azure Blob Storage Profile Configuration
storage:
  repositories:
    ...
  profiles:
    ...
    azure_blobstore:
      azureImmutable1:
        connection: 'DefaultEndpointsProtocol=https;AccountName=immutablestorageos;AccountKey=AAAAAAAAAA==;EndpointSuffix=core.windows.net'
        defaultRetentionInDays: 0
        buckets:
           'blob1':
             name: 'blobNoRetention'           
             retentionInYears: 0
           'blob2': 
             retentionInYears: 1
             name: 'blob1year'           
           'blob3': 
             retentionInYears: 20
             name: 'blob20years' 

Parameters for Azure Profiles with Object Retention

Available as of 2022 Winter. Use azure_objectretention as value for archiveDriverName.

ParameterDescriptionDefault Value
connection

Connection string containing all required information for the access to Azure.

-
defaultRetentionInDays

Retention time in days.

Note: Objects under retention can still be deleted by administrators having direct access to the storage.

To transmit no retention time, set the value 0.

Scheduled retention time (system:rmExpirationDate) has priority over any times specified here.

0
retentionMode

Optional parameter to select an Azure immutability policy lock similar to the retention mode for S3 Object Lock.

Available values:

  • COMPLIANCE - objects under retention cannot be modified/deleted even by a storage administrator. Corresponds to Azure immutability policy LOCKED.
  • GOVERNANCE - storage users with specific permissions can modify/delete objects under retention. Corresponds to Azure immutability policy UNLOCKED.
COMPLIANCE
pathTemplate

Optional: This parameter can be used to store objects in specific directories (paths) within a bucket.


buckets

Name of the bucket in the archive system for filing

-
Example Azure Profile Configuration with Object Retention
azure_objectretention:
  azureProfile1:
    connection: 'DefaultEndpointsProtocol=https;AccountName=XXXXX;AccountKey=xxxx....'
    bucket: test1
    pathTemplate: '{tenant}/{DATE(yyyy)}/{DATE(MM)}/{DATE(dd)}/{contentStreamId.substring(0,2)}'
    defaultRetentionInDays: 2