application-storage.yml
Configuration file containing parameters and archive profiles used by the ARCHIVE and REPOSITORY service in order to connect external archives.
Table of Contents
Characteristics
Configuration File Name | application-storage.yml |
---|---|
Referenced by Services | archive,repository |
Storage Location | Git root directory |
File Structure
The file application-storage.yml
is structured as follows:
storage: repositories: <repositoryId>: profiles: ['<archiveProfile>'] <parameter>: <value> profiles: <archiveDriverName>: <archiveProfile>: <parameter>: <value>
Define your repositories via repositoryId
. For each of them, specify an archive profile and thus define the type of archive driver for the corresponding repository. Set the parameters url
and useDiscovery
as described below.
For each type of archive driver identified by archiveDriverName
, define one or more archive profiles. The name of each archive profile must be unique within the configuration file. Depending on the archive type, specific configuration parameters have to be set as described below too.
Parameters
Parameter | Description | Example | |
---|---|---|---|
| Specify the ID of a default rendition repository. It will be used for the storage of renditions if no repository is specified in the corresponding request header. As of version 2021 Winter, only an S3 storage can be supported. It has to be defined in the It is recommended to configure a separate repository for the storage of renditions. The demands regarding storage of renditions differs from the storage of binary content files. E.g., retention is not supported. | renrepo | |
storage.repositories | Configurations of repositories for by the REPOSITORY service. The | s3: profiles: ['s3profile1'] default: true netapp: profiles: ['netapplike'] url: http://archive/api/profiles/{profile}/dms/objects useDiscovery: true | |
profiles | A list of archive profile names, each of them referencing an archive profile defined in the For any repository that will be used as rendition repository, an S3 profile has to be specified. Note: Only the first list entry will be considered whereas further list entries will be ignored. | ||
url | Address of the repository accessible via the ARCHIVE service. The archive profile can be referenced as dynamic component in curly brackets (e.g., Not required for a repository of type | ||
useDiscovery | Boolean value specifying if the given Not required for a repository of type | ||
predicate | Available as of 2023 Autumn. Specifies a condition using any properties of the target DMS objects before normalization. If the condition is matched by an imported DMS object, its binary content is stored in the corresponding repository instead of the default repository. If a Default value: | ||
default | Optional: Boolean value specifying if the repository should be the default repository ( The default repository will be used for each content-related request where no The default repository is NOT used as default rendition repository if | ||
storage.profiles | Archive profiles for each archive driver. Multiple profiles can be defined for each archive driver. The name of each profile is defined via the corresponding key (e.g., The parameters for the configuration of an archive profile depend on the archive driver indicated by the corresponding Note: An S3 archive profile configuration is required for the usage of a rendition repository. | s3: s3profile1: access-key: 'MGMWCOYTDUSLNCFE' secret-key: 'changeme' url: 'http://minio.infrastructure:9000' bucket: 'dmscloudrepodocker' netapp: netapplike: # use mountpoint of the persistent volume claim that provides the netapp storage # must be the same in the deployment of the archive app volume: '/var/lib/netapp/data' defaultRetentionInDays: 10 |
The values for the parameters can be modified as described here.
>> Configuring Services using Profiles.
Parameters in Archive Profiles
Depending on the archive driver, the archive profiles have to be configured with specific parameters.
Drivers for the following archives are available:
- S3 (
s3
) - NetApp (
netapp
) - iCAS (
iternity
) - Hitachi Content Platform (
hcp_s3
) - Cloudian HyperStore (
cloudian_s3
) - DELL S3 (
ecs-s3
) - File System (
filesystem
) - Azure:
- Driver for Usage of Blob Store (
azure_blobstore
) - Driver for Usage of Object Retention (
azure_objectretention
)
- Driver for Usage of Blob Store (
General Parameters
Retention
The defaultRetentionInDays
parameter can be set in all following archive profiles and is only relevant for the ARCHIVE service. This value is used as retention time for objects that do not have a retention time specified within their metadata during their import. If defaultRetentionInDays
is set to 0
which is also the default value, no retention will be set for those objects. If the profile is used for a rendition repository, the parameter is ignored as renditions cannot be under retention.
Path Templates
Many of the below listed archive profiles allow to configure a pathTemplate
to specify a path structure within the storage.
The following variables can be used. They are referenced with curly brackets within the pathTemplate
value (see example below).
tenant
,system:
tenant
contentStreamId
,system:
contentStreamId
objectTypeId
,system:objectTypeId
With the substring
operation, a specific part of a string can be referenced. For example, to reference the first two characters of the contentStreamId
, use {contentStreamId.substring(0,2)}
. The second index is excluded. If only one index is specified, the characters beginning from that index are referenced. For example, using 'test'.substring(2)
would result in st
.
In some cases, the DATE
operation can be useful to reference the current date stamp. By passing a parameter to DATE
, a format string is specified as defined by SimpleDateFormat. Without any parameter, a yyyyMMdd
is considered.
Example:
- Consider the
pathTemplate
configuration{tenant}/{DATE(yyyy)/DATE(MM)/DATE(dd)/{contentStreamId.substring(0,2)}}
- The currently logged-in user belongs to the
yuuvistest
tenant. - The user stores a binary content file at 21 DEC 2022 at 4.11 pm.
- The binary content file gets
7850BB1A-F749-11E8-A21E-49DDF2475266
, as value for thecontentStreamId
. - The archive path results to
yuuvistest/2022/12/21/78/
.
Parameters for S3 Archive Profiles
Use s3
as value for archiveDriverName
.
Parameter | Description | Default Value |
---|---|---|
access-key | Access key | - |
secret-key | Password | - |
url | URL for S3 | - |
bucket | Name of the bucket in the archive system for filing. | - |
| Retention time in days. Note: Objects under retention can still be deleted by administrators having direct access to the S3 storage. To transmit no retention time, set the value Scheduled retention time (system:rmExpirationDate) has priority over any times specified here. If the profile is used for a repository used as rendition repository, the parameter is ignored for the storage of renditions as they cannot be under retention. However, if binary content files are stored in the same repository beside the renditions, the | 0 |
| Specifies the location of the data center where to create new buckets. | - |
| Boolean value that decides if the archive-internal retention ( If
If
In both cases: For objects without retention specified via secondary object type (SOT) | true |
pathTemplate | Optional: This parameter can be used to store objects in specific directories (paths) within a bucket. If the profile is used for a rendition repository, use only the | - |
retentionMode | Optional parameter to select a retention mode for S3 Object Lock. Available values:
| COMPLIANCE |
Parameters for NetApp Archive Profiles
Use netapp
as value for archiveDriverName
.
Parameter | Description | Default Value |
---|---|---|
volume | Archive location for the data to be saved. | - |
| Optional: Retention time in days. To transmit no retention time, set the value Scheduled retention time (system:rmExpirationDate) has priority over any times specified here. | 0 |
enableReadOnly | Optional: Defines whether the drivers of the archive should define data to be read-only. | true |
pathTemplate | Optional: This parameter can be used to store objects in specific directories (paths) within a bucket. | - |
Parameters for iCAS Archive Profiles
Use iternity
as value for archiveDriverName
.
Parameter | Description | Default Value | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
userName | Name of the user with the appropriate rights for the archive. | - | ||||||||||||||||||||
userPassword | The user's password. | - | ||||||||||||||||||||
endpoint | URL of the iCAS web service. | - | ||||||||||||||||||||
cscMode | Storage mode for objects and meta data. The storage mode is specified with a sequence of four parameters:
Example:
The binary content file is compressed (L) and encrypted with the standard method (S); the metadata is not compressed (S) and not encrypted (N). | - | ||||||||||||||||||||
maxCreateCscSize | Determines the maximum size of containers. The information is specified in bytes. | 10000000 | ||||||||||||||||||||
maxCreateCscFile | Determines the maximum number of objects for the containers. | 1000 | ||||||||||||||||||||
maxCreateCscSingleFileLimit | Determines the individual size limit of an object. | 4000000 | ||||||||||||||||||||
maxWorkChunkSize | Determines the maximum size of a single chunk. The information is specified in bytes. | 5000000 | ||||||||||||||||||||
| Optional: Resource path to a certificate trust store for encrypted communication with the web service. | - | ||||||||||||||||||||
| Optional: Password for the certificate trust store. | - | ||||||||||||||||||||
| Retention time in days. To transmit no retention time, set the value Scheduled retention time (system:rmExpirationDate) has priority over any times specified here. | 0 |
Parameters for Hitachi Content Platform Archive Profiles
Use hcp_s3
as value for archiveDriverName
.
Parameter | Description | Default Value |
---|---|---|
access-key | Access key | - |
secret-key | Password | - |
url | URL for HCP | - |
bucket | Name of the bucket in the archive system for filing. | - |
| Retention time in days. To transmit no retention time, set the value Scheduled retention time (system:rmExpirationDate) has priority over any times specified here. | 0 |
pathTemplate | Optional: This parameter can be used to store objects in specific directories (paths) within a bucket. | - |
Parameters for Cloudian HyperStore Archive Profiles
The Cloudian Hyperstore Content Platform provides an AWS-S3-compatible rest API with few extensions used by the ARCHIVE service. In order to configure the Cloudian Hyperstore archive, you need to configure the following (S3-relevant) parameters.
Use cloudian_s3
as value for archiveDriverName
.
Parameter | Description | Default Value |
---|---|---|
access-key | Access key | - |
secret-key | Password | - |
url | URL for Cloudian HyperStore | - |
bucket | Name of the bucket in the archive system for filing. | - |
| Retention time in days. To transmit no retention time, set the value Scheduled retention time (system:rmExpirationDate) has priority over any times specified here. | 0 |
pathTemplate | Optional: This parameter can be used to store objects in specific directories (paths) within a bucket. | - |
Parameters for DELL S3 Archive Profiles
Available as of 2022 Autumn. Use ecs-s3
as value for archiveDriverName
.
Parameter | Description | Default Value |
---|---|---|
access-key | Access key | - |
secret-key | Password | - |
url | URL for S3 | - |
bucket | Name of the bucket in the archive system for filing. | - |
| Retention time in days. Note: Objects under retention can still be deleted by administrators having direct access to the S3 storage. To transmit no retention time, set the value Scheduled retention time (system:rmExpirationDate) has priority over any times specified here. If the profile is used for a repository used as rendition repository, the parameter is ignored for the storage of renditions as they cannot be under retention. However, if binary content files are stored in the same repository beside the renditions, the | 0 |
| Specifies the location of the data center where to create new buckets. | - |
| Boolean value that decides if the archive-internal retention ( If
If
In both cases: For objects without retention specified via secondary object type (SOT) | true |
pathTemplate | Optional: This parameter can be used to store objects in specific directories (paths) within a bucket. If the profile is used for a rendition repository, use only the | - |
retentionMode | Optional parameter to select a retention mode for S3 Object Lock. Available values:
| COMPLIANCE |
Parameters for File System Profiles
Use filesystem
as value for archiveDriverName
.
Parameter | Description | Default Value | |
---|---|---|---|
volume | Storage container location. | - | |
pathTemplate | This parameter is essential as it configures to store objects in specific directories (paths) within the volume. | {tenant}/DOCUMENT/{contentStreamId.substring(1,3)}/{contentStreamId.substring(3,5)}/{contentStreamId.substring(5,6)} |
Each binary content file stored in a file system repository is portioned in sub-sets of data. Each of those chunks is named <contentStreamId>_<count>
. All chunks belonging to the same binary content file are encapsulated in one ZIP file. Additionally, each ZIP file contains a meta.xml file containing processing information for the stored binary content file:
- file name
- version
- mime type
- unpacked file size
- SHA-256 hash calculated by the REPOSITORY service
- MD5 hash calculated by the REPOSITORY service
<?xml version="1.0" encoding="UTF-8"?> <xmlmeta> <filename>content.eml</filename> <version>1.0</version> <mime>message/rfc822</mime> <size>156836</size> <sha256>0BA8B60F0A1D16DEB44D9EE427A75FADE6ADF751312BE9D22D7D4ED2CC3B2DA6</sha256> <md5>BF8782D1AD01CDF211AD09038AB64349</md5> <chunksize>156836</chunksize> </xmlmeta>
The number of chunks is determined by the file size of the binary content file to be stored:
size of binary content file | number of chunks | size of each chunk |
---|---|---|
<= 1 MB | 1 | = size of binary content file |
> 1 MB to <= 10 MB | size of binary content file / 1MB | = 1MB (minimum chunk size) |
> 10 MB to < 1280 MB (128 MB is the maximum chunk size) | 10 | = size of binary content file / number of chunks + 1 Byte |
> (x - 1) • 1280 MB to <= x • 1280 MB where (x = 2 ... 2n, n ∈ ℕ) | 10 • x | = size of binary content file / number of chunks + 1 Byte |
Parameters for Azure Blob Storage Profiles
Available as of 2022 Winter. Use azure_blobstore
as value for archiveDriverName
.
Parameter | Description | Default Value | |
---|---|---|---|
connection | Connection string containing all required information for the access to Azure. | - | |
defaultRetentionInDays | Retention time in days. Note: Objects under retention can still be deleted by administrators having direct access to the storage. To transmit no retention time, set the value Scheduled retention time (system:rmExpirationDate) has priority over any times specified here. | 0 | |
pathTemplate | Optional: This parameter can be used to store objects in specific directories (paths) within a bucket. | ||
buckets | A map containing one or more configurations. Specify a unique configuration name as configuration key. For each key, the Whenever the content file of an object is stored in Azure, the ARCHIVE service selects the container with the lowest | - | |
name | Name of the container in Azure. Note: The container has to exist. | ||
retentionInYears | Retention time that is configured in Azure for the container specified by If Note: The specified retention time has to match the Azure container configuration. |
Example: Consider the configuration below. If the content file of an object with a scheduled retention time of 2 years would be stored, blob20years
container would be used.
storage: repositories: ... profiles: ... azure_blobstore: azureImmutable1: connection: 'DefaultEndpointsProtocol=https;AccountName=immutablestorageos;AccountKey=AAAAAAAAAA==;EndpointSuffix=core.windows.net' defaultRetentionInDays: 0 buckets: 'blob1': name: 'blobNoRetention' retentionInYears: 0 'blob2': retentionInYears: 1 name: 'blob1year' 'blob3': retentionInYears: 20 name: 'blob20years'
Parameters for Azure Profiles with Object Retention
Available as of 2022 Winter. Use azure_objectretention
as value for archiveDriverName
.
Parameter | Description | Default Value | |
---|---|---|---|
connection | Connection string containing all required information for the access to Azure. | - | |
defaultRetentionInDays | Retention time in days. Note: Objects under retention can still be deleted by administrators having direct access to the storage. To transmit no retention time, set the value Scheduled retention time (system:rmExpirationDate) has priority over any times specified here. | 0 | |
retentionMode | Optional parameter to select an Azure immutability policy lock similar to the retention mode for S3 Object Lock. Available values:
| COMPLIANCE | |
pathTemplate | Optional: This parameter can be used to store objects in specific directories (paths) within a bucket. | ||
buckets | Name of the bucket in the archive system for filing | - |
azure_objectretention: azureProfile1: connection: 'DefaultEndpointsProtocol=https;AccountName=XXXXX;AccountKey=xxxx....' bucket: test1 pathTemplate: '{tenant}/{DATE(yyyy)}/{DATE(MM)}/{DATE(dd)}/{contentStreamId.substring(0,2)}' defaultRetentionInDays: 2