When saving thousands or millions of files to a harddisk or external storage device like the NetApp, this needs to be done in a structured way or accessing the files will eventually become very slow because directory listings will get large and take long to process. In our experience there is a limit of 10,000 entries per folder that should not be exceeded to prevent this performance degradation. Thus our depositories (media sets) contain a subdirectory structure to prevent the excess of this limit. While the subdirectory structure solves this problem, it uses several times more filesystem units or inodes (in case of linux based devices like the NetApp) than the files alone would. When saving millions of files into the same media set and with inappropriate settings, this might cause the filesystem to run out of units / inodes although there is still free space available. This and the following information needs to be taken into account when creating and configuring the depositories.
- In yuuvis on-premises all media sets are treated equally. The following points apply to WORK, CACHE and ARCHIVE media sets alike.
- The subdirectory structure is configurable in the number of layers and the character length of the directory-names on each layer. By default this is set to 4 layers of directories where each directory has 3 (hexadecimal) characters as its name. So for example, when saving the file example.pdf it will be put in the location: ...\WORK\jas\0BC\A65\734\C79\example.pdf
If you would reconfigure this to 3 layers of 2 characeters, it would look like this: ...\WORK\jas\0B\CA\65\example.pdf - The hexadecimal characters used for the subdirectory names are taken from the 32 characters long GUID that the dms-service assignes to the content-file (from left to right). So when saving the file example.pdf, first of all a GUID is genereated and assigned (in the example it is 0BCA65734C7947DF905BDACB4B4AC782) and then the content-file is put into the subdirectories corresponding to this value. This also implies that the sum of all character lengths must be less than or equal to 32.
- Since we have the limit of 10,000 entries per folder, the character length of the foldernames should not exceed 3 characters. This is because with 3 characters we have a maximum of 4096 possible folders per layer (16 possibilities per character, to the power of 3 yields 16*16*16 = 4096). If 4 characters are taken this value increases to 16^4= 65,536 which is already far beyond 10,000.
- The content-file is not just put directly under the last of the structural subdirectories but has a container-folder named by the entire GUID. This contains more files and folders for technical reasons, so that the actual structure look like this:
"...\WORK\jas\0BC\A65\734\C79\0BCA65734C7947DF905BDACB4B4AC782\D-1_1\DATAFILES\example.pdf" +
"...\WORK\jas\0BC\A65\734\C79\0BCA65734C7947DF905BDACB4B4AC782\D-1_1\ok" - Using this default configuration means that for each content-file you will have another 7 directories (4 structural, 1 container, 2 technical) and 1 more file (technical "ok" file).
This ratio will only get "better" once there are more container-folders than leaf-directories (i.e. structural folders on the last layer). With 4 layers of 3-character subdirectories the number of possible leaf-directories is (16^3)^4 = 281,474,976,710,656. Since the GUID-function spreads very good, the possibility of having two container-folders ending up in the same leaf-directory is very low. Thus you need to calculate with 8 additional filesystem units / inodes per content-file. - Also take into account that the content-files might be edited within its yuuvis lifetime and, because of the versioning, the files in the depositories are not replaced but another version is saved. Thus you will have to calculate with the maximum number of files times 9 and multiply this with the average number of versions that each file will (expectedly) have in order to determine your maximum needed number of filesystem units / inodes.
- You can decrease this number significantly by the follwoing two ways:
- Compress the container-folder to a single .zip file. This will replace the container-folder and all of its contents with a <GUID>.zip file. Thus taking 5 of the 8 additional units/indoes away.
To do this, check the "compress" checkbox of the media set settings (turn details on). - Reduce the number of layers of the structural folders. Each removed layer will reduce the number of additionally needed units/inodes by 1.
To do this, define a custom "template for subdirectories" value in the media set settings. The syntax is "<number of characeters for layer 1 subdirectories>,<number of characeters for layer 2 subdirectories>,<number of characeters for layer 3 subdirectories>" and so on. The default value is "3,3,3,3" meaning 4 layers with 3 characters each.
Keep in mind that each removed layer will reduce the number of leaf-directories and thus the number of content-files before exceeding 10,000 entries per folder. As an example: Having 2 layers of 3 characters yields to ~16 mio. leaf-directories which means that after ~16 mio * 10,000 = ~160 billion content-files you will exceed this limit. With 1 layer of 3 characters this will be 4096 leaf-directories and after ~41 mio content-files the limit is exceeded.
- Compress the container-folder to a single .zip file. This will replace the container-folder and all of its contents with a <GUID>.zip file. Thus taking 5 of the 8 additional units/indoes away.
For NetApp Users: The number of available inodes can be re-configured even if the volume is already created and used but this leads to less available storage space than when configuring it during creation time. So if you can, try to avoid re-configuration during runtime.