When saving thousands or millions of files to a harddisk or external storage device like the NetApp, this needs to be done in a structured way. Otherwise, accessing the files will eventually become very slow because directory listings will become large and take a long to process. In our experience, there is a limit of 10,000 entries per folder that should not be exceeded to prevent this performance degradation. Thus, our depositories (media sets) contain a subdirectory structure to prevent the excess of this limit. While the subdirectory structure solves this problem, it uses several times more filesystem units or inodes (in case of linux based devices like the NetApp) than the files alone would. When saving millions of files into the same media set and with inappropriate settings, this might cause the filesystem to run out of units/inodes although there is still free space available. This and the following information needs to be taken into account when creating and configuring the depositories.
- In yuuvis® RAD, all media sets are treated equally. The following points apply to WORK, CACHE and ARCHIVE media sets alike.
- The subdirectory structure is configurable in the number of layers and the character length of the directory names on each layer. By default, this is set to 4 layers of directories where each directory has 3 (hexadecimal) characters as its name. So for example, when saving the file example.pdf it will be put in the location: ...\WORK\jas\0BC\A65\734\C79\example.pdf
If you would reconfigure this to 3 layers of 2 characters, it would look like this: ...\WORK\jas\0B\CA\65\example.pdf - The hexadecimal characters used for the subdirectory names are taken from the 32 characters long GUID that the dms-service assigns to the content file (from left to right). So when saving the file example.pdf, first of all a GUID is generated and assigned (in the example it is 0BCA65734C7947DF905BDACB4B4AC782). Then the content file is put into the subdirectories corresponding to this value. This also implies that the sum of all character lengths must be less than or equal to 32.
- Since we have a limit of 10,000 entries per folder, the character length of the folder names should not exceed 3 characters. This is because with 3 characters we have a maximum of 4096 possible folders per layer (16 possibilities per character, to the power of 3 yields 16*16*16 = 4096). If 4 characters are used, this value increases to 16^4= 65,536 which is already far beyond 10,000.
- The content file is not just put directly under the last of the structural subdirectories, but has a container folder named by the entire GUID. It contains more files and folders for technical reasons, so that the actual structure look like this:
"...\WORK\jas\0BC\A65\734\C79\0BCA65734C7947DF905BDACB4B4AC782\D-1_1\DATAFILES\example.pdf" +
"...\WORK\jas\0BC\A65\734\C79\0BCA65734C7947DF905BDACB4B4AC782\D-1_1\ok" - Using this default configuration means that for each content file you will have another 7 directories (4 structural, 1 container, 2 technical) and 1 more file (technical "ok" file).
This ratio will only get "better" once there are more container folders than leaf directories (i.e., structural folders on the last layer). With 4 layers of 3-character subdirectories the number of possible leaf directories is (16^3)^4 = 281,474,976,710,656. Since the GUID function spreads very good, the possibility of having two container folders ending up in the same leaf directory is very low. Thus, you need to calculate with 8 additional filesystem units/inodes per content file. - Also take into account that the content files might be edited within its yuuvis® RAD lifetime. Also, because of the versioning, the files in the depositories are not replaced but another version is saved. Thus, you will have to calculate with the maximum number of files times 9 and multiply this with the average number of versions that each file will (expectedly) have in order to determine your maximum needed number of filesystem units/inodes.
- You can decrease this number significantly by the following two ways:
- Compress the container folder to a single .zip file. This will replace the container folder and all of its contents with a <GUID>.zip file. Thus, taking 5 of the 8 additional units/indoes away.
To do this, check the "compress" check box of the media set settings (show details). - Reduce the number of layers of the structural folders. Each removed layer will reduce the number of additionally needed units/inodes by 1.
To do this, define a custom "template for subdirectories" value in the media set settings. The syntax is "<number of characters for layer 1 subdirectories>,<number of characters for layer 2 subdirectories>,<number of characters for layer 3 subdirectories>" and so on. The default value is "3,3,3,3" meaning 4 layers with 3 characters each.
Keep in mind that each removed layer will reduce the number of leaf directories and thus the number of content files before exceeding 10,000 entries per folder. As an example: Having 2 layers of 3 characters yields to ~16 M leaf directories which means that after ~16 million * 10,000 = ~160 billion content files, you will exceed this limit. With 1 layer of 3 characters this will be 4096 leaf directories and after ~41 M content files the limit is exceeded.
- Compress the container folder to a single .zip file. This will replace the container folder and all of its contents with a <GUID>.zip file. Thus, taking 5 of the 8 additional units/indoes away.
For NetApp Users: The number of available inodes can be re-configured even if the volume is already created and used. But this will lead to less available storage space than configuring it during creation time. So if you can, try to avoid re-configuration during runtime.