What search API to use?
This article addresses programmers who implement components to be used for searching. Four services can be used for different requirements. The following matrix should guide to select the best fitting service.
Search Feature Matrix
Topic | Search Service | Structure Service | core-service query | core-service raw esql |
---|---|---|---|---|
Backend storage used | Elasticsearch | Elasticsearch | Relational DB | Relational DB |
Short description | For search result lists and aggregations. | For search and building nested virtual trees using aggregations. | Simple AND conditions on documents and folders. Returns full dms objects as json/xml. | Provides simple(raw) database values by using a feature rich SQL like query. |
In production usage | Client main search and client dashlets. | Client folder view to build the virtual child objects tree. | API only usage. | API only usage. |
Service name | search-service | structure-service | rest-ws - ResultService - query | rest-ws - ResultService - raw |
API Documentation link | Live Swagger-UI for rest-ws / ResultService / query | Live Swagger-UI for rest-ws / ResultService /raw See also Using eSQL | ||
Supports SQL-like query? | No | Yes, but a very limited set. | No | Yes, full eSQL functionality. |
Supports fulltext search conditions? | Yes | Yes | No - only wildcard search values on single fields. | No, but the sql LIKE can be used. |
Supports search for document text content? | Yes, by fulltext search, | Yes, by fulltext search. | No | No |
Supports aggregations? (COUNT,SUM) | Yes, partially | Yes, full, with nested aggregations and getting arbitrary aggregations in one call. | No | Yes, limited, by using SUM(), COUNT() as part of the e-sql functionality. |
Supports access to inactive versions? | No, inactive versions are not permanently stored inside Elasticsearch. Inactive versions can not be accessed with the API. | No | Yes, by using the "(all versions)" eSQL qualifier. | |
Streaming capable? (1) | No | No | Yes | Yes |
Data consistency | The Elasticsearch data is updated lazily. Any change of indexdata or content text is processed asynchronously using a message queue. The immediate update can be forced by using the synchronous storage mechanism. But even in this case, it is possible that the data in the Elasticsearch is not up to date. On heavy system load, or if a reindexing is running, it may even take several minutes before the data is updated. Even if the indexdata of an object is up to date, it may take some more time until the content text is extracted by the rendition service and updated in the Elasticsearch index. | The data is always up to date. The changes are committed and confirmed on the relational database before any message about this change is send. | ||
Access control | The Elasticsearch index holds an access control list (ACL) for each entry. This list is checked against the user access list on each query. The performance cost at query time is constant, regardless of the complexity of the rights system. The disadvantage is: Changes in the security system, like new or removed roles or changes to visibility clauses, must be propagated to the Elasticsearch index. It may take a long time before the ACL is updated in the index. This depends on the amount of data and complexity of the visibility clauses. | The user access is checked during query time. The SQL does not only contain the conditions given by the user. It also applies visibility clauses for the folder and document objects. This is a query performance cost, that is dependent on the complexity of the request and the rights system. The advantage is: Changes of the right system are instantly active. | ||
Basic performance considerations | Elasticsearch is built to give a very good search performance for fulltext searches. Indexdata conditions and aggregations are also performing well. It is not well suited to retrieve large amounts of data. | Relational databases are good at joining tables and using where conditions on columns that are indexed. It is very important to use indexed columns in the where conditions for a good query performance. If LIKE is used with wildcard in front, an index can not be used. |
(1) What is "Streaming capable?"
The service is streaming capable, if the result from the backend storage is delivered to the calling client without storing the complete result in memory. If the core-service is used, the client's request gets transformed to a SQL statement used to query the database. The data the database returns is transformed row by row and sent back to the client. The memory consumption is minimal in this case. If the receiving client is also capable of stream handling the result, even large datasets can be processed.
The search and structure service do not support stream handling. The complete data set of the backend storage (Elasticsearch) is kept and transformed in memory. Therefore it is mandatory to set an upper limit for the result size. If no size is given, the service enforces a default size.