What search API to use?

This article addresses programmers who implement components to be used for searching. Four services can be used for different requirements. The following matrix should guide to select the best fitting service.

Search Feature Matrix

Topic	Search Service	Structure Service	core-service query	core-service raw esql

Topic	Search Service	Structure Service	core-service query	core-service raw esql
Backend storage used	Elasticsearch	Elasticsearch	Relational DB	Relational DB
Short description	For search result lists and aggregations.	For search and building nested virtual trees using aggregations.	Simple AND conditions on documents and folders. Returns full dms objects as json/xml.	Provides simple(raw) database values by using a feature rich SQL like query.
In production usage	Client main search and client dashlets.	Client folder view to build the virtual child objects tree.	API only usage.	API only usage.
Service name	search-service	structure-service	rest-ws - ResultService - query	rest-ws - ResultService - raw
API Documentation link	Search API	Structure Service API	Live Swagger-UI for rest-ws / ResultService / query	Live Swagger-UI for rest-ws / ResultService /raw See also Using eSQL
Supports SQL-like query?	No	Yes, but a very limited set.	No	Yes, full eSQL functionality.
Supports fulltext search conditions?	Yes	Yes	No - only wildcard search values on single fields.	No, but the sql LIKE can be used.
Supports search for document text content?	Yes, by fulltext search,	Yes, by fulltext search.	No	No
Supports aggregations? (COUNT,SUM)	Yes, partially	Yes, full, with nested aggregations and getting arbitrary aggregations in one call.	No	Yes, limited, by using SUM(), COUNT() as part of the e-sql functionality.
Supports access to inactive versions?	No, inactive versions are not permanently stored inside Elasticsearch. Inactive versions can not be accessed with the API.		No	Yes, by using the "(all versions)" eSQL qualifier.
Streaming capable? (1)	No	No	Yes	Yes
Data consistency	The Elasticsearch data is updated lazily. Any change of indexdata or content text is processed asynchronously using a message queue. The immediate update can be forced by using the synchronous storage mechanism. But even in this case, it is possible that the data in the Elasticsearch is not up to date. On heavy system load, or if a reindexing is running, it may even take several minutes before the data is updated. Even if the indexdata of an object is up to date, it may take some more time until the content text is extracted by the rendition service and updated in the Elasticsearch index.		The data is always up to date. The changes are committed and confirmed on the relational database before any message about this change is send.
Access control	The Elasticsearch index holds an access control list (ACL) for each entry. This list is checked against the user access list on each query. The performance cost at query time is constant, regardless of the complexity of the rights system. The disadvantage is: Changes in the security system, like new or removed roles or changes to visibility clauses, must be propagated to the Elasticsearch index. It may take a long time before the ACL is updated in the index. This depends on the amount of data and complexity of the visibility clauses.		The user access is checked during query time. The SQL does not only contain the conditions given by the user. It also applies visibility clauses for the folder and document objects. This is a query performance cost, that is dependent on the complexity of the request and the rights system. The advantage is: Changes of the right system are instantly active.
Basic performance considerations	Elasticsearch is built to give a very good search performance for fulltext searches. Indexdata conditions and aggregations are also performing well. It is not well suited to retrieve large amounts of data.		Relational databases are good at joining tables and using where conditions on columns that are indexed. It is very important to use indexed columns in the where conditions for a good query performance. If LIKE is used with wildcard in front, an index can not be used.

(1) What is "Streaming capable?"

The service is streaming capable, if the result from the backend storage is delivered to the calling client without storing the complete result in memory. If the core-service is used, the client's request gets transformed to a SQL statement used to query the database. The data the database returns is transformed row by row and sent back to the client. The memory consumption is minimal in this case. If the receiving client is also capable of stream handling the result, even large datasets can be processed.
The search and structure service do not support stream handling. The complete data set of the backend storage (Elasticsearch) is kept and transformed in memory. Therefore it is mandatory to set an upper limit for the result size. If no size is given, the service enforces a default size.

yuuvis® RAD

What search API to use?

Analytics

Search Feature Matrix

(1) What is "Streaming capable?"

Related content