Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article addresses programmers who implement components to be used for searching. Four services can be used for different requirements. The following matrix should guide to select the best fitting service.

Search Feature Matrix

Topic

Search Service

Structure Service

core-service query

core-service raw esql

Backend storage used

Elasticsearch

Elasticsearch

Relational DB 

Relational DB

Short description

For search result lists and aggregations.

For search and building nested virtual trees using aggregations.

Simple AND conditions on documents and folders. Returns full dms objects as json/xml.

Provides simple(raw) database values by using a feature rich SQL like query. 

In production usage

Client main search and client dashlets.

Client folder view to build the virtual child objects tree.

API only usage.

API only usage.

Service name

search-service

structure-service

rest-ws - ResultService - query

rest-ws - ResultService - raw

API Documentation link

Search API

Structure Service API

Live Swagger-UI for rest-ws / ResultService / query

Live Swagger-UI for rest-ws / ResultService /raw

See also Using eSQL

Supports SQL-like query?

No

Yes, but a very limited set.

No

Yes, full eSQL functionality.

Supports fulltext search conditions?

Yes

Yes

No - only wildcard search values on single fields.

No, but the sql LIKE can be used.

Supports search for document text content?

Yes, by fulltext search,

Yes, by fulltext search.

No

No

Supports aggregations? (COUNT,SUM)

Yes, partially

Yes, full, with nested aggregations and getting arbitrary aggregations in one call.

No

Yes, limited, by using SUM(), COUNT() as part of the e-sql functionality.

Supports access to inactive versions?

No, inactive versions are not permanently stored inside Elasticsearch. Inactive versions can not be accessed with the API.

No

Yes, by using the "(all versions)" eSQL qualifier.

Streaming capable? (1)

No

No

Yes

Yes

Data consistency

The Elasticsearch data is updated lazily. Any change of indexdata or content text is processed asynchronously using a message queue. The immediate update can be forced by using the synchronous storage mechanism. But even in this case, it is possible that the data in the Elasticsearch is not up to date. On heavy system load, or if a reindexing is running, it may even take several minutes before the data is updated. Even if the indexdata of an object is up to date, it may take some more time until the content text is extracted by the rendition service and updated in the Elasticsearch index.

The data is always up to date

always

. The changes are committed and confirmed on the relational database before any message about this change is send.

Access control

The Elasticsearch index holds an access control list (ACL) for each entry. This list is checked against the user access list on each query. The performance cost at query time is constant, regardless of the complexity of the rights system. The disadvantage is: Changes in the security system, like new or removed roles or changes to visibility clauses, must be propagated to the Elasticsearch index. It may take a long time before the ACL is updated in the index. This depends on the amount of data and complexity of the visibility clauses.

The user access is checked during query time. The SQL does not only contain the conditions given by the user. It also applies visibility clauses for the folder and document objects. This is a query performance cost, that is dependent on the complexity of the request and the rights system. The advantage is: Changes of the right system are instantly active.

Basic performance  considerations

Elasticsearch is built to give a very good search performance for fulltext searches. Indexdata conditions and aggregations are also performing well. It is not well suited to retrieve large amounts of data. 

Relational databases are good at joining tables and using where conditions on columns that are indexed. It is very important to use indexed columns in the where conditions for a good query performance. If LIKE is used with wildcard in front, an index can not be used.

(1) What is "Streaming capable?"

...