may 5, 2023
Configuring the local Kylin cache to optimize query execution time
Open source technologies have become more and more popular and widely used lately. Our company is not an exception and began to master the new Apache technology stack. After several successful projects we would like to share our development experience and some features.
After deploying the Apache Kylin docker image from the developers (apachekylin/apache-kylin-standalone:4.0.0), when working with analytical solutions in Excel, delays and restrictions in requests to the server set in Kylin by default were revealed. In this article we will describe the variant of Apache Kylin configuration for a cluster with one node whose specifications are 150 GB RAM, 981.46 GB ROM, 10 core processor with 2.40 Hz base clock frequency.
MDX for Kylin allows you to create and use data sets for analysis in Excel. MDX for Kylin configurations can be managed in the insight.properties file. In order to enable complex queries, the MDX query timeout in seconds must be increased to 3600 and the initial and maximum memory capacity of the JVM when running MDX for Kylin must be increased. The modifiable properties are shown in Listing 1.
Listing 1 — MDX for Kylin configuration options
In addition, to reduce the execution time of queries to MDX for Kylin you need to change the following parameters in the file insight.properties (listing 2):
1) insight.mdx.mondrian.rolap.maxQueryThreads — specifies the maximum number of query threads for mdx, by default 50, let’s set 500.
2) insight.mdx.mondrian.rolap.optimize-tuple-size-in-aggregate.enable — specifies whether to enable tuple size optimization during mdx aggregation. Default value is false, set to true.
3) insight.mdx.mondrian.rolap.star.disableLocalSegmentCache — specifies whether to enable local cache for segments requested by mdx in MDX for Kylin, generate specific cache and find block cache SPI, default value is false, set to true.
Listing 2 — MDX for Kylin configuration options
For Kylin queries, you should also configure the cache in the kylin.properties file by changing the following configuration properties (listing 3):
1) kylin.query.cache-enabled: whether to enable caching. Set to true.
2) kylin.query.cache-threshold-duration: the duration of the query that exceeds the threshold is stored in the cache. Default value is 2000 (ms), set to 0.
3) kylin.query.cache-threshold-scan-bytes: bytes scanned in the query, exceeding the threshold, are stored in cache. Default value is 1048576 (bytes), set to 0.
Listing 3 — Kylin Configuration Settings
After performing a configuration similar to the one described in the article, the execution time of queries from Excel to the MDX for Kylin dataset was reduced by 40%.