april 21, 2023
Features of Kylin object changes
Recently, open-source technologies have become increasingly popular and actively used. Our company is no exception and began to master the new Apache technology stack. After several successful projects we want to share our development experience and some features.
It is a well-known fact that the development of the company is accompanied by improvements in the analytics solutions used, where the set of reference fields, the actual data for analytics and more can change. There are a number of limitations to changing objects in Kylin.
Apache Kylin has a hierarchy of relationships between objects of different types (project, table, model, cube, data set). The data model is built on a set of Hive tables and is the source for the cube. The sources for datasets in MDX for Kylin are pre-calculated and aggregated data, cubes developed and assembled in Apache Kylin. A diagram of object dependencies is shown in Figure 1.
Figure 1 — Dependency diagram of Kylin objects
Changes in one of the solution objects entail the need to refine related objects according to the arrows in the diagram. It is important to note that it is impossible to replace the fact table in the model and the model in the cube, while the cube in the dataset is rather problematic (there is a possibility of an unrecoverable error).
Our company in the project uses Hive tables of type external with storage type TEXTFILE. It often happens that the data structure changes so drastically that it is easier to create new tables than make changes to existing ones. Because of this, to avoid additional rework in Kylin, it is more rational to use Hive views as a fact table and frequently changed measurements in models. With the same set of columns in a view, data can be pulled from different tables, filtered, or already aggregated to the desired level. Since Kylin treats the top-level view as an ordinary table, all the changes of queries in the view while the set of columns remains unchanged only entail updating the cube segments, preventing the need to develop a new analytical solution.