march 30, 2023
Apache Kylin distributed database
Not so long ago our company started to master the new Apache technology stack. After several successful projects we want to share our development experience and some features.
After deploying the Apache Kylin docker image from the developers (apachekylin/apache-kylin-standalone:4.0.0), when working with analytics solutions, we faced the problem of resource limits set in Kylin by default. In this article, we will discuss options to change the configuration of Hadoop for a cluster with a single node (node) whose specifications are 150GB RAM, 981.46GB ROM, 10 core processor with a base clock speed of 2.40Hz.
To change the configuration of Hadoop, after connecting to the server, go to the command line of the Kylin container, find the required file and edit it using vi (the commands are shown in Figure 1).
Figure 1 — commands for navigating to the Kylin container, searching for a file by name
Set additional properties in yarn-site.xml file (listing 1):
1) Will virtual memory limits be applied to containers. Set it to false.
2) The amount of physical memory in MB that can be allocated to containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is set to true, it is calculated automatically (in the case of Windows and Linux). In other cases 8192MB is the default. Let’s set it to 120680MB.
3) Minimum percentage of free disk space and maximum percentage of occupied disk space for NodeManager to mark the node as operational (default is 0.25). We will set 0.0 and 100.0 respectively.
4) Maximum memory limit for each queue allocated to each container request in Resource Manager. Let’s set 32768MB.
5) Maximum number of virtual cores for each queue allocated to each container request in Resource Manager. Let’s set 8.
Listing 1 — additional properties in yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>120680</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
<value>0.0</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>100.0</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>8</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>32768</value>
</property>
In the capacity-scheduler.xml file (listing 2), add a property that defines the maximum percentage of resources in the cluster that can be used to run application wizards — controls the number of simultaneously active applications. The limits for each queue are directly proportional to the capacity of the queue and the user limits. Defined as a floating point number, i.e. 0.8 = 80%. The default is 10%.
The container must be restarted for the changes to take effect (Listing 3).
Listing 2 — additional properties in capacity-scheduler.xml
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.8</value>
</property>
Listing 3 — Restarting the container
sudo docker restart kylin