april 15, 2023
Provisional filtering of the data in the subquery by filter values in Apache Superset.
The issue of query optimisation for Apache Superset virtual datasets is quite relevant, as it greatly affects the visualisation speed of the report data. From the dashboard user’s point of view, this is critical. In some cases it is not possible to transfer the calculation logic to the data warehouse, so it is necessary to add data pre-filtering to the virtual dataset.
The visualisation uses samples of data from the original set, by filter values, which means that the dataset needs to be changed dynamically. Jinja templates can help us with this. In the standard installation package they are disabled, to enable them you need to set the ENABLE_TEMPLATE_PROCESSING function flag in superset_config.eng. You can then embed Python code into the virtual datasets.
1. Let’s create a test query, a chart with a table, a reference for the filter and the filter itself:
Query for table
select ‘col1_param1’ as column, 1 as value
union all select ‘col1_param1’ as column, 2 as value
union all select ‘col1_param2’ as column, 3 as value
union all select ‘col1_param1’ as column, 1 as value
union all select ‘col1_param1’ as column, 2 as value
union all select ‘col1_param2’ as column, 3 as value
Query for filter directory
select ‘col1_param1’ as column_for_filter
union all select ‘col1_param2’ as column_for_filter
Chart and filter view in the dashboard
The filter will not work in this form, because the columns in the dataset and the table have different names.
If you look at the resultant query (click on chart properties => View query), the filter in the original configuration will be inserted in position 1, we need to insert the filter in position 2.
SELECT «column» AS «column»,
value AS value
FROM
(select ‘col1_param1’ as column, 1 as value
union all select ‘col1_param1’ as column, 2 as value
union all select ‘col1_param2’ as column, 3 as value
union all select ‘col1_param1’ as column, 1 as value
union all select ‘col1_param1’ as column, 2 as value
union all select ‘col1_param2’ as column, 3 as value
where <Position 2>) AS virtual_table
WHERE <Position 1>
LIMIT 1000;
2. Let’s edit the query for the virtual dataset.
The Jinja templates have a function called filter_values() which returns an array of selected filter values. To make this function work in a query, we need to enclose it in double curly braces and pass the column name used in the filter as an argument. Let’s limit ourselves to one value for filtering, because the output will be an array, we will specify its first element. The template for filtering will look as follows:
{{ filter_values(‘column_for_filter’,)[0] }}, insert it into the original dataset.
select column, value
from
(select ‘col1_param1’ as column, 1 as value
union all select ‘col1_param1’ as column, 2 as value
union all select ‘col1_param2’ as column, 3 as value
union all select ‘col1_param1’ as column, 1 as value
union all select ‘col1_param1’ as column, 2 as value
union all select ‘col1_param2’ as column, 3 as value) as t
where column in (‘{{ filter_values(‘column_for_filter’,)[0] }}’)
Final result
All possibilities for using the Jinja templates can be found in the SuperSet documentation through this link:
https://superset.apache.org/docs/installation/sql-templating/