november 30, 2023
Cleaning up the data that exports to CSV from HTML formatting code in Apache Superset
By using HTML for formatting, you can achieve colourful visualisations in Apache Superset. This topic was explained in detail earlier in our blog in the article «Interactive visualisation using HTML in Apache Superset». But, the effect of such a solution is incorrect uploading of data to CSV format from the chart, because the customisation was added at the database query level. We have developed an add-on embedded in Superset that solves this problem, that will be shown in this post.
Let’s consider the example shown in the figure below. Here you can clearly see how you can use HTML to transform the visualisation of «Chart2» to the visualisation on «Chart1».
Let’s suppose that the analyst needed to retrieve chart data for a more detailed analysis. It is quite easy to download them in CSV format. It is necessary to click on the triplet in the upper right corner of the screen, then «Download» and «Export to csv»:
Let’s save the chart data in CSV format and then load it into excel. The result is shown in the figure below:
We can see that the data from «Chart1» is not being uploaded correctly, and there are a couple of other issues:
- If the table has html markup for the colour indication of cells, then it goes into the summary file;
- On opening CSV in excel:
a) Incorrect encoding;
b) Default delimiter is comma but it requires semicolon;
It is possible to clean such data at excel level, but if you need repeat processing or the data array is large, it will significantly increase labour costs. For a business user, such an option may not be suitable.
To solve these problems we changed 2 files: in source code and superset configuration.
File superset/utils/csv.py:
File superset/config.py:
The upload takes the following form in excel by using this solution:
Currently, this solution is embedded in a docker container, can be easily ported from project to project, and can be adapted for different versions of Apache Superset.