info@lasmart.biz

february 28, 2024

Working with eMondrian

Introduction

eMondrian is a free and open source OLAP server based on the Mondrian project. The OLAP server represents the database as a multidimensional space with dimensions, which allows you to hide the complexity of the underlying tables and their relationships, and allows you to interactively analyse data from different perspectives. The eMondrian server can run on Windows and Linux operating systems.
eMondrian supports XML standard for analysis (XMLA) and OLE DB for OLAP, with which you can connect to client tools: Microsoft Excel, Power BI, Tableau. Any database that has a JDBC driver can be a data source for eMondrian.
This paper will describe the use of eMondrian, which was used as an intermediary to work analysing data in Excel from SQL Server data warehouse.

Installing eMondrian

In this paper, eMondrian was installed locally on Windows 10 with Java 1.8.0_311 and Apache Tomcat / 9.0.56.

Installation progress:

  1. Install Java.
  2. Install Tomcat.
  3. Configure/start Tomcat.
  4. Install eMondrian (download and place the war file in the webapps directory of the java server).
 

As a result, the connection string will look like this: http://localhost:8080/emondrian/xmla. More details about the installation can be found in the eMondrian documentation.

To configure the instance, you need to modify three files in the eMondrian web application:

  • /WEB-INF/datasources.xml                      (Contains a description of the data source and the path to the schema file).
  • /WEB-INF/schema/schema.xml               (Schema file)
  • /WEB-INF/classes/mondrian.properties

Database connection

Initially, the data source settings are in the /WEB-INF/datasources.xml file. It contains the data source connection string and a link to the file where the eMondrian schema is located. The default folder for schema files is /WEB-INF/schema/.

To connect to the database you need to:

  1. Install the JDBC driver of the appropriate Java version, if it is not present in eMondrian.
  2. Specify in the mondrian.properties file the necessary values: connection string, driver and driver path.
  3. Modify the datasources.xml file, basic XML elements to modify <DataSourceInfo> and <Catalogs>

 

The <DataSourceInfo> records the connection string, and <Catalogs> defines the path to the schema.

Scheme writing

A schema is an XML file that defines a multidimensional database. It contains a logical model consisting of cubes, hierarchies and elements, and a mapping of this model to a physical model. The physical model is the data source that is represented through the logical model, usually a set of tables in a relational database. The computable elements are defined in a multidimensional expression language.


The order of the XML elements is important, the main elements of the schema are summarised below in Table 1.

Scheme element Purpose Location

<Schema>

A collection of cubes, virtual cubes, common dimensions and roles
The main element, all others are located inside it, comes right after the XML declaration

<Cube>

A set of measurements centred on a fact table

Located inside <Schema>

<VirtualCube>

A cube defined by combining the measurements of one or more cubes

Located inside <Schema> after the cubes

<Dimension>

Measurement
  • The local dimension is located in <Cube> and will be available for this cube
  • The global dimension is located in <Schema> and can be used in multiple cubes

<Hierarchy>

Hierarchy

Located in <Dimension>, the name of the first hierarchy must match the name of the dimension

<Level>

Hierarchy Level
Located in ; hierarchy levels should be placed sequentially after each other

<Measure>

Measure

Located in <Cube>

<CalculatedMember>

A calculated element whose value is output using a formula

Located in <Cube>

<Table>

A table of facts or measurements

Located inside <Cube> or <Dimension> before other elements

 

Table 1. — Main elements of the circuit

The scheme can be changed in a text editor or with a special scheme editor. After large changes in the schema it is desirable to restart the server, if the changes are uncommon for the work, it is possible not to restart the server, this is also related to the cache operation.


As mentioned earlier: the computable elements are defined in the multidimensional expression language, but Mondrian has its own MDX dialect. More details about schema elements and the MDX dialect can be found in the Mondrian documentation.

Working/connecting in Excel

To connect eMondrian you need:

  1. Select the Data tab in Excel, then Get Data -> From Database -> From Analysis Services.
  2. Insert the address of the Mondrian XMLA service as the server name and click Next.
  3. Select the cube and click Finish.


This sequence is summarised below in Figure 1.

Figure 1 (a, b). — Connection in Excel

The work of eMondrian

The eMondrian server is a relational OLAP (ROLAP) server, which means that it always displays real-time data from the source, works as a proxy to represent data as cubes and performs multidimensional queries. This server executes queries written in a multidimensional expression language, reads data from the source database, and presents the results in a multidimensional format (Figure 2).


Internally, eMondrian uses a cache consisting of key and value maps. The cache is cleared of unlinked values by a rubbish collector, and the full cache is cleared within an hour, as an hour is the default maximum session latency. The cache pool consists of a single instance written using the Singleton pattern.


The cache is used to calculate values and store them for the next queries, in addition to values the cache also stores totals and query elements, this is intended to generate a faster response to a query as eMondrian sends a large number of queries to the database in a short period of time for its calculations.

Figure 2. — Operation of eMondrian

Sequence of actions when working in Excel after connecting to eMondrian:

  1. The user works with a PivotTable panel, for each user action, Excel sends MDX queries using the XMLA protocol.
  2. eMondrian parses the received query, parts of which it writes/compares with the cache.
  3. Based on the schema and cache, eMondrian generates the necessary SQL queries to the database.
  4. Over JDBC, eMondrian sends queries to the repository, then accepts the query results and writes the results to the cache.
  5. Calculates the required values, generates the result for the MDX query and sends it to Excel via XMLA protocol.
  6. Excel displays the accepted result to the user.
 

On the first Excel queries, eMondrian sends the approximate sequence of queries to the data warehouse:

  1. For each measurement used:
    — number of rows in the directory
    — directory values for all used hierarchy levels
  2. For each measure:
    — calculation of all totals for all used levels of measurement hierarchies
    — calculation of all values
 

In the case of filters, queries to directories are duplicated twice (as an assumption perhaps one query for cache comparisons, the other for calculations). To generate a filter, eMondrian sends two types of queries:      

  • directory values
  • queries for selected values from the directory as filters

 

Sometimes such duplication of queries occurs for directories that are used in a query as rows or columns of a summary table.

A query for the number of rows in a directory is rarely sent to the server: either when the dimension has not been used for a long time, or when it has been removed from the cache for some other reason.

eMondrian is designed to use different databases and the database queries it generates adhere to the same paradigm:

To avoid grouping by dimension level, you can use the attribute uniqueMembers=»true» for a given dimension level, if you know in advance that there are unique values in a given dimension/level, but as practice has shown, eMondrian will still group dimensions and calculate a unique number of rows for filters.


Conditions for filtering eMondrian forms conditions to the columns of the directory tables in the query, depending on the given dimension on which the filtering is performed; conditions are formed in different ways, this is affected by whether the dimension for filtering has or does not have a hierarchy. If the dimension is selected in a hierarchy, the query will include the directory columns that represent the levels higher in the hierarchy.

Filter Type of condition

When one value in the filter

column = value

When multiple values in the filter

column in (value1, value2)

When one value is selected in the hierarchy

column above the level = value for the level above and column = value of

When multiple values are selected in the hierarchy

((column above level = value for level above and column = value1) or (column above level = value for level above and column = value2)))

Table 2. — Formation of filtration condition

Logging in eMondrian

eMondrian maintains three types of logging:

  • Main log, logs everything that happens in eMondrian (including all requests)
  • Separate log of incoming MDX queries (not to search the main log)
  • Separate log of SQL queries sent to the database (not to search the main log)

 

All logs can be viewed through a browser by following the link to your deployed eMondrian (in this case http://localhost:8080/emondrian/), and the main page will provide links to navigate to the log pages at the bottom. All links are provided below (left is the view on the main page, right is the log links).

View on the main page Log links

Mondrian.log

http://localhost:8080/emondrian/logs/FILE

Mondrian-mdx.log

http://localhost:8080/emondrian/logs/MDXLOG

Mondrian-sql.log

http://localhost:8080/emondrian/logs/SQLLOG

Progress of work

In SQL Server, storage has been developed:

Facts:

  • Sales (10000000 pcs)
  • Deliveries (10926713 pcs)


Directories

  • Calendar (for 3 years)
  • Shops (100 pcs)
  • Manufacturers (1000 pcs)
  • Nomenclature (10000 pcs)
  • Nomenclature groups (500 pcs)

 

A circuit was created to test the performance of eMondrian:

General Measurements:

  • Calendar (hierarchy G-K-M-D)
  • Shops
  • Manufacturers
  • Nomenclature(hierarchies Nomenclature and Nomenclature Groups-Nomenclature)

 

Cubes:

  • Sales
  • Deliveries
  • Commodity turnover (cube containing a query of sales and deliveries merge representation)
  • Sales and deliveries (virtual cube combining sales and deliveries cubes)


SQL Server is remotely on the server, eMondrian and Excel are directly on the same PC.

Work example

1. Counting the number of values in the measurements.


Having selected the cube in Excel, the first thing that is sent to the database are queries for the number of values in dimensions/levels, a picture from Profiler with a description is shown below (left explanation, right view in Profiler)

 In this case, 9 requests were sent:

  • Number of shops
  • Number of manufacturers
  • Number of nomenclature
  • Number of nomenclature groups
  • Number of nomenclature — nomenclature group pairs
  • Number of years
  • Number of years-quarters
  • Number of years-quarter-months
  • Number of years-quarters-months-months-days

 

 Request for the number of years-quarter-months-   months-days

2. Selecting a value (measure) in the summary table.       

If you select a turnover value in pcs in Excel, which is the responsibility of the Movement quantity calculation                element, calculated from the difference between arrival and sales, an MDX request will be sent to eMondrian:

And eMondrian will send an SQL query to the database to retrieve the values of arrival and sales measures in pcs.

After retrieving the values from the repository, eMondrian will cache them and calculate the turnover values, then send the result to Excel.

3. Adding columns and rows to a summary table

Next, to analyse the turnover values, we will add columns and terms (calendar and shop) to the summary table to get the turnover values in pcs for each shop and for each year with 21 years disclosed by quarter.

Excel will send the following query:

Once received, eMondrian will generate queries to the database to retrieve the values it lacks to calculate the result, a picture from Profiler with the description is shown below (right explanation, left view in Profiler)

In this case, 16 requests were sent:

  • Retrieving year values from the calendar(twice)
  • Number of unique years
  • Sales and receipts in pcs by year
  • Retrieving years-quarter values from calendar
  • Retrieving quarter values from the calendar
  • Retrieving shop values(twice)
  • Number of unique shops
  • Sales and arrivals in pcs by shops
  • Sales and arrivals in pcs by shop and by year
  • Number of unique values of light-cart-month-months
  • Number of unique values light-cartal-months-days
  • Number of unique quarters
  • Sales and arrivals in units for 21 years by quarter
  • Sales and arrivals in units over 21 years by quarter and by shop

In this example, the queries by dimension were duplicated and eMondrian also decided to pre-calculate the number of unique values in the hierarchy below for the calendar (to save them in the cache in advance).


The results of eMondrian’s calculation and Excel display of the turnover values in pcs for each shop and for each year with the 21 year disclosure by quarter are shown in Figure 3.

Figure 3. — Excel calculation result

Testing

Measurement calendar has a hierarchy Year-Quarter-Month-Day, initially it had several hierarchies, but in the presence of several hierarchies, the calculated balance was calculated correctly only for the first hierarchy, so it was decided to leave only one.


It was decided to implement the directory «nomenclature groups» as one of the parts of the hierarchy «nomenclatures». Since nomenclature groups have a connection with facts only through nomenclature. Initially in the schema it was undertaken to merge nomenclatures with nomenclature groups using XML element <Join>, but during connection to Excel an error started to pop up that two tables are used in the dimension, so it was decided to create independently a query to merge two directories using XML elements <View> and <SQL>.


The sales and shipments cubes are simple cubes linked to their respective fact tables, which were designed to validate aggregates for counting measures. The turnover cube and the virtual «sales and deliveries» cube have the same purpose: to calculate the measures (sales, deliveries and turnover) and the computational elements of the settlement balance and settlement account.


In the course of the work, each cube in the circuit was separately tested for:

  • Correctness of the obtained values
  • Computing speed
  • Server load
  • Computer load capacity
  • Efficiency of sent requests

 

The test results, for each criterion for all the above cubes, are summarised in Table 3 below.

Criterion/Name of cube Sales Deliveries Turnover Sales and supplies

Correctness of the obtained values

Relevant
Relevant

Relevant

Incorrect with multiple filtering

Calculation time

Acceptable, but grows from the complexity and number of queries sent to the database

 

Acceptable, but grows from the complexity and number of queries sent to the database
Acceptable, but grows from the complexity and number of queries sent to the database
Acceptable, but coordinately grows due to the formation of incorrect queries
Server load capacity
From 10% to 70%, depending on the number and complexity of the query calculations
From 10% to 70%, depending on the number and complexity of the query calculations
From 10% to 80%, depending on the number and complexity of the query calculations
From 10% to 100% if it generates incorrect queries
Computer load capacity
CPU up to 30% for complex calculations
CPU up to 30% for complex calculations
CPU up to 30% for complex calculations
CPU up to 30% for complex calculations
Effectiveness of the requests sent
Effectiveness of the requests sent
Composes correct queries but suboptimal queries
Composes correct queries but suboptimal queries
Sends incorrect queries when multiple filtering is performed

Table 3 — Comparison of cubes according to given criteria

Computer load is mainly caused by receiving and calculating values, writing them to the cache, and generating the Excel response.

Server load is related to the number and complexity of sent queries, as well as parallel calculations from other accesses to the database.

The main cubes for testing are the «turnover» cube and the virtual «sales and deliveries» cube. The virtual cube did not justify its capabilities: when measures are calculated, eMondrian sends queries in the form of cross-connection of fact tables with directories, but when it is necessary to calculate several measures from different cubes, in such cases it makes a query that should cross-connect all necessary fact tables with directories, but these queries are made incorrectly:

  1. A cross join tries to join facts with facts and does not specify rules for joining fact tables with each other.
  2. Because of the presence of cache, cross joins join directories to facts that have been interacted with in past queries.

 

The turnover cube was implemented using xml schema elements <View> and <SQL> with the help of which a query was created to merge the sales and delivery fact tables, and the turnover values were calculated elements, also its analogue was created, where a storage view was created, in which the turnover values were calculated and with which the work was performed as with the fact table, but during testing this analogue gave the same results, but the calculation time was slightly longer.


The calculated balance was calculated using the cumulative total algorithm, three different variations of the algorithm were tested and their results are presented in Table 4.


The cumulative total algorithm in MDX can be described as follows: it returns the sum of the value of an expression (in this case a measure) calculated over a set (in this case a date range from the beginning to the current value).

Calculation approach Result

NULL - as the beginning of time for the date range

 

Error because there is no NULL in this MDX dialect

PeriodsToDate - calculates the date range, from the beginning to the current value

 

Worked correctly, but stopped on complex queries with multiple filtering

Openinig Period - as the start for the date range
It works correctly

By default, a measure is of type Numeric, but you can set one of three types for each measure: String, Integer, Numeric; and you can also set the display format. The «,» is used as a thousandths separator, examples of built-in formats, for the value 12342345.25:

  • formatString=»#,###» (values will be displayed as 12,342,345)
  • formatString=»#,###.00″ (values will be displayed as 12,342,345.00)
  • formatString=»#,###.##» (values will be displayed as 12,342,345.25)

 

It is possible to use your own formats created through plug-ins, more details about this can be found in the corresponding section of the documentation.


The same applies to languages: values in Russian stored in the database are displayed correctly in Excel, but when the scheme contains names of hierarchies, measurements, measures, cubes, whose names use Cyrillic characters, they are displayed incorrectly. The documentation on this topic says that you can create and connect your own localisation driver.

Conclusion

In this paper the eMondrian product has been analysed, the main part of the functionality has been tested, as a result the advantages and disadvantages of this software product can be highlighted.

Benefits:

  • Allows storage to be treated as a cube
  • Not difficult to install
  • Simple functionality to understand and use
 

Disadvantages:

  • Not all drivers are available initially and you have to look for a suitable driver to connect to the database
  • Not all functionality works correctly and some elements do not work
  • Composes unoptimised queries to the database
  • Only old Mondrian documentation is available

The developers of the original Mondrian advise to transfer all computational logic to the base, use aggregated tables, and the cube logic should correspond to the asterisk scheme for fast and correct operation.


Opinion about the product: an interesting tool for treating the base as a cube, you can use it if you need to view/analyse data for simple model structures and if you need simple MDX queries for analysis.

Personal advice, tested in my work:

  • In the snowflake scheme it is better to combine directories in advance either with the help of xml-element <SQL>, or with the help of views in the database (because xml-element <JOIN> did not work), and convert to the star scheme
  • Virtual cube is disappointing, for example for the same turnover, it is better to create a table, or view, or independently write a query to merge using xml-element <SQL>, and use one of these options in a regular cube on the scheme
  • Do not use «all but a few» when selecting filter values, as it will generate a giant conditional query that will go through every filter element and all hierarchies except the unselected ones
  • Use uniqueMembers=»true» for measurement levels for those directories where we know that the values in it are unique, so that we don’t have to group by unique elements unnecessarily
  • Do not use complex queries with multiple filters when calculating computational elements, because in such cases there will be long calculations: either on the server with the database, or in eMondrian itself.

For more information, please get in touch here:

Contact Form main