CDC - Distributed Cache
CDC allows high-performance and scalable + distributed memory clustering cache based on Hazelcast for both CDA and Mondrian. CDC main features are:
- CDA distributed cache support;
- selectively clear the cache of specific schemas / cubes / dimensions of Mondrian cubes;
- ability to switch between default and CDC cache for CDA and Mondrian;
- gracefully handles adding / removing new cache nodes;
- mondrian distributed cache support;
- provides an API to clean the cache from the outside (e.g.: after running ETL);
- provides a view over cluster status;
- supports several memory configuration options.
Credits: Webdetails Team, Lead - Pedro Alves
Available in: Marketplace
Performance is a key point, not only in Business Intelligence software but in user interfaces in general. The goal of CDC is to provide a Pentaho implementation based on Mondrian/CDA and a distributed caching layer which prevents the database to be hit, as much as possible.
One of the functionalities added is the ability to clear from cache only specific Mondrian cubes. Even though Mondrian has a very complete API to control the member's cache, Pentaho only provides a functionality to clean all the cache, that ends up being very limited in production environments.
The cache ability to survive server restarts is a design bonus, and it's supported by CDA out-of-the-box. This is now supported in Mondrian, since CDC 13.02.07 update.
- Mondrian 3.4 or later (in Pentaho 4.5);
- CDA 12.05.15.
- Install CDC using either the cdc-installer or ctools-installer. If you do a manual install, be sure to copy the contents of solution/system/cdc/pentaho/lib to server's WEB-INF/lib;
- Download the standalone cache node;
- Execute the standalone cache node in the same machine as Pentaho or in the same internal network (launch-hazelcast.sh), optionally editing the file and changing the memory settings (defaults to 1Gb, increase at will). You can launch as many nodes as you want;
- Launch Pentaho and click on the CDC button;
- Enable cache usage on CDA and Mondrian;
- Restart Pentaho Server;
- Check if the settings screen are satisfactory. Usually the defaults work fine;
- Open analyzer, jpivot or a CDE Dashboard that uses CDA and you should see the cache being populated.
However, we do support a simple cluster information Dashboard that gives an overview of the state of the nodes.
With CDC you can selectively control the contents of the cache, allowing you to clean either specific Dashboards or cubes.
The business case around this is simple: We need to clear the cache after new data is available (usually as a result of an ETL job).
CDC allows you not only to do that but also to do it from within the ETL process.
CDC offers a solution navigator, so we can select a Dashboard. When we select the dashboard, all the CDA queries used by that Dashboard will be cleaned.
Clicking on the URL button, we'll get a url that we can call externally (from an ETL job). Be aware that you need to add the user credentials when calling from the outside (e.g.: &userid=joe&password=password).
This one is very similar to the previous one, but navigates through the available cubes.
You can then either clean the entire schema, a specific cube or even the individual cell cache for a specific dimension (use this latest one with care).