Skip navigation

Introduction

 

 

Pentaho Report Designer can be internationalized and localized. To accomplish that, translation files used by the report need to have a specific format and their names need to obey certain rules.

 

 

How to configure translation files?

 

 

By default, each report that you create contains an empty fall-back/default translation file: translations.properties, editable. You can access it by clicking on File Tab, then Resources option and finally selecting translations.properties and selecting Edit option.

 

The other translation files can be created or imported by accessing File Tab/Resources option. These files need to contain key-value pairs, separated by an equal symbol (=):

 

  • dashboardTitle = Internationalization dashboard

 

Additionally, translation files need to have the following format name:

 

 

    • translations_en.properties
    • translations_ja.properties

 

 

    • translations_en_US.properties

 

Note: Special characters such as Japanese characters or "Ç" for instance or even accents need to be Ascii encoded (Native2ascii Online), otherwise the characters are not displayed correctly in the report.

 

Resource-labels are the only elements that perform translations in PRD. You need to set its value property with a key, defined in the translations files. For instance:

 

  • value property = dashboardTitle

 

value property is under resource-label Atributes Panel.

 

How is the hierarchical structure in the translations.properties ?

 

 

Translations or messages can be created in multiple translation files. Whenever that happens, keep in mind the following hierarchy:

 

     + translations.properties

     ++ translations_en.properties

     ++++ translations_en_US.properties

     ++++ translations_en_UK.properties

 

In conclusion, translations_<language>_<COUNTRY>.properties overrides  translations_<language>.properties file, which overrides translations.properties file.

 

 

Pentaho Report Designer Example

 

 

Imagine that you want to create a PRD for English and Japanese languages. You need to configure/create

translations.properties, translations_en.properties and translations_ja.properties with the keys and values that you want to translate and add resource-labels with the keys that you just defined as values properties.  Then, to test the translation functionality, you need to go to File Tab, click on Configuration option and click and set ~.environment.designtime.Locale with the language code that you want to see: ja or en to see a Japanese or English translation, respectively.

 

In case you want to publish this PRD in the repository, PUC, and test the translation functionality,  you need to click on View Tab, choose Languages option and then select English or Japanese language. Afterwards, open the PRD report or reload it if it was already opened and see that the report shows the translation for the language chosen in PUC.

 

Note: In the event of having to translate two reports that share some translation, you only want to maintain one file for each language instead of several. So, you should import the translation file from your file system in File Tab/Resources option.  You can edit the imported translation file by clicking on edit option from File Tab/Resources option, or by editing the translation file in your file system. In these situations there are some considerations your should take into consideration. Whenever you edit the translation file inside the File Tab/Resources option you will need to export it. Otherwise, the translation file in your file system won't have the changes that you have performed. The same way, if you change the translation file in your file system, you will need to remove the file that you have imported to File Tab/Resources option and imported it again.

This example was tested in Pentaho 8.0.


Introduction

 

 

Dashboards can be internationalized and localized. Several files need to be created in order to perform the dashboard translation, all with .properties extension and their location needs to be the same as the dashboard.

 

 

How to configure?

 

 

As mentioned before, some files need to be created and configured properly:

 

    • en_US=English
    • ja=\u65E5\u672C\u8A9E

     en_US and ja correspond to the language and country codes and English and \u65E5\u672C\u8A9E are the languages names.

 

          Note: Special characters such as Japanese characters or "Ç" for instance or even accents need to be Ascii encoded (Native2ascii Online), otherwise the characters are not displayed correctly in the dashboard.

 

  • The translation file name should have this format
    • messages_<language>.properties - language corresponds to the language code, specified in the  messages_supported_languages.properties. File name examples:

 

      • messages_en.properties
      • messages_ja.properties

    

    • messages_<language>_<COUNTRY>.properties - COUNTRY corresponds to the country code, uppercase code ISO ALPHA-2. Whenever, you add this type of file, do not forget to add <language>_<COUNTRY> to messages_supported_languages.properties file. Otherwise, the  configured translation file will never be used.

 

      • messages_en_US.properties

 

   The latter files should contain their translation information organised as key value pairs, separated by an equal symbol (=):

    • dashboardTitle = Internationalization dashboard

    

  • messages.properties - is the translation fallback file, the default translation, in case of something not being configured well in translation functionality. The file should have the same structure as described for messages_<language>_<COUNTRY>.properties and messages_<language>.properties.

 

 

How is messages.properties structure hierarchical?

 

 

Translations or messages can be created in multiple translation files. Whenever that happens, keep in mind the following hierarchy:

 

     + messages.properties

     ++ messages_en.properties

     ++++ messages_en_US.properties

     ++++ messages_en_UK.properties

 

In conclusion, messages_<language>_<COUNTRY>.properties overrides messages_<language>.properties file, which overrides messages.properties file.

 

 

CDE dashboard Example

 

 

Imagine that you want to build a CDE dashboard with only a title for English and Japanese languages. You construct the CDE layout and then you add the Text Component element. In the Expression of the latter, place the following code that calls a CDF API- prop function from i18nSupport, that uses the translation file according to the PUC language:

     function f(){

         return this.dashboard.i18nSupport.prop('dashboardTitle');

     }

 

Additionally, message.properties and message_en.properties should have:

 

  • dashboardTitle = Internationalization dashboard

 

and  message_ja.properties should have:

 

  • dashboardTitle = \u56fd\u969b\u5316\u30c0\u30c3\u30b7\u30e5\u30dc\u30fc\u30c9

 

When you want to test the translation configuration, you need to go to View Tab, choose Languages option and then select English or Japanese language. After that, you need to reload the dashboard, if it was already opened, or open the dashboard and you will see that the translation showed respects the PUC language chosen.

 

Note: This example was tested in Pentaho 8.0.

 

 

Using a language that it is not available in PUC

 

 

To add new languages, you need to instal them from the marketplace. Please follow these instructions: pentahoLanguagePacks/README.md at master · webdetails/pentahoLanguagePacks · GitHub

After your language pack is installed, please add the language code to messages_supported_languages.properties file and create the translations file using the format name previously specified.

Hi everyone,

 

with 100 participants from Austria, Germany and Switzerland, Pentaho User Meeting has been a great success. I´m happy to share with you the live blog covering all presentations and speakers:

 

  • Migrating from Business Objects to Pentaho (CERN, Gabriele Thiede)
  • Pentaho 8 (Pedro Alves)
  • Best Practices for Data Integration Architectures (Matt Casters)
  • Operating Pentaho at Scale (Jens Bleuel)
  • Running Pentaho in Kubernetes (Nis Christian Carstensen, Netfonds)
  • Data handling with Pentaho (Marco Menzel, Hansainvest)
  • IoT and Predictive Analytics (Jonathan Doering, Hitachi Vantara)
  • Adding Pentaho Dashboards to Angular 5 applications (Francesco Corti, Alfresco)
  • Predictive Analytics with PDI and R (Dr. David James, it-novum)
  • Integrating and analyzing SAP data with SAP/Pentaho Connector (Stefan Müller, it-novum)
  • Analyzing IT service management data with openLighthouse (Dirk Rönsch, it-novum)

 

See the full agenda at

Pentaho User Meeting 2018 in Frankfurt - it-novum

 

Thanks to all who have contributed to this event!

Introducing Plug-in Machine Intelligence

by Mark Hall and Ken Wood

 

 

HVLabsLogo.pngToday, the need to swiftly operationalize machine learning based solutions to meet the challenges of businesses is more pressing than ever. The ability to create, deploy and scale a company’s business logic to quickly take advantage of opportunities or react to changes is exceeding the capabilities of people and legacy thinking. Better and more machine learning is vital going forward but, more importantly, easier machine learning is essential. Leveraging an organization’s existing staff levels, business domain knowledge, and skillsets by lowering the entry into the realm of data science can dramatically expand business opportunities.

 

Everytime I am in PMI, I am seeing more and more of its value!!! Great stuff!!!”

Carl Speshock - Hitachi Vantara Product Manager, Hitachi Vantara Analytics Group

 

The world of Machine Learning is empowering an ever-increasing breadth of applications and services from IoT to Healthcare to Manufacturing to Energy to Telecom, and everything in between. Yet the skills gap between business domain knowledge and the analytic tools used to solve these challenges needs to be bridged. People are doing their part through education, training and experimentation in order to become data scientists, but that’s only half of the equation. Making the analytic tools easier to use can help bridge this gap quickly. Throw in the ability to access and blend different data sources, cleanse, format and engineer features into these datasets, and you have a unique and powerful tool. In fact, the combination of PDI and PMI is an evolution of the PDI tool suite for deeper analytics and data integration capabilities

 

"While exploring solutions with a major healthcare provider that was using predictive
analytics to reduce the costs and negative patient care incurred from
complications
from surgery
, we were given internal access to PMI. Working with python
and using
the Scikit-Learn library required
2 weeks of coding and prototyping to perform
just the
machine
learning model selection and training. With PDI and PMI, I was able to
prep
the data, engineer in the features and train the models
in about 3 hours. And, I could
include other machine learning engines from R and Weka and evaluate the results. The
combination of PDI and PMI makes machine learning solutions easier to use and maintain."

Dave Huh - Data Scientist - Hitachi Vantara Analytics Services

 

Hitachi Vantara Labs is excited to introduce a new PDI capability, Plug-in Machine Intelligence (PMI) to the PDI Marketplace. PMI is a series of steps for Pentaho Data Integration (PDI) that provides direct access to various supervised machine learning algorithms as full PDI steps that can be designed directly into your PDI data flow transformations. Users can download the PMI plugin from the Hitachi Vantara Marketplace or directly from the Marketplace feature in PDI (automatic download and install). Installation Guides for your platform, the Developer's Document, and the sample transformation, and datasets are available here. The motivation for PMI is:

 

  1. To make machine learning easier to use by combining it with our data integration tool as a suite of easy to consume steps, and ensuring these steps guide the developer through its usage. These supervised machine learning steps work “out-of-the-box” by applying a number of “under-the-cover” pre-processing operations and algorithm specific "last-mile data prep" to the incoming dataset. Default settings work well for many applications, and advanced settings are still available for the power user and data scientist.
  2. To combine machine learning and data integration together in one tool/platform. This powerful coupling between machine learning and data integration allows the PMI steps to receive row data as seamlessly as any other step in PDI. AND! No more jumping between multiple tools with inconsistent data passing methods or, complex and tedious performance evaluation manipulation.
  3. To be extensible. PMI provides access to 12 supervised Classifiers and Regressors “out-of-the-box”. The majority of these 12 algorithms are available in each of the four underlying execution engines that PMI currently implements: WEKA, python scikit-learn, R MLR and Spark MLlib. New algorithms and execution engines can be easily added to the PMI framework with its dynamic PDI step generation feature.

 

PMI also incorporates revamped versions of the Weka steps that have been originally part of the Pentaho Data Science pack. Essentially, this could be looked at as version 2 of the Data Science Pack. These include:PMIVantaraLogo2.png

 

  • PMI Forecasting for deploying time-series models learned in WEKA’s time series forecasting environment.
  • PMI Scoring for deploying trained supervised and unsupervised ML models. This includes new features to support evaluation/monitoring of existing supervised models on fresh data (when class labels are available).
  • PMI Flow Executor for executing arbitrary WEKA Knowledge Flow processes. This revamped step supports WEKA’s new Knowledge Flow execution engine and UI.

 

PMI tightly integrates, into the PDI “Data Mining” category, four popular machine learning “engines” via their machine learning libraries. These four engines are, Weka, Python, R and Spark. This first phase of PMI incorporates the supervised machine learning algorithms from these four engines from their associates machine learning libraries - Weka, Scikit-Learn, MLR and MLlib, respectively. Not all of the engines support all of the same algorithms evenly. Essentially, there are 12 new PMI algorithms added to the Data Mining category that executes across the four different engines;

 

PMIList.png

  1. Decision Tree Classifier – Weka, Python, Spark & R
  2. Decision Tree Regressor – Weka, Python, Spark & R
  3. Gradient Boosted Trees – Weka, Python, Spark & R
  4. Linear Regression – Weka, Python, Spark & R
  5. Logistic Regression – Weka, Python, Spark & R
  6. Naive Bayes – Weka, Python, Spark & R
  7. Naive Bayes Multinomial – Weka, Python & Spark
  8. Random Forest Classifier – Weka, Python, Spark & R
  9. Random Forest Regressor – Weka, Python & Spark
  10. 1Support Vector Classifier – Weka, Python, Spark & R
  11. Support Vector Regressor – Weka, Python, & R
  12. Naive Bayes Incremental – Weka

 

As such, eventually the existing “Weka Scoring” step will be deprecated and replaced with the new “PMI Scoring” step. This step can consume (and evaluate for model management monitoring processes) any model produced by PMI, regardless of which underlying engine is employed.

 

I know what you’re thinking, “why implement machine learning across four engines?”. Good question. Believe it or not, data scientists are picky and set in their ways, and… not all engines and algorithms perform (think accuracy and speed) the same or yield the same accuracies for any given dataset. Many analysts, data scientists, data engineers and others that look to these tools to solve their challenges, tend to use their favorite tool/engine. With PMI, you can compare up to four different engines and up to 12 different algorithms against each other to determine the best fit for your requirement.

 

 

What Happens When Data Patterns Change?

 

An important benefit to PMI is the evaluation metrics used to measure accuracy is uniform and unified. Since all steps are now built into the same PMI framework - unified, the resulting metrics are all calculated uniformly and can be used MLMMDiagram3.pngto easily compare performance even across the different engine's algorithms. This unique characteristic has resulted in a whole new use case in the form of model management. Concepts around model management with the PMI framework has enabled the ability to Auto-Retrain models, Auto-Re-evaluate, Dynamic-Deploy models and so on. A concept that we have recently proven with demonstration is the Champion / Challenger model management strategy. This Champion / Challenger strategy easily allows currently active model(s) to be re-evaluated and compared with other candidate models' performance and "hot-swap' deploy the new Champion model. A more detailed discussion on Machine Learning Model Management can be found with this accompanying blog called "4-Steps to Machine Learning Model Management".

 

 

BA-Champ.png

HotSwap-Models.png

 

The Fail Fast Approach

 

Thomas Edison is quoted as saying "I have not failed. I've just found 10,000 ways that won't work.". And back in the days leading up to the invention of the light bulb, this 10,000 ways that won't work took years to iterate through. What if you could eliminate candidates in days or hours? PMI allows a “Fail Fast” approach to achieving results. With the ease of using PMI on datasets, many combinations of algorithms and configurations can be tried and testing very fast, weeding out the approaches that won’t work and narrowing down to promising candidates quickly. The days of churning on code until it finally works, then finding out the results aren’t good enough and a new approach is needed, are coming to an end.

 

Over the next few month, Hitachi Vantara Labs will continue to provide blogs and videos to demonstrate how to use PMI, how to extend the PMI framework and how to add additional algorithms to PMI.

 

  PMIMOREktr.png

 

It is important to point out that this initiative is not formally supported by Hitachi Vantara, and there are no current plans on the Enterprise Edition roadmap to support PMI at this time.  It is recommended that this experimental feature be used for testing only and not used in production environments. PMI is supported by Hitachi Vantara Labs and the community. Hitachi Vantara Labs was created to formally test out new ideas, explore emerging technologies and as much as possible, share our prototypes with the community and users through the Hitachi Vantara Marketplace. We like to refer to this as "providing early access to advanced capabilities". Our hope is that the community and users of these advanced capabilities will help us improve and recommend additional use cases. Hitachi Vantara has forward thinking customers and users, so we hope you will download, install and test this plugin. We would appreciate any and all of your comments, ideas and opinions.

Eliminating Machine Learning Model Management Complexity

By Mark Hall and Ken Wood

 

MLMMDiagram3.png

 

HVLabsLogo.pngLast year in 4-Steps to Machine Learning with Pentaho, we looked at how the Pentaho Data Integration (PDI) product provides the ideal platform for operationalizing machine learning pipelines – i.e. processes that, typically, ingest raw source data and take it all the way through a series of transformations that culminate in actionable predictions from predictive machine learning models. The enterprise-grade features in PDI provide a robust and maintainable way to encode tedious data preparation and feature engineering tasks that data scientists often write (and re-write) code for, accelerating the process of deploying machine learning processes and models.

 

 

“According to our research, two-thirds of organizations do not have an automated
process to update their predictive analytics models seamlessly. As a result, less than
one-quarter of machine learning models are updated daily, approximately one third
are updated weekly, and just over half are updated monthly. Out of
date models can
create a significant risk to organizations.”

- David Menninger, SVP  & Research Director, Ventana Research

 

 

It is well known that, once operationalized, machine learning models need to be updated periodically in order to take into account changes in the underlying distribution of the data for which they are being used to predict. That is, model predictions can become less accurate over time as the nature of the data changes. The frequency that models get updated is application dependent, and itself can be dynamic. This necessitates an ability to automatically monitor the performance of models and, if necessary, swap the current best model for a better performing alternative one. There should be facilities for the application of business rules that can trigger re-building of all models or manual intervention if performance drops dramatically across the board. These sorts of activities fall under the umbrella of what is referred to as model management. In the original diagram for the 4-Steps to Machine Learning with Pentaho blog, the last step was entitled “Update Models.” We could expand the original "Update Models" step and detail the underlying steps that are necessary to automatically manage the models. Then relabel this step to "Machine Learning Model Management" (MLMM). The MLMM step includes the 4-Steps to Machine Learning Model Management, “Monitor, Evaluate, Compare, and Rebuild all Models” in order to cover what we are describing here. This concept now looks like this diagram.

 

MLMMDiagramOldNew3.png

 

 

The 4-Steps to Machine Learning Model Management, as highlighted, include Monitor, Evaluate, Compare and Rebuild. Each of these steps implements a phase of a concept called a "Champion / Challenger" strategy. In a Champion / Challenger strategy applied to machine learning, the idea is to compare two or more models against each other in order to promote the one model that performs the best. There can be only one Champion model, in our case the model that is currently deployed, and there can be one or more Challengers, in our case other models that are trained differently, use different algorithms and so forth, but all running against the same dataset. The implementation of the Champion / Challenger strategy for MLMM goes like this,

 

  1. Monitor - constant monitoring of all of the models is needed to determine the performance accuracy of the models in the Champion / Challenger strategy. Detecting a degraded model's performance should be viewed as a positive result to your business strategy in that the characteristic of the underlying data has changed. This can be viewed as the behaviors you are striving for are being achieved, resulting in different external behaviors to overcome your current model strategy. In the case of our retail fraud prediction scenario, the degradation of our current Champion model's performance is due to a change in the nature of the initial data. The predictions worked and is preventing further fraudulent transactions, therefore new fraud techniques are being leveraged which the current Champion model wasn't trained to predict.
  2. Evaluate - an evaluation of the current Champion model needs to be performed to provide evaluation metrics of the model's current accuracy. This evaluation results in performance metrics on the current 4-stepsMLMM2.pngsituation and can provide both a detailed set visual and programmatic data to use to determine what is happening. Based on business rules, if the accuracy level has dropped to a determined threshold level, then this event can trigger notifications of the degraded performance or initiate automated mechanisms. In our retail fraud prediction scenario, since the characteristic of the data has changed, the Champion model's accuracy has degraded. Evaluation metrics from the evaluation can be used to determine that model retraining, tuning and/or a new algorithm is needed. Simultaneously, all models in the Champion / Challenger strategy could be evaluated against the data to ensure an even evaluation on the same data.
  3. Compare - by comparing the performance accuracy of all the models against each other from the evaluation step, the Champion and the Challenger models can be compared against each other to determine which model performs best, at this time. Since the most likely case is that the current Champion and all the Challenger models were built and trained against the initial state of the data, these models will need to be rebuilt.
  4. Rebuild - by rebuilding (retraining) all the models against the current state of the data, the best performing model on the current state of the data, is promoted to Champion. The new Champion can be hot-swapped and deployed or redeployed into the environment by using a PDI transformation to orchestrate this action.

 

This 4-Steps to Machine Learning Model Management is a continuous process, usually scheduled to run on a periodic basis. This blogs describes how to implement a Champion / Challenger strategy using PDI as both the machine learning and the model management orchestration.

 

The new functionality that provides a new set of supervised machine learning capabilities and the model management enablers to PDI is called Plug-in Machine Intelligence (PMI). PMI provides a suite of steps to PDI that gives direct access to various supervised machine learning algorithms as full PDI steps that can be designed directly into your PDI data flow transformations with no coding. Users can download the PMI plugin from the Hitachi Vantara Marketplace or directly from the Marketplace feature in PDI (automatic download and install). The motivation for PMI is:

 

  • To make machine learning easier to use by combining it with our data integration tool as a suite of easy toconsume steps that do not require writing code, and ensuring these steps guide the developer through its usage. These supervised machine learning steps work “out-of-the-box” by applying a number of “under-the-cover” pre-processing operations and algorithm specific "last-mile data prep" to the incoming dataset. Default settings work well for many applications, and advanced settings are still available for the power user and data scientist. PMIVantaraLogo2.png
  • To combine machine learning and data integration together in one tool/platform. This powerful coupling between machine learning and data integration allows the PMI steps to receive row data as seamlessly as any other step in PDI. No more jumping between multiple tools with inconsistent data passing methods or, complex and tedious performance evaluation manipulation.
  • To be extensible. PMI provides access to 12 supervised Classifiers and Regressors “out-of-the-box”. The majority of these 12 algorithms are available in each of the four underlying execution engines that PMI currently supports: WEKA, python scikit-learn, R MLR and Spark MLlib. New algorithms and execution engines can be easily added to the PMI framework with its dynamic step generation feature.

 

A more detailed introduction of the Plug-in Machine Intelligence plug-in can be found in this accompanying blog.

 

PMIList.pngPMI also provides a unified evaluation framework. That is, the ability to output a comprehensive set of performance evaluation metrics that can be used to facilitate model management. We call this unified because data shuffling, splitting and the computation of evaluation metrics is performed in the same way regardless of which of the underlying execution engines is used. Again, no coding is required which, in turn, translates into significant savings in time and effort for the practitioner. Evaluation metrics computed by PMI include (for supervised learning): percent correct, root mean squared error (RMSE) and mean absolute error (MAE) of the class probability estimates in the case of classification problems, F-measure, and area under the ROC (AUC) and precision-recall curves (AUPRC). Such metrics provide the input to model management mechanisms that can decide whether a given “challenger” model (maintained in parallel to the current “champion”) should be deployed, or whether champion and all challengers should be re-built on current historical data, or whether something fundamental has been altered in the system and manual intervention is needed to determine data processing problems or to investigate new models/parameter settings. It is this unified evaluation framework that enables PDI to do model management.

 

 

Implementing MLMM in PDI

The PDI transformations below are also included in the PMI plugin download complete with the sample datasets.

 

The following figure shows a PDI transformation for (re)building models and evaluating their performance on the retail fraud application introduced in the 4-Steps to Machine Learning with Pentaho blog. It also shows some of the evaluation metrics produced under a 2/3rd training - 1/3rd test split of the data. These stats can be easily visualized within PDI via DET (Data Exploration Tool), or the transformation can be used as a data service for driving reports and dashboards in the Business Analytics (BA) server.

 

PDIRebuildModels.png

DETChartResults.png

 

The following figure shows a PDI transformation that implements champion/challenger monitoring of model performance. In this example, an evaluation metric of interest (area under the ROC curve) is computed for three static models: the current champion, and two challengers. Models are arranged on the file system such that the current champion always resides in one directory and challenger models in a separate directory. If the best challenger achieves a higher AUC score than the current champion, then it is copied to the champion directory. In this way, hot-swapping of models can be made on-the-fly in the environment.

 

ChampChallengerDiagram.png

 

PMI provides the ability to build processes for model management very easily. This, along with its no-coding access to heterogeneous algorithms, automation of “last mile” algorithm-specific data transformations, and when combined with enterprise-grade features in PDI – such as data blending, governance, lineage and versioning – results in a robust platform for addressing the needs of citizen data scientists and modern MI deployments.

 

Installation documentation for your specific platform and a developer's guide, as well as, the sample transformations and datasets used in this blog can be found at here. The sample transformations and sample datasets are for demonstration and educational purposes.

 

It is important to point out that this initiative is not formally supported by Hitachi Vantara, and there are no current plans on the Enterprise Edition roadmap to support PMI at this time.  It is recommended that this experimental feature be used for testing only and not used in production environments. PMI is supported by Hitachi Vantara Labs and the community. Hitachi Vantara Labs was created to formally test out new ideas, explore emerging technologies and as much as possible, share our prototypes with the community and users through the Hitachi Vantara Marketplace. We like to refer to this as "providing early access to advanced capabilities". Our hope is that the community and users of these advanced capabilities will help us improve and recommend additional use cases. Hitachi Vantara has forward thinking customers and users, so we hope you will download, install and test this plugin. We would appreciate any and all comments, ideas and opinions.

Introduction

 

 

PRD is a powerful report tool. However, from my personal experience, some of the most cool stuff is hidden under all the available settings and the combinations that PRD allows. Due to this, only through your experience and others, you will start to understand what and how you can do certain implementations.

I would like to share with you the knowledge gathered during a project using PRD tool.

 

This blog post is divided in sections, where features from particular PRD components are described.

The sections are below, and if you click in one of them you will navigate to the correspondent section:

Tricks only to emphasise the details that certain components features have.

Bear in mind that all the examples in this post use dummy data.

 

General Tricks

 

· How do you format a column retrieved by the query?

  • You need to have that field selected and then click on attributes, format

 

     general1.png

 

· How do you define a name for your sub-report?

  • Select your sub-report

 

general2.png

 

  • Go the attributes and then name

 

general3.png

 

· There are a lot of functions that can be used in the reports:

  • Page - adds page number, you need to put this function in Page Footer
  • Row Banding – defines the color for the even and odd rows in a table

 

Query Tricks

 

· How many queries can be selected in your report?

  • You can only have one query selected per report/sub-report. This have an impact on how you structure your report.

 

· What happens when you change the query name?

  • Whenever you change the query name, the query that was previously selected, will become deselected, forcing the report/sub-report to produce empty reports because it doesn’t know where to fetch data from. The selected query layout should be something like:

 

query1.png

 

 

Parameter tricks

 

· How do you create a date parameter?

  • You need to keep in mind the formatting that you use. Taking as example this date: 2017-01-01 00:00, you should
    • Date Format - yyyy-MM-dd HH:mm. So, both year, day and minutes are with no capital letters. Month needs to be with capital letters, otherwise it will be seen as minutes instead of a month.
    • Default Value - choose a default value to prevent you to type the same value every time you want to generate a report.
    • Value Type - in this example it is a timestamp, but you can choose a date type as well

 

parameter1.png

 

· What are the formats used to display the parameter value?

  • On Message Field component - $(nameOfTheParameter)
  • On Query script - ${nameOfTheParameter)
  • On Functions[nameOfTheParameter]

 

· How do you get the current date and format it in a Message Field Component?

  • Report.date – current date
  • $(Report.date, date, yyyy-MM-dd)

 

· How do you do a cascading filtering in your report?

See the following example:

 

parameter2.png

 

  • Customer prompt will influence the arrayTypeParam prompt and the later will influence the arraySerialParam prompt
    • arrayTypeParam prompt query needs to be filtered by customer value and arraySerialParam prompt query needs to be filtered by arrayTypeParam value
    • The parameters associated with each prompt needs to be in the following order

 

parameter3.png

 

Having the parameters with the order that you want to perform the cascading enables you to activate the cascading filtering that PRD gives you OOB.

 

· How are the parameters passed through the reports?

 

  • To have access to the parameters values between the Master Report and the sub-reports, the Parameters option under Data in the sub-reports, needs to have the following options selected

 

parameter4.pngparameter5.png

 

 

Table tricks

 

· In which sections, should a table be placed?

  • The columns/data retrieved by the query on the Details Body
  • The labels, header of those columns, in the Details Header
    • If you want to repeat the header in each page, you need to set, in the Details Header, Repeat-header property to true.

The Details Body run for each row of your query.

 

· Table Example

Imagine that you have a query with the following result set:

 

Port

Host Storage

Avg Read

Max Read

CL1-A, CL2-A

TOTWSCPPRDHDS03

5

20

CL1-A, CL2-A

TOTWSCTDEVRPT04

12

16

CL1-B, CL2-B

TOTVH10001

6

10

CL1-B, CL2-B

TOTVH10002

12

15

 

and you want to group the Host storages information by Port, see the below image:

 

table1.png

 

            We created a sub-report, where we defined

      • in the Details Header, the port field
      • in the Details Body, the remaining fields: Host Storage Domain,LDEV
      • in the Page Header, the table headers

 

The Page Header section is repeated in every page. As this table is big, occupies more than one page, the table header will be repeated at the top of every page.

 

To iterate the ports, we added in the Group section, the Port column. This way, a new Details Header will be shown every time a different port appears.

 

 

Generate Reports in excel tricks

 

· How do you set the sheet name?

  • To add a sheet name in an excel file you need to select the Report Header of your Master or sub-report and then set the sheetname property

 

excel1.png

 

· How do you create different sheet names?

  • To create different sheets in your PRD report, you need to set, in each sub-report, in the Report Header, the pagebreak-before to true

 

excel2.png

 

· How do you set dynamically the sheet name?

    • Imagine that you have information for different sites in one single query. But you want to display in each sheet the information for each site
      • You need to define a sub-report that
        • has a query that retrieves all the sites, query layout - siteQuery

 

site

EMEA

Korea

India

          • has in Group section the column site selected – Group. It will repeat the sections that it contains (Details section) for each iteration that it makes, in this case, for each site.

 

excel3.png

 

          • in the Details section, you need to add another sub-report (child) that will contain the information for each site
            • As reports inherit the queries from their report father, the site iterated is passed to the report child
            • The query from the report child needs to be filtered by site
            • The Report Header needs to have
              • the pagebreak-before to true
              • the sheetname property needs contain something like

 

excel4.png

 

 

Charts tricks

 

· Which java chart library PRD charts use?

 

· In which section, should a chart be added?

  • In contrary from what happens with tables, you need to add a chart, not in the Details section, but in the Report Header. Otherwise, you will have as many charts as the number of rows retrieved by your /report/subreport query.

· How to configure a time series chart with two plots (Area and Line charts)?

In this case, the Used and Provisioned are in one( primary) plot and the warning in another one (secondary)

  • Choose XY Area Line Chart

 

chart1.png

 

  • Choose Time Series Collector both in Primary and Secondary DataSources
  • Your result set needs to be something like the table below, otherwise, it will only show the series with greater value, in this case Provisioned.

 

Date_time Column (Date Column)

Label Column

Value Column

Threshold Label

Threshold Value

2017/01/01

Used

5

warning

45

2017/01/01

Provisioned

12

warning

45

2017/01/02

Used

6

warning

45

2017/01/02

Provisioned

12

warning

45

2017/01/03

Used

7

warning

45

2017/01/03

Provisioned

12

warning

45

 

chart2.pngchart3.png

 

  • How to format x tick axis labels? The date_time column is in day granularity. But our chart is only showing months. So, if you look at the images above, we need to set
    • time-period-type - Month
    • x-tick-period - Month
    • x-tick-fmt-str (corresponds to the tick label format) – MMM YYYY
    • x-auto-range - True, otherwise we need to specify the x-min and x-max properties
    • x-tick-interval (corresponds to the interval that you want between ticks) - 1 (we wanted to see one tick per month) It is essential to set this property, otherwise it will not apply the format that you specified.
    • x-vtick-label True (ticks will be displayed vertically instead of horizontally)

 

· How to hide the x and y gridlines and the x or y axis?

 

chart4.png

 

  • There is a scripting section that enables you to customise the chart furthermore. You can use several languages: javascript or java. The code is below, using javascript:
    • chart - points to the chart itself
    • chart.getPlot() – will grab some chart properties.
    • chart.getDomainAxis() - picks the x axis properties
    • domain.setVisible(false) – hides the x axis
    • chart.setRangeGridlinesVisible(false) and chart.setDomainGridlinesVisible(false) – will hide gridlines for both axis

 

var chart = chart.getPlot();

var domain = chart.getDomainAxis();

 

domain.setVisible(false);

chart.setRangeGridlinesVisible(false);

chart.setDomainGridlinesVisible(false);

 

· How to set dynamically the x axis ticks of a time series chart depending on a start and end date parameters?

  • Using the scripting section, choosing the java language
    • Include the necessary libraries
    • int difInDays = (int) ((endDate.getTime() - startDate.getTime())/(1000*60*60*24)) - from the startDate and endDate parameters difference, I get the number of days that I want to show. Once that number is converted to ms, when I call DateTickUnit unit = new DateTickUnit(DateTickUnit.HOUR, difInDays, new SimpleDateFormat("dd-MM-yyyy HH:mm")), the java function already knows how many points will show for hour.
    • xAxis.setTickUnit(unit) - will set the tick unit that we set previously

 

import org.jfree.chart.axis.ValueAxis;

import org.jfree.chart.axis.DateTickUnit;

import org.jfree.chart.axis.DateAxis;

import org.jfree.data.Range;

import java.text.SimpleDateFormat;

import java.util.*;

import java.math.*;

import java.util.Date;

 

import org.jfree.chart.plot.Plot;

 

// Get the chart

Plot chartPlot = chart.getPlot();

 

ValueAxis xAxis= chartPlot.getDomainAxis();

 

 

Date startDate = dataRow.get("startDate");

Date endDate = dataRow.get("endDate");

int difInDays = (int) ((endDate.getTime() - startDate.getTime())/(1000*60*60*24));

 

DateTickUnit unit = new DateTickUnit(DateTickUnit.HOUR, difInDays, new SimpleDateFormat("dd-MM-yyyy HH:mm"));

 

xAxis.setTickUnit(unit);

 

· How to define dashed lines and specific colours in certain series in a line chart?

  • Using the scripting section, choosing the java language and include the following code:
    • In the query, the series that I wanted to paint and add dashed lines were at the top
    • chartPlot.getSeriesCount() – access the result set of the query
    • renderer.setSeriesPaint(i,  Color.green) – set the color, for a given row i
    • renderer.setSeriesStroke(i, new BasicStroke(1.0f,BasicStroke.CAP_BUTT, BasicStroke.JOIN_MITER, 10.0f, new float[] {2.0f}, 0.0f)) - set the shape style (BasicStroke.CAP_BUTT) and new float[] {2.0f} argument corresponds to the width of the dashed line
    • renderer.setSeriesShapesVisible(i,false) – will disable the line shapes on the remaining series
    • chartPlot.setRenderer ( renderer ) – updates the chart configurations with the ones specified in the scripting section

 

import java.awt.Color;

import java.awt.BasicStroke;

import java.awt.Stroke;

 

import org.jfree.chart.plot.XYPlot;

import org.jfree.chart.renderer.xy.XYLineAndShapeRenderer;

      import java.util.*;

import java.math.*;

import java.util.Date;

 

// Get the chart

XYPlot chartPlot = chart.getXYPlot();

 

 

XYLineAndShapeRenderer renderer = new XYLineAndShapeRenderer( );

for (int i = 0; i < chartPlot.getSeriesCount(); i++) {

      if(i <3){

                  if(i <1){

                              renderer.setSeriesPaint(i,  Color.orange);

                  }else if(i < 2){

                              renderer.setSeriesPaint(i,  Color.red);

                  }else{

                              renderer.setSeriesPaint(i,  Color.green);

                  }

 

                  renderer.setSeriesStroke(i, new BasicStroke(1.0f,BasicStroke.CAP_BUTT, BasicStroke.JOIN_MITER, 10.0f, new float[] {2.0f}, 0.0f));

      }

 

      renderer.setSeriesShapesVisible(i,false);

 

}

 

chartPlot.setRenderer( renderer );

 

 

Css tricks

 

· How can you stylise items, without using the style properties from the left panel?

      • It is possible to use an external stylesheet or the internal stylesheet - consult here how you specify the css style. However, there are some css properties that take precedence in relation to others. Imagine that you have the following example:
        • In the report-header, under a band element, there are two elements: text-field and a message.
        • The following Rules were defined:
          • report-header
            • font-size:16
            • background:blue
          • report-header text-field
            • font-size:16
            • background:pink
          • band text-field
            • font-size:10
            • background:purple
        • It was expected that
          • text-field should have the font-size:10 and background:purple, because band text-field Rule is more precise than report-header text-field Rule, as the band is inside the report-header
          • message should have font-size:16 and background:blue, because the other Rules refer to text-field
        • The output was
          • text-field had font-size:16 and background:pink, applied  report-header text-field Rule
          • message had the expected Rule applied
        • Even, adding the report-header to band text-fieldRule does not make any difference. The solution is to not mixture bands elements with other elements such as report-header,group-header, text-field. Use only bands with id and style-class properties under attributes panel. For instance, if you add a style-class, to band: reportHeader and a style-class to text-field: textField:
          • band.reportHeader
            • font-size:16
            • background:blue
          • band.reportHeader .textField
            • font-size:10
            • background:purple
        • The expected and output are the same
          • text-field have the font-size:10 and background:purple
          • message have font-size:16 and background:blue.

 

· When should not you use bands?

      • Imagine two sub-reports, one at the bottom of the other, and each one
        • is inside a band
        • contains a table

 

css1.png

 

      • As bands do not have the “overlap notion”, in other words, if you place one on top of the other, you won’t get the red alarm rectangle, meaning they are overlapping. So, if the first sub-report has a lot of rows to display, they will get overlapped with the rows returned by the second sub-report.
      • Due to this, do not use sub-reports inside bands.