Pentaho

 View Only

 Barchart - empty space for missing time series data point

  • Pentaho
  • Ctools
  • Pentaho
Glupe Registracije's profile image
Glupe Registracije posted 03-09-2018 10:01

Hi,

if I have several data points, something like this (please note that there are no data point for 20-th and 25-th minute):

2017-12-31 00:00:00  1

2017-12-31 00:05:00  2

2017-12-31 00:10:00  3

2017-12-31 00:15:00  4

2017-12-31 00:30:00  5

When I do the bar chart for these values with time on x axis and value (1-5) on y axis, barchart is drown well as five bars, but there is no space before 4-th and 5-th bars.

I would have expected that I will have four successive bars with values 1-4, than empty space of size two bars (for 20-th and 25-th missing data point), and than last data point with its value of 5.

For now we are using sql for inserting missing data points in datasource. In this case we are inserting the following two missing data points:

2017-12-31 00:20:00  0

2017-12-31 00:25:00  0

in order to get data source like this:

2017-12-31 00:00:00  1

2017-12-31 00:05:00  2

2017-12-31 00:10:00  3

2017-12-31 00:15:00  4

2017-12-31 00:20:00  0

2017-12-31 00:25:00  0

2017-12-31 00:30:00  5

and to be able finally to have a hole between 15-th and 30-th minute.

Can I somehow tell pentaho chart to use time scale as x axis, and to plot only values that exist in the datasource and leave empty space for any missing value?

Regards.


#Ctools
#Pentaho
Duarte Cunha Leao's profile image
Duarte Cunha Leao

Hi Glupe,

that's not supported.

You could use a Line chart and specify timeSeries as true, and you'd get a continuous scale on the X-axis, which naturally "fills-in" missing data points.

Bar charts, on the other hand, use a categorical scale on the X-axis — each distinct value is a category and each category gets a fixed-width space to draw a bar on.

In the spirit of trying to justify a new feature to support your use case:

  • Are you performing binning on the server, and is that why you want to show bars for underlying continuous data?
  • Could we trust that the data is equally spaced (always 5 minutes) and "aligned to 0" (no points at the 10th second)?

If this were the case, we could think of extending the existing interpolation mechanism (nullInterpolationMode). Currently, it allows filling-in blank measure values across series (one category exists in one series but not in another). For continuous categories, whether date or numeric, and the definition of a regular interval and alignment (a grid), we could create missing data points with either null, 0 or interpolated measure data.

What do you think?

Cheers!

Glupe Registracije's profile image
Glupe Registracije

Duarte,

thank you very much for your explanation. I tried to use line chart as you have suggested, but the problem is that some data points are missing in our dataset, like for example data points: 2017-12-31 00:20:00 and 2017-12-31 00:25:00 from the example above.

In our particular case line will be drawn for the first four data points and than from 4-th data point to the 5-th with no visual clue about missing two data points in between.

The only solution for us at the moment is to deliberately  put null values as described above in the dataset (you should see the sql query that handles this ).

If we do so, there is a very nice line chart feature "Null Interpolation Mode" that if set to Linear will show null values as doted line.

New feature justification:

We are performing binning on the server so it is not a problem to have all data points in 5 minutes time slots aligned to the

zero second.  If data is in such format, it should be quite easy to just plot bars (or lines) on time line.

Furthermore, it would be very nice to have horizontal scrollbar that could be used for shifting time.

For example, if I have data set for 7 days, I could show in line chart data only for two days, and than use horizontal scrollbar in order see different two days of 7 days periods.

Duarte, thanks again. Best regards.