Pentaho

 View Only

 How can I calculate the weighted averages based on multiple groupings?

  • Pentaho
  • Ctools
  • Pentaho
James Antony's profile image
James Antony posted 04-01-2019 13:31

I asked a similar question a few days ago (https://community.hitachivantara.com/message/43497), which I was able to resolve. However, that was merely to check whether the calculations were possible and was a smaller part of a larger problem

In order to calculate one of the weighted averages, the sum of a particular group is required, there can be many different groups.

An example of one of the groupings is by counterparty and term type. Suppose there is a choice of 7 counterparties and 7 term types, this can result in 49 different grouping permutations records.

 

I then need to calculate the sum of a particular value of each group, (as well as maintain the group count) before calculating the weighted average.

 

My approach could be all wrong, but performing the calculation for every group is proving to be quite difficult, I’m not sure if this can be done as a single transformation.

 

I apologise if I have provided a vague explanation, and will provide as much as I can if you require more information.

 

Here's an image of my (incomplete) transformation, with some notes of what I'm trying to achieve; each path contains its own set of groups.

grouping

Any help with this will be immensely appreciated.

 

Update:

Here's an example of input data and groupings:

 

example


#Ctools
#Pentaho
Ana Gonzalez's profile image
Ana Gonzalez

I don't know if others will be able to understand what you are asking, but I'm unable to guess what you are trying to achieve. I know you are trying to get different subtotals of a calculation grouping by different columns while at the same time keeping the original rows, but in your sample data I can't see counterparty or term type, I suppose instead of those columns, in your sample data you are using FinanceType and Industry.

So with this input data:

BalanceFinanceType (col1)Collateralisation (col2)BusinessSize (col3)Industry (col4)1000AssetsFully securedSFarming2000AssetsFully securedSFishing5000Off-balancePartially securedSMining2000Off-balanceunsecuredLMining

Do you want to get something like this at the end?:

BalanceFinanceType (col1)Collateralisation (col2)BusinessSize (col3)Industry (col4)WeightCol1CountCol1WeightCol2CountCol2WeightCol3CountCol3WeightCol4CountCol4TotalBalanceSum1000AssetsFully securedSFarming0.320.320.1311100002000AssetsFully securedSFishing0.720.720.3311100005000Off-balancePartially securedSMining0.72110.630.72100002000Off-balanceunsecuredLMining0.3211110.3210000

Regards

James Antony's profile image
James Antony

Thank you again, for getting back to me.

Counterparty and term are another grouping, and not part of the sample data, I should have made that more clear.

The sample data provided is for two (different) groupings, one by finance type and the other by industry.

What you said is correct, I want to calculate the weighted average based on the total of a group.

In the data above the total sum of assets is 3000 and the total of off-balance is 7000, so I want to calculate the each row by the respective total.

1000 / 3000 (sum of assets)

2000 / 3000 (sum of assets)

5000 / 7000 (sum of off-balance)

2000 / 7000 (sum of off-balance)

I should note that the output for the 2 groupings I provided above, don't have to be part of the same output, they can be their own separate tables, so the output of the financetype group can be one output and the output of the industry group can be another.

I hope this makes it more clear.

Thanks again.

Ana Gonzalez's profile image
Ana Gonzalez

I'm still not sure if I understand what you are trying to achieve. So you have input data with this structure:

Column1Column2Column3Column4Column5Column6Column7Balance

And you want this kind of output:

Column1Column2Column3Column4Column5Column6Column7BalanceWeightedColumnNNumRowsColumnN

So you'll have 7 diferent tables with the output, each table for each column you use in your grouping? Is this what you are trying to achieve?

If this is it, with so much repeating the same operation you'll better use Metadata Injection to create a generic transformation to use the group by operations and you call that transformation N times telling each time the metadata of which column to use for the group by and the table to insert the result.

Regards

James Antony's profile image
James Antony

That’s not quite what I’m after, let me try another example, as I think I’ve made a right hash of the one above.

Suppose I have the following data grid.

grid

I then perform the following grouping

pastedimage_1

Which gives me the sum of each fruit

pastedimage_2

Now that I have the sum of each, I want to place those summations back into the original data, so for every row containing an apple, I want to have the sum of all apples,

For every row containing a banana, I want to have the sum of all bananas, etc.

Something like this

pastedimage_3

If I can achieve this; I can then calculate the weighted average, because it’s performed on a row by row basis – I can then say the

price / sumOfFruit

Which will then give me something like this

pastedimage_4

I hope that makes it more clear.

Again, I appreciate the time you've taken out to assist me with this.

James Antony's profile image
James Antony

Update: Managed to get what I want in terms of the fruit analogy.

Time to try the solution with my real data.

Ana Gonzalez's profile image
Ana Gonzalez

OK, so you could use the same as in my example in your other question, but instead of not putting any condition in the Merge Join step, you put the condition Fruit = Fruit, and you use the Group by  step grouping by Fruit.

Or you use another solution, there are more than one way to do it.

Regards