AnsweredAssumed Answered

Join rows "startsWith" stop on first match - udjc?

Question asked by Thomas Lanz on Apr 9, 2019
Latest reply on Apr 10, 2019 by Sparkles Sparkles

Dear Experts

 

We have a special data integration task to join "known hostnames" on given inputRows where hostname has many postfixes. example

Stream1: server-abc          Stream2 (csv): server-abc-cool-ext-2      ip:10.8.2.2

                                                                    server-abc-cool              ip:10.8.2.3

There is no rule of valid postfixes or well-known hostnames. So the only approach is to use kind of search from left (startsWith).

The step "join rows" has such a feature but does not stop at first match. It continous to seach all other data from stream2 which we do not want.

 

What we wanted to do:
If input data gets sorted by lengh of the inputstring the longest could be checked first. On match we remove row from lookupStream2.

The planned approach is to use a "user defined java class" to accomplish that. I think of using HashMap/Array where we can store all occurences of data from stream2. Then loop through all hosts from stream1 and check if there is a match. On match we remove the entry from the ArrayList/HashMap.

 

I struggle with the Java class to adress the two input streams. May you have an example for me?

I would really appreciate any other suggestion if you can think of a better approach!

 

Many thanks in advance!

Thomas

Outcomes