Join rows "startsWith" stop on first match - udjc?

Thomas Lanz Apr 9, 2019
Apr 10, 2019 Sparkles Sparkles

Dear Experts


We have a special data integration task to join "known hostnames" on given inputRows where hostname has many postfixes. example

Stream1: server-abc          Stream2 (csv): server-abc-cool-ext-2      ip:

                                                                    server-abc-cool              ip:

There is no rule of valid postfixes or well-known hostnames. So the only approach is to use kind of search from left (startsWith).

The step "join rows" has such a feature but does not stop at first match. It continous to seach all other data from stream2 which we do not want.


What we wanted to do:
If input data gets sorted by lengh of the inputstring the longest could be checked first. On match we remove row from lookupStream2.

The planned approach is to use a "user defined java class" to accomplish that. I think of using HashMap/Array where we can store all occurences of data from stream2. Then loop through all hosts from stream1 and check if there is a match. On match we remove the entry from the ArrayList/HashMap.


I struggle with the Java class to adress the two input streams. May you have an example for me?

I would really appreciate any other suggestion if you can think of a better approach!


Many thanks in advance!