We have a special data integration task to join "known hostnames" on given inputRows where hostname has many postfixes. example
Stream1: server-abc Stream2 (csv): server-abc-cool-ext-2 ip:10.8.2.2
There is no rule of valid postfixes or well-known hostnames. So the only approach is to use kind of search from left (startsWith).
The step "join rows" has such a feature but does not stop at first match. It continous to seach all other data from stream2 which we do not want.
What we wanted to do:
If input data gets sorted by lengh of the inputstring the longest could be checked first. On match we remove row from lookupStream2.
The planned approach is to use a "user defined java class" to accomplish that. I think of using HashMap/Array where we can store all occurences of data from stream2. Then loop through all hosts from stream1 and check if there is a match. On match we remove the entry from the ArrayList/HashMap.
I struggle with the Java class to adress the two input streams. May you have an example for me?
I would really appreciate any other suggestion if you can think of a better approach!
Many thanks in advance!