AnsweredAssumed Answered

Parsing unstructured street address

Question asked by khaoula ajbal on Jan 23, 2018
Latest reply on Jan 25, 2018 by Bill Moore

Hi, i'm using PDI 7.1.0.0-12 as an ETL tool to my datawarehouse. One dataset contains the address and it has been entered manually, so i have an unstructered string variable with all possible entries.

exemple : 12 , rue ibn rochd, avenue moulay smail, Casablanca

                 BOULEVARD MASSIRA, OULFA

                 barnoussi, bloc 12 imm 5 app 6.

                 12 rue ibn koutaiba, casa.

(note : addresses are in casablanca, morocco)

Is there any way to extract the street and the district from an unstructued address perhaps using NLP?

I was thinking of creating a table (from an open source dataset) with all the street and district names of casablanca  (including short forms)  and then if any word in the the dataset matches with a street/district or its short form it fills this into a new column in the target table.

Best regards,

Outcomes