Ben Isherwood

HCI Plugins: Geocoding

Blog Post created by Ben Isherwood Employee on Apr 14, 2017

Many of us are aware of the photo image geotagging capabilities of our smart phones. It's how social networks can report to the masses where we were when we posted our latest vacation slideshows. This process simply identifies your location at the time the photo was taken by leveraging the global positioning systems found in each device. The metadata coordinates of your position are attached to the document for later use.

So how can our systems take advantage of this data for search and analysis? Rather than manually sifting through millions of documents, it's useful to to identify all documents related to a specific building, city, state, or country and produce the results on demand. These results may then also be further categorized by other metadata found in each document to help you find that exact vacation image you were looking for.

To support the use of geotagged information, HCI provides the Geocoding stage.

Geocoding is really "reverse geotagging". You you take the metadata latitude and longitude coordinates attached to Documents as metadata and convert them into the corresponding City, State, Country, or even Timezone values that make sense for your use case.

The stage supports input latitude and longitude fields in the following format:

geo_lat : 42.482119 
geo_long : -71.186761

Note that these fields are automatically extracted by the "Text and Metadata Extraction" stage, so any geotagged metadata in each Document will be typically immediately available for further processing and analysis.



So, what can you do with this geotagged metadata?

The HCI Geocoding stage uses public location data collected and made available from the GeoNames project to map the latitude and longitude coordinates to the nearest local position on earth.

The stage supports the following output configuration values (any combination may be specified):

  • cityField - The name of the output field that should contain the city (defaults to "city", optional)
  • stateProvinceField - The name of the output field that should contain the state/province (defaults to "state", optional)
  • countryField - The name of the output field that should contain the country (defaults to "country", optional)
  • timeZoneField - The name of the output field that should contain the time zone (defaults to "timeZone", optional)




HCI pipelines can be configured to automatically extract these additional information fields from Documents given only the geotagged input fields that the camera added to each image. The resulting Documents contain even more metadata that may be utilized by the pipeline or in query requests for further faceting and categorization.




Now that we have the metadata, we can index it and leverage it for queries. Instead of just random keywords, we can then specifically target keywords found in documents matching "City:Burlington" and "State:MA" in our queries.

Now you'll know how to easily locate that ONE great St. Lucia skyline photo from an image repository of billions. And you may even run into hundreds just like it in the process!

Thanks for reading,