Object Storage

 Extract Metadata (Tika) from GDAL Files

  • HV Object Storage
  • Hitachi Content Intelligence HCI
Data Conversion's profile image
Data Conversion posted 12-08-2018 18:16

GDAL is an open source project that extracts information from geospatial files. You can install command line tools like gdalinfo and it will extract lots of information from the files.

In Tika, there appears to be integration with GDAL module that seems to be a wrapper around the gdalinfo command line. Tika does recognize the file in question, and claims it is using the GDAL Tika module, but I don't get any useful information back. The following is all I get back from HCI.

screen-shot-2018-12-08-at-1

So in trying to "fix" this, I thought perhaps the reason is that gdalinfo command is not installed as a default on the HCI host OS. So I installed it and the CLI tool works great. However, HCI (even after a restart) still doesn't give any useful information.

I also installed java on the HCI host OS and retrieved the tika app jar file (tika-app-1.19.1.jar). When I run it from the command line on the HCI server:

 

java -jar tika-app-1.19.1.jar -m ./17NOV15081801-S3DMR03C01.NTF)

it does return all the image information.

Any help on getting HCI (patching is fine) to return much more useful information via Metadata Extraction (Tika) would be greatly appreciated.


#HitachiContentIntelligenceHCI
Data Conversion's profile image
Data Conversion

In order for HCI to natively be able to extract metadata from image files via Tika, the product needs to be enhanced by adding gdal package to the workflow container.  I did hack the system to make it work, but don't recommend it for a production system.