Hitachi Content Intelligence​

  • 1.  HCI: Logging and Monitoring at the Document/Object level

    Posted 05-20-2020 16:34

    Based on some customer requirement to have logging/monitoring capabilities at document level, I have created a stage that sends logging messages.

     

    The stage can and should be used to monitor documents at a granularity of documents/field level, getting logs for processed or failed documents at different stages of a workflow. We can use the stage to send information to the document.

     

    The stage uses using FluentD API (https://www.fluentd.org/) to logs the messages in each of the different critical phases of a workflow.

     

    For great performance the deployment involves a Fluentbit (https://fluentbit.io/) agent that will be installed in each node of the HCI cluster. This could even be done using containers included as part of Mesos and scheduled with Marathon, such as other components of HCI.

     

     2020-05-19 20_36_03-Logging.drawio - draw.io

     

     

     

     

     

     

     

     

    As per the image above, the stage will send the logs to Fluentbit, that will take care of forwarding to FluentD or other for high availability and load balancing for fail-over. Once the message arrives to FluentD servers it gets filtered and parsed to be send to other systems such as Splunk or any other logging solution or document oriented datastore.

     

    The system is capable of handling backpressure, making a robust solution and capable of being part of the most scalable and reliable solutions.

     

    Features, provided with the integration with FuentD and Fluentbit:

    • High Performance
    • Data Parsing and filtering
    • Reliability and Data Integrity handling back-pressure and capable of data buffering
    • Security: built-in TLS/SSL support
    • Extensibility
    • Monitoring and stream processing
    • Portability
    • Out-of-the-box integration with multiple 3rd party solutions such as Splunk, Elasticsearch, SumoLogic, Datdog, New Relic, Sentry, Emial, Slack Twillio, HBASE, BigQuery, Kafka, Bigquery, Databases, … among many others

     

    The stage was done for a POC and will require some improvements such as:

    • Selection of fields to be included in logging
    • Class separation for easier integration with other stages

     

    Already included:

    • Ability to select the server and port to send the logs to
    • Identify tag and prefix to be used for search purposes
    • Stage compatible with version 1.6.X and 1.7 of HCI (should work for earliest versions but not tested)

     

    Please reach out to @Miguel Gaspar​ (miguel.ferreira.gaspar@gmail.com) for more information or if you have some user case where the stage makes sense.

     

    Improvements can be included as per customer requirements.

     

    If there is any interest, next post, can give more details and show how it can be used with Elasticsearch. 

     

     

     https://www.linkedin.com/in/mfgaspar/

     

     

     


    #HitachiContentIntelligenceHCI


  • 2.  RE: HCI: Logging and Monitoring at the Document/Object level

    Posted 06-16-2020 12:16

    Nice work, @Miguel Gaspar​ ! Would you be interested in publishing the plugin here so others can try it out (myself included)?



  • 3.  RE: HCI: Logging and Monitoring at the Document/Object level

    Posted 06-16-2020 12:52

    Sure @Jonathan Chinitz​, what's the best way to publish it? Attach it to the post? Thanks.



  • 4.  RE: HCI: Logging and Monitoring at the Document/Object level

    Posted 06-16-2020 13:03

    The plugin itself can be posted here as a jar, or if you have a gitlab repo you can publish a link to that here. If you have any supplementary docs that accompany the plugin my suggestion would be to wrap it up in a zip and post that here.

     



  • 5.  RE: HCI: Logging and Monitoring at the Document/Object level

    Posted 07-10-2020 22:38

    I have attached the plugin, remember you need to have fluentd up and running