Pentaho

 View Only

 Kafka Consumer step in Pentaho Data Integration is not streaming the events from Kafka and the transformation gets stopped on its own immediately

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Rajkumar Venkatasamy's profile image
Rajkumar Venkatasamy posted 04-23-2020 09:58

Hi Community members,

 

We are exploring PDI for Kafka streaming / ETL purpose. As part of which , I tried using Kafka Consumer step to consume the events stream from Kafka server. The same Kafka server is working fine with other consumer applications (written in Java). But when used with Pentaho (Version 9), the streaming is not working. Given below is the screenshot, also attached (PDI Consumer.png) where you could see that the consumer gets stopped on its own without throwing any errors. Tried launching the transformation with Log level set to Row Level mode, still not much information on the Spoon / Pentaho logs.

 

PDI Consumer

 

From the Kafka server log, I could see that Consumer group "n4snapshotcg1" which I have configured in Kafka Consumer step gets registered in Kafka. The broker then tries to rebalance the group and assigns the work to the consumer group. But following that, the Pentaho consumer leaves the group by itself and terminates its work. Attached the Kafka server log snippet (PDI Consumer with Kafka.png) showing this detail.

 

PDI Consumer with Kafka

 

I can see that the network connectivity exists between Pentaho to Kafka and back from Kafka to Pentaho. But still not sure, why Pentaho transformation gets completed and not continuing to stream the messages.

 

Consumer Option "auto.offset.reset" is set to earliest. Sub Transformation exists with "Get Records from Stream" step. Batch configuration option in kafka consumer is as given below and attached as well (PDI Consumer Batch Options.PNG):

 

PDI Consumer Batch Options 

 

 I am stuck with this stage and not able to proceed further. Any help on this is highly appreciated.

 

Thanks!!!


#Kettle
#PentahoDataIntegrationPDI
#Pentaho
Rajkumar Venkatasamy's profile image
Rajkumar Venkatasamy

Hi Pentaho Technical team / Community members

 

Seems that the pentaho consumer is not functioning properly when the messages are compressed by producer. That's seems to be the scenario in my case, when the consumer didn't work as detailed above. Our Kafka producer published messages in lz4 compression format.

 

When I removed the compression property from producer, the consumer worked properly by consuming the messages.

 

Is this a known limitation with Pentaho Consumer ?

 

 

Thanks and Regards