Pentaho

 View Only

 Kafka consumer  java.lang.OutOfMemoryError: GC overhead limit exceeded

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Andry RAKOTONDRASOA's profile image
Andry RAKOTONDRASOA posted 04-29-2019 09:23

This content was either too long or contained formatting that did not work with our migration. A PDF document is attached that contains the original representation

 

We are trying for to run a Kafka consumer transformation in Pentaho Server 8.1. The transformation subscribes to a single topic that sends  500 messages per second. But after running a few hours (18 hours), the tranformation stops for a memory's problem. Any idea is welcome for solving this problem.Thanks to see  below the java options settings and error log : * CATALINA_OPTS="-Xms8g -Xmx10g -XX:MaxPermSize=512m" 2019/04/25 07:34:59 - PurRepositoray - Creating repository meta store interface2019/04/25 07:34:59 - PurRepository - Connected to the enterprise repository2019/04/25 07:34:59 - PurRepository - Creating repository meta store interface2019/04/25 07:34:59 - PurRepository - Connected to the enterprise repository2019/04/25 07:34:59 - t_kafka_consumer_sub - Distribution démarrée pour la tranformation [t_kafka_consumer_sub]java.lang.OutOfMemoryError: GC overhead limit exceededException in thread "RxCachedThreadScheduler-6" java.lang.OutOfMemoryError: GC overhead limit exceeded25-Apr-2019 07:36:53.311 GRAVE [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run Unexpected death of background thread [ContainerBackgroundProcessor[StandardEngine[Catalina]]] java.lang.OutOfMemoryError: GC overhead limit exceededException in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]" java.lang.OutOfMemoryError: GC overhead limit exceededException in thread "http-nio-8080-exec-6" java.lang.OutOfMemoryError: GC overhead limit exceeded25-Apr-2019 09:53:16.767 GRAVE [http-nio-8080-exec-1] com.sun.xml.ws.server.sei.TieHandler.createResponse java.lang.OutOfMemoryError: GC overhead limit exceeded com.google.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: GC overhead limit exceeded        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2199)        at com.google.common.cache.LocalCache.get(LocalCache.java:3934)        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938)        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821)        at org.pentaho.platform.repository2.unified.jcr.sejcr.GuavaCachePoolPentahoJcrSessionFactory.getSession(GuavaCachePoolPentahoJcrSessionFactory.java:120)        at org.pentaho.platform.repository2.unified.jcr.sejcr.CredentialsStrategySessionFactory.getSession(CredentialsStrategySessionFactory.java:364)        at org.springframework.extensions.jcr.SessionFactoryUtils.doGetSession(SessionFactoryUtils.java:87)        at org.springframework.extensions.jcr.SessionFactoryUtils.getSession(SessionFactoryUtils.java:119)        at org.pentaho.platform.repository2.unified.jcr.sejcr.PentahoJcrTemplate.getSession(PentahoJcrTemplate.java:87)        at org.pentaho.platform.repository2.unified.jcr.sejcr.PentahoJcrTemplate.execute(PentahoJcrTemplate.java:60)        at org.springframework.extensions.jcr.JcrTemplate.execute(JcrTemplate.java:115)        at org.pentaho.platform.security.policy.rolebased.JcrRoleAuthorizationPolicyRoleBindingDao.getBoundLogicalRoleNames(JcrRoleAuthorizationPolicyRoleBindingDao.java:153)        at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99)        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:280)        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)        at com.sun.proxy.$Proxy88.getBoundLogicalRoleNames(Unknown Source)        at org.pentaho.platform.security.policy.rolebased.RoleAuthorizationPolicy.isAllowed(RoleAuthorizationPolicy.java:84)        at org.pentaho.platform.security.policy.rolebased.ws.DefaultAuthorizationPolicyWebService.isAllowed(DefaultAuthorizationPolicyWebService.java:77)        at sun.reflect.GeneratedMethodAccessor236.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)        at sun.reflect.GeneratedMethodAccessor182.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)        at sun.reflect.GeneratedMethodAccessor207.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at com.sun.xml.ws.api.server.MethodUtil.invoke(MethodUtil.java:83)        at com.sun.xml.ws.api.server.InstanceResolver$1.invoke(InstanceResolver.java:250)        at com.sun.xml.ws.server.InvokerTube$2.invoke(InvokerTube.java:149)        at com.sun.xml.ws.server.sei.SEIInvokerTube.processRequest(SEIInvokerTube.java:88)        at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:1136)        at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:1050)        at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:1019)        at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:877)        at com.sun.xml.ws.server.WSEndpointImpl$2.process(WSEndpointImpl.java:419)        at com.sun.xml.ws.transport.http.HttpAdapter$HttpToolkit.handle(HttpAdapter.java:868)        at com.sun.xml.ws.transport.http.HttpAdapter.handle(HttpAdapter.java:422)        at com.sun.xml.ws.transport.http.servlet.ServletAdapter.invokeAsync(ServletAdapter.java:225)        at com.sun.xml.ws.transport.http.servlet.WSServletDelegate.doGet(WSServletDelegate.java:161)        at com.sun.xml.ws.transport.http.servlet.WSServletDelegate.doPost(WSServletDelegate.java:197)        at org.pentaho.platform.web.servlet.PentahoWSSpringServlet.doPost(PentahoWSSpringServlet.java:98)        at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)        at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)        at org.pentaho.platform.web.http.filters.PentahoWebContextFilter.doFilter(PentahoWebContextFilter.java:235)        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)        at org.pentaho.platform.web.http.filters.PentahoRequestContextFilter.doFilter(PentahoRequestContextFilter.java:90)        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)        at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:115)        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)        at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:137)        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)        at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)        at org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:215)        at org.pentaho.platform.web.http.security.PentahoBasicProcessingFilter.doFilterInternal(PentahoBasicProcessingFilter.java:128)        at org.springfr...
#Kettle
#PentahoDataIntegrationPDI
#Pentaho
Steven Brown's profile image
Steven Brown

Hi RAKOTONDRASOA Andry,

Might want to review and/or implement this:

What is java.lang.OutOfMemoryError: GC overhead limit exceeded?

Best,

Steven

Andry RAKOTONDRASOA's profile image
Andry RAKOTONDRASOA

Hi Steven Brown,

Thanks for your reply.

We have already tested this configuration but we still have the same error : OutofMemory.

With a 5000000 LAG in the kafka cluster, the etl reaches 10GB of memory usage in 45 seconds.

So, if it is related to the pentaho environment, using the etl Kafka consumer  is recommended for consuming a big volume of data in a Kafka topic ?

Regards,

Andry

Ricardo Miguel Díaz Razo's profile image
Ricardo Miguel Díaz Razo

Hi RAKOTONDRASOA Andry

Can you attach a screenshot of your transformation?

Andry RAKOTONDRASOA's profile image
Andry RAKOTONDRASOA

Hi Ricardo Diaz,

Please find attached the screenshots of etl.

Both transformations (screenshots_1 and screenshot_2 ) have the same memory problem after a few hours with a topic without LAG in Kafka and 2 minutes for a topic with LAG (LAG > 4500000 ).

Regards,

Andry

Virgilio Pierini's profile image
Virgilio Pierini

Hi RAKOTONDRASOA Andry

I don't have a specific answer for you but I'm working on Kafka consumers as well and I'm having some, slightly different but relevant, problems.

I had out of memory issues, I'm on CE edition so I'm using plain Carte to have the transformation running and I haven't found out yet the reason. But I noticed I was on OpenJDK and now switched to Oracle JVM. Also I'm on AWS machines which usually do not have swap space and this leads to machine stall. So I'm currently testing a more supported scenario.

As per your transformation I don't see any particular problem, afaik. I mean that, for example, if you run jobs "per every row" then PDI has to collect a lot of "result information" from every run. But you're using only transformations, am I right?

Another thing, maybe it's silly, maybe not :-) The kafka consumer gets all data and then calls the trasnformation according to the batch size (or time frame). Which configuration are you using? Maybe you're getting too much data inside of the component at once. Some things to try might be to:

- start with "latest" messages and a new consumer group to see if things go weird anyway regardless of LAG and history!

- and, if you can, to increase memory a lot and see if there is some size where it fits. I mean, so far you never managed to have a long running job, while I managed with a much more complex transform. So first would try to define a baseline.

My regards

Virgilio

Andry RAKOTONDRASOA's profile image
Andry RAKOTONDRASOA

Hi Virgilio Pierini,

At the moment, we're only testing Kafka consumers with transformations whose a batch configuration is 3000 ms or 1000 row.

As I said in my previous comment, the transformation have memory problem after a few hours with a topic without LAG in Kafka and 2 minutes for a topic with LAG (LAG > 4500000 ). Start with "latest" messages is just to delay the inevitable memory problem.

Out of curiosity, we developed the same transformation in another ETL platform (not Pentaho) and my job runs without any memory problem (less than 150MB for LAG=30 milllion and less than 110MB without LAG (from the "latest" message )) whereas it was the same configurations for the JVM that Pentaho uses.

Regards,

Andry 

Jens Bleuel's profile image
Jens Bleuel

You might investigate into two other directions:

1) It might be log buffer related

2) It might be related to the repository (from what I see in the error message)

To isolate the issue, I propose to:

For 1) Minimize the logging to "Error Only" and see if it keeps alive longer. When this is the reason, there are ways to limit and clean up the log buffers.

For 2) Try to use it without the repository (.ktr files)

To really nail it down, you might want to use a Java memory profiler and see what types of objects are polluting the system.

Hope that helps,

Jens

Andry RAKOTONDRASOA's profile image
Andry RAKOTONDRASOA

Hi Jens Bleuel,

We have already tried your first proposal because we had to reduce the size of the log file (catalina.out).

And the file execution mode of ETLs returns the same errors : 

2019/05/13 16:03:14 - grfs-message.0 - Fin exécution étape (Entrées=0, Sorties=0, Lues=10000, Ecrites=10000, Maj=0, Erreurs=0)2019/05/13 16:03:17 - jsoni-message.0 - Fin exécution étape (Entrées=10000, Sorties=0, Lues=10000, Ecrites=10000, Maj=0, Erreurs=0)2019/05/13 16:03:17 - calc-periode.0 - Fin exécution étape (Entrées=0, Sorties=0, Lues=10000, Ecrites=10000, Maj=0, Erreurs=0)2019/05/13 16:03:17 - swcs-type-message.0 - Fin exécution étape (Entrées=0, Sorties=0, Lues=10000, Ecrites=10000, Maj=0, Erreurs=0)2019/05/13 16:03:22 - jsoni-message-service.0 - Fin exécution étape (Entrées=10000, Sorties=0, Lues=10000, Ecrites=20000, Maj=0, Erreurs=0)Exception in thread "Timer-2" java.lang.OutOfMemoryError: Java heap space        at java.io.BufferedReader.<init>(BufferedReader.java:105)        at java.io.BufferedReader.<init>(BufferedReader.java:116)        at java.io.LineNumberReader.<init>(LineNumberReader.java:72)        at org.apache.felix.utils.properties.Properties$PropertiesReader.<init>(Properties.java:748)        at org.apache.felix.utils.properties.Properties.loadLayout(Properties.java:352)        at org.apache.felix.utils.properties.Properties.load(Properties.java:142)        at org.apache.felix.utils.properties.Properties.load(Properties.java:138)        at org.apache.felix.utils.properties.Properties.load(Properties.java:122)        at org.apache.felix.utils.properties.Properties.<init>(Properties.java:107)        at org.apache.felix.utils.properties.Properties.<init>(Properties.java:96)        at org.apache.karaf.jaas.modules.properties.AutoEncryptionSupport$1.run(AutoEncryptionSupport.java:63)        at java.util.TimerThread.mainLoop(Timer.java:555)        at java.util.TimerThread.run(Timer.java:505)

 

Best regards

 

Andry

Data Conversion's profile image
Data Conversion
Attachment  View in library
Data Conversion's profile image
Data Conversion
Data Conversion's profile image
Data Conversion
Attachment  View in library
Data Conversion's profile image
Data Conversion
Attachment  View in library
70082.pdf 615 KB
Andry RAKOTONDRASOA's profile image
Andry RAKOTONDRASOA

Hi All,

After a few weeks of testing, I still have the same error on my job Pentaho Kafka consumer.

We must put Jobs into production in mid-November and I have two weeks to find a solution for the Kafka consumer job.

if you have questions or something to clarify, please don't hesitate to comment.

Regards

Andry

Steven Brown's profile image
Steven Brown

@Andry RAKOTONDRASOA​ , Java throws out of memory error even when there's plenty of memory. Sounds more like GC is having issues.

 

If you're using Oracle Java 8, you might want to try these settings:

-XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts

 

as described in:

https://support.pentaho.com/hc/article_attachments/360023371491/Tuning_JVM_Garbage_Collection_for_Pentaho.pdf

Steven Brown's profile image
Steven Brown

@Andry RAKOTONDRASOA​ , When Java throws a GC out of memory error, it may not have to do with memory but inability to perform GC.

 

You might want to try modifying the start-pentaho.sh with:

-XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts

 

as described in:

https://support.pentaho.com/hc/article_attachments/360023371491/Tuning_JVM_Garbage_Collection_for_Pentaho.pdf