AnsweredAssumed Answered

PDI CE 8.1 AEL spark job hangs

Question asked by Rajendra Patki on Aug 17, 2018

I am trying to using AEL spark with PID 8.1 CE and cloudera CDH 5.12

 

My application properties as follows

 

hadoopConfDir=/etc/hadoop/conf

sparkHome=/home/<username>/spark-1.6.3-bin-hadoop2.6

sparkMaster=yarn

sparkDeployMode=client

sparkApp=/home/<username>/pdi81spark/data-integration/

assemblyZip=hdfs:/user/<username>rajendrapa/pdi8-executor/pdi-spark-executor.zip

 

When I run the job through spoon, firstly I get following error.

 

2018-08-17 09:09:16.952  INFO 20755 --- [launcher-proc-1] o.apache.spark.launcher.app.TestSpark2   : 09:09:16,951 ERROR [KarafCapability] Unknown error installing feature

2018-08-17 09:09:16.952  INFO 20755 --- [launcher-proc-1] o.apache.spark.launcher.app.TestSpark2   : java.io.IOException: Error resolving artifact pentaho:pentaho-osgi-config:cfg:pentaho-kerberos:8.1.0.0-365: Could not find artifact pentaho:pentaho-osgi-config:cfg:pentaho-kerberos:8.1.0.0-365 in karaf-system (file:/home/rajendrapa/pdi81spark/data-integration/system/karaf/system/)

 

As per this link (Karaf errors when trying to run a Spark transformation ) this error can be ignored and the job should start. There is also a bug reported for this ([PDI-17312] Error resolving artifact pentaho:pentaho-osgi-config:cfg:pentaho-kerberos:8.1.0.0-365 - Pentaho Platform Tra… )

 

I my case after some time the job starts, meaning I see it in yarn applications (port 8088 port) and in spark history server (port 18088) but it never progress. In spark history server no job or tasks starts. Meaning that job is submitted but never started. No error is reported in any logs.

 

Any pointers will be appreciated.

 

Thanks

Rajendra.

Outcomes