AnsweredAssumed Answered

Error parsing email documents

Question asked by Hedde van der Hoeven on Apr 4, 2017
Latest reply on Apr 19, 2017 by Hedde van der Hoeven

Hello there,

 

We are running a workflow on 2.5 million plus email objects stored in HCP, so far 2.1 million in we encountered 783 document failures with the message below, is this something you have observed before?

 

com.hds.ensemble.sdk.exception.PluginOperationFailedException: Failed to parse an email message

at com.hds.ensemble.plugins.extract.ExtractStagePlugin.process(ExtractStagePlugin.java:380)

at com.hds.ensemble.pipeline.plugins.StageSslHandlerWrapper.lambda$process$0(StageSslHandlerWrapper.java:50)

at com.hds.ensemble.plugins.PluginSslHandlerWrapper$PluginAction.run(PluginSslHandlerWrapper.java:124)

at com.hds.ensemble.plugins.PluginSslHandlerWrapper.wrapSSLCheck(PluginSslHandlerWrapper.java:96)

at com.hds.ensemble.pipeline.plugins.StageSslHandlerWrapper.process(StageSslHandlerWrapper.java:49)

at com.hds.ensemble.workflow.WorkflowModule.lambda$processElements$0(WorkflowModule.java:519)

at com.hds.ensemble.workflow.WorkflowModule.wrapPluginRetry(WorkflowModule.java:778)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:519)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:613)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:505)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:613)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:613)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:613)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:613)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:613)

at com.hds.ensemble.workflow.WorkflowModule.processElements(WorkflowModule.java:613)

at com.hds.ensemble.workflow.WorkflowModule.runWorkflowInt(WorkflowModule.java:440)

at com.hds.ensemble.workflow.WorkflowModule.runWorkflow(WorkflowModule.java:281)

at com.hds.ensemble.job.EnsembleSparkJob.lambda$null$0(EnsembleSparkJob.java:222)

at java.util.Iterator.forEachRemaining(Iterator.java:116)

at com.hds.ensemble.job.EnsembleSparkJob.lambda$runJob$98a696ac$1(EnsembleSparkJob.java:200)

at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218)

at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218)

at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902)

at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902)

at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)

at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)

at org.apache.spark.scheduler.Task.run(Task.scala:86)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.tika.exception.TikaException: Failed to parse an email message

at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:90)

at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)

at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)

at com.hds.ensemble.plugins.extract.ExtractStagePlugin.process(ExtractStagePlugin.java:321)

... 32 more

Caused by: org.apache.james.mime4j.io.MaxHeaderLengthLimitException: Maximum header length limit exceeded

at org.apache.james.mime4j.stream.DefaultFieldBuilder.append(DefaultFieldBuilder.java:63)

at org.apache.james.mime4j.stream.MimeEntity.readRawField(MimeEntity

 

--Hedde--

Outcomes