Pentaho

 View Only

 Problems reading CSV file

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Luis Suarez's profile image
Luis Suarez posted 09-13-2018 21:05

Hi, I'm working with PDI Version 7.1 and I'm trying to read a CSV file with the CSV File Input step, but I'm having an error that I can not handle.

My CSV file is delimited by the character | (ASCCI Code 124) and at the end of the line or enclosure character has CRLF.Then, when I configure the step with the parameters below in the image and I try to obtain the PDI fields, it returns the following error;

View Attached # 1

View Attached # 2 for visualize my csv step config

I'm going crazy with this issue.

Thanks


#Kettle
#PentahoDataIntegrationPDI
#Pentaho
Ana Gonzalez's profile image
Ana Gonzalez

Uncheck lazy conversion, unless to solve very specific problematics, it only adds complexity to the data.

Have you tried if there's something wrong with the second line or the content of a specific column? Try importing the minimal CSV and adding columns/content until you find the problematic column/cell.

Regards

Luis Suarez's profile image
Luis Suarez

Hello And, thanks for your reply. I'm going to try it and I let you know how it was.

Luis Suarez's profile image
Luis Suarez

Hello Ana, I unchecked lazy conversion but get the same error.

"

98765653321|2018-09-06|09:08:22|854521454|5|ENTREGADO|26||PEDRO PEREZ||2517193|LUIS GOMEZ|1|ENTREGADO||||||MARCELO T|JAVIER ROMERO 802 |2018-09-28|12:00 A 15:00|98765653321|854521454|104971725|CCH|REAGENDAMIENTO|LONGITUDINAL -  PARQUE PATRICIO

VILLA SAN IGNACIO.

LLAMAR ANTES

"

Up you can see the row wich it is getting the error.

Look, that is one of the lines that is giving error, note that there is a field that has line breaks (LF) but in the configuration of the step I am specifying that the end of each line is specified by Carriage return (CR ) and line jump (LF)

Enclosure: $[0D,0A]

but it does not seem to me that it is not taking it.

What you think about it ?

Luis Suarez's profile image
Luis Suarez

Hi, I think I already have an idea of the problem, and that is that I am using the enclosure parameter incorrectly, now I correct it and place it. "The problem is that there are text fields in the CSV file that have line breaks. Is there any way I can specify how each line ends when I get CRLF? I get the impression that the step understands that each end of the line ends only with LF, so as this text field has line breaks the step understands that it is the end of the line.

Thanks

Ana Gonzalez's profile image
Ana Gonzalez

Have you tried using the Text file input step instead of the CSV step? It has more configuration options, and one of them, in the Content tab, is Format, where you can define a Mixed format.

Regards

Ricardo Miguel Díaz Razo's profile image
Ricardo Miguel Díaz Razo

Yes,

Don't use CSV File Input, try to use FILE TEXT INPUT

Alain Debecker's profile image
Alain Debecker

The enclosure  $[0D,0A] (= LF CR) seams weird to me.

The enclosure is to put thing in between quotes. They usually are " or '.  Basically you are telling the PDI to look the whole line as being one single field.

If after having changed the enclosure, the issue persists because of CR in the line separator, then use the Text Input File step and set the format (of the Content tab) to mixed.

Data Conversion's profile image
Data Conversion
Attachment  View in library
attach_2.JPG 67 KB
Data Conversion's profile image
Data Conversion
Attachment  View in library
attach_1.JPG 133 KB