Pentaho

 View Only

 file api response is zip

Peter Boogert's profile image
Peter Boogert posted 04-04-2023 05:58

Goodmorning all,

I have build a transformation in pentaho 9.2.
First i started in postman where i am able to communicate with the API and download a zip file from the fileapi.
I am able to retrieve the token. Get a list off files from the api and use this to download the actual files. But i have to download this in parts as the filesize is too big.
I am rebuilding this in pentaho.

This is an example of the response i get in postman / pentaho

How can i get this output in a zipfile using pentaho
In pentaho i am using text file output.
I changed the extension to zip and put include stepnr in filename.  This way all parts get added into 1 big file

But i am unable to read this file with winzip.
Even if i save it as .txt and change the extension manually to .zip i am unable to open it.
How can i get a zip from a fileapi using pentaho?
John Craig's profile image
John Craig

The only really simple way I know to get a zip file as the output of a transformation is to output the files in whatever their native type is and then use the Utility > Zip file step to create a zip file. But that will save the zipped data to a file, not send the output to an HTTP client (which it sounds like you're simulating using Postman).

I believe you'll find that the Text File output step you're using cannot change the type of the file by simply specifying a zip extension. That has no effect on the nature of the content of the file--it'll still be a text file. That is, you can set the extension to be any arbitrary value, but that won't make the output match the type of file traditionally associated with the specified extension: that step produces a text file as output.

An option that I've used when I want to zip up (or unzip) the data stream entering a transformation step is to use a User-defined Java Class and use the java.uitil.zip classes (perhaps ZipOutputStream) to zip up the data stream flowing into the step. If you want the zip file to be sent as a data stream to an API call to invoke the transformation, rather than saved in a known file on the server or client, you'd also need to (I assume) use the java.util.net classes to set up an HTTP output stream of the zipped bytes. I assume you could do the same with the Python Executor step and perhaps with the Modified Javascript value step, but I don't have any experience using either of those. In any case, the setup you've got in the transformation illustrated above, it looks like you're writing out a file to disk, not to the caller (via an API) of the transformation.

Someone else might have a different way to do this, but if the Zip file step won't do what you need (since it works with existing files rather than the data stream from a prior step and writes to a file whose name is specified or constructed), I'm guessing creating your own step in Python or Java (or perhaps JavaScript) is the way to go.

John

Petr Prochazka's profile image
Petr Prochazka

Hi Peter,
if I understand correct. You select files on remote server over API and can download these files to client side.

IMHO problem is that content of file is stored in field as String type and this can damage any bytes in final stream. Better way is create job which download file directly to local file (see job entry HTTP). And in trans use job executor step and iterate over job and download each file.


John Craig's profile image
John Craig

Hi,
Sorry, I completely misunderstood the issue. If you want to retrieve a zip file via an API call and write it out (as noted by Peter Prochazka), you'll need to retrieve the file as an octet-stream. I was able to create a User-defined Java Class step to do this when I wanted to download the prpti files from the Pentaho repository (these files define "Interactive" type reports; they're actually just zip files). I wasn't writing them out, but just processing them inside the transformation, but the idea is pretty much the same.

What you have to do is basically:

Authenticate with the HTTP service (if needed)

Create the HTTPConnection

Request a file via the API call (be sure to request the response as octet stream)

Loop

  read a chunk of data into an array of bytes (use whatever size makes sense--say 100K)
  when nothing is read drop out of loop
  write bytes to the output file
End of loop

With a User-defined Java Class step like this written, you can deal with any number of files: create the API call in the PDI data stream and pass that into your download step. You could also call an external piece of Java code to do most of the work (just put it in a JAR file in design-tools/data-integration/lib [if running as a separate process, such as by means of pan.bat/.sh] or, if you're running your transformation right in Tomcat/Pentaho, in server/pentaho-server/tomat/lib). I'm guessing there must be Java code available on a website somewhere that would connect to a URL and download a file.

This may be more complex  than you want to tackle and I might be able to get permission to post part of my solution, but as it was work done for my company, they'd have to approve my doing that. If you're interested, let me know.

John

Peter Boogert's profile image
Peter Boogert

Thank you Petr and John for sharing your insights.
Yes you both understand me correct.
I am trying to receive a zip file via an API call. The file is too big therefor i have to retrieve it in chunks. I am able to authenticate with the API and split it up and retrieve it in strings.
I will checkup on the HTTP job entry part and the response as octet stream
Will let you know if i get it working