Pentaho

 View Only

 Save image file returned as octet-stream to disk

Ezra Wise's profile image
Ezra Wise posted 04-01-2024 13:19

Hello All! 

I am trying to retrieve an image file (likely *jpg, but not necessarily) using a RESTClient step and save it locally to disk.  I was able to accomplish this using the code below (courtesy of Naimish: Extraction of BLOB Content using Pentaho Kettle CE ), however the local file is corrupted, likely due to the fact that it isn't a blob, but rather an octet-stream, that is being returned.  I'm trying to convert the octet-stream to a blob in the User-Defined Java Class step so that the file is re-created correctly.  My JAVA skill is practically non-existent, so any help with this would be greatly appreciated!!

Here is the original code that successfully creates the file, although corrupted:

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.BufferedOutputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
String filename = get(Fields.In, "imageFile").getString(r);
File f = new File(filename);
FileOutputStream fos = null;
BufferedOutputStream bos = null;
try{
if (!f.exists()){ f.createNewFile(); }
fos = new FileOutputStream(f);
bos = new BufferedOutputStream(fos);

//create the file:
byte[] blobBytes = (byte[])get(Fields.In, "attachment").getBinary(r);
bos.write( blobBytes );
bos.flush();
bos.close();
}
catch(IOException e){}
return true;
}

where "attachment" is the stream field containing the raw octet-stream data.

Most of the information I find online pertains to javascript and not pure JAVA.  I'm assuming that I'll need to construct an array out of the octet-stream and then do some type of byte-wise manipulation, but I'm a bit out of my element here.  Has anyone done this before?  Seems like this might be a frequent need.

BTW, the point of all of this is that I'm grabbing image files remotely via API (hence the RESTClient step) and storing locally temporarily while I run a report using PRD.  The image references are included in the report stream in a image-field object.

Thanks in advance!

--Ezra