Pentaho

 View Only

 User Defined Java Class yields duplicate values in table output step

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Daniel Clark's profile image
Daniel Clark posted 06-26-2019 18:33

I've created a udjc (below) to parse a nested json returned from a web service call. While previewing the udjc step, I can see the values from the step parsed into variables, which are associated with output fields. I have a hop created to a table output step, where the variable values are inserted into a table. The problem is many of the variable values do not appear in the table, and many of the variable values are repeated 40-50 times. What am I missing?

import org.json.simple.JSONArray;

import org.json.simple.JSONObject;import org.json.simple.JSONValue;

String content_field;

String courseid_output_field;String coursecode_output_field;String coursename_output_field;String modid_output_field;String modname_output_field;String sessionid_output_field;String sessionname_output_field;String start_date_output_field;String end_date_output_field;String active_output_field;String courseId;String courseCode;String courseName;Boolean active;String modId;String modName;String sessionId;String sessionName;String startDate;String endDate;// String input_session_end_field;// String output_session_end_field;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {

if (first) { content_field = getParameter("CONTENT"); courseid_output_field = getParameter("COURSEID"); coursecode_output_field = getParameter("COURSECODE"); coursename_output_field = getParameter("COURSENAME"); modid_output_field = getParameter("MODID"); modname_output_field = getParameter("MODNAME"); sessionid_output_field = getParameter("SESSIONID"); sessionname_output_field = getParameter("SESSIONNAME"); start_date_output_field = getParameter("STARTDATE"); end_date_output_field = getParameter("ENDDATE"); active_output_field = getParameter("ACTIVE");

// output_session_start_field = getParameter("OUTPUT_SESSION_START"); // input_session_end_field = getParameter("INPUT_SESSION_END"); // output_session_end_field = getParameter("OUTPUT_SESSION_END"); // logDebug("dateField is set to: " +dateField); // completedDateField = getParameter("COMPLETED_DATE_FIELD"); // logDebug("completedDateField is set to: " +completedDateField); first = false; } // First, get a row from the default input hop // Object[] r = getRow();

if (r == null) {

setOutputDone(); return false; }

// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large

// enough to handle any new fields you are creating in this step. // Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());

// JSONParser jsonParser = new JSONParser();

JSONObject jsonObject = null;

String contentStr = get(Fields.In, content_field).getString(r);

JSONArray array = (JSONArray)JSONValue.parse(contentStr); for (int i=0; i < array.size(); i++) { jsonObject = (JSONObject) array.get(i);

courseId = (String) jsonObject.get("Id");

courseCode = (String) jsonObject.get("Code"); courseName = (String) jsonObject.get("Name"); active = (Boolean) jsonObject.get("Active"); // Set the value in the output field get(Fields.Out, courseid_output_field).setValue(outputRow, courseId); get(Fields.Out, coursecode_output_field).setValue(outputRow, courseCode); get(Fields.Out, coursename_output_field).setValue(outputRow, courseName); get(Fields.Out, active_output_field).setValue(outputRow, active); JSONArray modules = (JSONArray) jsonObject.get("Modules"); for (int j=0; j < modules.size(); j++) { JSONObject modObject = (JSONObject) modules.get(j); modId = (String) modObject.get("Id"); modName = (String) modObject.get("Name"); // Set the value in the output field get(Fields.Out, modid_output_field).setValue(outputRow, modId); get(Fields.Out, modname_output_field).setValue(outputRow, modName);

JSONArray sessions = (JSONArray) modObject.get("Sessions");

for (int k=0; k < sessions.size(); k++) { JSONObject sessionObject = (JSONObject) sessions.get(k); sessionId = (String) sessionObject.get("Id"); sessionName = (String) sessionObject.get("Name"); startDate = (String) sessionObject.get("StartDate"); endDate = (String) sessionObject.get("EndDate");

// Set the value in the output field

get(Fields.Out, sessionid_output_field).setValue(outputRow, sessionId); get(Fields.Out, sessionname_output_field).setValue(outputRow, sessionName); get(Fields.Out, start_date_output_field).setValue(outputRow, startDate); get(Fields.Out, end_date_output_field).setValue(outputRow, endDate);

}

} // Send the row on to the next step. putRow(data.outputRowMeta, outputRow); }

return true;

}


#Kettle
#PentahoDataIntegrationPDI
#Pentaho
Data Conversion's profile image
Data Conversion
Manuel Destremon's profile image
Manuel Destremon

Hi,

 

A bit late to the party but still, adding my 2 cents since someone might be interested as well.

Maybe try to include the createOutputRow and putRow functions inside the loop, like this :

 

for (int i=0; i < array.size(); i++) { Object[] outputRowCopy = createOutputRow(r, data.outputRowMeta.size()); jsonObject = (JSONObject) array.get(i);  courseId = (String) jsonObject.get("Id"); ... get(Fields.Out, "courseid_output_field").setValue(outputRow, courseId); ... putRow(data.outputRowMeta, outputRowCopy); }

Oh and I don't know if that really matters but I also double-quoted the 2nd parameter of the get function.

Hope this helps, at least it worked for me.

 

Cheers