AnsweredAssumed Answered

What Objects/class/methods/constructors are needed ?   

Question asked by David Marlow on Oct 20, 2018

I'm having a hard time getting a small java program integrated with Spoon. My not understanding much how to use the user Defined java class I think is the biggest problem. This Code works as java:

package getlinks;

import java.io.IOException;
import java.io.PrintWriter;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class testme {
    
    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
                
        java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); /* comment out to turn off annoying htmlunit warnings */
            
        WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        
        HtmlPage currentPage = webClient.getPage("http://google.com");
        webClient.waitForBackgroundJavaScript(10 * 1000); /* will wait JavaScript to execute up to 30s */
        // the whole page
        String Source = currentPage.asXml(); 
        
        PrintWriter writer1 = new PrintWriter("G:\\Temp\\HtmlOutSource1.html", "UTF-8");
        writer1.print(Source);
        webClient.close();
        writer1.close();
    }

}

 

What I am trying to do is get the ip from a data base, download the pages and pass the page to the next step as a string field. The problem I'm having is I can't run htmlunit in the process row method and I can't figure out how to call the class/method to pass it the ip address download the page and return the downloaded page as a field processRow will take.

I know it should be a simple thing but I'm missing something basic here and keep going around and around. My test is 3 steps; generate rows with 1 field LLDB = http://google.com X 1 row, User Defined Java class and out to Dummy. This is the code  where I'm stumped at the moment.

//********** NEW STUFF **************
import java.uti.*;
import java.util.List;
import java.util.Arrays;
import javax.swing.*;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

//*** Globals?***
public GetPage getPage;
String LLDBField;
String SourceField;
String source;

// ******************** this all works ********************
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
    Object[] r = getRow();
    if (r == null) {
        setOutputDone();
        return false;
    }
// Let's look up parameters only once for performance reason.
    if (first) {
        LLDBField = getParameter("LLDB_FIELD");
        first=false;
    }
 Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());

    String LLDBField = get(Fields.In, "LLDB").getString(r);
    LLDBField = "\""+LLDBField+"\""; 
    
 //How do I call and pass ip from here and return the page ??
 //and set the output fileld 
 putRow(data.outputRowMeta, outputRow);
    return true;
}

 public class GetPage  {
 public int source;
  public  String getPage(String LLDBField)  throws java.io.IOException {
    java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
    WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    HtmlPage currentPage = (HtmlPage) webClient.getPage(LLDBField);
    webClient.waitForBackgroundJavaScript(10 * 1000);



    String source = currentPage.asXml();
  String source = currentPage.asXml();
return source;
}

 

I just don't know what objects I need to create and where to make this work.  When I do get something back to processRow I keep getting not a rvalue error.

 

any help or suggestions are appreciated.

Outcomes