Hi,
There is no step to parse HTML it self, so, you have to do it by your self and you got some choices!
Any way, you will have to first y "prepare" de html or convert it to a tabular data.
As far as I can see you whant to get the html table to a data stream, in that case a very simple way to do it woul be "manual" copy the "table" object and replace the tags with a delimiter and save it to a "csv" file
As a html table is compoused by somethigs like:
<table><th>.....
<tr><td>VALUE</td><td>VALUE1</td></tr>
You could replace:
- "<TR>" and "<td> for blank
- </td> for separator ";"
- "</TR>" for line feed "\n" (or just blank it depends on your file)
in linux there is a very easy way with "sed", just copy the table to a new file and it could go like this:
sed -i 's/<td>//gI' my.html
sed -i 's/<\/td>/;/gI' my.html
sed -i 's/<\/tr>/\n/gI' my.html
You can also "manual" parse the html on PDI using "replace in string" step