Pentaho

 View Only
  • 1.  Pentaho DI Community Edition slow on purpose?

    Posted 04-01-2022 03:55

    Good morning everyone,

    As I'm gonna be needing to chose an ETL to perform daily tasks I wanted to try Pentaho by using the Community Edition first to see if it would be worth buying the Enterprise Edition
    At the moment the product really satisfy my needs, however, when trying to work with a large volume of data the transformations are very slow, and I was wondering if it comes from the Community Edition being slow on purpose to get you to buy the Enterprise one?

    Here is an example of transformation i'm working on:

    The CSV contains users with informations about them, the purpose is to put those users in a database by executing HTTP Request to a REST API linked to a MYSQL DataBase.
    It works well, with around 1000 rows in my CSV. But as soon as I get above 10000 rows it gets very slow during the "REST client" step (around 20 rows are processed per second then after 20 minutes it goes down to 5 rows per second.
    Processing the whole 10000 rows take 40 minutes which is way too much compared to other solutions that I tried which took less than 2 minutes.

    Any Ideas on how to overcome this problem?
    Is PDI Community Edition slowed on purpose?

    Thanks in advance, and sorry for any mistakes as english isn't my native language.

    Cordially.



    ------------------------------
    Robert Daye
    Systems Engineer
    Esisar
    ------------------------------


  • 2.  RE: Pentaho DI Community Edition slow on purpose?

    Posted 04-03-2022 19:42
    Hi Robert

    I've never noticed a difference in performance between CE and EE.

    Have you tried starting multiple copies of the HTTP step (right click , 'Change number of copies to start')?

    Andrew Cave

    ------------------------------
    Andrew Cave
    Systems Engineer
    BizCubed Pty Ltd
    Australia
    ------------------------------



  • 3.  RE: Pentaho DI Community Edition slow on purpose?

    Posted 04-05-2022 02:49
    Hi Andrew,

    Thanks for your answer, indeed increasing the number of copies of the HTTP step was a good solution to make this transformation faster!

    Robert

    ------------------------------
    Robert Daye
    Systems Engineer
    Esisar
    ------------------------------



  • 4.  RE: Pentaho DI Community Edition slow on purpose?

    Posted 04-04-2022 02:55
    Hi Robert,
      the 10000 is the default limit of the rowset size. Beyond that limit PDI starts caching.

    You may try to see what happens if you increase it.  (with the transformation loaded on Spoon click Edit -> Settings -> Miscellaneous tab -> Nr of rows in rowset option).

    In any case, I've also noticed that the JSON input and JSON output components are way slower than the Text Input / Output. Has anyone else noticed that?

    Antonio Petrella

    ------------------------------
    Antonio Petrella
    Data Service Manager
    UNOG
    ------------------------------



  • 5.  RE: Pentaho DI Community Edition slow on purpose?

    Posted 04-04-2022 06:32
    Hi Robert
    As far as I know, there are no performance differences between Pentaho EE and Pentaho CE.
    The main differences between both versions is that in the CE version you don't have an online help, some features or capabilities are disabled, the Analyzer software is not included in the CE version either, surely there are other differences like maybe the update packages for bug fixes etc.
    Best Regards


    ------------------------------
    Carl Messner
    Data Analyst
    MCSF
    ------------------------------



  • 6.  RE: Pentaho DI Community Edition slow on purpose?

    Posted 04-05-2022 03:17

    Thanks to your advice I was able to make the time of the transformation go from 38minutes to 2minutes!

    Now I have an other question, when trying to do the same with a 100000 rows CSV Pentaho just stop answering (between the two json blocks).

    I've tried increasing the Nr of rows in rowset but nothing changes.

    Any idea? Thanks in advance,

    Robert.



    ------------------------------
    Robert Daye
    Systems Engineer
    Esisar
    ------------------------------



  • 7.  RE: Pentaho DI Community Edition slow on purpose?

    Posted 04-05-2022 05:59
    Hi Robert,
    Maybe the issue is on the Rest client step, perhaps it's working too slow and that produces the block of the json steps.
    Best regards

    ------------------------------
    Carl Messner
    Data Analyst
    MCSF
    ------------------------------



  • 8.  RE: Pentaho DI Community Edition slow on purpose?

    Posted 04-06-2022 03:57
    Hi Robert,

    I suggest that in your case, you try incresing the "Nb of copies to start" (right click on the REST client step).
    otherwise, it is 1 which means all the requests are sequential and must wait for the previous one to be complete before starting the next one.

    I usually set it to 20 copies for instance.. (you have to try) so that you'll have 20 concurrent requests.
    Of course, there are situations where you won't want concurrent requests but I don't have the feeling that it is your case.

    Additionally, you may also want to track the server's response time (the REST CLIENT step can output it).

    Regards,
    Olivier

    ------------------------------
    Olivier Pessin
    Application Services Manager
    KDS
    ------------------------------