Pentaho

 View Only

Pentaho performs very slow when unzipping files in a mounted samba share

This thread has been viewed 3 times
  • 1.  Pentaho performs very slow when unzipping files in a mounted samba share

    Posted 07-05-2022 14:03
    Hi,
    I am migrating tasks from Windows to Linux (carte)

    I have encountered a slowness problem while unzipping ZIP files on a mounted Samba partition.Pentaho performs far worse than if it were done in console mode.

    This is the log of a zip file that has 2 files inside:
    2022/07/05 09:27:08 - S_Unzip_File - Starting job entry
    2022/07/05 09:27:08 - S_Unzip_File - Target folder [/mnt/driveN/Ficheros/504] exists
    2022/07/05 09:27:08 - S_Unzip_File - The Zip file [/mnt/driveN/FicherosFtp/EMPRESA/504] exists
    2022/07/05 09:27:08 - S_Unzip_File - Processing file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
    2022/07/05 09:27:08 - S_Unzip_File - Processing zipped entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/CABECERA.json] from file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
    2022/07/05 09:27:08 - S_Unzip_File - We can find a file called [/mnt/driveN/Ficheros/504//CABECERA_20220705_092708151.json]. It will be extracted
    2022/07/05 09:27:08 - S_Unzip_File - Extracting entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/CABECERA.json] to [/mnt/driveN/Ficheros/504//CABECERA_20220705_092708151.json]
    2022/07/05 09:27:43 - S_Unzip_File - Processing zipped entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/DETALLE.json] from file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
    2022/07/05 09:27:43 - S_Unzip_File - We can find a file called [/mnt/driveN/Ficheros/504//DETALLE_092743526_092743526.json]. It will be extracted
    2022/07/05 09:27:43 - S_Unzip_File - Extracting entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/DETALLE.json] to [/mnt/driveN/Ficheros/504//DETALLE_092743526_092743526.json]
    2022/07/05 09:28:18 - S_Unzip_File - File [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] was moved to [/mnt/driveN/Ficheros/504]2022/07/05 09:28:18 - S_Unzip_File - =======================================
    2022/07/05 09:28:18 - S_Unzip_File - Nr errors : 0
    2022/07/05 09:28:18 - S_Unzip_File - Nr unzipped files : 1
    2022/07/05 09:28:18 - S_Unzip_File - =======================================

    If I simulate the same thing within the Carte container the performance is much better

    pentaho@75e631ccf11c:/mnt/driveN/FicherosFtp/EMPRESA/504/juan$ unzip -l 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
    Archive: 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
      Length Date Time Name
    --------- ---------- ----- ----
       317560 2022-07-05 09:17 CABECERA.json
       124047 2022-07-05 09:17 DETALLE.json
    --------- -------
       441607 2 files
    pentaho@75e631ccf11c:/mnt/driveN/FicherosFtp/EMPRESA/504/juan$ time unzip 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip -d /mnt/driveN/Ficheros/504/juan/
    Archive: 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
      inflating: /mnt/driveN/Ficheros/504/juan/CABECERA.json
      inflating: /mnt/driveN/Ficheros/504/juan/DETALLE.json

    real 0m0.040s
    user 0m0.005s
    sys 0m0.003s

    In console, it takes 0.04 seconds while Pentaho takes (09:28:18 - 09:27:08) = 70 seconds
    So basically for this case Pentaho performs 1750 times worse.

    For cases where there are more files within the zip is even worse.

    Any idea of what may be going on?

    Thanks a lot


    ------------------------------
    Juan Sierra Pons
    Systems Engineer
    Juan Sierra Pons
    ------------------------------