Hi,
I am migrating tasks from Windows to Linux (carte)
I have encountered a slowness problem while unzipping ZIP files on a mounted Samba partition.Pentaho performs far worse than if it were done in console mode.
This is the log of a zip file that has 2 files inside:
2022/07/05 09:27:08 - S_Unzip_File - Starting job entry
2022/07/05 09:27:08 - S_Unzip_File - Target folder [/mnt/driveN/Ficheros/504] exists
2022/07/05 09:27:08 - S_Unzip_File - The Zip file [/mnt/driveN/FicherosFtp/EMPRESA/504] exists
2022/07/05 09:27:08 - S_Unzip_File - Processing file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
2022/07/05 09:27:08 - S_Unzip_File - Processing zipped entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/CABECERA.json] from file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
2022/07/05 09:27:08 - S_Unzip_File - We can find a file called [/mnt/driveN/Ficheros/504//CABECERA_20220705_092708151.json]. It will be extracted
2022/07/05 09:27:08 - S_Unzip_File - Extracting entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/CABECERA.json] to [/mnt/driveN/Ficheros/504//CABECERA_20220705_092708151.json]
2022/07/05 09:27:43 - S_Unzip_File - Processing zipped entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/DETALLE.json] from file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
2022/07/05 09:27:43 - S_Unzip_File - We can find a file called [/mnt/driveN/Ficheros/504//DETALLE_092743526_092743526.json]. It will be extracted
2022/07/05 09:27:43 - S_Unzip_File - Extracting entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/DETALLE.json] to [/mnt/driveN/Ficheros/504//DETALLE_092743526_092743526.json]
2022/07/05 09:28:18 - S_Unzip_File - File [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] was moved to [/mnt/driveN/Ficheros/504]2022/07/05 09:28:18 - S_Unzip_File - =======================================
2022/07/05 09:28:18 - S_Unzip_File - Nr errors : 0
2022/07/05 09:28:18 - S_Unzip_File - Nr unzipped files : 1
2022/07/05 09:28:18 - S_Unzip_File - =======================================
If I simulate the same thing within the Carte container the performance is much better
pentaho@75e631ccf11c:/mnt/driveN/FicherosFtp/EMPRESA/504/juan$ unzip -l 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
Archive: 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
Length Date Time Name
--------- ---------- ----- ----
317560 2022-07-05 09:17 CABECERA.json
124047 2022-07-05 09:17 DETALLE.json
--------- -------
441607 2 files
pentaho@75e631ccf11c:/mnt/driveN/FicherosFtp/EMPRESA/504/juan$ time unzip 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip -d /mnt/driveN/Ficheros/504/juan/
Archive: 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
inflating: /mnt/driveN/Ficheros/504/juan/CABECERA.json
inflating: /mnt/driveN/Ficheros/504/juan/DETALLE.json
real 0m0.040s
user 0m0.005s
sys 0m0.003s
In console, it takes 0.04 seconds while Pentaho takes (09:28:18 - 09:27:08) = 70 seconds
So basically for this case Pentaho performs 1750 times worse.
For cases where there are more files within the zip is even worse.
Any idea of what may be going on?
Thanks a lot
------------------------------
Juan Sierra Pons
Systems Engineer
Juan Sierra Pons
------------------------------