Hi,
I have been able to reproduce it on Kettle so it is not something related with Carte nor Docker
Also I have tested samba performance and all is OK. Creating a 500M files on the samba share only takes 4 secs
XXXXX@XXXXXX:/SERVER/driveN/Ficheros/juanTests$ time dd if=/dev/zero of=./test bs=512 count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 4.01275 s, 128 MB/s
real 0m4.018s
user 0m0.722s
sys 0m1.329s
My suspicion is that is should be something related with the VFS
2022/07/05 09:27:08 - S_Unzip_File - Extracting entry [
zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/CABECERA.json] to [/mnt/driveN/Ficheros/504//CABECERA_20220705_092708151.json]
Best regards
------------------------------
Juan Sierra Pons
Systems Engineer
Juan Sierra Pons
------------------------------
Original Message:
Sent: 07-05-2022 08:21
From: Juan Sierra Pons
Subject: Pentaho performs very slow when unzipping files in a mounted samba share
Hi,
I am migrating tasks from Windows to Linux (carte).
I have encountered a slowness problem while unzipping ZIP files on a mounted Samba partition.
Pentaho performs far worse than if it were done in console mode.
This is the log of a zip file that has 2 files inside
2022/07/05 09:27:08 - S_Unzip_File - Starting job entry
2022/07/05 09:27:08 - S_Unzip_File - Target folder [/mnt/driveN/Ficheros/504] exists
2022/07/05 09:27:08 - S_Unzip_File - The Zip file [/mnt/driveN/FicherosFtp/EMPRESA/504] exists
2022/07/05 09:27:08 - S_Unzip_File - Processing file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
2022/07/05 09:27:08 - S_Unzip_File - Processing zipped entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/CABECERA.json] from file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
2022/07/05 09:27:08 - S_Unzip_File - We can find a file called [/mnt/driveN/Ficheros/504//CABECERA_20220705_092708151.json]. It will be extracted
2022/07/05 09:27:08 - S_Unzip_File - Extracting entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/CABECERA.json] to [/mnt/driveN/Ficheros/504//CABECERA_20220705_092708151.json]
2022/07/05 09:27:43 - S_Unzip_File - Processing zipped entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/DETALLE.json] from file [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] ...
2022/07/05 09:27:43 - S_Unzip_File - We can find a file called [/mnt/driveN/Ficheros/504//DETALLE_092743526_092743526.json]. It will be extracted
2022/07/05 09:27:43 - S_Unzip_File - Extracting entry [zip:file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip!/DETALLE.json] to [/mnt/driveN/Ficheros/504//DETALLE_092743526_092743526.json]
2022/07/05 09:28:18 - S_Unzip_File - File [file:///mnt/driveN/FicherosFtp/EMPRESA/504/3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip] was moved to [/mnt/driveN/Ficheros/504]2022/07/05 09:28:18 - S_Unzip_File - =======================================
2022/07/05 09:28:18 - S_Unzip_File - Nr errors : 0
2022/07/05 09:28:18 - S_Unzip_File - Nr unzipped files : 1
2022/07/05 09:28:18 - S_Unzip_File - =======================================
If I simulate the same thing within the Carte container the performance is much better
pentaho@75e631ccf11c:/mnt/driveN/FicherosFtp/EMPRESA/504/juan$ unzip -l 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
Archive: 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
Length Date Time Name
--------- ---------- ----- ----
317560 2022-07-05 09:17 CABECERA.json
124047 2022-07-05 09:17 DETALLE.json
--------- -------
441607 2 files
pentaho@75e631ccf11c:/mnt/driveN/FicherosFtp/EMPRESA/504/juan$ time unzip 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip -d /mnt/driveN/Ficheros/504/juan/
Archive: 3f9353c9-cec5-4568-8b6e-d23708506713_2022057091756.zip
inflating: /mnt/driveN/Ficheros/504/juan/CABECERA.json
inflating: /mnt/driveN/Ficheros/504/juan/DETALLE.json
real 0m0.040s
user 0m0.005s
sys 0m0.003s
In console, it takes 0.04 seconds while pentaho takes (09:28:18 - 09:27:08) = 70 seconds
So basically for this case Pentaho performs 1750 times worse
For cases where there are more files within the zip is even worse.
Any idea of what may be going on?
Thanks a lot for your time
------------------------------
Juan Sierra Pons
Systems Engineer
Juan Sierra Pons
------------------------------