Pentaho

 View Only

 List Files in AWS S3 Bucket

  • Pentaho
  • Kettle
  • Pentaho
  • Pentaho Data Integration PDI
Rob Burgess's profile image
Rob Burgess posted 05-18-2018 18:14

What is the best way to list the files in an AWS S3 bucket using a Job or Transformation?


#Kettle
#Pentaho
#PentahoDataIntegrationPDI
Steven Brown's profile image
Steven Brown

Hi Rob,

I would use the AWS CLI to list, output to a text file and use that as an input text file source. Something like:

aws s3 ls s3://{bucket-name}/{path} >result.txt

of if you wanted to get the entire contents:

aws s3 ls s3://{bucket-name} --recursive >result.txt

And then you could use the regular expression /(.*)\/ to parse the path.

HTH,

Steven

Kiran Rajendran's profile image
Kiran Rajendran

Get File Names with a path in this format:

s3://<AWS Access Key>:<AWS Secret Access Key>@s3/MyBucket/path

pastedimage_0