PDI itself does not have the kind of SQL capability you are asking about. If you read them in, you can join the streams; but that could mean a lot of data in memory. You are looking for a different tool to carry out the task. For example, you might look at Dremio, or Google BigQuery. They both support connecting to Parquet files and exposing a SQL layer against them and the joins you are talking about across files.