Kenice,
We use both separation of tasks into 2 transformations and the Block this step until steps finish methods (depending upon how things fit together most logically). That is, sometimes the job running a transformation is already complicated enough without adding another transformation that would do the update and it just makes sense to put in the "wait for other things to finish" (Block this step until steps finish) step before continuing with subsequent steps.
One thing I notice is missing from the above replies is a really critical item to keep in mind:
All steps in a single transformation proceed in parallel.
The only exception to this, from a practical point of view, is that some steps need input data to do anything; in which case, they wait for data. What this means in your example is that the step to read the data (ERP Input) starts by invoking the query against its Connection, and as soon as there is a batch of rows, they're passed to the output step (ODS Update) which operates on them as they come in. You can control the size of the batch used between steps in the Transformation settings and an Update Table step has its own batch size (that would typically control how often a COMMIT occurs or a batch size if using batch updates). But, if you watch the SQL operations feeding into the ODS DB from your PDI step, you'd expect to see a whole series of UPDATE statements happening as soon as there is any data to operate on. A step that does not have a hop leading to it will start as soon as the transformation starts (that is, before any data is available for the ODS Update step, in your example). But if you put a hop to it, then the question becomes, does it need any input data to run or not? If not, it'll start right along with the first step in the hop-chain.
So, if you need steps to occur in a particular order, there are two things that will not do what you have in mind:
- Simply put the steps into the transformation with hops indicating the order.That doesn't work because the steps start running in parallel (and just wait for data to work on--if they need input data);
- Put the step in without a hop.That means that step starts as soon as the others do (with no waiting for data to come through from a prior step).
The options listed above will work:
- Put the step you want to run after the others into its own transformation and then create a job that has the two transformations in it (the hops in a job do indicate execution order: the first transformation finishes before the next one starts).
- Add a Block this step until steps finish step before the step you want to delay in your transformationAnd configure it to wait for the appropriate step to finish (in this case ODS Output).
There's also other ways to accomplish things that may be simpler and/or more efficient, depending upon what your needs are. For example, suppose you need to set a value in the ODS DB row based on a foreign key to another table in the ODS DB (that is, you cannot read the value in question from the ERP DB itself, but you can read its foreign key). In that case, rather than doing the update as a separate step after the ODS Update step, you might add the lookup step before the ODS Output step so that you have three steps joined by hops in your transformation:
- ERP Input
- LookupUses the ODS DB connection to look up one or more values in another ODS table using a key from the ERP Input query
- ODS Update
We use this pattern frequently in our PDI transformations and it's a very useful one to have in your personal toolkit. This is particularly useful when the ODS Update touches a tiny subset of the rows in the target table and setting the extra value requires examining each row in the target table if it's done at the end with an UPDATE statement that joins in one or more tables. Also, if the lookup table is of a modest size or the input data has many rows with the same foreign key, you can configure the Lookup step to cache the values so it doesn't hit the DB for each entry.
Another pattern we use occasionally is to have a prior step feed one or more values into an Execute SQL Script which then puts those values into the SQL script by Variable substitution: The SQL script has ? placeholders and the step's configuration lists the values from the prior step's output to feed into the SQL's placeholders as Parameters.
Perhaps one or the other of these latter patterns would be more useful in your particular case. If not, as you get deeper into PDI work, you're likely to run into cases that are more effectively dealt with using one of these latter patterns.
John