Lookup, Join, Merge – When to use

January 20, 2012

There are three different stages in DataStage that can be used to combine two different data sets. They are lookups, joins and merges. Each one has it’s benefits and drawbacks. Below is a synopsis on when to use each stage.

Lookup

Does not need to be partitioned or sorted
Only one primary input but can have many reference tables
All done from memory, thus faster
Can have one reject link
Use lookups with small amounts of data ( >5 million)

Merge

Must be pre-hashed and sorted on merge keys
Only one master but can have many updates
Can have multiple reject links
Use merges with large amounts of data

Join

Must be pre-hashed and sorted on merge keys
Must have two inputs
Cannot have reject links
Use joins with large amounts of data

As you can see, each stage has it’s advantages. Use Lookups when you are dealing with small amounts of data. The data is cached, so it is stored in memory, which makes it faster. However, if you are combining large amounts of data, you will see a decline in performance. When this happens, switch out the Lookup for a join or merge. Both Join and Merge pretty much do the same thing, except merges can have a reject link. So if you do no need a reject link, go with a Join.

When you know when to use each stage, it will help your job performance. When in doubt, just run the same job twice. Once with a lookup and once with a join/merge and see which stage is better.

From → Stages

Lookup, Join, Merge – When to use

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

Lookup, Join, Merge – When to use

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta