Saturday, June 13, 2015

What is "High Water Mark" in ETLing?

When extracting data from one database to another in continuous fashion, two common techniques are used; delete the destination and reload with extraction or load the destination as an incremental load. First technique is straight forward, but for the second, changes have to be identified, it is commonly done with a datatime column in the source that holds either created date or last modified date.

What is the link between this and "High Water Mark"? The High Water Mark is used for indicating the highest water level for a tide or flood. Same theory can be applied for our scenario, considering datetime column in the source as High Water Mark.


No comments: