13 April 2011

Archiving BAM Data

BAM is intended to provide information about the status of active business processes; it’s not recommended to keep historical data in the BAMPrimaryImport database. Instead, old data should be moved out of BAMPrimaryImport and into the BAMArchive database. Each time the bm.exe is use to deploy a new BAM activity, a SQL Server Integration Services (SSIS) package called BM_DM_<Activity> is created.

The SSIS package have two functions:
  • It partitions the data in BAMPrimaryImport.
  • It moves old data from BAMPrimaryImport into the BAMArchive database.
Data is partitioned by creating new tables in BAMPrimaryImport to contain activity data. When the archive package runs, it partitions the activity tables by creating additional tables called bam_<activity>_<GUID> and bam_<activity>_<GUID>_ Relationships. Partitioning data into separate tables is a performance improvement, especially for real-time aggregations. It also simplifies the archival process. When the package runs, it looks for partitions that are available to archive, and processes the entire partition.

Caution If you write code against the BAM Primary Import database, it’s essential to use the views, not the underlying tables. If you write code directly against the underlying tables, BAM data may disappear when the job runs and partitions the table.

A partition is available to archive if all scheduled views that depend on the partition have been processed, and the partition is older than the online window for the activity. By default, the online window for an activity is six months. The online window can be changed by modifying the values of the OnlineWindowTimeUnit and OnlineWindowTimeLength columns in the bam_Metadata_Activities table in BAMPrimaryImport.

It’s important to bear in mind that the SSIS package only archives partitions older than the online window. As an example, imagine an activity where the archive package has never been run. There is a year’s worth of data in the BAMPrimaryImport database, but there are no partitions. The online window is the default: six months. Now consider what happens if you schedule the archive package to run each day. The first day the package runs and creates a partition table in BAMPrimaryImport. This partition table contains a year’s worth of data. However, there are no partitions older than six months, so no data is removed from BAMPrimaryImport. The next day, the package runs again, and a second partition table is created. This partition will contain one day’s worth of data. The partition table created the previous day is only 24 hours old, and so it will be left in BAMPrimaryImport. If the online window remains six months, it will be a full 6 months after the job is scheduled before any data is removed from BAMPrimaryImport, but then a full year of data will be removed in one day.

There are two design decisions to make when setting up archiving: the length of the online window and how often the archive job runs. The length of the archive window is driven by business needs for historical data. The archive job should be run on a regular basis. Running this job daily or weekly is a good choice.

No comments:

Post a Comment