Click here for SW Consulting sponsor message
Anatomy of an IMAGE Database, Part 2

Get into the details (and the masters) or IMAGE
to better understand your HP 3000's database


Second of two parts

By Patrick Mullen

Detail Dataset Capacity

As we’ve discovered in Part 1, IMAGE entries are stored in blocks called MPE records in Privileged Files.

Our example dataset DB10’s capacity can be determined via the LISTF,2 command. Since this is a detail dataset, the capacity is equal to the (eof * blocking factor). The eof is the number of blocks in the dataset, and the blocking factor is the number of IMAGE entries per block. In this case: (65 * 11) = 715. The capacity of the detail dataset, INV-DTL aka DB10, is 715. In regular (non-DDX) detail datasets, the eof and limit will be equal.

IMAGE has a feature for detail datasets called DDX (dynamic detail expansion). In a set configured for DDX you will notice that the eof does not have the same value as the limit. The limit will be higher than the eof, due to the configurable Max Cap parameter. For a complete explanation of DDX please refer to Fred White’s article, available at www.adager.com/TechnicalPapers.html.

For the purpose of this paper, the eof * blocking factor = capacity, and the limit * the blocking factor = the max capacity.

Master datasets are a little different, as we’ll see below.

The dataset’s gross dimensional information (block length, blocking factor, capacity, max capacity, increment, etc.) is all contained in the root file.


Detail User Label

Now that we know the physical layout of the detail dataset and its relation to the root file, one last piece of identifying information is missing to make our anatomy lesson of detail datasets complete. As I mentioned above, the capacity is contained in the root file, so we know how many entries a detail dataset can contain.

We’ll presume that the database is healthy and passes Adager’s consistency check; therefore, the capacity contained in the root file is correct with regards to the number of physical records the detail dataset can hold. But how do we know if there are any records in the dataset?

IMAGE uses one user label (128-word MPE datastructure associated with every dataset in the database). The listf,-3 has a heading called NUM LABELS, and, for any IMAGE dataset that number will always be one.

IMAGE uses the user label to store three pieces of information that aid in determining how many records are in the dataset and where new entries should be added: The highwater mark, the number of free entries, and the location of the first free entry.

The highwater mark records and retains the highest record number that has ever been entered in the dataset. In cases where records have never been deleted or an Adager “repack” has just been performed, the highwater mark equals the number of entries.

If the highwater mark equals the number of entries, then the address of the first free entry will be zero and the number of free entries (both are variables contained in the label) will be the difference between the capacity and the highwater mark.

In such a scenario, the next entry added by IMAGE will be added above the highwater mark, the highwater mark will be incremented by one, and the number of free entries will be decremented by one.

If in this scenario a delete occurs next, the highwater mark will remain the same, the number of free entries will be incremented by one, and the first free entry will contain the address of the deleted entry.

This is the method by which IMAGE recycles the free space below the highwater mark. If a second record is deleted, its record number will be entered in the user label as the new first free entry, and its backward pointer will contain the address of the old first free entry, forming what’s called the delete chain. In this manner, IMAGE only needs to keep track of the latest deleted entry in the User Label. The next addition of an entry in this set will be added at the record number of the first free entry. If this entry has a backward pointer, then the value of the backward pointer becomes the new first free entry.

Recycling space via this methodology is extremely efficient with regards to space management. However, the resultant order of entries in the detail becomes random. Such randomized locales of detail records with like search field values compromises retrieval performance. Robelle’s Howmessy identifies poor performance, and Adager can regain the performance through the use of Repack. See my paper on the subject (in PDF format) for more info on this topic.

The IMAGE lab has provided a means to avoid the randomized placement of detail records. In a database that is enabled for highwatermark put, IMAGE adds new records above the highwater mark, even in the presence of a delete chain.

Highwater mark PUT is enabled with DBUTIL at the database level rather than at the dataset level. The ability to configure highwater mark PUT put at the set level would be a nice enhancement to IMAGE.


Master Datasets

Master and detail datasets are structurally similar. Both exist as MPE files with 401 file codes made up of MPE records containing IMAGE blocks. There are two types of masters: manuals and autos. Autos are confined to a single field, the search field, while manuals can contain the search field plus as many as 254 additional fields.

Both masters and details use bit maps to denote the existence of a particular record in a block and one 128-word user label per file. Each block in a master begins with the bit map. Each record in a master block begins with the synonym chain.

The synonym chain is unique to master datasets. Since the location of entries in master datasets is based upon the value of the search field, it is possible that two or more completely different search field values can be located at a particular master address. The synonym chain provides a link for any entry which must be relocated, since two entries cannot reside at the same address.

The relocation process serially reads the master for the next free entry and puts the new entry there. The relocated entry is called a secondary, and the entry that forced the secondary is called a primary. Synonym chain pointers in the primary contain the number of entries that locate to that address (count) and the location of the first and last secondary (backward and forward pointers).

In the secondary, the count is zero, and the backward and forward pointers point to any other members of the synonym chain, very much like detail record pointers. The synonym chain is made up of five words, one word for the count and two words each for the backward and forward pointers.

In master datasets with paths to detail datasets, each record contains a chainhead for each path. The chainhead for path one is the next datastructure after the synonym chain data in each master record. Paths two, three, etc follow accordingly. The chainheads (six words per path) contain a two-word count, a two-word backward pointer and a two-word forward pointer.

After the bit map, synonym chain, and path information comes the entry length - the data portion of the record.


Master Capacity

The capacity of a master will not always equal the eof * blocking factor as in a detail dataset. The capacity will always be somewhere in the last block of the dataset, but not necessarily at the end of the last block.

If you have ever changed the capacity of a master dataset, you have probably had to choose a prime number for a capacity. The address of the prime number capacity will be somewhere between the first and last record of the last block. The record numbers between the capacity and the last record will be flagged in the bit map as “on” (in use) to prevent records from ever being written there.

The very latest enhancement to master datasets in IMAGE is the ability to expand their capacities dynamically. HP calls this MDX, and it first became available in the PowerPatch 5 release for MPE/iX 5.5. In a future article the Adager lab will outline this new and powerful feature.

Master User label

The highwater mark in the user label in master datasets will equal the capacity. The number of free entries will be the number of available entries in the dataset, and the first free entry will not be utilized in masters. The first free entry will never be known due to the methods IMAGE uses to place master entries.

A master example

Figure 1 contains the report set output for DB09, an automatic master dataset that has three paths:


Database DB.PUB.TEST THU, AUG 20, 1998, 7:12 AM


Set 9: INV-A, Automatic
83.80 % full: 1257 entries 243 available

Capacity: 1500
Block length (words): 1128
Media-record length (words): 25
Entry length (words): 2
Blocking factor: 45
Fields per entry: 1
Number of paths: 3
Search-field hash type: 0
Search field is field number 1

Field 1st. Total
number word words

1 1 2 INVOICE# I2 (search)

Entry length: 2


The entry length is two words and the media entry length contains 25 words. The media entry length is made up of the synonym chain (five words) and three paths (six words each, 18 words) for a total of 23 words plus the two-word entry length — 25 words.

The blocking factor is 45, which means that three bit map words are necessary per block. The block length is 1128 words:


block length = bit map words + (blocking factor * media entry length) 1128 = 3 + (45 * 25)


Since this block must reside in an MPE record that is a multiple of 128 words, 1152 words is the next available MPE record size large enough to accommodate this IMAGE block. Figure 2 displays the listf,2:

:listf DB09,2
ACCOUNT= TEST GROUP= PUB

FILENAME CODE ------------LOGICAL RECORD----------- ----SPACE----
SIZE TYP EOF LIMIT R/B SECTORS #X MX

CUST09 PRIV 1152W FB 34 34 1 320 2 *


IMAGE Enhancement

In writing this article, I am reminded of two separate instances in which a user lost the database’s root file. In the first instance, there were thirty datasets in the database, which contained a fair number of ASCII fields. After about 80 hours of focused diagnostics and repair, we were able to recover the database completely.

In the second instance, the database contained over one hundred datasets, and the data comprised mostly non-ASCII fields. After two hours of analysis, we determined that this database was unrecoverable.

Since then I have spoken to a different user who generates a schema of his database weekly. The schema is backed up on tape along with the database every night. Perhaps his example will prompt other users to incorporate such a policy and protect themselves against the catastrophe of losing a root file.

Creating a root file from nothing is a difficult matter. One of my concerns is that the root file is the sole container for the blueprint of the database. Could HP incorporate additional user labels per dataset in order to keep “self-describing ” (SD) data about the layout of each dataset? The self-describing data could be crucial in the unfortunate event of a missing root file.

A recovery method via Adager, for example, could rebuild the root based upon the self-describing data contained in the additional user labels for each set.

My discussions with HP programmers have yielded positive feedback and, in addition, we have discovered some additional benefits IMAGE would gain by having the self-describing data contained in each dataset. The idea might even bolster the incredible resilience that is IMAGE.


Patrick Mullen is a member of the Adager support and development team.


Copyright 1998, The 3000 NewsWire. All rights reserved.