Last of a series
In this series on HP 3000 performance we've examined the essential components of making your HP 3000 run more efficiently. We've talked about getting management support for your efforts to keep your systems performing at their best. We also looked at the relationship between your CPU measurements and overall performance. Last month we examined the HP 3000's memory and disk components. This month we wrap up with a look at database performance.
Several programs for the HP 3000 provide reporting information that can be used to analyze database inefficiencies. HOWMESSY from Robelle (800.561.8311), DBLOADNG from the Interex contributed library and Bradmark's (800.275.2723) DBGENERAL full analysis option all have the ability to give information about database and dataset inefficiencies.
In all three of these solutions you can identify an item called ELONGATION. This is used only in the case of detail datasets. Details can be identified under the type column with either a "D" or a "Det".
Since a detail set can have more then one search field, it is important to identify the key search field. This is the data item that the set is ordered by, and it should be the set that most online activity uses to retrieve data. Therefore, the key item is the one search field we need to check the elongation factor for. This item can be identified out of the many that may be listed for a set by looking for the one with an "!" in front of it. Look for elongation factors greater than 1.5 as ones of initial concern and those above 3.0 as those with greater concern.
The method used to determine the most efficient size for a master set uses two items from any of the reports. This applies only to datasets marked with either a "Man" or an "Ato" on the Howmessy report. One column is the percentage of secondaries. This value tells us what percentage of the total entries in the master set are secondaries. Secondaries result when a record has already been "hashed" to the location calculated for this entry.
The other item of importance is the inefficient pointers percentage. This column tells us what percentage of the secondaries cross into another block of data within the set. The reason we don't rely on only one of these indicators is that a secondary is only bad for performance when it falls into another block and causes another I/O to occur to retrieve the data. The factor we use is secondaries times inefficient pointers divided by 100. When this number is greater than 8.0 we begin to be concerned, and when it is above 13.0 we have great concern.
The disk I/O queue length is important in measuring your efficiency in accessing I/O in the disk area. The disk I/O queue length tells how many disk I/O requests are awaiting service in the disk environment. Obviously you want to make the wait for any I/O, represented by the number of requests awaiting service, as short as possible. But if your disk files are fragmented, the datasets in use are not as efficiently organized, which means several disks are performing more I/O than others. In this environment each request will take more time to service.
A disk I/O queue length of greater than zero will result in pause for disk -- and when looked at from a process level, results in a slower completion of the processes work. You want to take a view of the disk drives which helps identify when certain drives are performing more I/O than others.
Sometimes certain of LDEV's perform more I/O than others. This can be influenced by the way you set up your volume sets. You should be sure that a large number of your disk drives are not taken out of the available servers. In an environment where volume sets are not in use and all of the disk drives are of equal size and speed, you should seek relative equity in I/O. The basic method, when seeking this balance, is to identify those files responsible for the majority of the I/O and try to set them out on the different disks. The file system tracks opens and closes to files, so this can be used to identify files responsible for I/O. You will find that a small number of files account for the majority of your I/O. A system level logging event can be turned on to track file closes.
The Lund Performance Solutions (541.926.3800) programs FILERPT or REDWOOD can be used to create a report once the two needed events, FILE_CLOSE and CM_FILE_CLOSE, are turned on. Once the log events are stored in the system logfiles the report can be created. These events are usually not turned on because they tend to cause the proliferation of system logfiles. A backdoor method of turning these on exists in the form a program called SYSLOG.
There are some issues to watch for when adding new disk drives. New disks need to have data moved to them, some of these files should be responsible for I/O, and files can be moved to specific disks in a number of ways (such as Vesoft's (310.282.0420) MPEX, a specific restore, De-Frag/X or DBUTIL).
A bottleneck is a restriction in resources that occurs when that resource is not large enough to handle the load. Systems usually encounter more than one bottleneck, and that one bottleneck may hide another. Bottlenecks cause a restriction in the resource, which will cause user requests to queue or wait for the resource causing the users demands to be unmet.
Experience is your best guide in looking for bottlenecks. To help you rely upon the experience of others we're offering our Pulse Points chart.
It identifies thresholds for green, yellow and red levels of resource usage. When yellow is registered, you should watch the resource and begin a plan for eventually becoming a red zone problem. Red points call for immediate action. There are several possible responses to a bottleneck: increase the resource, reduce the demand, or manage the resource differently.
HP 3000 PULSE POINTS
|CPU Queue Length||<5||5-15||>15|
|High Priority (%)(Total Busy-DQ and EQ)||<50||50-85||>85||High priority processing only.|
|Low priority batch excluded (it depends).|
|Total Busy (%)||<50||50-85||>85||All processing combined.|
|Process Preemption (%)||<15||15-30||>30||On most active processes.|
|AQ+BQ (%)||<5||5-8||>8||Operating system versus dependent.|
|CPU (%) CM||<10||10-50||>50||Subjective.|
|ICS + Dispatcher (%)||<10||10-15||>15|
|CPU Memory (%)||<4||4-10||>10||Very reliable indicator.|
|Page Fault Rate/Second||<10||10-25||>25||CPU dependent; see table below|
|Library Fault (%) (as % of Page Faults)||<10%||Libraries are XL, NL and SL files.|
|CPU Pause for User I/O (%)||<5||5-15||>15||Reflects data locality; can reflect I/O efficiency.|
|Read Hit (%)||>95||85-95||<85||Measurement of locality.|
|Disk Queue Length||<.5||.5-1.0||>1.0||Overall average.|
|Disk I/O Rate/Second||<10||10-25||>25||Per drive.|
TurboIMAGE: (DBLoading/Howmessy Tools)
|Detail Elongation||<1.5||.5-3.0||>3.0||Most important overall indicator|
|of database performance.|
|Master Secondaries||<8||8-15||>15||Depends on inefficient pointers.|
|Secondaries Inefficient Pointers||<20||20-30||>30|
|Compatibility Mode/Native Mode Switches||Ranges vary by processor model:|
|>150-922 and >1000-980 (Red zone).|
|Native Mode/Compatibility Mode Switches||>75||Ranges vary by processor model.|
In the case of the CPU, you can move batch jobs to evening hours (use of a scheduler may help this work); move on-line users to another time frame (not likely to make them very happy); experiment with changing the way the queuing works on your system (sometimes this will help extend the life of a bottlenecked system); split the load for the system on to another system (Netbase from Quest can execute this solution); or increase the CPU processor capabilities (an upgrade).
In the case of a memory bottleneck, you can increase the amount of memory. Memory is currently very cheap. But before you upgrade your memory you may want to check your disk environment. We have seen that inefficient disk environments can cause the CPU to expend more resources managing memory.
When disk I/O bottlenecks are encountered try some of the following strategies: optimize your databases by re-packing detail sets and resizing master sets (this will help make the Pre-Fetching mechanism more effective at eliminating I/O); move files from one disk to another to help equalize your I/Os between disk drives; reload or defragment your disk drives to move files back into one piece (extent) and make the available free space on disk one large piece; move to faster disk drive technology to help decrease the amount of time to retrieve I/O, or increase the number of servers (disk drives) in the environment.
You can do a lot to identify and optimize your HP 3000. The ideal is to proactively monitor your system to avoid being controlled by the "Tyranny of the Urgent." If you apply Front Line Performance monitoring, your work life will be much improved as you learn to control the data processing environment and its companion -- stress.
Jeff Kubler is a performance expert working for Lund Performance Solutions whose experience with the HP 3000 dates from 1982. He can be reached at firstname.lastname@example.org
How Messy is My Database?
HowMessy is a Robelle tool that prints a report on the internal efficiency of IMAGE/SQL databases. Here are some tips on how to analyze the reports:
Master Datasets (Automatic or Manual)
If Secondaries are over 30 percent and Inefficient Pointers are over 50 percent, the dataset is either too full or hashing improperly. Increase capacity to a higher number that is not a power of 2, or change the data-type and the format of the search field. Increasing the Block Factor should reduce Inefficient Pointers. Look for indications of clustering problems such as Load Factor less than 80 percent, Secondaries less than 5 percent and Max Blocks greater than 100. Clustering usually results from a sequential binary key; change it to type X or U.
Secondaries are created when a synonym occurs in IMAGE/SQL. Synonyms are two or more key values that hash to the same location in a master dataset. The first entry gets the coveted location and becomes a Primary, while the others look for "secondary" space in adjacent unused locations. Of course if a new key value comes along that belongs in a secondary's location, the secondary must be moved. This is called migrating secondaries.
Normally, synonyms are not a problem. But when you have a "binary" key (types I, J, K and R), strange things can happen because IMAGE/SQL does not attempt to select random locations. It just divides the right 32 bits of the key value by the dataset capacity and uses the remainder as the primary hash location. The weirdest case we have seen was a UK user with an R4 key field. R4 is a long floating-point. It has 64 bits, with the exponent and the most significant portion of the mantissa in the leftmost bits. So when IMAGE/SQL selects the rightmost four bytes, they are always 0 and all entries hash to the same location. The user discovered this fact when he exceeded the limit on maximum synonyms per location!
Ignore the Load Factor, unless dataset overflow is likely, and even that is not too serious with the new "automatic detail expansion." If the dataset has more than one path, check that the Primary Path has a large Average Chain Length and is often accessed. Elongation tells how inefficiently packed the chains are, relative to their optimum packing. Elongation of 8.0 on a Primary Path means that disk I/Os are reduced by a factor of eight if you reload the dataset. Look for Average Chain equal to 1.00, Standard Deviation about 0.01, and Max Chain of 2 or 3; this is usually a dataset that should have exactly one entry per key value but actually has duplicate key values. Look for paths with long chains (Average Chain plus Std Dev greater than 100), especially if the path is Sorted (S). Just remember that every search key exists for an important online user need.
A detail dataset with a blocking factor of 10 may have had so many puts and
time that its free-entry chain points all over the place. When 10 new line
items are placed
on the same chain into this detail, they end up in 10 different blocks. If
this detail were
repacked, all 10 entries could reside in the same block. Instead of 10 I/Os
to read the
chain, it would then take just one. If there is more than one path,
remember that a
Dbunload followed by a Dbload repacks only the primary path, not all of the
Detpack from Adager gives you more options, including selecting which path
and how long to work on it. If you want to pursue this further, download
Messy tutorial as a self-extracting archive.