net.digest April 2004

April 2004

Net.digest summarizes helpful technical discussions on the HP 3000 Internet newsgroup and mailing list. Advice here is offered on a best-effort, Good Samaritan basis. Test these concepts for yourself before applying them to your HP 3000s.

Edited by John Burke

The results of the 2004 SIB are in, and there were few surprises. This year’s SIB was radically different, as befits the current status of the HP e3000, MPE/iX and IMAGE. The ballot was split into two, one a ballot on strategic items and the other on the more traditional tactical items.

We now must wait for HP’s response. Count on it being reported and analyzed in the 3000 NewsWire. As I write this, voting is underway for the Board of Directors of the recently re-organized OpenMPE. Hopefully you are a member. If not, why not, since membership is still free? We need to energize the MPE-IMAGE community. OpenMPE is the best shot for MPE-IMAGE to exist post 2006 in some supportable and maintainable form. If you have not joined OpenMPE and are not planning to, contact me and give me a chance to talk you into changing your mind.

March saw more of the now all-too-familiar off topic threads about Iraq or religion or politics, or Iraq and religion and politics. This time we added a long thread on taxes. However, there was still a surprisingly large amount of good technical content, some of which we summarize below.

I always like to hear from readers of net.digest and Hidden Value. Even negative comments are welcome. If you think I’m full of it or goofed, or a horse’s behind, let me know. You can reach me at john@burke-consulting.com.

Hardware mirroring and value in user volumes

We got a good argument going between some real MPE pros on this question. In favor of user volumes were the arguments that, even in a robust hardware mirroring scenario:

• User volumes give you an extra measure of protection in case of a catastrophic failure requiring a re-install;

• User volumes may improve performance, especially in a multi-cpu system

Against user volumes were these arguments:

• The overhead of maintaining the account structure with user volumes more than offsets any small performance gain;

• The overhead of maintaining user volumes more than offsets the miniscule risk of a catastrophic failure requiring a re-install. Only in the case of 100s of GB of storage might user volumes make sense.

For what it’s worth, here is what I think. The person who asked the question already had two Model 20s with all storage configured as part of the system volume set. Even if you were in favor of user volumes, changing to user volumes for this customer would entail a re-install and a delicate manipulation of account structure. Clearly the claimed benefits do not justify such a drastic measure. Similarly, if you are moving from an unprotected environment where you used user volumes to a hardware RAID environment, the benefits of going to a single volume set do not justify the delicate operation required on the account structure. If you were starting out from scratch with a hardware RAID solution then I would probably not recommend user volumes unless you were looking at several hundred GB of total storage and/or more than a dozen or so LUNs.

You’ve GOT to be joking

This is not strictly technical, but is so funny I could not let it pass. Someone wrote on 3000-L, “This ‘Got to be joking’ is the message that I get after I connect to our HP 3000 from a newly arrived, but used, HP 3000 and try to issue any FTP command.” What is going on? After several people speculated on possible errors, James Hofmeister of WTEC replied, “This couldn’t possibly be coming from the FTPSRVR code? ARGH... Okay, joking aside it is true; this message is coming from the MPE/iX FTPSRVR. I checked the code and verified the cause for this goofy messages is: ‘port.tcp_addr > 1023’. The standard TCP ports for FTP are 20 and 21.”

Watch that LDEV 1 disk usage figure

Mike Hornsby of Beechglen first reported this problem, “The problem surfaces if you attempt to reboot your system while it is out of permanent space on the system volume set. The reboot process will hang while attempting to build the next NMLG (network log) file. In other words, rebooting with no available disk space can render your system unbootable. Completely filling the permanent disk space in the system volume set can happen more easily than one might expect. Inadvertently restoring large file sets, batch processes that loop, and enabling low level logging events, to name a few.

“One workaround we have developed involves running stand-alone offline diagnostic after the fact, to patch the volume information. This is a delicate and time-consuming process, but certainly beats the alternative of a re-install. The other workaround is to build a temporary file of sufficient size to ‘reserve’ some space that will be recovered when the system is rebooted:

:BUILD TAKESPAC;DISC=20000,1,1;DEV=1;TEMP

The simplest place to put this would be in a job like JINETD. Check your HPSWINFO.PUB.SYS file to determine whether you are at risk. Systems that have one of the following patches installed are susceptible:

Release 6.5, NMSGDT1A; Release 7.0: NMSGDT2A; Release 7.5, NMSGDV1A

James Hofmeister confirmed the problem, “This problem is resolved in beta test patches NMSHD77 (6.5), NMSHD78 (7.0) and NMSHD79 (7.5). You can contact the HP-Response Center to request the beta fixes for SR 8606351808.

XM and scanning memory

There was a fascinating thread about XM and memory scanning that brought in a number of MPE heavyweights (no pun intended). I am going to copy most of the last posting to the thread. Bill Cadier of vCSY wrote the posting. You know you are a geek if you enjoy reading it.

“I thought I’d mention that the ‘enhanced checkpoint’ feature (ALTERCHKPTSTAT in VOLUTIL) is not used as of 6.5, the command remains in volutil (no idea why!) but it does nothing. The feature used a bit map to track changed pages and that didn’t scale with large files. And the ‘system-wide’ semaphore mentioned is also gone replaced with a more granular object based locking scheme.

“And I thought I’d also share some historical information about the early days of 6.5 and 7.0 with large memory and large files that might help put the ‘scanning memory’ statement into perspective. Some of this might have been discussed here several years ago. The memory manager scans memory. When XM needs to ensure that pages of files have posted to disk (been made durable) so that it can reuse a log half it calls memory management (MM) routines to do that.

“In addition to handling post requests or fetch (or prefetch) requests MM has to try to keep memory organized. These activities include making present pages into ‘recoverable overlay candidates’, or ‘roc’ pages if they have not been accessed recently. This may also include starting a background write if the page is dirty. MM will also try to take pages from the roc list and make them absent (free) if they remain un-accessed and if their background write has finished. This activity will become more urgent as the pool of free pages drops below certain thresholds and can include bumping the priority of background writes so they complete more quickly and being more aggressive about roc’ing present pages.

“These list management algorithms scale based on memory size. And unlike on MPE V where this activity could occur during idle periods because there was far less memory to manage, on MPE/iX we don’t have that luxury. The memory manager has to try to do some amount of list maintenance almost any time it is called.

“Early in 6.5 on systems with large memory and large files we found that these algorithms did not scale as well as we would have liked. They might take too long by trying to do too much, too frequently. And this may be where the ‘scanning memory’ observation was made.

“We made a number of enhancements to speed memory management activities. This was several years ago and by now I’d hope most 6.5 and 7.0 systems have these patches. Here are some of the more significant of those large memory performance improvement patches:

“MPELXG6 — The first of two enhancements to the memory manager list maintenance algorithms. This reduced the length of time MM would spend traversing its lists and while doing so, keeping parts of memory locked.

“MPELXH8 — The second of two, further shortening the amount of time memory manager locks are held and changing the frequency and location of some of the free page replenishment activities.

“MPELXF8 — Enhancement to storage management allowing ‘big’ files of 1GB or more to be held on a ‘least recently used’ list longer. This list holds ‘GUFD’s’ or ‘global unique file descriptors’ of files that are closed and have NO accessors. The expectation being that normal memory management activity might whittle away at the file pages posting them and minimizing the impact of the post that would have to occur when the ‘GUFD’ structure needed to be reused for another file.

“MPELXH5 — The ‘whittling’ wasn’t happening fast enough so we added code to do that. It’s done in small increments from the end of the file upwards so if the file reopens before it is fully mapped out we minimize the page faults needed to bring it back into memory.

“MPELXF2 — An enhancement to a memory manager internal api called make_pages_roc that allowed callers to make pages free rather than just recoverable. Rather than letting the memory manager discover these unneeded pages, it can be told that the pages are no longer needed and can be tossed out of memory right away.

“MPELXJ9 — A further improvement to that enhanced api so both present and recoverable pages would be made free. The initial code change in LXF2 unnecessarily skipped roc pages.

“These patches are all superseded by others. For 6.5, patch MPEMXE5A will install all these changes and many more. The 7.0 patch is MPEMXC7A. These changes were submitted to 7.5 so there are no 7.5 versions needed.”