REXXpertise: Efficiency

Showing posts with label Efficiency. Show all posts

Monday, September 19, 2022

Breathing Room

When I left my last contract, I had an idea for software that was — to use an expression — 'half-baked'. It was an idea for a problem reporting-responding system for your local friendly tools maven — someone like me. Someone, perhaps, like you. I thought there was a need for a vehicle for programmers — the folks who do the real work of producing software that generates the reports and files that management uses on a day-to-day basis to run the companies that produce the wealth that pays our salaries — to log that some particular tool exhibits unusual behavior or to suggest that some tool needs additional function.

The problem is that to develop mainframe software, one really needs a mainframe, and when I say 'mainframe', what I really mean is an 'IBM-ish operating system'.

There aren't too very many of those around, and what few there are are generally devoted to revenue-producing activity, as opposed to blue-sky and pie-in-the-sky ventures... like problem-reporting systems.

It's not easy to find such things — IBM-ish mainframes — but it's not impossible, either, and I managed to find one that didn't require me to pay $50/hr of computing time.

As a result, I've managed to get that problem reporting-responding system pulled together into a form that seems to me ready for some real-world testing. So...

I have put 'FIXTOOL' out on my alternate source code page, http://frankclarke.dx.am/REXX/listindx.html, and now invite all and sundry to steal a copy and run it through its paces, reporting any errors back to me at my main email address. You will also need to snag copies of several other routines that are needed in one way or another, and FIXTOOL also requires that SMTPNOTE (an IBM-originated routine) be active on your site.

Routine DFLTTLIB should be downloaded and adjusted to fit your local environment. It designates the local default ISPTLIB dataset that will house all the ISPF tables used by my software. Every user of FIXTOOL must be able to WRITE to its table.
Routine ICEUSER should be downloaded and customized to identify the 'special' users who are expected to manage the tools.
Before running FIXTOOL, you must run TBLMSTR. TBLMSTR will create an AA-form table (AAMSTR) with a single entry describing itself, and write that table onto the default ISPTLIB.
You, the maintainer, must add a row onto AAMSTR for the IT-type table so that TBLGEN will know how to build it when ordered. The easiest way to do this is to invoke FIXTOOL with the single parameter 'LOADQ'. This only has to be done ONCE. Subsequent invocations of FIXTOOL need not (should not) use LOADQ.

So, here is the component list of all the software that must be in place in order to run FIXTOOL:

DFLTTLIB
FCCMDUPD
FIXTOOL
ICEUSER
LA
TBLGEN
TBLMSTR
TRAPOUT

Several of these routines reference SYSUMON, a usage logging routine that uses RXVSAM to update a KSDS where tool-usage statistics are kept. You can either disable it or build the appropriate KSDS and let it count. Up to you.

If you decide to fix any problems you find rather than have me fix them, I would still appreciate hearing from you what those problems were and how you addressed them.

Sunday, May 22, 2016

Quick! Get me member (Q47MMT1)!

Imagine a partitioned dataset with 8,000 members (or more). This is getting into the range where finding the directory entry for a specific member is becoming a real chore and is chewing up cycles. I heard of an imaginative way to speed up the process.

Define the partitioned dataset as a Generation Data Group and make the group large enough that, when the dataset is split, searching the directory of each is less of a chore (it will be even if only because each fragment of the whole is smaller). Let's say, for the sake of argument, that we break it into 27 generations, one for each letter of the alphabet plus a catch-all. Now copy all the members beginning with non-alphabetics into generation #1, all the "A"s into #2, all the "B"s into #3, etc. When you access the group via its base-name (without specifying a generation) you get them all concatenated in 27-26-25...3-2-1 order.

When you look for member 'Q47MMT1', the directory of generation #27 is scanned, but member names are always in alphabetical order and this directory starts with 'Z...'. That's not it; skip to G0026V00. Its first entry starts with 'Y...'. Nope. G0025V00 starts with 'X...', G0024V00 starts with 'W...', G0023V00 starts with 'V...', G0022V00 starts with 'U...', G0021V00 starts with 'T...', G0020V00 starts with 'S...', G0019V00 starts with 'R...', G0018V00 starts with 'Q...'. Got it! You quickly find the required member and processing continues. What's happening here is that instead of searching through 8,000+ directory entries and finding what you seek in (on average) 4000-or-so lookups, you looked at (on average) 13 + ~150 (8000 / 27 / 2). As the original partitioned dataset gathers more members, this comparison gets more stark. At some point, the comparison is so stark that someone will wonder if the quicker method failed because it just couldn't complete that fast.

Monday, May 9, 2016

ALIAS is not a four-letter word

Are you one of those who thinks "Alias? Why bother?"? They do have their uses, and with a little imagination they can be leveraged to deliver surprising productivity gains.

Aliases come in two flavors: member aliases and dataset aliases.

Member aliases are nothing more than entries in a partitioned dataset's directory. Each such entry holds the TTR (track and record) of an existing member — called the "base member". If you edit an alias and save it, BPAM writes the saved text at the back of the dataset and records the new TTR in the directory entry for the alias, making it a base member in its own right (no longer an alias of some other base member). But as long as it is an alias, any reference to the base name or any of its aliases points to the same code. Most languages have the facility of knowing by which name the routine was called, and the logic may branch differently for each (or not — it depends).

Dataset aliases provide the same sort of facility but at the dataset level. These aliases must be kept in the same catalog as holds the dataset name for which the alias is created. The kicker here is that the alias and the dataset it aliases must have the same high-level qualifier. If they didn't, they'd be in different master catalogs and couldn't exist in the same sub-catalog.

So, what can you do with a dataset alias? Why would you bother? Well, here's a practical application that can save hours of updating and weeks of grief: You have a dataset (or a series of datasets) that IBM or some other maintainer periodically updates. Maybe it's the PL/I compiler or something similar. You have a cataloged procedure, a PROC, or possibly several of them that programmers use to compile programs. If your PROC(s) all reference SYS1.COMPLIB.R012V04.LOADLIB, then when IBM sends down the next update, somebody is going to have to change all those PROCs to reference ...R013V01... and slip them into the PROCLIB at exactly the right moment. Usually that means Sunday afternoon when the system is quiesced for maintenance and only the sysprogs are doing any work. Or...

You could alias whichever is the currently supported version as SYS1.COMPLIB.CURRENT.LOADLIB. When the new version has been adequately tested and is ready to be installed for everyone's use, you use IDCAMS to DELETE ALIAS the old one and DEFINE ALIAS the new one. These two operations will happen so fast it will be like flipping a switch: one instant everyone is using R012V04, and the next they're using R013V01. The system doesn't even have to be down. You can do it Tuesday during lunch. Nobody's JCL has to change, but (more importantly) none of the PROCs have to change, either. Your favorite beta-testers can access the next level just by overriding the STEPLIB. Everybody else just uses the PROC as-is. If somebody reallyreallyreally needs to get to the prior version, that, too, is just a STEPLIB override.

I think (but I don't know for certain) that you can write an ACF2 rule that allows certain privileges to an ALIAS that are prohibited to the BASE (and vice versa), but the most amazing ALIAS-trick (as far as I'm concerned) is the ability to swap one dataset for another with none of the users being any the wiser.

Thursday, October 24, 2013

Retrospective

So, here I am on the feather-edge of retirement (I'll be 70 in a few months) and I'm still learning things. I had an insight last night that kept me awake mulling it. My last contract was with Bank of America in Texas and, while it was fun, it was also more than just a little frustrating.

When I first started looking at the code I would be working with at BofA, I was confused. Everybody these days writes 'strictly structured', right? No, wrong, and that was what was so confusing. Last night's insight cleared away all the cobwebs... just in time for Hallowe'en.

There are two ways to approach a programming problem. In the first, you start out by assuming that this is an easy problem; you have to do A, B, C, and finally D. Voila! You sit down and write code and it comes out as a single long module. It may be fairly complex. (This was typical for most of the BofA code I saw.)

If, instead, you assume that the problem will be complex, that you have to do A, B, C, and finally D, you will sit down and write a top-level module with stubs for the called subroutines. Then you will write the innards of the subroutines, probably as routers with stubs for their called subroutines. This process will continue through n levels until each subroutine is so simple it just doesn't make sense to break it down further. The resulting program will be longish, but (all things considered) pretty simple regardless of the initial estimate of complexity.

I (almost) always presume a programming task will be complex. If that turns out to be wrong, no big loss. If I were to assume some programming task were simple and it turns out not to be quite as simple as I originally thought — that would hurt. It would hurt because halfway through writing that 'one long module', I would discover the need for the same code I used in lines 47 through 101. Stop. Grab a copy of that code. Create a subroutine at the end of the code. Insert CALLs at both places. Continue writing that 'one long module' where you left off.

If that scene happens more than once or twice, what we wind up with is a long main module with several calls to randomly-placed subroutines. The coefficient of complexity has just been bumped up, and the bump could be quite a lot. If it's one of the newly-created subroutines whose function needs to be partitioned, the code soon takes on a distinct air of 'disorganization'.

Do I have to point out that there's way too much overly-complex and disorganized code out there and running in production? No, I probably don't; we've all experienced Windows.

So, there's a built-in penalty for assuming simplicity, and it turns out this penalty applies (in REXX, at any rate) no matter how complex the eventual program actually is.

If a (REXX) program is written as 'one long module', possibly with a few random subroutines for function required in more than one place, diagnosis becomes a problem. Unless the programmer has anticipated bypassing iterative loops, a trace will have to endure every iteration in every loop before getting to the next stage. To avoid this most painful experience, what happens most often with such code is a quick one-time fix to turn TRACE on here and shut it off there. But then, the program being diagnosed is no longer the program that failed; it's a modified version of the failing program.

If a (REXX) program is highly-structured, function will be very encapsulated to the point that any error will be isolated to one or a very small number of suspect segments. Running such a heavily-encapsulated program in trace-mode means that entire trees of logic can be bypassed: if TRACE is on for a higher-level module, it can be turned off in a submodule (and all its children) but will still be on when control returns to the higher-level module. The more structured the code, the easier it is to debug. With one proviso...

You can have a highly-structured program that is nevertheless disorganized. If, for example, you place your subroutines in alphabetical order, the flow of control will appear chaotic. Ideally, submodules that are merely segments of an upper-level router should appear in roughly their order-of-execution. Although they're broken out into separate segments, they still retain the flavor of that 'one long module' insofar as they appear one after the other like the cars of a train. Reading such code becomes easier because a CALL to a submodule is a call to code which is (probably) physically close by. (This is not always strictly true.)

COBOL programmers long ago adopted a more-or-less universal convention: they prefix the name of each code segment with a glyph ('D100', perhaps) that indicates its logical position in the complete program. A COBOL programmer seeing a reference to 'D100-something' in module 'C850-GET-USER-ID' knows to look later in the code for that segment. The same technique works equally well in all languages, and REXX is not an exception. (I tend to use alpha-only such that the mainline calls A_INIT, B_something, etc. Module C_whatever calls CA_blah, CB_blah, etc. Whatever works...)

Exactly the same sorts of things can be said about modifying an existing program. The 'one long module' requires careful planning and skillful execution when inserting new function or changes to existing function. Testing the new function is a chore to the same extent diagnosing an error is a chore, and for the same reasons. Highly-structured code is designed to be modified; it was written that way.

Summarizing: a highly-structured REXX program may be a little longer than it (strictly) has to be, but it will be easier to understand and easier to diagnose in case of an error. This understanding can be enhanced by strategic naming of segments and by arranging the segments to more closely align with the actual order of execution.

Recommendation: Structure is your friend. It may be your best friend.

Monday, November 7, 2011

Adding a row to a stem array

Les Koehler taught me this trick that I have put to wide use since the mid-90s. Yes, adding a row to a stem array is a fairly simple process:

   zz     = log.0 + 1
   log.zz = msgtext
   log.0  = zz

That is an example of using "a bigger hammer" even though many REXX programmers will look at it and exclaim "There's nothing wrong with that!" Indeed, there's nothing wrong with it... except that it's slow. If you're doing it several thousand times, you'll probably want something a little quicker. In fact, after you've found that 'something quicker', you may well decide to always use the quicker method. Those who don't follow REXXpertise will cock their heads to one side as if to ask "What in the world are you doing?" You'll get a chance then to explain the process to them ;-)

   parse value  log.0+1  msgtext     with,
                zz      log.zz    1  log.0   .

Here we first construct a value-string composed of "log.0 + 1" and "msgtext". This is parsed as "zz" and "log.zz". Since "zz" is set first (from the value of "log.0+1"), "log.zz" now points at the next available slot. The location pointer is reset to "1" and the parse continues, loading "log.0" with the incremented value. The remainder of the line is discarded. Once you understand the protocol, it makes perfect sense.

Saturday, October 29, 2011

Rapid Initialization

If you, like me, always write your REXX code beginning with a "signal on novalue" (to prevent the use of uninitialized variables), then you (like me) always want to make sure that all your variables are properly initialized. Here's a fast way to initialize a whole load of variables WHAM!:

   parse value "0 0 0 0 0 0 0"    with ,
               rpt#   rpt. ,
               .
   parse value ""    with ,
               slist   ltoken  stoken  loadlist   ,
               tag   taglist   ,
               .
   parse value 0   "ISR00000  YES       Error-Press PF1"    with,
               sw.  zerrhm    zerralrm  zerrsm

The first PARSE uses a string of several (many, a whole lot of) zeroes because it's concerned with zeroing-out several counters, among which are 'rpt#' and 'rpt.'. Note the '.' on the last line to flush any unused zeroes.

Any number of counters can be zeroed in a single PARSE by just adding their names to the list. If more zeroes are needed in the value-pattern, they are easy enough to add because you never have to count. Need a few more? Add another twenty or thirty!

The second PARSE is concerned with variables that need to be blanked-out. Here, you never have to worry about whether there are enough blanks. To add another variable-to-be-blanked, just add it to the existing list.

The third PARSE initializes a collection of variables to several different values. 'sw.' is set to '0'; 'zerrhm' is set to 'ISR00000'; 'zerrsm' is set to 'Error-Press PF1'; &cetera. PARSEs of this type need to be carefully constructed.

Rather than have each variable assigned its value on a separate line, this technique clusters many assignments together in a compact form that does not distract from reading the code for comprehension, it is at least as fast, execution-wise, as doing them one-by-one, and it's a heckuvalot easier to type.