Thursday, November 17, 2011

Associative arrays

In REXX, arrays do not have to be indexed by numbers.  They can be indexed by anything... I think.  Certainly they can be indexed by names or other non-numeric tokens, and they certainly don't have to have values for every possibility.  Les Koehler calls these 'content addressible arrays' because it is data-content which serves as the index-value for the array.

It's common to see such arrays in matching routines where an array might be initialized to '0' and set to '1' for each of several dozen/hundred/thousand values as they are encountered in the course of processing data.  The value of the array-element tells us, in that case, whether or not we have encountered the index token before, or in the case of an actual counter, how many times we have encountered that token.  If you are, for instance, trying to eliminate duplicate data in a file and the duplicates might be scattered randomly throughout the data, this is just what the doctor ordered:

exists. = 0
do (over the entire input stream)
                  /* get the next token */
   if exists.token then iterate   /* discarding this input */
                  /* process this input data */
   exists.token = 1               /* discard all future */
end

This logic causes each token to be processed only the first time it is encountered.  All following occurrences are flushed.  If, instead, we wished to know how many of each token were present, the code would simply increment a counter each time the token were encountered.  However, in such a case, it will be nigh-on impossible to retrieve the counts at the end of processing unless the program has separately kept a list of the tokens it found, because the tokens could be anything... literally.

WARNING!  One aspect of content-addressible arrays has caused me endless grief because I continually forget the lesson so painfully learned the last time it happened.  To the extent those index-values are alphabetic, they are exclusively upper-case.  That is: if you set a value as

fullname.Tom  = "Tom Swift"

you must be careful when retrieving it that you specify the index-value as "TOM".  Stated another way, you cannot have separate values for 'fullname.Tom' and 'fullname.TOM'.  They occupy the same space.  Only the last set value will exist, and it will be indexed by 'TOM'.  It is a worthwhile 'exercise for the student' to write a small demonstration program to expose this behavior.

No comments:

Post a Comment