COBOL level 88 data type - documentation

Very basic question here.
I have to write out a data glossary for a COBOL program. This data glossary includes the following details about every variable:
Name
Data type
Range of values (if applicable)
Line numbers
Fuller name
I have several variables that include level 88 switches. My question is this: Are these level 88 switches counted as variables, and should I include them in the data glossary? Or, judging by the data glossary structure I have to work with, should they be ignored in this context?
And while I'm here, another simple question. Should fillers be included in data glossaries? This program in particular contains a LOT of filler variables, most being simple "PIC X" variables.

Assuming I understand the question being asked.
It would help if you could give an example with a COBOL layout and data glossary entry one with and one without an 88 entry. However, I'll do my best to try to answer the question.
No, 88 level entries are not variables and they do not increase or decrease the length of the record. They simply allow you to create a conditional statement.
With that being said should your data glossary only include variables that contribute to the length of the record?
If yes then there shouldn't be a separate data glossary entry per 88 item. However, it might help to explain a given variable's value[s] (3 and maybe 5 or even an extra line for expected values).
01 record-store.
02 location pic 9(4).
88 dist-center value 100, 101, 102.
02 value pic 9(6).
02 paid pic X(1).
88 yes value 'Y', 'y'.
88 no value 'N', 'n'.
Your data glossary would/could be:
location
Name: location
Data Types: integer
Range of Value: 0-9999
Line Numbers: 20
Fuller name: location of the data
Expected Values:
100, 101, 102 for distribution centers
1-99 for customers
103-9999 invalid
Now knowing your expected values you might go back and change your 88 values?
...
02 location pic 9(4).
88 dist-center value 100, 101, 102.
88 customers value 1 thru 99.
88 invalid value 0, 103 thru 9999.
...
If no then:
You could have a separate data glossary entry pre 88 level entry.
Your data glossary would/could be:
location
Name: location
Data Types: integer
Range of Value: 0000-9999
Line Numbers: 20
Fuller Name: The location of the data
dist-center
Name: dist-center
Data Types: boolean
Range of Value: 100, 101, 102
Line Numbers: 5
Fuller Name: Is location is a distribution center
customer
Name: customer
Data Types: boolean
Range of Value: 1-99
Line Numbers: 5
Fuller Name: Is location a customer
invalid
Name: invalid
Data Types: boolean
Range of Value: 0001, 0010, 0100
Line Numbers: 5
Fuller Name: Is location an invalid value

As usual, it depends. :-)
The level 88 values seem to belong under part 3 "Range of values", especially if they document the only values allowed for some variable.
The FILLER fields are of course important if the documentation is used to reconstruct the records. If you just want to document the usage of the other fields, they are not very interesting.

The 'PIC X' FILLER variables are probably flags in working storage with 88 levels, and therefore quite important.
For instance, we use this type of construct a lot:
01 FILLER PIC X.
88 OPTION-IS-ON VALUE 'Y', FALSE 'N'.
88 OPTION-IS-OFF VALUE 'N'.
This defines a flag which we only reference using it's conditions. For example we might use it like this:
SET OPTION-IS-ON TO TRUE. | This puts a 'Y' in the PIC X
.
.
.
IF OPTION-IS-ON
do something
END-IF
In this case we never need to refer to the actual flag value itself, and hence you do not need to give it a name.
The 'FALSE' in the 88 level just allows you to specify what is stored when you use the statement:
SET OPTION-IS-ON TO FALSE | This puts an 'N' in the PIC X
which of course is the same as saying:
SET OPTION-IS-OFF TO TRUE | This also puts an 'N' in the PIC X
It all depends what is more readable at the time.

Related

Convert character variable to numeric variable in SAS

I'm trying to convert a character variable to a numeric variable, but unfortunately i'm really struggeling. Help would be appreciated!
I keep getting the following error: 'Invalid argument to function INPUT at line 3259 column 17'
Syntax:
Data want;
Set have;
Dosis_num = input(Dosis, best12.);
run;
I have also tried multiplying the variable by 1. This doesnt work either.
The variable looks like this:
Dosis
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
Want:
Dosis_num
155.0
201.0
2.1
0.8
123.8
12.0
333.4
0.6
Thanks alot!
The code will work with the data you show. So either the values in the character variable are not what you think or you are not using the right variable name for the variable.
The code is trying to only use the first 12 bytes of the character variable. Normally you don't need to restrict the number of characters you ask the INPUT() function to use. In fact the INPUT() function does not care if the width of the informat used is larger than the length of the string being read. So just use 32. as the informat since 32 is the maximum width that the normal numeric informat can read. Note that BEST is the name of a FORMAT, if you use it as the name of informat it is just an alias for the normal numeric informat.
If the variable has a length longer than 12 then perhaps there are leading spaces in the variable (note the ODS output displays do not properly display leading spaces) then use the LEFT() function to remove them.
Dosis_num = input(left(Dosis), 32.);
The typical thing to do here is to find out what's actually in the character variable. There is likely something in there that is causing the issue.
Try this:
data have;
input #1 Dosis $8.;
datalines;
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
;;;;
run;
data check;
set have;
put dosis hex32.;
run;
What I get is this:
83 data check;
84 set have;
85 put dosis hex32.;
86 run;
3135352020202020
3230312020202020
322E312020202020
302E382020202020
3132332E38302020
31322E3020202020
333333332E342020
30302E3620202020
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.CHECK has 8 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
All those 2020202020 are spaces, which should be there (all strings are space-padded to full length). Period/Decimal Point is 2E, Digits are 3x where x is the digit (because the ASCII for 0 is 30, not because of any other reason). So for example for the last one, 00.6, 30 means zero, 30 means zero, 2E means period, and 36 means 6.
Check to make sure that you don't have any other characters other than digits (3x) and period (2e) and space (20).
The other thing to verify is that your system is set to use . as the decimal separator and not , as many European systems are - otherwise this requires the commaw. informat. You can actually just try the commaw. informat (comma12. is sufficient if 12 is plenty - and don't include anything after the period) as anything that 12. can read in also can be read in by commaw..

Composite indexing using Redis in a hierarchical data model

I have a data model like this:
Fields:
counter number (e.g. 00888, 00777, 00123 etc)
counter code (e.g. XA, XD, ZA, SI etc)
start date (e.g. 2017-12-31 ...)
end date (e.g. 2017-12-31 ...)
Other counter date (e.g. xxxxx)
Current Datastructure organization is like this (root and multiple child format):
counter_num + counter_code
---> start_date + end_date --> xxxxxxxx
---> start_date + end_date --> xxxxxxxx
---> start_date + end_date --> xxxxxxxx
Example:
00888 + XA
---> Jan 10 + Jan 20 --> xxxxxxxx
---> Jan 21 + Jan 31 --> xxxxxxxx
---> Feb 01 + Dec 31 --> xxxxxxxx
00888 + ZI
---> Jan 09 + Feb 24 --> xxxxxxxx
---> Feb 25 + Dec 31 --> xxxxxxxx
00777 + XA
---> Jan 09 + Feb 24 --> xxxxxxxx
---> Feb 25 + Dec 31 --> xxxxxxxx
Today the retrieval happens in 2 ways:
//Fetch unique counter data using all the composite keys
counter_number + counter_code + date (start_date <= date <= end_date)
//Fetch all the counter codes and corresponding data matching the below conditions
counter_number + date (start_date <= date <= end_date)
What's the best way to model this in redis as I need to cache some of the frequently hit data. I feel sorted sets should do this somehow, but unable to model it.
UPDATE:
Just to remove the confusion, the ask here is not for an SQL "BETWEEN" like query. 'Coz I don't know what the start_date and end_date values are. Think they are just column names.
What I don't want is
SELECT * FROM redis_db
WHERE counter_num AND
date_value BETWEEN start_date AND end_date
What I want is
SELECT * FROM redis_db
WHERE counter_num AND
start_date <= specifc_date AND end_date >= specific_date
NOTE: The requirement is pretty much close to 2D indexing of what is proposed in Redis multi-dimensional indexing document
https://redis.io/topics/indexes#multi-dimensional-indexes
I understood the concept but unable to digest the implementation detail that is given.
I'm unlikely to get this done in time for the bounty, but what the hell...
This sounds like a job for geohashing. Geohashing is what you do when you want to index a 2-dimensional (or higher) dataset. For example, if you have a database of cities and you want to be able to quickly respond to queries like "find all the cities within 50km of X", you use geohashing.
For the purposes of this question, you can think of start_date and end_date as x and y coordinates. Normally in geohashing you're searching for points in your dataset near a particular point in space, or in a certain bounded region of space. In this case you just have a lower bound on one of the coordinates and an upper bound on the other one. But I suppose in practice the whole dataset is bounded anyway, so that's not a problem.
It would be nice if there was a library for doing this in Redis. There probably is, if you look hard enough. The newer versions of Redis have built-in geohashing functionality. See the commands starting with GEO. But it doesn't claim to be very accurate, and it's designed for the surface of a sphere rather than a flat surface.
So as far as I can see you have 3 options:
Map your search space to a small part of the sphere, preferably near the equator. Use the Redis GEO commands. To search, use GEOSPHERE on a circle covering the triangle you're trying to search, taking into account the inbuilt inaccuracy and the distortion you get by mapping onto the sphere, then filter the results to get the ones that are actually inside the triangle.
Find some 3rd-party geohashing client for Redis which works on flat space and is more accurate than GEO.
Read the rest of this answer, or some other primer on geohashing, then implement it yourself on top of Redis. This is the hardest (but most educational) option.
If you have a database that indexes data using a numerical ordering, such that you can do queries like "find all the rows/records for which z is between a and b", you can build a geohash index on top of it. Suppose the coordinates are (non-negative) integers x and y. Then you add an integer-valued column z, and index by z. To calculate z, write x and y in binary, then take alternate digits from each. Example:
x = 969 = 0 1 1 1 1 0 0 1 0 0 1
y = 1130 = 1 0 0 0 1 1 0 1 0 1 0
z = 1750214 = 0110101011010011000110
Note that the index allows you to find, for example, all records positioned with z between 0101100000000000000000 and 0101101111111111111111 inclusive. In other words, all records for which z starts with 010110. Or to put it another way, you can find all records for which x starts with 001 and y starts with 110. This set of records corresponds to a square in the 2-dimensional space we are trying to search.
Not all squares can be searched in this way. We'll call these ones searchable squares. Suppose the client sends a request for all records for which (x,y) is inside a particular rectangle. (Or a circle, or some other reasonable geometric shape.) Then you need to find a set of searchable squares which cover the rectangle. Then, for each of these squares you've chosen, query the database for records inside that square and send the results to the client. (But you'll have to filter the results, because not all the records in the square are actually in the original rectangle.)
There's a balance to be struck. If you choose a small number of large special squares, you'll probably end up covering a much larger area of the map than you need; the query to the database will return lots of extra results that you'll have to filter out. Alternatively, if you use lots of little special squares, you'll be doing lots of queries to the database, many of which will return no results.
I said above that x and y could be start_time and end_time. But actually the distribution of your dataset won't be as symmetrical as in most uses of geohashing. So the performance might be better (or worse) if you use x = end_time + start_time and y = end_time - start_time.
Because your question remains a bit vague on how you desire to query your data, it remains unclear on how to solve your question. With that in mind, however, here are my thoughts on how I might model your data:
Updated answer, detailing how to use SORTED SET
I have edited this answer to be able to store your values in a way that you can query by dynamic date ranges. This edit assumes that your database values are timestamps, as in the value is for a single time, not 2, as in your current setup.
Yes, you are correct that using Sorted Sets will be able to accomplish this. I suggest that you always use a Unix timestamp value for the score component in these sorted sets.
In case you were not already familiar with redis, let's explain indexing limitations. Redis is a simple key-value designed to quickly retrieve values by a key. Because of this design, it does not contain many features of your traditional DBMS, like indexing a column for instance.
In redis, you accomplish indexing by using a key, and the most nested key-like structures are available in HASH and SORTED SET, but you only get 2 key-like structures. In a HASH, you have the key (same as any data type), and a inner hash key, which can take the form of any string.
In a SORTED SET, you have the key (same as any data type), and a numeric value.
A HASH is nice to use to keep a grouped data organized.
A SORTED SET is nice if you want to query by a range of values. This could be a good fit for your data.
Your SORTED SET would look like the following:
key
00888:XA =>
score (date value) value
1452427200 (2016-01-10) xxxxxxxx
1452859200 (2016-01-10) yyyyxxxx
1453291200 (2016-01-10) zzzzxxxx
Let's use a more intuitive example, the 2017 Juventus roster:
To produce the SORTED SET in the table below, issue this command in your redis client:
ZADD JUVENTUS 32 "Emil Audero" 1 "Gianluigi Buffon" 42 "Mattia Del Favero" 36 "Leonardo Loria" 25 "Neto" 15 "Andrea Barzagli" 4 "Medhi Benatia" 19 "Leonardo Bonucci" 3 "Giorgio Chiellini" 40 "Luca Coccolo" 29 "Paolo De Ceglie" 26 "Stephan Lichtsteiner" 12 "Alex Sandro" 24 "Daniele Rugani" 43 "Alessandro Semprini" 23 "Dani Alves" 22 "Kwadwo Asamoah" 7 "Juan Cuadrado" 6 "Sami Khedira" 18 "Mario Lemina" 46 "Mehdi Leris" 38 "Rolando Mandragora" 8 "Claudio Marchisio" 14 "Federico Mattiello" 45 "Simone Muratore" 20 "Marko Pjaca" 5 "Miralem Pjanic" 28 "Tomás Rincón" 27 "Stefano Sturaro" 21 "Paulo Dybala" 9 "Gonzalo Higuaín" 34 "Moise Kean" 17 "Mario Mandzukic"
Jersey Name Jersey Name
32 Emil Audero 23 Dani Alves
1 Gianluigi Buffon 42 Mattia Del Favero
36 Leonardo Loria 25 Neto
15 Andrea Barzagli 4 Medhi Benatia
19 Leonardo Bonucci 3 Giorgio Chiellini
40 Luca Coccolo 29 Paolo De Ceglie
26 Stephan Lichtsteiner 12 Alex Sandro
24 Daniele Rugani 43 Alessandro Semprini
22 Kwadwo Asamoah 7 Juan Cuadrado
6 Sami Khedira 18 Mario Lemina
46 Mehdi Leris 38 Rolando Mandragora
8 Claudio Marchisio 14 Federico Mattiello
45 Simone Muratore 20 Marko Pjaca
5 Miralem Pjanic 28 Tomás Rincón
27 Stefano Sturaro 21 Paulo Dybala
9 Gonzalo Higuaín 34 Moise Kean
17 Mario Mandzukic
To query the roster by a range of jersey numbers:
ZRANGEBYSCORE JUVENTUS 1 5
Output:
1) "Gianluigi Buffon"
2) "Giorgio Chiellini"
3) "Medhi Benatia"
4) "Miralem Pjanic"
Note that the scores are not returned, however ZRANGEBYSCORE command orders the results in ASC order by score.
To add the scores, append "WITHSCORES" to the command, like so: ZRANGEBYSCORE JUVENTUS 1 5 WITHSCORES
By using ZRANGEBYSCORE, you should be able to query any key (counter number + counter code) with a date range,
producing the values in that range.
Original: Below is my original answer, recommending HASH
Based on your examples, I recommend you use a HASH.
With a hash, you would have a main key to find the hash (Ex. 00888:XA). Then within the hash, you have key -> value pairs (Ex. 2017-01-10:2017-01-20 -> xxxxxxxx). I prefer to delimit or tokenize my keys' components with the colon char :, but you can use any delimiter.
HASH follows your example data structure very well:
key
00888:XA =>
hashkey value
2017-01-10:2017-01-20 xxxxxxxx
2017-01-21:2017-01-31 yyyyxxxx
2016-02-01:2016-12-31 zzzzxxxx
key
00888:ZI =>
hashkey value
2017-01-10:2017-01-20 xxxxxxxx
2017-01-21:2017-01-31 xxxxyyyy
2016-02-01:2016-12-31 xxxxzzzz
When querying for data, instead of GET key, you would query with HGET key hashkey. Same for setting values, instead of SET key value, use HSET key hashkey value.
Example commands
HSET 00777:XA 2017-01-10:2017-01-20 xxxxxxxx
HSET 00777:XA 2017-01-21:2017-01-31 yyyyyyyy
HSET 00777:XA 2016-02-01:2016-12-31 zzzzzzzz
(Note: there is also a HMSET to simplify this into a single command)
Then:
HGET 00777:XA 2017-01-21:2017-01-31
Would return yyyyyyyy
Unless there is some specific performance consideration, or other goal for your data, I think Hashes will work great for your system.
It's also very convenient if you want to get all hashkeys or all values for a given hash, using commands like HKEYS, HVALS, or HGETALL.

Efficient way to store FilePath

Currently I have a table with the following format/Desc:
ColumnName ColID PK IndexPos Null DataType
ID 1 1 N VARCHAR2 (1 Byte)
FILEPATH 2 N VARCHAR2 (127 Byte)
As you can see the length of ID Column is only 1 Byte we can store only 36 different file paths. I have more than 35 different file paths that has to be stored and retrieved. I know increasing the length of ID solves the issue but I want to also know/suggestion that is there any Efficient way to handle this.
Thanks!
The assertion that you can store only 35 different values in the table is incorrect, because varchar2 characters are not limited to letters and digits (even if they were you'd have 26 letters + 10 digits + 1 empty string = 37, not 35 possibilities).
If you need to store few more paths, say, 40 or 50, you could make your keys mixed case, so 'a' and 'A' would reference different paths. This would instantly give you 26 extra possibilities.
Expanding past the limit of 63 is a little harder, because you need to bring special characters into the mix. However, the theoretical maximum for a single character is 256 plus one combination for an empty string.

How to display the numeric numbers

Here's the content of my DataGrid
id
1
2
3A
4
5
6A
..
...
10V1
I want to get the max number from the datagrid. Then, I want to
display the next number (In this case: 11) in the textbox beside the grid
Expected Output
id
1
2
3A
4
5
6A
..
...
10V1
11
I tried the following code:
textbox1.text = gridList.Rows(gridlist.RowCount() - 1).Cells(1).Value + 1
It works if the previous row values is entirely numeric. However, if the value is alpahnumeric, I am getting the following error:
Conversion from string "10V1" to type 'Double' is not valid.
Can someone help me solve this problem? I am looking for a solution in VB.Net
You may want to look into Regex to do that (based on what I understand from your question)
Here's a related question on this.
Regex.Match will return the part of the string that will match the expression... In your case, you want the first number in your string (Try "^\d+" as your expression, it will find any serie of numbers at the beginning of your string). You can then convert the result string into an int and add 1 to it.
Hope this helps!
Edit: Here's more info on regex expressions.

COBOL data buffering without moving character by character

I am reading a variable length input file and wanting to create an output buffer (indexed table) that will not utilize a character by character move.
For example: my first input record is 79 characters, I can then move this to my group level of the table. My second input record is 101 characters -- how can I take these 101 characters and place them in my table beginning at position 80 for a length of 101 ? And the next input record beginning at position 180.....etc. We currently Perform Varying 1 by 1 but this is incredibly CPU intensive compared to a block move to a beginning address.
We do this millions of times a day and a solution would be quite useful.
Use reference modification with the length from your record. Consider:
01 Record
05 Rec-LL Pic S9(4) Binary.
05 Rec-Data Pic X(32767).
01 Tgt-Area Pic X(10000000).
01 Curr-Ptr Pic S9(8) Binary.
Once you read your record, you can move based on the length like so:
Move 1 to Curr-Ptr
Perform Get-Next-Record
Perform until no-more-data
Move Rec-Data (1:Rec-LL) to Tgt-Area (curr-ptr:rec-LL)
Compute Curr-Ptr = Curr-Ptr + Rec-LL
Perform Get-Next-Record
End-Perform
Or the old fashioned ( we are talking COBOL here so old fashioned = Jurassic) way:-
01 Record.
05 REC-LL PIC S9(4) Binary.
05 REC-DATA.
10 REC-BYTES PIC X OCCURS 32767 times depending on REC-LL.
01 TARGET-AREA.
05 TARGET-HEADER PIC X(79).
05 TARGET-REC PIC X(101) OCCURS 50 TIMES.
01 TGT-INDEX PIC S9(8) BINARY VALUE 1.
* Length calculation happens by magic!
Perform Read-Record.
move REC-DATA TO TARGET-HEADER.
perform until no-more-data
Perform Read-Record
move REC-DATA to TARGET-RED(TGT-INDEX)
add +1 to TGT-INDEX
end-perform
Or if records really vary between 1 and 101 bytes:
01 Record.
05 REC-LL PIC S9(4) Binary.
05 REC-DATA.
10 REC-BYTES PIC X OCCURS 32767 times depending on REC-LL.
01 TARGET-AREA.
05 TGT-LL PIC s9(8) BINARY.
05 TGT-REC.
10 TGX-BYTE OCCURS 3175 depending on TGT-LL.
05 TGT-EXTRA PIC X(101).
Perform Read-Record.
Move +0 to tgt-LL.
perform until no-more-data
MOVE REC-DATE to TGT-EXTRA
ADD REC-LL TO TGT-LL
Perform Read-Record
add +1 to TGT-INDEX
end-perform
Take a look at the STRING INTO verb, in particular the WITH POINTER clause. Don't forget the ON OVERFLOW imperative when stringing things together like this.
For details, grab a copy of Gary Cutler's OpenCOBOL Programmer's Guide.
http://opencobol.add1tocobol.com/OpenCOBOL%20Programmers%20Guide.pdf
This is a world class COBOL manual, and it's an open and free document (GNU FDL).