AS/400: Most efficient DB2 SQL to unpack EBCDIC subfield values as strings? - sql

Most files I am working with only have the following fields:
F00001 - usually 1 (f1) or 9 (f9)
K00001 - usually only 1-3 sub-fields of
zoned decimals and ebcdic
F00002 - sub-fields of ebcdic, zoned and
packed decimals
Occasionally other field names K00002, F00003 and F00004 will appear in cross reference files.
Example Data:
+---------+--------------------------------------------------+--------------------------------------------------------------------------------------------+
| F00001 | K00001 | F00002 |
+---------+--------------------------------------------------+--------------------------------------------------------------------------------------------+
| f1 | f0 f0 f0 f0 f1 f2 f3 f4 f5 f6 d7 c8 | e2 e3 c1 c3 d2 d6 e5 c5 d9 c6 d3 d6 e7 40 12 34 56 7F e2 d2 c5 c5 e3 |
+---------+--------------------------------------------------+--------------------------------------------------------------------------------------------+
Currently using:
SELECT SUBSTR(HEX(F00001), 1, 2) AS FNAME_1, SUBSTR(HEX(K00001), 1, 14) AS KNAME_1, SUBSTR(HEX(K00001), 15, 2) AS KNAME_2, SUBSTR(HEX(K00001), 17, 2) AS KNAME_2, SUBSTR(HEX(F00002), 1, 28) AS FNAME_2, SUBSTR(HEX(F00002), 29, 8) AS FNAME_3, SUBSTR(HEX(F00002), 37, 10) AS FNAME_4 FROM QS36F.FILE
Is this the best way to unpack EBCDIC values as strings?

You asked for 'the best way'. Manually fiddling the bytes is categorically NOT the best way. #JamesA has a better answer: Externally describe the table and use more traditional SQL to access it. I see in your comments that you have multiple layouts within the same table. This was typical years ago when we converted from punched cards to disk. I feel your pain, having experienced this many times.
If you are using SQL to run queries, I think you have several options, all of which revolve around having a sane DB2 table instead of a jumbled S/36 flat file. Without more details on the business problem, all we can do is offer suggestions.
1) Add a trigger to QS36F.FILE that will break out the intermingled records into separate SQL defined tables. Query those.
2) Write some UDFs that will pack and unpack numbers. If you're querying today, you'll be updating tomorrow and if you think you have some chance of maintaining the raw HEX(this) and HEX(that) for SELECTS, wait until you try to do an UPDATE that way.
3) Write stored procedures that will extract out the bits you need for a given query, put them into SQL tables - maybe even a GLOBAL TEMPORARY TABLE. Have the SP query those bits and return a result set that can be consumed by other SQL queries. IBM i supports user defined table functions as well.
4) Have the RPG team write you a conversion program that will read the old file and create a data warehouse that you can query against.

It almost looks as if old S/36 files are being accessed and the system runs under CCSID 65535. That could cause the messy "hex"representation issue as well as at least some of the column name issues. A little more info about the server environment would be useful.

Related

Postgresql performing partitioning to find time difference

I am trying to fill column D and column E.
Column A: varchar(64) - unique for each trip
Column B: smallint
Column C: timestamp without time zone (excel messed it up in the
image below but you can assume this as timestamp column)
Column D: numeric - need to find out time from origin in minutes
column E: numeric - time to destination in minutes.
Each trip has got different intermediate stations and I am trying to figure out the time it has been since origin and time to destination
Cell D2 = C2 - C2 = 0
cell D3 = C3 - C2
Cell D4 = C4 - C2
Cell E2 = E6 - E2
Cell E3 = E6 - E3
Cell E6 = E6 - E6 = 0
The main issue is that each trip contains differnt number of stations for each trip_id. I can think about using partition by column but cant figure out how to implement it.
Another sub question: I am dealing with very large table (100 million rows). What is the best way Postgresql experts implement data modification. Do you create like a sample table from the original data and implement everything on the sample before implementing the modifications on the original table or do you use something like "Begin trasaction" on the original data so that you can rollback in case of any error.
PS: Help with question title appreciated.
you don't need to know the number of stops
with a as (select *,extract(minutes from c - min(c) over (partition by a)) dd,extract(minutes from max(c) over (partition by a) - c) ee from td)
update td set d=dd, e=ee
from a
where a.a = td.a and a.b=td.b
;
http://sqlfiddle.com/#!17/c9112/1

How do I run md5() on a bigint in Presto?

select md5(15)
returns
Query failed (#20160818_193909_00287_8zejd): line 1:8:
Unexpected parameters (bigint) for function md5. Expected: md5(varbinary)
How do I hash 15 and get back a string? I'd like to select 1 in 16 items at random, e.g. where md5(id) like '%3'.
FYI I might be on version 0.147, don't know how to tell.
FYI I found this PR. md5 would be cross-platform, which is nice, but I'd take a Presto-dependent hash function that spread ids relatively uniformly. I suppose I could implement my own linear formula. Seems awkward.
Best thing I could come up with was to cast the integer as a varchar, then turn it into varbinary via utf8, then apply md5 on the varbinary:
presto> select md5(to_utf8(cast(15 as varchar)));
_col0
-------------------------------------------------
9b f3 1c 7f f0 62 93 6a 96 d3 c8 bd 1f 8f 2f f3
(1 row)
If this is not the result you get, you can always turn it into a hex string manually:
presto> select to_hex(md5(to_utf8(cast(15 as varchar))));
_col0
----------------------------------
9BF31C7FF062936A96D3C8BD1F8F2FF3
(1 row)

Oracle SQL - Combine results from two columns

I am seeking to combine the results of two columns, and view it in a single column:
select description1, description2 from daclog where description2 is not null;
results two registry:
1st row:
DESCRIPTION1
Initialization scans sent to RTU 1, 32 bit mask: 0x00000048. Initialization mask bits are as follows: B0 - status dump, B1 - analog dump B2 - accumulator dump, B3 - Group Data Dump, B4 - accumulat
(here begin DESCRIPTION2)
,or freeze, B5 - power fail reset, B6 - time sync.
2nd row:
DESCRIPTION1
Initialization scans sent to RTU 1, 32 bit mask: 0x00000048. Initialization mask bits are as follows: B0 - status dump, B1 - analog dump B2 - accumulator dump, B3 - Group Data Dump, B4 - accumulat
(here begin DESCRIPTION2)
,or freeze, B5 - power fail reset, B6 - time sync.
Then I need the value of description1 and description2, on the same column.
It is possible?
Thank you!
You can combine two columns into one by using || operator.
select description1 || description2 as description from daclog where description2 is not null;
If you would like to use some substrings from each of the descriptions, you can use String functions and then combine the results. FNC(description1) || FNC(descriptions2) where FNC might be a function to return the desired substring of your columns.

How to read Date from VB binary-file in another language?

I get a binary file from a VB-application which consists of about 1400 records of a Timestamp in Date format followed by 19 Long values.
I am able to read the data in VBA with this function:
Dim myDate As Date
Dim myLong As Long
iFileNum = FreeFile
Open "C:\test.bin" For Binary Access Read As #iFileNum
Do While Not EOF(iFileNum)
Get iFileNum, , myDate
MsgBox(myDate)
For i = 1 To 19
Get iFileNum, , myLong
Next i
Loop
Now i want to read the Date-timestamps from the file (I am already able to read the Long-values) within java, but i cannot find any information on how to interpret the 8 bytes of the Date-type.
As an example, the first timestamp in binary form is c5 b3 a2 11 80 7b e4 40.
VB output for this is 2014-11-05 0:03:06 AM.
To clarify, I am not looking for a java implementation, but for information of the binary representation of data type 'Date' from VB.
(for which i wasn't able to find any more information as than this http://msdn.microsoft.com/en-us/library/3eaydw6e.aspx , which doesn't help much)
As #RBarryYoung mentions in his comment to the question, VBA stores Date/Time values internally as floating-point Double values. The integer part is the number of days since 1899-12-30 and the fractional part is the time (e.g., 0.25 = 6 AM, 0.5 = Noon, 0.75 = 6 PM).
In the example you gave, the Date/Time (Double) value is stored in bytes as c5 b3 a2 11 80 7b e4 40. Windows has always been a little-endian environment, so by reversing the order of the bytes we know that it corresponds to the 64-bit binary value 0x40e47b8011a2b3c5.
A Binary-to-IEEE_Double convertor like this one tells us that the decimal Double value is 4.19480021527777789742685854435E4, and if we ask VBA what the corresponding Date/Time value is we get
?CDate(41948.0021527777789742685854435)
2014-11-05 00:03:06

Redis and linked hashes

everyone
I would like to ask community of help to find a way of how to cache our huge plain table by splitting it to the multiple hashes or otherwise.
The sample of table, as an example for structure:
A1 B1 C1 D1 E1 X1
A1 B1 C1 D1 E1 X2
A7 B5 C2 D1 E2 X3
A8 B1 C1 D1 E2 X4
A1 B6 C3 D2 E2 X5
A1 B1 C1 D2 E1 X6
This our denormalized data, we don't have any ability to normalize it.
So currently we must perform 'group by' to get required items, for instance to get all D* we perform 'data.GroupBy(A1).GroupBy(B1).GroupBy(C1)' and it takes a lot of time.
Temporarily we had found workaround for this by creating composite a string key:
A1 -> 'list of lines begin A1'
A1:B1 -> 'list of lines begin A1:B1'
A1:B1:C1 -> 'list of lines begin A1:B1:C1'
...
as a cache of results of grouping operations.
The question is how it can be stored efficiently?
Estimated number of lines in denormalized data around 10M records and as in my an example there is 6 columns it will be 60M entries in hash. So, I'm looking an approach to lookup values in O(N) if it's possible.
Thanks.