How Do I Convert a Y2K Date to SQL DATE In SSIS? - sql

I am using SSIS (SQL 2008) to bring data over from an AS400. The date values are stored in the 400 as a 7 digit numeric. Here is the format: "CYYMMDD" C is a "century digit" where 0 = 1900 and 1 = 2000. I have been looking into derived columns and script components. I am very new to SSIS and all the casting required compounded with different cases is making me a dull boy. Also, I am losing leading zeros. I am not sure if that is b/c they are numeric type and I would see them correctly if I cast as string or not. Below is what I am seeing in SQL after a direct pull from the 400 using SSIS.
AS400 = Actual
101 01/01/1900 (I think these are "unknown" dates)
1231 12/31/1900 (I think these are "unknown" dates)
20702 07/02/1902
151231 12/31/1915
1000102 01/02/2000
1110201 02/01/2011

You should be able to use this expression
(DT_DBDATE) ((DT_STR) (AS400 + 19000000))

Firstly, add the leading zero's in a derived column task:
RIGHT("000000000" + (DT_STR,10,1252)AS400,7)
Pass this to another Derived column task, and use an expression to perform the conversion depending on the century digit, something like:
SUBSTRING([Derived Column 2],1,1) == "0" ? (DT_UI4)[Derived Column 2] + 19000000 : (DT_UI4)SUBSTRING([Derived Column 2],2,8) + 20000000
Which should give you something like 20110201. You can then convert this, or shred it into date parts as required.

Neither of the two answers were 100%, but both helped me to figure out the prob. Not sure whom to mark as "correct" Here is what I did. Had to do 2 derived columns.
1. ((DT_WSTR,8)(<<AS400>> + 19000000))
2. (DT_DBDATE)(SUBSTRING(DCDateString,1,4) + "-" + SUBSTRING(DCDateString,5,2) + "-" + SUBSTRING(DCDateString,7,2))

select substring(t1.datefield,4,2)|| '/' || substring(t1.datefield,6,2) || '/' || (cast(substring(t1.datefield,1,3) as integer) + 1900) as RegularDate from db.table1 t1

If I recall correctly, the century mark date didn't go back to 1900, but rather had to do with an arbitrary date. You might want to check the AS400 redbooks related to the Y2K dates. I programmed on the AS400 from 1992 - 2005. The break years are 1939 & 2039. Date before or after these respectively fail under the century mark system. This is because IBM decided that any two digit year greater than 39 referred to the 1900's, anything less than or equal to 39 referred to the 2000's. If you are dealing with future dates, this might cause a snag.

Related

Is there any way to use CASE in WHERE clause in Oracle?

I have the following data in DB which supposed to be a date but in number format:
date_col:
20200130.95656
20200910.85213
0
20220101.55412
0
0
20220730.85626
Now I need to format the dates to mm/dd/yyyy so I used the following command in oracle:
to_char(to_date(trunc(date_col),'yyyymmdd'),'mm/dd/yyyy')
to output it like this 07/30/22. But the problem I still need to get the 0 values in the date_col and at the same time to only get the dates between 09/10/2020 to 07/30/2022 so I was wondering if there is a way to do this in Oracle?
I can filter the dates by using the following code:
SELECT char(to_date(trunc(date_col),'yyyymmdd'),'mm/dd/yyyy')
FROM table1 WHERE
date_col != 0
AND char(to_date(trunc(date_col),'yyyymmdd'),'mm/dd/yyyy') BETWEEN '09/10/2020' AND '07/30/2022';
but I need to get the 0 at the same time. I'm thinking on using CASE on the WHERE clause but don't know how. I Googled but doesn't seems to answer my question. I'm only used to using CASE on the SELECT and not on WHERE so I'm wondering if there is a trick on using it on WHERE.
Update
Output should be:
data_col
________________
- 09/10/2020
- 0
- 01/01/2022
- 0
- 0
- 07/30/2022
Date is on number because of vendor who created the formatting of the date..
Before we get into this, there are a few things I need to point out here. I know you may not have much influence on the schema design, but you need to understand you are working with a BROKEN SCHEMA.
Building on this, it bears repeating internationalization/cultural issues mean converting these numbers to strings and then to dates is about the slowest and most error-prone way possible to accomplish the conversion to date values.
Finally, the BETWEEN '09/10/2020' AND '07/30/2022' comparison is horribly broken. The left-hand side casts back to char and so you are doing string comparisons instead of date comparisons. There is no value that will ever satisfy this condition, because when evaluating the boundary strings the 0s are equal, the 9 is already greater than the 7, and that's as far as you'll ever get.
That out of the way, you can use CASE in a WHERE clause. However, remember that CASE is not like an if() statement. You don't use it to select code branches. It is an expression... it produces a value. Therefore, to use CASE with a WHERE clause, you want something to compare with that value result. For example: WHERE 0 = CASE WHEN SomeColumn=1 THEN 0 ELSE 1 END
But we don't need CASE here. This will do the job:
SELECT char(to_date(trunc(date_col),'yyyymmdd'),'mm/dd/yyyy')
FROM table1
WHERE date_col = 0 OR date_col BETWEEN 20200910.0 AND 20220730.99999
This will perform much better, not only because it avoids the extra conversions, but in leaving the column values intact for the comparison it also preserves the ability to use indexes with the column, which cuts to the core of database performance.
You could use this. If your date_col have no decimal part you could use BETWEEN instead
SELECT TO_CHAR(TO_DATE(TRUNC(date_col),'yyyymmdd'),'mm/dd/yyyy')
FROM table1
WHERE date_col = 0
OR (date_col >= 20200910 AND date_col < 20200730);

Microsoft Query displaying data correctly but won't return data to Excel- Error SQ20448

Good morning,
I have a semi-functioning SQL query to pull data from the IBM AS/400 into Microsoft Excel. The data displays correctly inside Microsoft Query, but when I click on "Return Data" to return the data to Excel, I receive the following error message:
[IBM][System i Access ODBC Driver][DB2 for i5/OS]SQ20448 - Expression not
valid using format string specified for TIMESTAMP_FORMAT.
Essentially, my code is joining a few files to provide me with item information and uses dates to remove duplicates so that I can get the most recent transactions.
I use practically an identical code for other files that functions correctly, so I suspect there may be an incorrectly formatted date in my original data, which would cause an error when converting the IBM date to the Microsoft-compatible date with the TO_DATE function. I know that my date conversion works correctly, given the IBM date is an actual date.
My question is, how could I code in an exception to either ignore incorrectly formatted data, or how could I return data that is formatted incorrectly so that I could write an exception in my code?
Here is my code (hopefully the comments are helpful):
SELECT xh.ITNBR, yh.VNDNR, zh.VN35VM, yh.BUYNO, xh.Create_Date -- Item #, Vendor #, Vendor Name, Buyer, Create_Date
FROM
(SELECT PO_IH.ITNBR, -- Item #
max(TO_DATE((CONCAT(
CONCAT(
(CONCAT(SUBSTRING(PO_MH.ACTDT,4,2), '/')),
(CONCAT(SUBSTRING(PO_MH.ACTDT,6,2), '/'))),
SUBSTRING(PO_MH.ACTDT,2,2))), 'MM/DD/YY')) as Create_Date -- Converts IBM date format to work with Microsoft Query
FROM POHISTI as PO_IH, POHSTM as PO_MH
WHERE PO_IH.ORDNO = PO_MH.ORDNO AND
(TO_DATE((CONCAT(
CONCAT(
(CONCAT(SUBSTRING(PO_MH.ACTDT,4,2), '/')),
(CONCAT(SUBSTRING(PO_MH.ACTDT,6,2), '/'))),
SUBSTRING(PO_MH.ACTDT,2,2))), 'MM/DD/YY')) =
(SELECT MIN((TO_DATE((CONCAT(
CONCAT(
(CONCAT(SUBSTRING(PO_MH.ACTDT,4,2), '/')),
(CONCAT(SUBSTRING(PO_MH.ACTDT,6,2), '/'))),
SUBSTRING(PO_MH.ACTDT,2,2))), 'MM/DD/YY'))) -- All of this chaos basically removes duplicate information and converts IBM date
FROM POHSTM as PO_MH2
WHERE PO_MH.ORDNO = PO_MH2.ORDNO AND
PO_MH.ACTDT NOT LIKE '0%' AND -- Removes dates that start with 0 (i.e. IBM's way of saying "no date")
PO_MH.ACTDT NOT LIKE '9%') -- Removes dates from 20th century
GROUP BY PO_IH.ITNBR) xh
LEFT JOIN
(SELECT PO_IH2.ITNBR, -- Item #
PO_IH2.BUYNO, -- Buyer
PO_IH2.VNDNR, -- Vendor
(TO_DATE((CONCAT(
CONCAT(
(CONCAT(SUBSTRING(PO_MH2.ACTDT,4,2), '/')),
(CONCAT(SUBSTRING(PO_MH2.ACTDT,6,2), '/'))),
SUBSTRING(PO_MH2.ACTDT,2,2))), 'MM/DD/YY')) as Create_Date
FROM POHISTI as PO_IH2, POHSTM as PO_MH2
WHERE PO_IH2.ORDNO = PO_MH2.ORDNO AND
PO_MH2.ACTDT NOT LIKE '0%' AND -- Removes dates that start with 0 (i.e. IBM's way of saying "no date")
PO_MH2.ACTDT NOT LIKE '9%') yh -- Removes dates from 20th century
ON xh.ITNBR = yh.ITNBR AND xh.Create_Date = yh.Create_Date
LEFT JOIN VENNAML0 zh -- Vendor Name
ON yh.VNDNR = zh.VNDRVM
Here is the output in Microsoft Query:
ITNBR VNDNR VN35VM BUYNO CREATE_DATE
A-FUL 76 HOLLAND COMP SUSY 2016-12-06 00:00:00.000000
A-MINI 76 HOLLAND COMP SUSY 2016-11-28 00:00:00.000000
A-SHIMBOX 76 HOLLAND COMP SUSY 2014-10-16 00:00:00.000000
A-001 76 HOLLAND COMP SUSY 2016-12-19 00:00:00.000000
A-002 76 HOLLAND COMP SUSY 2016-12-19 00:00:00.000000
....
Like I said, the information displays perfectly in Microsoft Query but when I return it to Excel I get the above error. I tried using the above "NOT LIKE" statements to deal with the two most common errors, but I'm at a loss as to how to find other errors.
I don't really care if I get bad data, as long as it dumps into Excel. At that point I can correct it. But I suspect that if Microsoft Query can't convert a date, it won't return the data to Excel.
Thanks.
You have a numeric field that is formatted like CYYMMDD. This is a common date format in the IBM i world where C is a century code (0=>19, 1=>20, 2=>21, ..., 9=>28). This came about because most decimal dates were stored in packed decimal format with 2 digit years before the Y2K remediation. Packed decimal always has room for an odd number of digits due to the configuration of the format, but most dates were defined as (6,0) (length,decimal places). This left room on disk for one extra digit to the left of the date, and folks could define the dates as (7,0) without changing the format of the data in the records. thus the 7 digit date was born as a result of Y2K. Synon was the first company I am aware of that did this in their 2E code generator. It was quite popular, and the format is everywhere. It even found its way into the IBM operating system.
So when SQL casts that to CHAR a date like 0951107 will look like 951107, and when SQL casts 1171107 to CHAR, it will look like 1171107. Unfortunately, NOT LIKE '9%' will be unreliable at some point, because that leading 9 could be the first non-zero digit of a 1990 era date, or it could be a 2800 era date. Even worse, 1900's dates chast to a CHAR could start with anything from 1-9. For example a date of 1985/12/05 would look like 0851205 in CYYMMDD digit format. That would be cast to 851205. So when dealing with dates you need to use DIGITS to cast the number to a CHAR so you don't loose the leading 0 character. And you need to test your date field for 0 which is literally 0000000 (not 00/00/00 even though that is what it looks like when it is formatted with EDITC(Y)).
Here is an example of what is happening:
create table datetest
(decdt decimal(7,0));
insert into datetest
values (0), (941107), (1170304), (1000101);
select substr(decdt,4,2) || '/' ||
substr(decdt,6,2) || '/' ||
substr(decdt,2,2),
decdt
from datetest;
results in:
/ / 0
10/7 /41 941,107
03/04/17 1,170,304
01/01/00 1,000,101
I bet your procedure TO_DATE() is not handling invalid dates properly as they are almost certainly being passed. If you use digits in the substr function, you will get something more sane.
select substr(digits(decdt),4,2) || '/' ||
substr(digits(decdt),6,2) || '/' ||
substr(digits(decdt),2,2),
decdt
from datetest
results in:
00/00/00 0
11/07/94 941,107
03/04/17 1,170,304
01/01/00 1,000,101
Notice that the month day and year parts are all in the correct place now, and all you have to code for is the 0 date which means no date. In any case the function to_date() needs to detect an invalid date and either ignore it, or set it to something usable like 0001-01-01 or null.

HIVE. SQL. Calculating time difference between string

I have data about users showing up online. In my query I need to select only those between 13:00:00 and 14:00:00.
The data rows about time look like:
170214074534 where it is YYMMDDHHMMSS - 14 February 2017, 07:45:34
Can you help me with the query part please?
I think it should be easier to find a way without converting it to timedate format. Another way seems to be to ignore first 6 symbols and select data between 130000 and 135959.
You can use string functions for this:
where substr(col, 7, 2) = '13'
I would also suggest that you fix your data format. That is an arcane way of storing date/time values.

IBM i Date Diff with CYYMMDD - can't use DATE()

(title edited)
Good afternoon, all!
Using IBM i version 7.1 and looking to calculate difference between two dates in a query. Since nothing is ever easy, one date is in CYYMMDD format, the other (curdate()) is YYYY-MM-DD. I tried to CAST my CYYMMDD formatted date (field name APENGD) as a varchar(10) then wrapped that in a CAST as a date (since decimals can't be CASTed as dates):
Cast(Cast(APENGD + 19000000 As varchar(10)) As date) As math
but I only see a result ++++++++++++++ for whatever reason. I was able to test a few different versions of this and found I can't use DATE anywhere...can anyone suggest an alternative??
Thanks in advance!
Matt
casting varchar to date only works when the string includes separators.
At 7.1 you could use TIMESTAMP_FORMAT(), but you'd end up with a timestamp instead of just a date. But that's easily dealt with.
Date(Timestamp_format(char(APENGD + 19000000),'YYYYMMDD')) As math
My prefered solution when dealing with numeric/character value dates is creating a User Defined Function to handle conversion.
You could write your own, or use the one I do. iDate written by Alan Campin. Then your code would simple be:
idate(APENGD,'*CYMD') as nath
Note that if you're trying to use date differences in a WHERE clause, like so
WHERE CURRENT_DATE - 3 months <= idate(APENGD,'*CYMD')
The above will perform poorly since an existing index over APENGD can't be used (directly). Assuming a recent(6.1+) version of the OS, you can create a new index that includes the expression you're using to convert APENGD to date.
Or you could code it using the Date->Numeric function ConvertToIdate that Alan helpfully includes. That would allow existing indexes to be used.
WHERE ConvertToiDate(CURRENT_DATE - 3 months,'*CYMD') <= APENGD
The DDL was not offered [to define the column APENGD]. No matter, as the following should suffice, mostly irrespective the definition; either as a string or as a zero-scale numeric. The effect depends on the SQL recognition of a 14-character [up to 26-character, since some v7 release] character-string as an unformatted [i.e. lacking any delimiters, thus digits-only] TIMESTAMP representation:
date(timestamp((APENGD + 19000000) concat '000000'))
IBM i 7.3->Database->Reference->SQL reference->Language elements->Data types->Datetime values->String representations of datetime values->Timestamp strings
A string representation of a timestamp is a character or a Unicode graphic string that starts with a digit and has a length of at least 14 characters. …
If you want calculate difference between 2 dates, you can use:
`TIMESTAMPDIFF(32, cast(MYTIMESTAMP1 - MYTIMESTAMP2 as char(22)))`
The first argument of function specify the type of result.
1 : millisecond
16 : days
2 : second
32 : week
4 : minutes
64 : month
8 : hour
128 : trimester
256 : Year

How to add Leading Zeroes and Decimal Points at the same time in SQL?

I have tried many combinations of the SQL functions; so as to have a 12 digit number including the dot character, including leading zeroes and decimal points.
For example:
for the number 121.22, I want to format it to 000000121.22
or for the number 12.2, I want to format it to 000000012.20
or for the number 100, I want to format it to 000000100.00
I have used the following function; but I lost the decimal points if it's zero.
SELECT RIGHT('000000000000'+ STR(CONVERT(VARCHAR,MYNUMBER),12,2),12);
Any idea on how to solve this problem in Microsoft SQL?
If you're on SQL Server 2012 or later, you can use the format() function.
SELECT FORMAT(121.22, '000000000000.00')
SELECT FORMAT(12.2, '000000000000.00')
000000000121.22
000000000012.20
for ms sql versions not in (2012,2014):
cast(right('000000000',9-len(floor(the_number))) as varchar)
+ cast( cast(the_number as decimal(10,2))as varchar)
for ms sql versions in (2012,2014):
format(the_number ,'000000000000.00')
SELECT padded_id = REPLACE(STR(id, 12), SPACE(1), '0')
Is what I add to use (In SQL server) leading 0's as needed, change the 12 to whatever total number of digits you want it to be.
This allows for non hard coded values, just make sure id or whatever column/param you want to format is set.