How to find wrong dates in DB2 - sql

I want all the records with the wrong date from my data base. There are some records dated like 0645-14-10. Please note the data type of the column is VARCHAR.
I have tried with this query :
SELECT * from LTRECT_JOURNALS_T
where DATE_PART (YEAR,CREATE_DATE like '06%')
So how I can I find these kind of records?

You could use a simple test to find dates in the seventh century CE:
SELECT * from LTRECT_JOURNALS_T
where CREATE_DATE < date '0700-01-01'
/

You should cast the string representation of date first and not compare it to a date constant. This doesn't work in Db2, If you uncomment the commented out line and comment out the last one.
WITH LTRECT_JOURNALS_T (CREATE_DATE) AS
(
VALUES '0645-14-10', '2003-14-10', '2002-14-10'
)
SELECT *
FROM LTRECT_JOURNALS_T
WHERE
--CREATE_DATE < date('2003-01-01')
YEAR(TO_DATE(CREATE_DATE, 'YYYY-DD-MM')) < 2003
;

You can use a UDF function that will attempt to convert the string to a date, but capture the error generated if it fails and return false instead.
E.g.
CREATE OR REPLACE FUNCTION IS_DATE(i VARCHAR(64)) RETURNS INTEGER
CONTAINS SQL
ALLOW PARALLEL
NO EXTERNAL ACTION
DETERMINISTIC
BEGIN
DECLARE NOT_VALID CONDITION FOR SQLSTATE '22007';
DECLARE EXIT HANDLER FOR NOT_VALID RETURN 0;
RETURN CASE WHEN CAST(i AS DATE) IS NOT NULL THEN 1 END;
END
Change the statement terminator when creating the above. E.g. use # not ;
On Db2 11.1 or lower, remove the ALLOW PARALLEL line from the above SQL
Then, e.g.
VALUES IS_DATE('0645-14-10')
will return 0, but
VALUES IS_DATE('0645-12-10')
will return 1

Related

Issue While Creating Product of All Values Of Column (UDF in Snowflake)

I was trying to create a Snowflake SQL UDF
Where it computes the Values of the all values and will return the result to the user.
So firstly, i have tried the following approach
# The UDF that Returns the Result.
CREATE OR REPLACE FUNCTION PRODUCT_OF_COL_VAL()
RETURNS FLOAT
LANGUAGE SQL
AS
$$
SELECT EXP(SUM(LN(COL))) AS RESULT FROM SCHEMA.SAMPLE_TABLE
$$
The above code executes perfectly fine....
if you could see above (i have hardcoded the TABLE_NAME and COLUMN_VALUE) which is not i acutally want..
So, i have tried the following approach, by passing the column name dynamically..
create or replace function (COL VARCHAR)
RETURNS FLOAT
LANGUAGE SQL
AS
$$
SELECT EXP(SUM(LN(COL))) AS RESULT from SCHEMA.SAMPLE_TABLE
$$
But it throws the following issue...
Numeric Value 'Col' is not recognized
To elaborate more the Data type of the Column that i am passing is NUMBER(38,6)
and in the background its doing the following work..
EXP(SUM(LN(TO_DOUBLE(COL))))
Does anyone have any idea why this is running fine in Scenario 1 and not in Scenario 2 ?
Hopefully we will be able to have this kind of UDFs one day, in the meantime consider this answer using ARRAY_AGG() and a Python UDF:
Sample usage:
select count(*) how_many, multimy(array_agg(score)) multiplied, tags[0] tag
from stack_questions
where score > 0
group by tag
limit 100
The UDF in Python - which also protects against numbers beyond float's limits:
create or replace function multimy (x array)
returns float
language python
handler = 'x'
runtime_version = '3.8'
as
$$
import math
def x(x):
res = math.prod(x)
return res if math.log10(res)<308 else 'NaN'
$$
;
The parameter you defined in SQL UDF will be evaluated as a literal:
When you call the function like PRODUCT_OF_COL_VAL('Col'), the SQL statement you execute becomes:
SELECT EXP(SUM(LN('Col'))) AS RESULT from SCHEMA.SAMPLE_TABLE
What you want to do is to generate a new SQL based on parameters, and it's only possible using "stored procedures". Check this one:
Dynamic SQL in a Snowflake SQL Stored Procedure

SQL Server: Return a string in a specific format

In TSQL, I need to format a string in a predefined format.
For eg:
SNO
STRING
FORMAT
OUTPUT
1
A5233GFCOP
*XXXXX-XXXXX
*A5233-GFCOP
2
K92374
/X-000XXXXX
/K-00092374
3
H91543987
XXXXXXXXX
H91543987
I am trying with FORMATMESSAGE() built in function.
For ex:
FORMATMESSAGE('*%s-%s','A5233','GFCOP')
FORMATMESSAGE('/%s-000%s','K','92374')
FORMATMESSAGE('%s','H91543987')
I am able to get the first argument by replace function but issue is second/third/fourth/.. arguments.
I don't know how to count respective X's between the various delimiters, so that I can use substring to pass in second/third/.. arguments. If I can count the respective # of X's from the Format column, I feel using substring we can get it but not sure how to count the respective X's.
Please let me know how to get through it or if there is any other simple approach.
Appreciate your help.
Thanks!
It's in theory quite simple, could probably be done set-based using string_split however that's not ideal as the ordering is not guaranteed. As the strings are fairly short then a scalar function should suffice. I don't think it can use function in-lining.
The logic is very simple, create a counter for each string, loop 1 character at a time and pull a character from one or the other into the output depending on if the format string is an X or not.
create or alter function dbo.fnFormatString(#string varchar(20), #format varchar(20))
returns varchar(20)
as
begin
declare #scount int=1, #fcount int=1, #slen int=len(#string), #flen int=Len(#format), #output varchar(20)=''
while #scount<=#slen or #fcount<=#slen
begin
if Substring(#format,#fcount,1)='X'
begin
set #output+=Substring(#string,#scount,1)
select #scount+=1, #fcount +=1
end
else
begin
set #output+=Substring(#format,#fcount,1)
set #fcount +=1
end
end
return #output
end;
select *, dbo.fnFormatString(string, [format])
from t
See working Fiddle

sqlldr - how to use if/then logic on a field?

I am loading a particular field that has date values. However, some of them are not complete... for example the values look like this
START_DATE
'2015-06-12'
'2016-12-24'
'2015-02' <--- this is what causes an error
'2016-01-03'
I have tried solving this by combining NULLIF with a LENGTH() function like so, but this is not allowed:
Start_date NULLIF LENGTH(:start_date)<10 to_date .....
this returns the error
Expecting positive integer or column name, found keyword length.
My main objective is to load dates that are of a proper format, and load NULL otherwise. What is the easiest way to do this within the ctl file? Can I avoid creating a custom function?
Say I have a table like this:
create table dateTable(START_DATE date)
and I need to load this file, where I want to insert NULL where the string does not match my pattern
'2016-12-28'
'2016-12-'
'2016-12-31'
I can add some logic in my ctl file to check the length of the string to load this way:
load data
infile dateTable.csv
into TABLE dateTable
fields enclosed by "'"
( START_DATE "to_date(case when length(:START_DATE) = 10 then :START_DATE end, 'yyyy-mm-dd')"
)
This simply checks the length of the string, but you can edit it anyway you need to build your own logic; notice that CASE gives NULL when no condition is matched, so this is equivalent to case when length(:START_DATE) = 10 then :START_DATE else NULL end.
This gives the following result:
SQL> select * from dateTable;
START_DATE
----------
28-DEC-16
31-DEC-16
In oracle, you can verify a string to make sure that is it valid date or not. Please Check IsDate function.

PostgreSQL: Ignore/Load Null Dates via Select Substring

I am having trouble getting my fixed width Insert Into statement to handle a date field properly. I found some code on here to create an "is_date" function, but not sure how to make PostgreSQL pgAdminIII SQL Query window to recognize it.
Error: function is_date(date) does not exist
What I found on another post on StackOverflow:
create or replace function is_date(s varchar) returns boolean as $$
begin
perform s::date;
return true;
exception when others then
return false;
end;
$$ language plpgsql;
What I wrote, with lots of help!
CREATE TABLE marc (statusdate date);
INSERT INTO marc (statusdate)
SELECT CASE
WHEN is_date(to_date(substring(data,228,60), 'DD/MON/YYYY')) = 1
THEN to_date(substring(data,228,60), 'DD/MON/YYYY')
ELSE NULL
END As statusdate
FROM marctemp;
Can anyone help tell me what I am doing wrong?
You are using to_date() which returns date while your is_date() expects a string. PostgreSQL doesn't do automatic conversions and in this case it would be silly to check if a date is a date since it can't be anything else. Remove the to_date() from inside the is_date() call.
The error message tells you all you need to know: you are passing a date to a function that expects a varchar
CREATE TABLE marc (statusdate date);
INSERT INTO marc (statusdate)
SELECT CASE
WHEN is_date(substring(data,228,60)) = true
THEN to_date(substring(data,228,60), 'DD/MON/YYYY')
ELSE NULL
END As statusdate
FROM marctemp;
boolean values can be true or false so you need to compare the result of is_date() to that. 1 is not a boolean value.
You can actually leave out the = true completely, when is_date(..) then .. is perfectly fine.

SQL: Replacing dates contained within a text string

I am using SQL Server Management Studio 2012. I work with medical records and need to de-identify reports. The reports are structured in a table with columns Report_Date, Report_Subject, Report_Text, etc... The string I need to update is in report_text and there are ~700,000 records.
So if I have:
"patient had an EKG on 04/09/2012"
I need to replace that with:
"patient had an EKG on [DEIDENTIFIED]"
I tried
UPDATE table
SET Report_Text = REPLACE(Report_Text, '____/___/____', '[DEIDENTIFED]')
because I need to replace anything in there that looks like a date, and it runs but doesn't actually replace anything, because apparently I can't use the _ wildcard in this command.
Any recommendations on this? Advance thanks!
You can use PATINDEX to find the location of Date and then use SUBSTRING and REPLACE to replace the dates.
Since there may be multiple dates in the Text you have to run a while loop to replace all the dates.
Below sql will work for all dates in the form of MM/DD/YYYY
WHILE EXISTS( SELECT 1 FROM dbo.MyTable WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0 )
BEGIN
UPDATE t
SET Report_Text = REPLACE(Report_Text, DateToBeReplaced, '[DEIDENTIFIED]')
FROM ( SELECT * ,
SUBSTRING(Report_Text,PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text), 10) AS DateToBeReplaced
FROM dbo.MyTable AS a
WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0
) AS t
END
I have tested the above sql on a dummy table with few rows.I don't know how it will scale for your data but recommend you to give it a try.
To keep it simple, assume that a number represents an identifying element in the string so look for the position of the first number in the string and the position of the last number in the string. Not sure if this will apply to your entire set of records but here is the code ...
I created two test strings ... the one you supplied and one with the date at the beginning of the string.
Declare #tstString varchar(100)
Set #tstString = 'patient had an EKG on 04/09/2012'
Set #tstString = '04/09/2012 EKG for patient'
Select #tstString
-- Calculate 1st Occurrence of a Number
,PATINDEX('%[0-9]%',#tstString)
-- Calculate last Occurrence of a Number
,LEN(#tstString) - PATINDEX('%[0-9]%',REVERSE(#tstString))
,CASE
-- No numbers in the string, return the string
WHEN PATINDEX('%[0-9]%',#tstString) = 0 THEN #tstString
-- Number is the first character to find the last position and remove front
WHEN PATINDEX('%[0-9]%',#tstString) = 1 THEN
CONCAT('[DEIDENTIFIED]',SUBSTRING(#tstString, LEN(#tstString)-PATINDEX('%[0-9]%',REVERSE(#tstString))+2,LEN(#tstString)))
-- Just select string up to the first number
ELSE CONCAT(SUBSTRING(#tstString,1,PATINDEX('%[0-9]%',#tstString)-1),'[DEIDENTIFIED]')
END AS 'newString'
As you can see, this is messy in SQL.
I would rather achieve this with a parser service and move the data with SSIS and call the service.