sqlldr - how to use if/then logic on a field? - sql

I am loading a particular field that has date values. However, some of them are not complete... for example the values look like this
START_DATE
'2015-06-12'
'2016-12-24'
'2015-02' <--- this is what causes an error
'2016-01-03'
I have tried solving this by combining NULLIF with a LENGTH() function like so, but this is not allowed:
Start_date NULLIF LENGTH(:start_date)<10 to_date .....
this returns the error
Expecting positive integer or column name, found keyword length.
My main objective is to load dates that are of a proper format, and load NULL otherwise. What is the easiest way to do this within the ctl file? Can I avoid creating a custom function?

Say I have a table like this:
create table dateTable(START_DATE date)
and I need to load this file, where I want to insert NULL where the string does not match my pattern
'2016-12-28'
'2016-12-'
'2016-12-31'
I can add some logic in my ctl file to check the length of the string to load this way:
load data
infile dateTable.csv
into TABLE dateTable
fields enclosed by "'"
( START_DATE "to_date(case when length(:START_DATE) = 10 then :START_DATE end, 'yyyy-mm-dd')"
)
This simply checks the length of the string, but you can edit it anyway you need to build your own logic; notice that CASE gives NULL when no condition is matched, so this is equivalent to case when length(:START_DATE) = 10 then :START_DATE else NULL end.
This gives the following result:
SQL> select * from dateTable;
START_DATE
----------
28-DEC-16
31-DEC-16

In oracle, you can verify a string to make sure that is it valid date or not. Please Check IsDate function.

Related

Timestamp comparison is failing in spark SQL in databricks

I was executing below simple Spark-SQL code azure databricks.
val df2=spark.sql(
s"""
select
mbrgm.mbrgm_id as case_id,
case
when mbr_hist.meck is not null
and mbr_hist.efdt is not null
and mbr_hist.efdt <= mbr_pgm.credttm
and (
mbr_hist.exp_dt is null
or mbr_hist.exp_dt > mbrgm.creat_dttm
) then mbr_hist.meck
else mbrgm.facmbid
end as mb_fid,
.....
from
tempview1 mbrgm
left join left outer join tempview2 mbr_hist on (mbrgm.mrid = mbr_hist.mrid
and mbr_hist.efdt <= mbrgm.credttm
and mbr_hist.exdt > mbrgm.credttm
Every time I execute I get else condition value for mb_fid field i.e, mbrgm.facmbid. I have checked My data and compared with logic. As per logic it should go for then condition. I think while comparing mbr_hist.efdt <= mbr_pgm.credttm it is always not true.
I am having mbr_hist.efdt as a String type ex: 2017-07-22 21:58:46 and mbr_pgm.credttm as a timestamp ex:2011-08-13T11:00:00.910+0000. Is it like because of different in length of values ,my comparison is failing. What I can use to compare correctly.
Databricks can't directly compare the string with timestamp. You need to convert your string into the timestamp. By default, cast works only with strings in the ISO 8601 format, so you need to use the to_timestamp function with explicit date/time pattern to do the conversion.
like
select to_timestamp(mbr_hist.efdt, 'pattern') as efdt ...

How to find wrong dates in DB2

I want all the records with the wrong date from my data base. There are some records dated like 0645-14-10. Please note the data type of the column is VARCHAR.
I have tried with this query :
SELECT * from LTRECT_JOURNALS_T
where DATE_PART (YEAR,CREATE_DATE like '06%')
So how I can I find these kind of records?
You could use a simple test to find dates in the seventh century CE:
SELECT * from LTRECT_JOURNALS_T
where CREATE_DATE < date '0700-01-01'
/
You should cast the string representation of date first and not compare it to a date constant. This doesn't work in Db2, If you uncomment the commented out line and comment out the last one.
WITH LTRECT_JOURNALS_T (CREATE_DATE) AS
(
VALUES '0645-14-10', '2003-14-10', '2002-14-10'
)
SELECT *
FROM LTRECT_JOURNALS_T
WHERE
--CREATE_DATE < date('2003-01-01')
YEAR(TO_DATE(CREATE_DATE, 'YYYY-DD-MM')) < 2003
;
You can use a UDF function that will attempt to convert the string to a date, but capture the error generated if it fails and return false instead.
E.g.
CREATE OR REPLACE FUNCTION IS_DATE(i VARCHAR(64)) RETURNS INTEGER
CONTAINS SQL
ALLOW PARALLEL
NO EXTERNAL ACTION
DETERMINISTIC
BEGIN
DECLARE NOT_VALID CONDITION FOR SQLSTATE '22007';
DECLARE EXIT HANDLER FOR NOT_VALID RETURN 0;
RETURN CASE WHEN CAST(i AS DATE) IS NOT NULL THEN 1 END;
END
Change the statement terminator when creating the above. E.g. use # not ;
On Db2 11.1 or lower, remove the ALLOW PARALLEL line from the above SQL
Then, e.g.
VALUES IS_DATE('0645-14-10')
will return 0, but
VALUES IS_DATE('0645-12-10')
will return 1

Why does my update query to replace string not work?

I have an Access table where I have transaction IDs in the below format:
Transaction_ID
39296165-1
39296165-2
39296165-3
39284029-1
39284029-2
I am trying to write a query which finds the dash and removes the -1,-2,-3 etc., so I can then de-duplicate based on the string before the dash.
I've written the below:
UPDATE mytable
SET Transaction_ID=Left(Transaction_ID,InStr(1,Transaction_ID,"-")-1)*
Which works fine, however, when it comes across a Transaction_ID which doesn't have a dash in the string, it gives me a type conversion and replaces the string with a blank value.
Any advice on error-trapping this?
Add a WHERE clause to only update if InStr does not return -1:
WHERE InStr(1,Transaction_ID,"-") > 0
This would also work and would be more efficient.
WHERE Transaction_ID LIKE "*-*"

Search Through All Between Values SQL

I have data following data structure..
_ID _BEGIN _END
7003 99210 99217
7003 10225 10324
7003 111111
I want to look through every _BEGIN and _END and return all rows where the input value is between the range of values including the values themselves (i.e. if 10324 is the input, row 2 would be returned)
I have tried this filter but it does not work..
where #theInput between a._BEGIN and a._END
--THIS WORKS
where convert(char(7),'10400') >= convert(char(7),a._BEGIN)
--BUT ADDING THIS BREAKS AND RETURNS NOTHING
AND convert(char(7),'10400') < convert(char(7),a._END)
Less than < and greater than > operators work on xCHAR data types without any syntactical error, but it may go semantically wrong. Look at examples:
1 - SELECT 'ab' BETWEEN 'aa' AND 'ac' # returns TRUE
2 - SELECT '2' BETWEEN '1' AND '10' # returns FALSE
Character 2 as being stored in a xCHAR type has greater value than 1xxxxx
So you should CAST types here. [Exampled on MySQL - For standard compatibility change UNSIGNED to INTEGER]
WHERE CAST(#theInput as UNSIGNED)
BETWEEN CAST(a._BEGIN as UNSIGNED) AND CAST(a._END as UNSIGNED)
You'd better change the types of columns to avoid ambiguity for later use.
This would be the obvious answer...
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #theInput between a._BEGIN and a._END
If the data is string (assuming here as we don't know what DB) You could add this.
Declare #searchArg VARCHAR(30) = CAST(#theInput as VARCHAR(30));
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #searchArg between a._BEGIN and a._END
If you care about performance and you've got a lot of data and indexes you won't want to include function calls on the column values.. you could in-line this conversion but this assures that your predicates are Sargable.
SELECT * FROM myTable
WHERE
(CAST(#theInput AS char) >= a._BEGIN AND #theInput < a.END);
I also saw several of the same type of questions:
SQL "between" not inclusive
MySQL "between" clause not inclusive?
When I do queries like this, I usually try one side with the greater/less than on either side and work from there. Maybe that can help. I'm very slow, but I do lots of trial and error.
Or, use Tony's convert.
I supposed you can convert them to anything appropriate for your program, numeric or text.
Also, see here, http://technet.microsoft.com/en-us/library/aa226054%28v=sql.80%29.aspx.
I am not convinced you cannot do your CAST in the SELECT.
Nick, here is a MySQL version from SO, MySQL "between" clause not inclusive?

Microsoft Access SQL Date Comparison

I am using Access 2007.
I need to return rows with a date/time field falling within a date range to be specified in query parameters.
The following doesn't error out, but doesn't appear to work.
SELECT FIELDS FROM FOO
WHERE (FOO.CREATED_DTG BETWEEN [START_DTG] And [END_DTG]);
Likewise this doesn't work for me
SELECT FIELDS FROM FOO
WHERE (FOO.CREATED_DTG >= [START_DTG] And FOO.CREATED_DTG < [END_DTG]);
How can I get this to work?
Update: Using CDate doesn't seem to make a difference.
Is BLAH the name of a field or a table? As you SELECT BLAH I imagine it names a field, but then BLAH.CREATED_DTG makes no sense -- do you mean FOO.CREATED_DTG perchance?
Does your dates start and end with a #?
also you have <= and >= ... you probably only want = on one of these operators.
Are you sure the CREATED_DTG field is Date format?
Have you tried
WHERE (FOO.CREATED_DTG BETWEEN #01/01/1971# And #07/07/2009#);
(or whatever is appropriate in the way of dates -- the point is, not a parameter query)
Are [START____DTG] and [END____DTG] fields in the table FOO, or are they parameters? If they are parameters, then you need to declare their type in order to get validation of the input values. If so, you should add this before the first line of your SELECT statement:
PARAMETERS [START_DTG] DateTime, [END_DTG] DateTime;