retrieve different format of date values with time from string (t-sql) - sql

I have a requirement where i have to pull the date/time value from string but the problem is that they can be different formats because of which substring becomes more complicated.
Here's what i came up with but is there any other method where i could simply retreive dates of different format with time and convert them all in single format?
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp
CREATE TABLE #temp (
comments varchar(500)
)
insert into #temp (comments)
(
select 'Mailed on 1/1/22 at 5 pm'
union
select 'Mailed on 01/2/2222 # 6 am'
union
select 'Mailed on 01/2/22 in night'
union
select 'Mailed on 1/02/2222 at 4 pm'
union
select 'Mailed on 1/1/2222 at 4 pm'
);
select *
from #temp
cross apply (select PATINDEX('%Mailed On%',comments) as start_pos) as start_pos
cross apply (select case when substring(comments,patindex('%Mailed On%',comments)+9,11) like '%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' then 1
when substring(comments,patindex('%Mailed On%',comments)+9,8) like '%[0-9][0-9]/[0-9]/[0-9][0-9]%' then 2
when substring(comments,patindex('%Mailed On%',comments)+9,10) like '%[0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%' then 3
when substring(comments,patindex('%Mailed On%',comments)+9,9) like '%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%' then 4
when substring(comments,patindex('%Mailed On%',comments)+9,9) like '%[0-9]/[0-9]/[0-9][0-9][0-9][0-9]%' then 5
when substring(comments,patindex('%Mailed On%',comments)+9,7) like '%[0-9]/[0-9]/[0-9][0-9]%' then 6 else null end as substr) as substr
--cross apply (select case when substring(authcomments,start_pos + 9, 11) like '%[1-9]/[0123][0-9]/[0-9][0-9][0-9][0-9]%' then 1 else null end as substr) as substr
cross apply (select case when substr = 1 then substring(comments,patindex('%Mailed On%',comments)+9,11)
when substr = 2 then substring(comments,patindex('%Mailed On%',comments)+9,8)
when substr = 3 then substring(comments,patindex('%Mailed On%',comments)+9,10)
when substr = 4 then substring(comments,patindex('%Mailed On%',comments)+9,9)
when substr = 5 then substring(comments,patindex('%Mailed On%',comments)+9,9)
when substr = 6 then substring(comments,patindex('%Mailed On%',comments)+9,7)
else null end as maileddate
) as maileddate

#user1672315 ,
Sometimes you get stuff like this and in order to fix it so that you can get the dates and times to store in a table or whatever, ya gotta do what ya gotta do to get it and, contrary to the comments, it certainly CAN be done in SQL. It's just not that difficult. Ya just gotta know some of the "gazintas" ;)
So, using the readily consumable test data that you were nice enough to provide, run the following code against it...
SELECT t.*
,TheDateAndTime = DATEADD(hh,ca4.cHour,ca3.cDate)
FROM #temp t
CROSS APPLY(VALUES(SUBSTRING(comments,PATINDEX('%[0-9]%',comments),500))) ca1(DT)
CROSS APPLY(VALUES(SUBSTRING(ca1.dt,PATINDEX('% [0-9]%',ca1.dt),500))) ca2(TM)
CROSS APPLY(VALUES(TRY_CONVERT(DATETIME,SUBSTRING(ca1.DT,1,PATINDEX('%[0-9] %',ca1.DT))))) ca3(cDate)
CROSS APPLY(VALUES(IIF(ca2.TM LIKE '%night%',23,DATEPART(hh,TRY_CONVERT(DATETIME,ca2.TM)))))ca4(cHour)
;
... and see that you CAN do it in SQL... BUT, see the warnings below the graphic below.
You also need to figure out what hour "night" is going to be assigned. I assigned "23" as the hour.
Results are as follows:
I'm thinking that your "2222" years are in error, though. :D
One thing I do agree on is that the format needs to be somewhat consistent. No code in the world, Python or otherwise, will be able to distinguish between a mm-dd-yy and dd-mm-yy format when dd and mm are both less than 13. The code I posted assumes (m)m-(d)d-yy and is based on the current LANGUAGE and DATEFORMAT that I'm using. It WILL return NULLs where the mm part isn't between 1 and 12 or if the dd part isn't between 1 and 31 or if the date is an "illegal date" like 2/29/2021, etc, though.
It also assumes that the format will always contain the numeric date as the first set of numeric values it comes across and that the time will always be the last thing in the string. We can add more checks, if needed but, like I said, unless mm is >=13, it cannot (nor can anything else) determine if it should be mm-dd-yy or dd-mm-yy because there's simply no other information in the string to indicate which format is being used. You MUST check your date format to use this, as well. If the strings are supposed to be in the dd-mm-yy format, we may have to make a change (although I believe SQL server will auto-magically accommodate that if the DATEFORMAT matches the intention of the string).

Related

CAST - Make string length to be 2 characters long

I need to combine two fields but force the characters of the second string to be 2 characters.
I'm combining a year field and month field and want the result to be YYYY_MM. Forcing any single months (e.g. 1,2,3,4) into a two digit format e.g. (01).
Below is my formula for combining the fields, but I need help making the month two digits.
Thanks, L
WITH so_header(soh_build_year,soh_build_week) AS (
SELECT 2020, 3
UNION ALL SELECT 2020,13
)
SELECT
CAST(SO_HEADER.SOH_Build_Year AS VARCHAR)
+'_'
+CAST(SO_HEADER.SOH_Build_Week AS VARCHAR) as [Build YYYY_WW]
FROM so_header;
Try this out (Syntax: SQL Server)
SELECT
CAST(2019 AS VARCHAR)
+'_'
+CAST(format (1, '0#') AS VARCHAR) as [Build YYYY_WW]
Replace your values with your variables
Try this:
WITH so_header(soh_build_year,soh_build_week) AS (
SELECT 2020, 3
UNION ALL SELECT 2020,13
)
SELECT
CAST(SO_HEADER.SOH_Build_Year AS VARCHAR)
+ '_'
+ SUBSTR(
CAST(100+SO_HEADER.SOH_Build_Week AS VARCHAR)
, 2
, 2
) as Build_YYYY_WW
FROM so_header;
-- out Build_YYYY_WW
-- out ---------------
-- out 2020_03
-- out 2020_13
If you are using SQL Server never use varchar (or related types) with no length. The default varies by context and may not be large enough for what you want.
If you are trying to convert a date to YYYY_MM format, you can use format():
select format(getdate(), 'yyyy_MM')
I recommend using dates, if they are available. If you are not using SQL Server, most other databases have similar functionality.
If not, you an simply use:
select concat(so_header.SOH_Build_Year, '_'
right(concat('00', so_header.soh_build_week), 2)
)
concat() does not require explicitly converting the values to strings.

REGEX Date Match Format

I currently have a dataset with varying date entries (and a mixture of string entries) for which I need to parse. There are a few: 'M/DD/YY', 'M/D/YY', 'MM/DD/YY', 'MM/D/YY', 'MM/DD/YYYY'...). I could use some support with improving my regex to handle the varying formats and possible text entered in the date field.
My current Postgres query breaks out other entries into another column and reformats the date. Although, I've increased the year to 4 digits rather than 2, I believe the issue may live somewhere in the 'YYYY-MM-DD' formatting or that my query does not properly accommodate additional formatting within.
CASE WHEN date ~ '^\\\\d{1,2}/\\\\d{1,2}/\\\\d{4}$' THEN TO_DATE(date::date, 'YYYY-MM-DD')
ELSE NULL END AS x_date,
CASE WHEN NOT date ~ '^\\\\d{1,2}/\\\\d{1,2}/\\\\d{4}$' AND date <> '' THEN date
ELSE NULL END AS x_date_text
For the various date formats, they should be reformatted accordingly and for other non-date values, they should be moved over to the other column.
Based on your list of formats, I believe that just two regexes should be enough to check the values:
'^[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}/$' would map to date format 'MM/DD/YYYY'
'^[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}/$' would map to 'MM/DD/YY'
You can use a CASE construct to check the value against the regex and apply the proper mask when using TO_DATE().
However, since you need to split the data over two columns, you would need to tediously repeat the CASE expression twice, one for each column.
One way to simplify the solution (and to make it easier to maintain afterwards) would be to use a CTE to list the regexes and the associated date format. You can LEFT JOIN the CTE with the table.
Consider the following query:
WITH vars AS (
SELECT '^[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}/$' reg, 'MM/DD/YYYY' format
UNION ALL '^[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}/$', 'MM/DD/YY'
)
SELECT
CASE WHEN vars.reg IS NOT NULL THEN TO_DATE(t.date, vars.format) END x_date,
CASE WHEN vars.reg IS NULL THEN t.date END x_date_text
FROM
mytable t
LEFT JOIN vars ON t.date ~ vars.reg
If more regex/format pairs are needed, you just have to expand the CTE. Just pay attention to the fact that regexes should be exclusives (ie two different regexes should not possibly match on a single value), else you will get duplicated records in the result.
While the regex by #GMB insures format validity it passes many invalid dates, and with liberal to_date conversion by Postgres could introduce errors and or confusion. Run the following to see the liberal conversion:
set datestyle = 'ISO';
select dd,'01/' || dd || '/2019' mmddyyyy, to_date ( '01/' || dd || '/2019', 'mm/dd/yyyy')
from ( select generate_series( 0,40)::text dd) d;
select mm , mm ||'/01/2019' mmddyyyy, to_date ( mm ||'01/2019', 'mm/dd/yyyy')
from ( select generate_series( 0,40)::text mm) d;
If that liberal date conversion is acceptable - Great. But if not we can tighten it down considerable (although still not 100% valid results). Lets break the format down:
for date formats mm/dd/yyyy or mm/dd/yy
breakdown MM valid 1 - 12
valid character 0 followed by 1-9
1 followed by 0-2
regex (0?[1-9]|1[0-2)
DD valid 0 - 31 (sort of)
day 31 valid for April, June, Sep, Nov also evaluate valid but become
day 1 of May, July, Oct, Dec respectivally
days 29-31 of Feb also eveluate valid but become day
1-3 of march and 1-2 in lead yearsin non-leap years
valid character optional 0 followed by 1-9
1-2 followed by 0-9
3 followed by 0-1
regex (0?[1-9]|[1-2][0-9]|3[0-2])
YEAR valid 1900 - 2999 (no ancient history)
valid character 1-2 followed by 0-9,0-9,0-9
0-9,0-9
Now putting that together we get.
-- setup
drop table if exists my_dates;
create table my_dates(test_date text, status text);
insert into my_dates (test_date, status)
values ('01/15/2019', 'valid')
, ('12/25/0001', 'invalid year < 1900')
, ('12/01/2020', 'valid')
, ('oops', 'yea a date NOT')
, ('6/3/19', 'valid')
, ('2/29/2019', 'valid sort of, Postgres liberal evaluation of to_date')
, ('2/30/2019', 'valid sort of, Postgres liberal evaluation of to_date')
, ('2/31/2019', 'valid sort of, Postgres liberal evaluation of to_date')
, ('2/29/2020', 'valid')
, ('14/29/2020', 'invalid month 14')
, ('01/32/2019', 'invalid day 32')
, ('04/31/2019', 'valid sort of, Postgres liberal evaluation of to_date')
;
-- as query
set datestyle = 'ISO';
with patterns (pat, fmt) as (values ('^(0?[1-9]|1[0-2])/(0?[1-9]|[1-2][0-9]|3[0-1])/[12][0-9]{3}$'::text, 'mm/dd/yyyy')
, ('^(0?[1-9]|1[0-2])/(0?[1-9]|[1-2][0-9]|3[0-1])/[0-9]{2}$'::text, 'mm/dd/yy')
)
select to_date(test_date, fmt),status, test_date, pat, fmt
from my_dates
left join patterns on test_date ~ pat;
------------------------------------------------------------------
-- function accessable from SQL
create or replace function parse_date(check_date_in text)
returns date
language sql
as $$
with patterns (pat, fmt) as (values ('^(0?[1-9]|1[0-2])/(0?[1-9]|[1-2][0-9]|3[0-1])/[12][0-9]{3}$'::text, 'mm/dd/yyyy')
, ('^(0?[1-9]|1[0-2])/(0?[1-9]|[1-2][0-9]|3[0-1])/[0-9]{2}$'::text, 'mm/dd/yy')
)
select to_date(check_date_in, fmt)
from patterns
where check_date_in ~ pat;
$$;
--- test function
select test_date, parse_date(test_date), status from my_dates;
-- use demo
select * from my_dates
where parse_date(test_date) >= date '2020-01-02';

Find data with specific date and month only

I am trying to find a data with specific where clause of date and month but I am receiving an error can anyone help me with this?
select *
from my_data
where date BETWEEN '11-20' AND '12-15'
MS SQL Server Management Studio
I am receving an error
Conversion failed when converting date and/or time from character string
Most databases support functions to extract components of dates. So, one way of doing what you want is to convert the values to numbers and make a comparison like this:
where month(date) * 100 + day(date) between 1120 and 1215
The functions for extracting date parts differ by database, so your database might have somewhat different methods for doing this.
The conversion is failing because you are not specifying a year. If you were to specify '11-20-2015' your query would work just insert whatever year you need.
SELECT *
FROM my_data
WHERE date BETWEEN '11-20-2015' AND '12-15-2015'
Alternatively if you wanted data from that range of dates for multiple years I would use a while loop to insert information in a # table then read from that table, depending on the amount of data this could be quick or sloooowww here is an example.
DECLARE #mindatestart date, #mindateend date, #maxdatestart date
SET #mindatestart = '11-20-2010'
SET #mindateend = '12-15-2010'
SET #maxdatestart = '11-20-2015'
SELECT top 0 *, year = ' '
INTO #mydata
FROM my_data
WHILE #mindatestart < #maxdatestart
BEGIN
INSERT INTO #mydata
SELECT *, YEAR(#mindatestart)
FROM my_data
where date between #mindatestart and #mindateend
SET #mindatestart = DATEADD(Year, 1, #mindatestart)
SET #mindateend = DATEADD(Year, 1, #mindateend)
END
This will loop and insert the data from 2010-2015 for those date ranges and add a extra column on the end so you can call the data and order by year if you want like this
SELECT * FROM #mydata order by YEAR
Hopefully some part of this helps!
FROM THE COMMENT BELOW
SELECT *
FROM my_data
WHERE DAY(RIGHT(date, 5)) between DAY(11-20) and DAY(12-15)
The reason '11-20' doesn't work is because its a character string which is why you have to input it between ' ' What the Month() function does is take whatever you put between the () and convert it to an integer. Which is why you're not getting anything back using the method in the first answer, the '-Year' from the table date field is being added into the numeric value where your value is just being converted from 11-20 you can see by using these queries
SELECT MONTH(11-20) --Returns 12
SELECT MONTH(11-20-2015) -- Returns 6
SELECT MONTH(11-20-2014) -- Returns 6
Using RIGHT(Date, 5) you only get Month-day, then you date the day value of that so DAY(RIGHT(DATE, 5) and you should get something that in theory should fall within those date ranges despite the year. However I'm not sure how accurate the data will be, and its a lot of work just to not add an additional 8 characters in your original query.
Since you only care about month and day, but not year, you need to use DATEPART to split up the date. Try this:
select *
from my_data
WHERE 1=1
AND (DATEPART(m, date) >= 11 AND DATEPART(d,date) >= 20)
AND (DATEPART(m, date) <= 12 AND DATEPART(d,date) <= 15)

Recursive Query for Date Range

I'm trying to create a query so that I can generate a date range between a specific start and end point.
I have the following:
WITH DATE_RANGE(DATE_FOR_SHIFT)
AS (SELECT DATE('2015-04-01')
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT DATE_FOR_SHIFT + 1 DAY
FROM DATE_RANGE
WHERE DATE_FOR_SHIFT <= #END)
SELECT DATE_FOR_SHIFT
FROM DATE_RANGE;
Output (assuming that #END equals 2015-05-01):
2015-04-01
2015-04-02
2015-04-03
2015-04-04
...
2015-05-01
The output is correct, but I want to be able to change the start and points based on parameters provided rather than having to rewrite the query or have a SQL injection prone query.
How would I rewrite this query in order to accomplish this?
Your SELECT is fine, other than the hard coded start date.
What I think you're missing is wrapping it in either a stored procedure or user defined table function (UDTF). Assuming you'll want to JOIN the date range to other tables, I'd suggest a UDTF.
create function date_range (#str date, #end date)
returns table (date_for_shift date)
language SQL
reads SQL data
return
WITH DATE_RANGE(DATE_FOR_SHIFT)
AS (SELECT #str
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT DATE_FOR_SHIFT + 1 DAY
FROM DATE_RANGE
WHERE DATE_FOR_SHIFT <= #END)
SELECT DATE_FOR_SHIFT
FROM DATE_RANGE;
Then you'd call it...
select * from table(date_range(date('2015-04-01'),date('2015-05-01'))) as tbl;
However, instead of generating this date range on the fly....consider simply creating a calender (aka dates) table. Basically just a table with dates from say 1900-01-01 to 2500-12-31..or whatever you'd like. Beside the date column, you can include lots of additional columns such as business_day, holiday, ect.. that make life much easier.
Google "SQL calendar table" for plenty of examples.
A bit of playing with this in perl gives me:
#!/opt/myperl/5.20.2/bin/perl
use 5.10.0;
use DBI;
use DBD::DB2;
use Data::Dump;
my $sql = <<EOSQL;
WITH DATE_RANGE(DATE_FOR_SHIFT)
AS (SELECT CAST(? AS DATE)
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT DATE_FOR_SHIFT + 1 DAY
FROM DATE_RANGE
WHERE DATE_FOR_SHIFT < CAST(? AS DATE))
SELECT DATE_FOR_SHIFT
FROM DATE_RANGE;
EOSQL
my $dbh = DBI->connect('dbi:DB2:sample');
my $sth = $dbh->prepare_cached($sql);
$sth->execute('2015-04-01','2015-05-01');
my $rc = $sth->fetchall_arrayref();
dd($rc);
This does give an error during prepare ("The recursive common table expression "MYSCHEMA.DATE_RANGE" may contain an infinite loop") that I haven't figured out yet, but the fetch does work, the final return goes from 04-01 to 05-01. Hopefully you can port this to your desired language.

Date arithmetic in SQL on DB2/ODBC

I'm building a query against a DB2 database, connecting through the IBM Client Access ODBC driver. I want to pull fields that are less than 6 days old, based on the field 'a.ofbkddt'... the problem is that this field is not a date field, but rather a DECIMAL field, formatted as YYYYMMDD.
I was able to break down the decimal field by wrapping it in a call to char(), then using substr() to pull the year, month and day fields. I then formatted this as a date, and called the days() function, which gives a number that I can perform arithmetic on.
Here's an example of the query:
select
days( current date) -
days( substr(char(a.ofbkddt),1,4) concat '-' -- YYYY-
concat substr(char(a.ofbkddt),5,2) concat '-' -- MM-
concat substr(char(a.ofbkddt),7,2) ) as difference, -- DD
a.ofbkddt as mydate
from QS36F.ASDF a
This yields the following:
difference mydate
2402 20050402
2025 20060306
...
4 20110917
3 20110918
2 20110919
1 20110920
This is what I expect to see... however when I use the same logic in the where clause of my query:
select
days( current date) -
days( substr(char(a.ofbkddt),1,4) concat '-' -- YYYY-
concat substr(char(a.ofbkddt),5,2) concat '-' -- MM-
concat substr(char(a.ofbkddt),7,2) ) as difference, -- DD
a.ofbkddt as mydate
from QS36F.ASDF a
where
(
days( current date) -
days( substr(char(a.ofbkddt),1,4) concat '-' -- YYYY-
concat substr(char(a.ofbkddt),5,2) concat '-' -- MM
concat substr(char(a.ofbkddt),7,2) ) -- DD
) < 6
I don't get any results back from my query, even though it's clear that I am getting date differences of as little as 1 day (obviously less than the 6 days that I'm requesting in the where clause).
My first thought was that the return type of days() might not be an integer, causing the comparison to fail... according to the documentation for days() found at http://publib.boulder.ibm.com/iseries/v5r2/ic2924/index.htm?info/db2/rbafzmst02.htm, it returns a bigint. I cast the difference to integer, just to be safe, but this had no effect.
You're going about this backwards. Rather than using a function on every single value in the table (so you can compare it to the date), you should pre-compute the difference in the date. It's costing you resources to run the function on every row - you'd save a lot if you could just do it against CURRENT_DATE (it'd maybe save you even more if you could do it in your application code, but I realize this might not be possible). Your dates are in a sortable format, after all.
The query looks like so:
SELECT ofbkddt as myDate
FROM QS36F.ASDF
WHERE myDate > ((int(substr(char(current_date - 6 days, ISO), 1, 4)) * 10000) +
(int(substr(char(current_date - 6 days, ISO), 6, 2)) * 100) +
(int(substr(char(current_date - 6 days, ISO), 9, 2))))
Which, when run against your sample datatable, yields the following:
myDate
=============
20110917
20110918
20110919
20110920
You might also want to look into creating a calendar table, and add these dates as one of the columns.
What if you try a common table expression?
WITH A AS
(
select
days( current date) -
days( substr(char(a.ofbkddt),1,4) concat '-' -- YYYY-
concat substr(char(a.ofbkddt),5,2) concat '-' -- MM-
concat substr(char(a.ofbkddt),7,2) ) as difference, -- DD
a.ofbkddt as mydate
from QS36F.ASDF a
)
SELECT
*
FROM
a
WHERE
difference < 6
Does your data have some nulls in a.ofbkddt? Maybe this is causing some funny behaviour in how db2 is evaluating the less than operation.