How to update dates stored as varying character formats (PL/SQL)? - sql

Problem: I have a large database table (~500k records) which has a list of dates stored in a varchar2(15) column. These dates are stored in varying formats, ie. some are yyyy-mm-dd, some are mm/dd/yyyy, some are dd/mm/yy, some are mm/dd/yy, etc. Ie:
1994-01-13
01/13/1994
01/13/94
13/01/94
13/01/1994
etc
I need to be able to shift these dates slightly, for example to add 30 days to each date. (This is an oversimplification of my objective but it's easier to explain this way).
If all the dates were formatted consistently, I would achieve this as follows:
UPDATE history_table
SET some_date_col =
to_char(to_date(some_date_col, 'mm/dd/yyyy')+30, 'mm/dd/yyyy')
WHERE some_date_col IS NOT NULL;
Due to the size of the database, I cannot afford to loop through the values one by one and parse the date value. Can anyone suggest a means to accomplish this without loops, ie with a mass UPDATE statement?

Are the formats of these dates really that important? They should be datetime columns. Then you could just use date math functions on that field.

well, you've got a real problem here.
07/07/1994 is valid for 'MM/DD/YYYY' and 'DD/MM/YYYY'
However, outside of that issue, you can try nesting decodes.
I entered the following dates into a varchar field:
01/12/2009, 01-12-2009, 2009-01-12, 01/12/09
and using the below, I was consistently returned 1/12/2009. You'll have to figure out all the patterns possible and keep nesting decodes. The other thing you could do is create a function to handle this. Within the function, you can check with a little more detail as to the format of the date. It will also be easier to read. You can use the function in your update statement so that should be faster than looping through, as you mentioned.
(for what its worth, looping through 500k rows like this shouldn't take very long. I regularly have to update row by row tables of 12 million records)
select mydate,
decode(instr(mydate,'-'),5,to_date(mydate,'YYYY-MM-DD'),3,to_date(mydate,'MM-DD-YYYY'),
decode (length(mydate),8,to_date(mydate,'MM/DD/YY'),10,to_date(mydate,'MM/DD/YYYY')))
from mydates;
and here is the update statement:
update mydates set revdate = decode(instr(mydate,'-'),5,to_date(mydate,'YYYY-MM-DD'),3,to_date(mydate,'MM-DD-YYYY'),
decode (length(mydate),8,to_date(mydate,'MM/DD/YY'),10,to_date(mydate,'MM/DD/YYYY')))

IMHO, you have a bigger problem:
If some dates are dd/mm/yyyy and some are mm/dd/yyyy how can you difference which format applies for certain date?
for example, how can I know if a value "12/09/2008" means December or September?

Related

Simple way to standardize varchar "dates" to single date format in SQL?

I have a table in a postgres database with a varchar date column that mixes MM/DD/YY with MM/DD/YYYY data formats. For example:
1/17/89
1/28/2018
12/30/2006
10/1/17
I'd like all of the dates to follow a MM/DD/YYYY format:
1/17/1989
1/28/2018
12/30/2006
10/1/2017
I'm aware it's not a best practice for dates to be in a varchar field, but I did not create this table and I cannot change the data type. Is it possible to use SQL to make this kind of change to my table?
I'm aware of similar questions like this one, but this seems a bit more than what I'm looking for, and I can't seem to extract an answer from it that's appropriate for my issue.
This question seems closer to what I'm looking for, but again, I can't seem to implement the answer. How would it know which table and field to make changes to? (I'm a total SQL noob if you can't tell).
You can try to use this code:
SELECT to_char(to_date(my_date,'MM/DD/YY')::TIMESTAMP, 'MM/DD/YYYY') as new_varchar_date FROM my_table;
-- for update the actual values
UPDATE my_table SET my_date = to_char(to_date(my_date,'MM/DD/YY')::TIMESTAMP, 'MM/DD/YYYY');
At first you need to convert the varchar to date, then to timestamp and then to varchar again.
Result should look like:
01/17/1989
01/28/2018
12/30/2006
10/01/2017

SQL - How to search WHERE but ignore first two characters

I need to perform a date search but the data is a String with the format
'dd/mm/yyyy'
I want to search only for 'mm/yyyy'
For example I want all records that have '07/2014' regardless of what day?
I'm sure its something simple just can't figure it out
EDIT:
It looks like the format is MM/DD/YYY
Looks like I got this sorted just used:
RIGHT(BookedDate,5) = '/2014'
AND LEFT (BookedDate,2) = '7/'
Thanks All :)
If your string is in the format of dd/mm/yyyy always, as in 01/09/2014 you could use right:
declare #val as varchar(10)
Set #val='1/2/2014'
RIGHT(#val,7)
if you are not sure of the format but know that there is a / you could search for it:
declare #val as varchar(10)
Set #val='1/2/2014'
select right(#val,len(#val)-patindex('%/%',#val))
myfield like '%/07/2014'
Beware, since the wildcard (%) is put at the beginning of the query no indexes (if they exist) will be used. This will always be a full table scan.
If you store your date values in character based column, than jyparask's answer is good enough, but if you store it in date/time based column, then use date/time functions or intervals:
WHERE
myDateColumn >= '01/07/2014'
AND myDateColumn < '01/08/2014'
The above WHERE condition means: all values in July, 2014.
This will ensure that, because its a string, if the value is longer than expected the first three characters will always be removed.
SELECT RIGHT(field, LEN(field)-3) FROM database
This feels like a very bad idea. Most likely there is a ton of optimizations that could be done automatically for your queries by the database if you used Date instead of the String.
This is certainly going to be some kind of bottleneck if your database grows, it would have to ask and parse every single row to find out if it matches your request.

Query "Select max(date) from table where date <= somedate" not working

I am querying a SQLite database table as follows:
SELECT MAX(Date) from Intra360 WHERE Date <= "05/04/2013 00:00"
The right record in return should be the number 47, i.e. 04/04/2013 23:00:
However, the execution of this statement returns a different value:
I confess I know almost nothing about SQL, but this outcome is strange. Where am I being wrong?
NOTE "Intra360" is the name of the table and the field containing the dates is called "Date"
ADDITIONAL NOTE what I need is the closest available date to a user input. It is a Python program which is making some analysis but when the user inputs the dates is not necessarily true they will exist in the database. So I'm just trying to re-select them in a way that the proper SQL statement that will load the data to be used in the analysis won't fail execution because of the missing record. So "05/04/2013 00:00" is the user input, and the query should be done hence starting from 04/04/2013 (and not definetely 04/06/2013).
The comparisons are performed on strings with alphabetical ordering, not on datetime stamps with chronological ordering.
Store your datetimes in a format that compares the way you want. For example, unix epoch timestamps and ISO 8601 yyyy-MM-dd'T'HH:mm:ss datetimes have this property.
If you cannot influence how the data is stored, you can use substr() to mangle the timestamps in SQL. See e.g. Sqlite convert string to date for more.

Unknown SQL coding issue in Oracle SQL Developer

I'm writing an SQL statment that is supposed to do a count based on a date range. But, for some reason no data is being returned. Before I try and filter the count with my date range, everything works fine. Here is that code.
SELECT
CR.GCR_RFP_ID
,S.RFP_RECEIVED_DT
,CR.GCR_RECEIVED_DT
,CT.GCT_LOB_IND
FROM ADM.GROUP_CHANGE_TASK_FACT CT
JOIN ADM.B_GROUP_CHANGE_REQUEST_DIM CR
ON CR.GROUP_CHANGE_REQUEST_KEY = CT.GROUP_CHANGE_REQUEST_KEY
JOIN ADM.B_RFP_WC_COVERAGE_DIM S
ON S. RFP_ID = CR.GCR_RFP_ID
WHERE CT.GCT_LOB_IND = 'WC'
AND CR.GCR_CHANGE_TYPE_ID IN ('10','20','30','50','60','70','80','90','100','110',
'120','130','140', '150','160','170','180','190','200',
'210','220','230','240','260','270','280','300','310',
'320','330','340','350','360','370','371','372')
AND S.RFP_AUDIT_IND = 'N'
AND S.RFP_TYPE_IND = 'A'
The date field I'm using is called CR.GCR_RECIEVED_DT. This is a new field a in the db and all the records are 01-JAN-00. But I'm still doing the count just to make sure I can grab the data. Now, I added this line:
AND CR.GCR_RECEIVED_DT LIKE '01-JAN-00'
just as a random test thing. I know all the dates are the same. And it works fine, no issues. So I remove that line and replace it with this:
AND CR.GCR_RECEIVED_DT BETWEEN '31-DEC-99' AND '02-JAN-00'
I used this small range to keep it simple. But even though 01-JAN-00 deffinetly falls between those two dates, no data is returned. I have no idea why this is happening. I even tried this line to:
AND CR.GCR_RECEIVED_DT = '01-JAN-00'
and I still don't get data returned. It only seems to work with LIKE. I have checked and the field is a date type. Any help wold be much appreciated.
If your NLS_DATE_FORMAT is set to DD-MON-YY then the apparent discrepancy between the first two results can be explained.
When you use LIKE it implicitly converts the date value on the left-hand side to a string for the comparison, using the default format model, and then compares that to the fixed string; and '01-JAN-00' is like '01-JAN-00'. You're effectively doing:
AND TO_CHAR(CR.GCR_RECEIVED_DT, 'DD-MON-YY') LIKE '01-JAN-00'
Using LIKE to compare dates doesn't really make any sense though. When you use BETWEEN, though, the left-hand side is being left as a date, so you're effectively doing:
AND CR.GCR_RECEIVED_DT BETWEEN TO_DATE('31-DEC-99', 'DD-MON-YY')
AND TO_DATE('02-JAN-00', 'DD-MON-YY')
... and TO_DATE('31-DEC-99', 'DD-MON-YY') is December 31st 2099, not 1999. BETWEEN only works when the first value is lower than the second (from the docs, 'If expr3 < expr2, then the interval is empty'). So you're looking for values bwteen 2099 and 2000, and that will always be empty. If your date model was DD-MON-RR, from the NLS parameter or explicitly via TO_DATE, then it would be looking for values between 1999 and 2000, and would find your records.
Your third result is a little more speculative but suggests that the values in your GCR_RECEIVED_DT field have a time component, or are not in the century you think. This is similar to the LIKE version, except this time the fixed string is being converted to a date, rather than the date being converted to a string; effectively:
AND CR.GCR_RECEIVED_DT = TO_DATE('01-JAN-00', 'DD-MON-YY')
If they were at midnight on 2000-01-01 this would work. Because it doesn't that suggests they are either some time after midnight, or maybe more likely - since you're using a 'magic' date in your existing records - they are another date entirely, quite possibly 1900-01-01.
Here are SQL Fiddles for just past midnight and 1900.
If the field will eventually have a time component for new records you might want to structure the condition like this, and use date literals to be a bit clearer (IMO):
AND CR.GCR_RECEIVED_DT >= DATE '2000-01-01'
AND CR.GCR_RECEIVED_DT < DATE '2000-01-02'
That will find any records at any time on 2000-01-01, and can use an index on that column if one is available. BETWEEN is inclusive, so using BETWEEN DATE '2000-01-01' AND '2000-01-02' would include any records that are exactly at midnight on the later date, which you probably don't want.
Whatever you end up doing, avoid relying on implicit conversions using NLS_DATE_FORMAT as one day it might not be set to what you expect, causing potentially data-corrupting or hard to find bugs; and specify the full four-digit year in the model if you can to avoid ambiguity.
try something like this:
WHERE TRUNC(CR.GCR_RECEIVED_DT) = TO_DATE('01-JAN-00','DD-Mon-YY')
TRUNC without parameter removes hours, minutes and seconds from a DATE.

split string in sql query

I have a value in field called "postingdate" as string in 2009-11-25, 12:42AM IST format, in a table named "Post".
I need the query to fetch the details based on date range. I tried the following query, but it throws an error. Please guide me to fix this issue. Thanks in advance.
select postingdate
from post
where TO_DATE(postingDate,'YYYY-MM-DD')>61689
and TO_DATE(postingDate,'YYYY-MM-DD')<61691
As you've now seen, trying to perform any sort of query against a string column which represents a date is a problem. You've got a few options:
Convert the postingdate column to some sort of DATE or TIMESTAMP datatype. I think this is your best choice as it will make querying the table using this field faster, more flexible, and less error prone.
Leave postingdate as a string and use functions to convert it back to a date when doing comparisons. This will be a performance problem as most queries will turn into full table scans unless your database supports function-based indexes.
Leave postingdate as a string and compare it against other strings. Not a good choice as it's tough to come up with a way to do ranged queries this way, as I think you've found.
If it was me I'd convert the data. Good luck.
In SQL Server you can say
Select postingdate from post
where postingdate between '6/16/1969' and '6/16/1991'
If it's really a string, you're lucky that it's in YYYY-MM-DD format. You can sort and compare that format as a string, because the most significant numbers are on the left side. For example:
select *
from Posts
where StringDateCol between '2010-01-01' and '2010-01-02'
There's no need to convert the string to a date, comparing in this way is not affected by the , 12:42AM IST appendage. Unless, of course, your table contains dates from a different time zone :)
You will need to convert your string into a date before you run date range queries on it. You may get away with just using the string if your not interested in the time portion.
The actual functions will depend on your RDBMS
for strings only
select * from posts
where LEFT(postingDate,10) > '2010-01-21'
or
for datetime ( Sybase example)
select * from posts
where convert(DateTime,postingDate) between '2010-01-21' and '2010-01-31'