Invalid data error in Redshift - sql

I have a query I am running in redshift that produces an error when I try to compare two dates. I have determined this is due to a data problem where the dates are VARCHAR and some are empty strings. The best solution is clearly to fix this at the source, but while trying to build a work around, I stumbled upon some very odd behavior.
To get around, I preselect the dates that are not empty strings, and cast as dates, then convert to integer date format (YYYYMMDD) and convert to INT. This runs fine. However, if I try to compare this with an integer in a WHERE clause, the query crashes with a data type error.
Here is a toy version of the working query
SELECT
date_id,
COUNT(*)
FROM
(
SELECT
CONVERT(int, date_id) AS date_id
FROM
(
SELECT
DATE_PART('year', start_dttm)*10000+DATE_PART('month', start_dttm)*10+DATE_PART('day', start_dttm) AS date_id
FROM
(
SELECT
CAST(start_dttm AS DATETIME) AS start_dttm
FROM
sfe.calendar_detail
WHERE
start_dttm <> ''
) cda
) cdb
) cd
GROUP BY
date_id
;
And here is the failed query
SELECT
date_id,
COUNT(*)
FROM
(
SELECT
CONVERT(int, date_id) AS date_id
FROM
(
SELECT
DATE_PART('year', start_dttm)*10000+DATE_PART('month', start_dttm)*10+DATE_PART('day', start_dttm) AS date_id
FROM
(
SELECT
CAST(start_dttm AS DATETIME) AS start_dttm
FROM
sfe.calendar_detail
WHERE
start_dttm <> ''
) cda
) cdb
) cd
WHERE
date_id >= 20170920
GROUP BY
date_id
;
As I mentioned above, the correct solution is to fix the data type and count empty dates as Nulls not empty strings, but I am very curious as to why the second query crashes on an invalid data type error.
Many Thanks!
Edit:
Here is the error
ERROR: Invalid digit, Value '1', Pos 0, Type: Integer
DETAIL:
-----------------------------------------------
error: Invalid digit, Value '1', Pos 0, Type: Integer
code: 1207
context:
query: 2006739
location: :0
process: query0_39 [pid=0]
-----------------------------------------------

Rather than converting dates to the human-readable YYYYMMDD format, it is always better to keep them as DATE or TIMESTAMP format. This way, date operations can be easily performed (eg adding 5 days to a date). You can still do easy comparison operators by using 'YYYYMMDD'::DATE.
Given that you are converting from a String, and casting to a Date seems to work, and that you have some empty strings, use this to convert it to a date:
SELECT
NULLIF(start_dttm, '')::DATE AS dt
FROM sfe.calendar_detail
WHERE dt > '20170920'::DATE
This will return a NULL if the string is empty, and a Date if it contains a date that could be converted.

Related

Cannot check if varchar is BETWEEN date and date

I created a partition projection in Athena named 'dt', which is a STRING and contains date information in the format 2020/12/11/20.
I'm running the following query in Athena
SELECT
DATE_FORMAT(dt, '%Y-%m') as dt,
count(*) as "total_visualization",
count(*)/cast(date_format(DATE '{END_DATE}', '%d') as integer) as "average_dia"
FROM
user.dashborad
WHERE
event = 'complete'
AND dt BETWEEN DATE '{START_DATE}' and DATE '{END_DATE}'
GROUP BY 1;
The resulting raw query received by Athena is:
DATE_FORMAT(dt, '%Y-%m') as dt,
count(*) as "total_visualization",
count(*)/cast(date_format(DATE '2022-08-08', '%d') as integer) as "average_day"
FROM user.dashborad
WHERE event = 'complete' AND dt BETWEEN DATE '2022-08-01' and DATE '2022-08-08'
GROUP BY 1;
However, I get the following error:
Error querying the database: SYNTAX_ERROR: line 2:62: Cannot check if varchar is BETWEEN date and date.
I've tried to find a workaround in an attempt to convert it into a date format using date_parse but it didn't work. And with str_to_date I get this error:
SYNTAX_ERROR: line 2:2: Function str_to_date not registered
Is there any other way I can modify the query to convert 'dt' from a varchar into a format Athena understands?
It is always a bad idea to store a date in a string instead of using the appropriate data type. You even call the column dt which suggests a datetime. This makes it harder to spot inappropriate handling.
Here
AND dt BETWEEN DATE '{START_DATE}' and DATE '{END_DATE}'
you compare a string with dates. Thus you rely on the DBMS guessing the string's date format correctly. Don't do this. Convert the string explicitely to a date, because you know the format. Or, as 'YYYY-MM-DD' is comparable, work with the strings right away:
AND dt BETWEEN '{START_DATE}' and '{END_DATE}'
Here
DATE_FORMAT(dt, '%Y-%m')
you invoke a date function on a string. This means the DBMS must again guess your format, convert your string into a date accordingly and then invoke the function to convert the date into a string. Instead, just use the appropriate string function on the string:
SUBSTR(dt, 1, 7)
The complete query:
SELECT
SUBSTR(dt, 1, 7) AS year_month,
COUNT(*) AS total_visualization,
COUNT(*) / CAST(SUBSTR('{END_DATE}', 9, 2)) AS INTEGER) AS average_dia
FROM
user.dashborad
WHERE
event = 'complete'
AND dt BETWEEN '{START_DATE}' and '{END_DATE}'
GROUP BY SUBSTR(dt, 1, 7)
ORDER BY SUBSTR(dt, 1, 7);

How to convert an int to DateTime in BigQuery

I have an INT64 column called "Date" which contains many different numbers like: "20210209" or "20200305". I want to turn those numbers into a date with this format: MM-YYYY (so in these cases, 02-2021 and 03-2020). Ultimately I want to sum all the data in each month together. The problem is that BigQuery can't convert INT64 to date, only to strings. I'm not sure if I should convert to a string and then to a date or if there is a better way.
Although converting to a string then a date both works and is very concise, over large enough numbers of rows (which may be the case in Big Query) you may be better off using integer maths and using DATE(year, month, day)...
https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#date
SELECT
DATE(
DIV( 20210209 , 10000), -- Which gives 2021
DIV(MOD(20210209, 10000), 100), -- Which gives 02
MOD(20210209, 100) -- Which gives 09
)
You can convert the value to a string and use parse_date():
select parse_date('%Y%m%d', cast(20210209 as string))
Another option
select date,
regexp_replace('' || date, r'(\d{4})(\d{2})(\d{2})', r'\2-\1') as MM_YYYY
from your_table
if applied to sample data in your question - output is
Yet another option
select date,
format_date('%m-%Y', parse_date('%Y%m%d', '' || date)) as MM_YYYY
from your_table
with same output

Redshift can't convert a string to a date, tried multiple functions

I have a table with a field called ADATE, it is a VARCHAR(16) and the values are like so: 2019-10-22-09:00.
I am trying to convert this do a DATE type but cannot get this to work.
I have tried:
1
TO_DATE(ADATE, 'YYYY-MM-DD')
Can't cast database type date to string
2
TO_DATE(LEFT(ADATE, 10), 'YYYY-MM-DD')
Can't cast database type date to string
3
TO_DATE(TRUNC(ADATE), 'YYYY-MM-DD')
XX000: Invalid digit, Value '-', Pos 4, Type: Decimal
4
CAST(ADATE AS DATE)
Error converting text to date
5
CAST(LEFT(ADATE, 10) AS DATE)
Error converting text to date
6
CAST(TRUNC(ADATE) AS DATE)
Error converting numeric to date
The issue was the data containing blanks (not Nulls) so the error was around them.
I resolved this by using the following code:
TO_DATE(LEFT(CASE WHEN adate = '' THEN NULL ELSE adate END, 10), 'YYYY-MM-DD') adate
Clearly, you have bad date string values -- which is why the value should be stored as a date to begin with.
I don't think Redshift has a way of validating the date before attempting the comparison, or of avoiding an error. But you can use case and regular expressions to see if the value is reasonable. This might help:
(case when left(adate, 10) ~ '^(19|20)[0-9][0-9]-[0-1][0-9]-[0-3][0-9]$'
then to_date(left(adate, 10), 'YYYY-MM-DD')
end)
This is not precise . . . you can make it more complex so month 19 is not permitted (for instance), but it is likely to catch the errors.

How to convert nvarchar into int type for SQL Server

I tried to count the total the column [Trans Det Amt ex Tax] like the following:
SELECT
Loyalty_Type_Code,
COUNT([Loyalty_Number]),
FORMAT(SUM([Trans_Det_Amt_ex_Tax]), '##,###,###,##0')
FROM
CRM_POWERBI_RETAIL
WHERE
Trans_Hdr_Sale_Date BETWEEN '2019-01-01' AND '2019-10-31'
GROUP BY
Loyalty_Type_Code
UNION
SELECT
'TOTAL',
COUNT(*) AS CCC,
COUNT(*) AS BBB
FROM
CRM_POWERBI_RETAIL
I get this error:
Msg 245, Level 16, State 1, Line 39
Conversion failed when converting the nvarchar value '67,527,726,031' to data type int.
So I tried to convert this to INT type by using the following:
SELECT
CONVERT(INT, Trans_Det_Amt_ex_Tax)
FROM
CRM_POWERBI_RETAIL
But the result still said
Conversion failed when converting the nvarchar value '67,527,726,031' to data type int
Please let me know how to fix this.
Thank you for all answers.
Don't convert the value to a string:
SELECT Loyalty_Type_Code , COUNT([Loyalty_Number]),
SUM([Trans_Det_Amt_ex_Tax]))
FROM CRM_POWERBI_RETAIL
WHERE Trans_Hdr_Sale_Date BETWEEN '2019-01-01' AND '2019-10-31'
GROUP BY Loyalty_Type_Code
UNION
SELECT 'TOTAL', COUNT(*) AS CCC, COUNT(*) AS BBB
FROM CRM_POWERBI_RETAIL;
All the types in a UNION ALL need to be the same. If it sees a string and an integer -- as in the third column -- it will try to convert the string to an integer. That is not possible with commas.
Alternatively, you can convert both columns to strings. A simpler method uses GROUPING SETS:
SELECT COALESCE(Loyalty_Type_Code, 'Total'),
COUNT([Loyalty_Number]),
FORMAT(SUM([Trans_Det_Amt_ex_Tax]), '##,###,###,##0')
FROM CRM_POWERBI_RETAIL
WHERE Trans_Hdr_Sale_Date BETWEEN '2019-01-01' AND '2019-10-31'
GROUP BY GROUPING SETS ( Loyalty_Type_Code, () );
select CONVERT(bigint, Trans_Det_Amt_ex_Tax)
int cant hold number that big
If you want to convert the value to a string, try removing the commas and using a data type cast that is large enough to store that data:
SELECT CAST(REPLACE('67,527,726,031',',','') AS BIGINT);
This will strip the commas and store the data as a BIGINT.
I'm not 100% sure, but you may need to CAST the SUM as a larger data type if you're going to be adding up a lot of values.
DB Fiddle

SQl Server Converting to Date fails , DateTime works

I have a table with a varchar(25) column that holds a date value. A typical value is '11/04/2017'.
This query returns 0 rows
select *
from myTable
where isdate(inputDate) = 0
I am trying to find a max on this, using a date sort.
This query returns the expected result
;with gooddates as
(
select
medcomfolder, PatientId, PatientBirthday, InputDate
from
myTable
where
isdate(inputDate) = 1
)
select max(convert(datetime, inputDate))
from gooddates
This query returns an error.
;with gooddates as
(
select
medcomfolder, PatientId, PatientBirthday, InputDate
from
dwhFuData
where
isdate(inputdate) = 1
)
select max(convert(date, inputdate))
from gooddates
This is the returned error
Msg 241, Level 16, State 1, Line 274
Conversion failed when converting date and/or time from character string
The difference between the 2 queries is that the first is converting to a dateTime while the latter is converting to a date.
At this point, I can move forward w/ the dateTime option, but I am left wondering what I am missing.
I have checked that there are no embedded spaces, and all the columns have a len(InputDate) = 10 (there is NO time data included)
I selected distinct values,put them in excel, and did a date function on each row. I was hoping to get a #VALUE on 1 row. All the rows worked.
So there is nothing silly like '02/31/2019' going on.
How can a dateTime conversion pass when a simple date conversion does not?
My guess is that you have values that include a time stamp following the date (based on the fact that isdate() is always zero).
If so, one simple solution would be to use convert(date, left(inputdate, 10)). Another solution uses try_convert():
try_convert(date, inputdate)
To find the offending values:
select inputdate
from dwhFuData
where try_convert(date, inputdate) is null and inputdate is not null;