Azure SQL Data Warehouse - Strange DateTime conversion error/ behaviour - sql

I am reading data from data lake (csv) and when running the below query, I am getting a 'Conversion failed when converting date and/or time from character string' error message.
select convert(datetime, NullIf(ltrim(rtrim([Date started])), ''), 111)
FROM dl.temp
Looked through the data and checked the source file as well, couldn't spot anything unusual.
As soon as I include the * and change the query to the below everything runs fine and the conversion seem to be doing its job.
select convert(datetime, NullIf(ltrim(rtrim([Date started])), ''), 111),*
from dl.temp
Out of curiosity also wanted to check the max and minimum date, so running max gives me the following:
However when I search for that particular value like below, I don't get any rows returned. It seems like it setting it to the column name. Does anyone know what is going on?
select *
from dl.temp
where [Date started] = 'Date started'
I am running this against an Azure Data Warehouse.

I think you'll find the issue is in your external file format.
In the CREATE EXTERNAL FILE FORMAT you probably need to add FIRST_ROW=2 in your FORMAT OPTIONS.
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql

Related

converting a DateTime column to string in ADF

I am trying to build a fully parametrised pipeline template in ADF. With the work I have done so far, I can do a full load without any issues but when it comes to delta load, it seems like my queries are not working. I believe the reason for this is that my "where" statement looks somewhat like this:
SELECT #{item().source_columns} FROM #{item().source_schema}.#{item().source_table}
WHERE #{item().source_watermarkcolumn} > #{item().max_watermarkcolumn_loaded} AND #{item().source_watermarkcolumn} <= #{activity('Watermarkvalue').output.firstRow.max_watermarkcolumn_loaded}
where the 'max_watermarkcolumn_loaded' is a datetime format and the 'activity' output is obviously a string format.
Please correct me if my assumption is wrong and let me know what I can do to fix.
EDIT:
screenshot of the error
ADF is picking a date from SQL column 'max_watermarkcolumn_loaded' in this format '"2021-09-29T06:11:16.333Z"' and I think thats where the problem is.
I tried to repro this error. I gave the parameter without single quotes to a sample Query.
Wrap the date parameters with single quotes.
Corrected Query
SELECT #{item().source_columns} FROM
#{item().source_schema}.#{item().source_table}
WHERE #{item().source_watermarkcolumn} >
'#{item().max_watermarkcolumn_loaded}' AND
#{item().source_watermarkcolumn} <=
'#{activity('Watermarkvalue').output.firstRow.max_watermarkcolumn_loaded}'
With this query, pipeline is run successfully.

Can't handle unfamiliar date format in BigQuery

I'm trying to query a BigQuery table that has column "date" (set to type DATE in the schema) formatted as yyyy-mm-dd-??. In other words, there's an extra set of information about the date and I'm not really sure what it is. When I try to query the "date" column I run into the error:
SQL Error [100032] [HY000]: [Simba]BigQueryJDBCDriver Error executing query job. Message: Invalid date: '2022-09-03-01'
I've tried cast(date as string), cast(left(date, 10) as string), all types of workarounds, but the error persists. It seems that no matter how much I try and nail it home in the query that I want this weird date column to be read as a string, so that I can work with it, BigQuery still wants to take it as a date, I guess because that's how it's setup in the schema. I don't care if this is parsed into a date properly or if it's read as a string and then I can parse it from there, I just want to be able to query the date column without getting an error.

Odd error with casting to timestamp in standard SQL/Tableau

The latest version of Tableau has started using standard SQL when it connects to Google's BigQuery.
I recently tried to update a large table but found that there appeared to be errors when trying to parse datetimes. The table originates as a CSV which is loaded into BigQuery where further manipulations happen. The datetime column in the original CSV contain strings in ISO standard date time format (basically yyyy-mm-dd hh:mm). This saves a lot of annoying manipulation later.
But on trying to convert the datetime strings in Tableau into dates or datetimes I got a bunch of errors. On investigation they seemed to come from BigQuery and looked like this:
Error: Invalid timestamp: '2015-06-28 02:01'
I thought at first this might be a Tableau issue so I loaded a chunk of the original CSV into Tableau directly where the conversion of the string to a data worked perfectly well.
I then tried simpler versions of the conversion (to a year rather than a full datetime) and they still failed. The generated SQL for the simplest conversion looks like this:
SELECT
EXTRACT(YEAR
FROM
CAST(`Arrival_Date` AS TIMESTAMP)) AS `yr_Arrival_Date_ok`
FROM
`some_dataset`.`some_table` `some_table`
GROUP BY
1
The invalid timestamp in the error message always looks to me like a perfectly valid timestamp. And further analysis suggests it doesn't happen for all the rows in the source table, just occasional ones.
This error did not appear in older versions of Tableau/BigQuery where legacy SQL was the default for Tableau. So i'm presuming it is a consequence of standard SQL.
So is there an intermittent problem with casting to timestamps in BigQuery? Or is this a Tableau problem which causes the SQL to be incorrectly formatted? And what can I do about it?
The seconds part in the canonical timestamp representation required if the hour and minute are also present. Try this instead with PARSE_TIMESTAMP and see if it works:
SELECT
EXTRACT(YEAR
FROM
PARSE_TIMESTAMP('%F %R', `Arrival_Date`)) AS `yr_Arrival_Date_ok`
FROM
`some_dataset`.`some_table`.`some_table`
GROUP BY
1

SSIS - Data source with SQL statement containing a Convert

I have created an SQL OLEDB data source (SQL Server) and I have the following Command Text:
select replace(convert(varchar(10), dob, 111), '/', '-') As DOB FROM Person
I get a warning on executing that says: [OLE DB Source [1]] Warning: Truncation may occur due to retrieving data from database column "DOB" with a length of 8000 to data flow column "DOB" with a length of 10.
When I try to change the external column from 8000 to 10 (as stated in the query), the designer automatically changes it back. I thought that external columns represented meta data in the data source. The dob in the data source (see query above) is a varchar(10). Why does it have to have a length of 8000? I don't have a lot of experience with SSIS.
I have found the solution. I had to do this:
select cast(replace(convert(varchar(10), dob, 111), '/', '-') As varchar(10)) As DOB from Person
There is another workaround to overcome this as: you can ignore the errors and execute the package from 'Truncation' erros by setting the appropriate property in 'Source' block's 'Error output' tab..

SSIS getdate into DateTimeOffset column - data value overflowed the type

I have an SSIS package. The source is a SQL query. The destination is a table. The package worked until I changed a column in a destination table from datetime to datetimeoffset(0).
Now, all records fail with a "Conversion failed because the data value overflowed the type used by the provider" error on this particular column.
The value in the source query is getdate(). I tried TODATETIMEOFFSET(getdate(),'-05:00') without success.
In fact, the only thing that has worked so far is to hard code the following into the source query:
cast('3/14/12' as datetime)
The only other interesting piece of information is that the package worked fine when running the source query against another server implying that maybe a setting is involved - but I see no obvious differences between the two servers.
I was going to suggest to add a "data conversion component" to deal with it, but since you changed only on the destination, it means that you can change your source query to do:
select cast(YOUR_DATE_COLUMN as datetimeoffset(0))
In case anyone else is looking, we found an alternative solution that works if the source is in SQL 2005 (no support for datetimeoffset).
select dateAdd(minute,datediff(minute,0,getutcdate()),0)
The intent is to reduce the precision. Granted I also lose seconds but if I try the above line with seconds I get an overflow error.