Extracting hour by using coalese in SQL on a timestamp - google-bigquery

I am trying to update a query to extract the hour from a timestamp and I keep getting an error. The error I get is due to the FROM clause I was using.
SELECT
analytics_platform_data_type
, activity_date_pt
, activity_timestamp_pt
, analytics_platform_timestamp_utc
, analytics_platform_timestamp_utc_iso
--This is the clause that is causing the problem (Begin)
, extract(hour from coalesce(activity_timestamp_pt)) as latd_hour_pt
--Clause above is the issue; Line above is line 9 (End)
, analytics_platform_ platform
, ad_channel_name
, publisher_name
, ip_address
, analytics_platform_unique_activity_id
, click_id
, latd_custom_fields
FROM table_date_range([AllData_AnalyticsMobileData_], timestamp('2018-09-
25'), timestamp('2018-09-27'))
where 1=1
and analytics_platform_data_type = 'CLICK'
and partner_name = 'ABC123'
If I remove the extract hour piece the query works fine. When I add it I get the error: Encountered " "FROM" "from "" at line 9, column 16. Was expecting: ")" ...
I have seen the clause I am trying to use in the above query used before, but it was a much more complex query that was using sub queries. Really not sure what the issue is. (Using Google Big Query Legacy SQL)

Your query is mixing Legacy Syntax (table_date_range) with Standard Syntax (Extract)
If for some reason you need to stick with Legacy SQL - use HOUR() instead of EXTRACT()
But it is much recommended to migrate stuff to Standard SQL - where you should use wildcard functions instead of table_date_range
Something like
FROM `project.dataset.AllData_AnalyticsMobileData_*`
WHERE _TABLE_SUFFIX BETWEEN '2018-09-25' AND '2018-09-27'
see more at https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#table_decorators_and_wildcard_functions in Migrating to Standard SQL doc

Related

timestamp VS TIMESTAMP_NTZ in snowflake sql

I am using an sql script to parse a json into a snowflake table using dbt.
One of the cols contain this datetime value: '2022-02-09T20:28:59+0000'.
What's the correct way to define ISO datetime's data type in Snowflake?
I tried date, timestamp and TIMESTAMP_NTZ like this in my dbt sql script:
JSON_DATA:",my_date"::TIMESTAMP_NTZ AS MY_DATE
but clearly, these aren't the correct one because later on when I test it in snowflake with select * , I get this error:
SQL Error [100040] [22007]: Date '2022-02-09T20:28:59+0000' is not recognized
or
SQL Error [100035] [22007]: Timestamp '2022-02-13T03:32:55+0100' is not recognized
so I need to know which Snowflake time/date data type suits the best for this one
EDIT:
This is what I am trying now.
SELECT
JSON_DATA:"date_transmission" AS DATE_TRANSMISSION
, TO_TIMESTAMP(DATE_TRANSMISSION:text, 'YYYY-MM-DDTHH24:MI:SS.FFTZH:TZM') AS DATE_TRANSMISSION_TS_UTC
, JSON_DATA:"authorizerClientId"::text AS AUTHORIZER_CLIENT_ID
, JSON_DATA:"apiPath"::text API_PATH
, MASTERCLIENT_ID
, META_FILENAME
, META_LOAD_TS_UTC
, META_FILE_TS_UTC
FROM {{ source('INGEST_DATA', 'TABLENAME') }}
I get this error:
000939 (22023): SQL compilation error: error line 6 at position 4
10:21:46 too many arguments for function [TO_TIMESTAMP(GET(DATE_TRANSMISSION, 'text'), 'YYYY-MM-DDTHH24:MI:SS.FFTZH:TZM')] expected 1, g
However, if I comment out the the first 2 lines(related to timpstamp types), the other two work perfectly fine. What's the correct syntax of parsing json with TO_TIMESTAMP?
Not that JSON_DATA:"apiPath"::text API_PATH gives the correct value for it in my snowflake tables.
Did some testing and it seems you have 2 options.
You can either get rid of the +0000 at the end: left(column_date, len(column_date)-5)
or try_to_timestamp with format
try_to_timestamp('2022-02-09T20:28:59+0000','YYYY-MM-DD"T"HH24:MI:SS+TZHTZM')
TZH and TZM are TimeZone Offset Hours and Minutes
So there are 2 main points here.
when getting data from JSON to pass to any of the timestamp functions that want a ::TEXT object, but the values to get from JSON are still ::VARIANT so they need to be cast. This is the cause of the error you quote
(22023): SQL compilation error: error line 6 at position 4
10:21:46 too many arguments for function [TO_TIMESTAMP(GET(DATE_TRANSMISSION, 'text'), 'YYYY-MM-DDTHH24:MI:SS.FFTZH:TZM')] expected 1, g
also your SQL is wrong there it should have been
TO_TIMESTAMP(DATE_TRANSMISSION::text,
How you handle the timezone format.As other have noted you (as I did in your last question) do you want to ignore the timezone values or read them. I forgot about the TZHTZM formatting. Given you have timezone data, you should use the TO_TIMESTAMP_TZ`TRY_TO_TIMESTAMP_TZto make sure the time zone data is keep, given you second example shows+0100`
putting those together (assuming you didn't want an extra date_transmission as a variant in you data) :
SELECT
TO_TIMESTAMP_TZ(JSON_DATA:"date_transmission"::text, 'YYYY-MM-DDTHH24:MI:SS+TZHTZM') AS DATE_TRANSMISSION_TS_UTC
, JSON_DATA:"authorizerClientId"::text AS AUTHORIZER_CLIENT_ID
, JSON_DATA:"apiPath"::text AS API_PATH
, MASTERCLIENT_ID
, META_FILENAME
, META_LOAD_TS_UTC
, META_FILE_TS_UTC
FROM {{ source('INGEST_DATA', 'TABLENAME') }}
You should use timestamp (not date which does not store the time information), but probably the format you are using is not autodetected. You can specify the input format as YYYY-MM-DD"T"HH24:MI:SSTZHTZM as shown here. The autodetected one has a : between the TZHTZM.

Apache Drill Timestampdiff on Oracle DB

Hey everyone im relativly new to Apache Drill and im having troubles converting my Oracle specific sql scripts (pl/sql) to Drill based querys.
For example i have a Scripts who checks for processed data in the last X Days.
In this script im using the the sysdate function.
Here is my old script:
SELECT i.id,i.status,status_text,i.kunnr,i.bukrs,i.belnr,i.gjahr,event,i.sndprn,i.createdate,executedate,tstamp,v.typ_text,i.docnum,i.description, i.*
FROM in_job i JOIN vstatus_injob v ON i.id= v.id
WHERE 1=1
AND i.createdate > sysdate - 30.5
order by i.createdate desc;
When i looked up in terms of drill specific Datetime Diff functions i found "TIMESTAMPDIFF".
So here is my "drillified" script:
SELECT i.id, i.status, status_text, i.kunnr, i.bukrs, i.belnr, i.gjahr, i.event, i.sndprn, i.createdate, i.executedate, i.tstamp,v.typ_text,i.docnum,i.description,i.*
FROM SchemaNAME.IN_JOB i JOIN SchemaNAME.VSTATUS_INJOB v ON i.id=v.id
WHERE TIMESTAMPDIFF(DAY, CURRENT_TIMESTAMP, i.createdate) >=30
And the Error that is returned reads like this:
DATA_READ ERROR: The JDBC storage plugin failed while trying setup the SQL query.
By further inspection i can see the Oracle specific error that reads:
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "TIMESTAMPDIFF": invalid ID
So now my question:
I thought apache drill replaces the function "TIMSTAMPDIFF" at runtime. But from what i can see in the logs its more like that Drill Hands over the Function Call "TIMESTAMPDIFF" to the Oracle database.
If thats true, how could i change my script to calculate the time difference (in days) and compare it to an int (ie 30 in the script).
If i use sysdate like above Apache Drill jumps in and says it doesnt know "sysdate".
How would you guyes handle that?
Thanks in advance and so long
:)
I have found a solution...
Just in Case someone (or even me in the future) is having a similar problem.
{
"queryType": "SQL",
"query": "select to_char(SELECT CURRENT_TIMESTAMP - INTERVAL XX MONTH FROM (VALUES(1)),'dd.MM.yy')"
}
With some to_char and the use of the CURRENT_TIMESTAMP - Interval Function Calls i can get everything i needed.
I took the query above packed it into an Grafana Variable, named it "timeStmpDiff" and then queried everything with an json Api Call to my Drill instance.
Basically:
"query" : "SELECT i.id, i.status, status_text, i.kunnr, i.bukrs, i.belnr, i.gjahr, i.event, i.sndprn, i.createdate, i.executedate, i.tstamp,v.typ_text,i.docnum,i.description,i.* FROM ${Schema}.IN_JOB i JOIN ${Schema}.VSTATUS_INJOB v ON i.id=v.id WHERE i.createdate >= '${timeStmpDiff}' order by i.createdate desc"
You can, of course query it in on go with an subselect.
But because i use grafana it made sense to me to bundle that in a Variable.

Google Cloud Big Query , Github Dataset Syntax Error

I am trying to make a query but google cloud gives a syntax error.
I had coppied this code which written in 2017 .
I have no idea about Sql
Syntax error: Unexpected "[" at [5:6]. If this is a table identifier, escape the name with `, e.g. `table.name` rather than [table.name].
The query is:
SELECT
f.repo_name,
f.path,
c.pkey
FROM
[bigquery-public-data:github_repos.files] f
JOIN (
SELECT
id,
You are probably using Standard SQL -- which is a good thing.
Try writing the table reference as:
FROM `bigquery-public-data.github_repos.files` f

Google BigQuery: TABLE_QUERY AND TABLE_DATE_RANGE

I have Big Query tables like below, and like to issue a query to the tables marked <=.
prefix_AAAAAAA_20170320
prefix_AAAAAAA_20170321
prefix_AAAAAAA_20170322 <=
prefix_AAAAAAA_20170323 <=
prefix_AAAAAAA_20170324 <=
prefix_AAAAAAA_20170325
prefix_BBBBBBB_20170320
prefix_BBBBBBB_20170321
prefix_BBBBBBB_20170322 <=
prefix_BBBBBBB_20170323 <=
prefix_BBBBBBB_20170324 <=
prefix_BBBBBBB_20170325
prefix_CCCCCCC_20170320
prefix_CCCCCCC_20170321
prefix_CCCCCCC_20170322
prefix_CCCCCCC_20170323
prefix_CCCCCCC_20170324
prefix_CCCCCCC_20170325
I made a query as this
SELECT * FROM
(TABLE_QUERY(mydataset,
'table_id CONTAINS "prefix" AND
(table_id CONTAINS "AAAAAA" OR table_id CONTAINS "BBBBBB")' )
AND
TABLE_DATE_RANGE(mydataset.prefix, TIMESTAMP('2017-03-22'), TIMESTAMP('2017-03-24')))
I got this error.
Error: Encountered " "AND" "AND "" at line 5, column 4. Was expecting: ")" ...
Does anybody has ideas?
You cannot mix TABLE_QUERY and TABLE_DATE_RANGE for exactly same FROM!
Try something like below
#legacySQL
SELECT *
FROM (TABLE_QUERY(mydataset, 'REGEXP_MATCH(table_id, "prefix_[AB]{7}_2017032[234]")'))
Consider Migrating to BigQuery Standard SQL
In this case you can Query Multiple Tables Using a Wildcard Table
See How to Migrate from TABLE_QUERY() to _TABLE_SUFFIX
I think, in this case your query can look like
#standardSQL
SELECT *
FROM `mydataset.prefix_*`
WHERE REGEXP_CONTAINS(_TABLE_SUFFIX, '[AB]{7}_2017032[234]')
I can not migrate to Standard SQL because ...
If I would like to search for example between 2017-03-29 and 2017-04-02, do you have any smart SQL
Try below version
#legacySQL
SELECT *
FROM (TABLE_QUERY(mydataset,
'REGEXP_MATCH(table_id, r"prefix_[AB]{7}_(\d){8}") AND
RIGHT(table_id, 8) BETWEEN "20170329" AND "20170402"'))
Of course yo can adjust above to use whatever exactly logic yo need to apply!

"Circular reference caused by ..." error in Access SQL (but not in T-SQL)

I have the following SQL statement which returns the desired result in SQL Server 2012:
SELECT
S.ONOMA
, S.DIEY
, S.POLH
, S.TK
, S.IDIOT
, S.KODIKOS
, S.AFM
FROM
SYNERG AS S
INNER JOIN
(SELECT
G.AFM, MIN(KODIKOS) AS KODIKOS
FROM SYNERG AS G
WHERE LEN(ISNULL(AFM, '')) != 0
GROUP BY AFM) AS I ON S.KODIKOS = I.KODIKOS
ORDER BY
S.AFM
but when I run the same SQL statement in MS Access 2007 I get an error:
Circular reference caused by 'KODIKOS' in query definition's SELECT list.
Any help would be appreciated.
As explained in the link by HansUp:
The alias of a calculated field cannot be identical to any of the field names used to calculate the field.
This can be rather annoying (esp. if it is a field that is returned by the query), but there is no way around it.
So you need to change the alias, e.g.:
SELECT
S.ONOMA
, S.DIEY
, S.POLH
, S.TK
, S.IDIOT
, S.KODIKOS
, S.AFM
FROM
SYNERG AS S
INNER JOIN
(SELECT
G.AFM, MIN(KODIKOS) AS MinKODIKOS
FROM SYNERG AS G
WHERE LEN(Nz(AFM, '')) <> 0
GROUP BY AFM) AS I ON S.KODIKOS = I.MinKODIKOS
ORDER BY
S.AFM
Note also that an IsNull() function exists in Access, but has a different meaning (it takes one argument and returns a Boolean). The corresponding function is Nz()
And (thanks #HansUp), the unequal operator is <>, not !=. I always use <> in SQL Server too, no need to make things more complicated than necessary. :)