Compare timestamps stored as strings to a string formatted date - google-bigquery

event_date contains timestamps stored as strings.
1382623200
1382682600
1384248600
...
How can I SELECT rows where event_date is less than a string formatted date? This is my best attempt:
SELECT *
FROM [analytics:workspace.events]
WHERE TIMESTAMP(event_date) < PARSE_UTC_USEC("2013-05-02 09:09:29");
I get all rows regardless of what date I pass to PARSE_UTC_USEC()

It looks like the event_date strings represent Unix seconds. Try this using standard SQL (uncheck "Use Legacy SQL" under "Show Options"):
WITH T AS (
SELECT x, event_date
FROM UNNEST(['1382623200',
'1382682600',
'1384248600']) AS event_date WITH OFFSET x
)
SELECT *
FROM (
SELECT * REPLACE (TIMESTAMP_SECONDS(CAST(event_date AS INT64)) AS event_date)
FROM T
)
WHERE event_date < '2013-05-02 09:09:29';
The subquery converts the event_date string into a timestamp using the REPLACE clause.

Try below. Hope this helps
SELECT event_date, TIMESTAMP(event_date) as ts
FROM -- [analytics:workspace.events]
(
SELECT event_date FROM
(SELECT '1382623200' AS event_date),
(SELECT '1382682600' AS event_date),
(SELECT '1384248600' AS event_date)
)
WHERE TIMESTAMP(event_date) < PARSE_UTC_USEC("2013-10-25 07:30:00")
above is just example - you should use your table in real life:
SELECT event_date, TIMESTAMP(event_date) as ts
FROM [analytics:workspace.events]
WHERE TIMESTAMP(event_date) < PARSE_UTC_USEC("2013-10-25 07:30:00")

Related

greenplum string_agg conversion into hivesql supported

We are migrating greenplum sql query to hivesql and please find below statement available, string_agg. how do we migrate, kindly help us. below sample greenplum code needed for migration hive.
select string_agg(Display_String, ';' order by data_day )
select string_agg(Display_String, ';' order by data_day )
from
(
select data_day,
sum(revenue)/1000000.00 as revenue,
data_day||' '||trim(to_char(sum(revenue),'9,999,999,999')) as Display_String
from(
select case when data_date = current_date then 'D:'
when data_date = current_date - 1 then ' D-01:'
when data_date = current_date - 2 then ' D-02:'
when data_date = current_date - 7 then ' D-07:'
when data_date = current_date - 28 then ' D-28:'
end data_day, revenue/1000000.00 revenue
from test.testable
where data_date between current_date - 28 and current_date and hour <=(Select hour from ( select row_number() over(order by hour desc) iRowsID, hour from test.testable where data_date = current_date and type = 'UVC')tbl1
where irowsid = 2) and type in( 'UVC')
order by 1 desc) a
group by 1)aa;
There is nothing like this in hive. However you can use collect list and partition by/Order by to calculate it.
select concat_ws(';', max(concat_str))
FROM (
SELECT collect_list(Display_String) over (order by data_day ) concat_str
FROM
(your above SQL) s ) concat_qry)r
Explanation -
collect list concats the string and while doing it it, order by orders data on day column.
Outermost MAX() will pickup max data for the concatenated string.
Pls note this is a very slow operation. Test performance as well before implementing it.
Here is a sample SQL and result to help you.
select
id, concat_ws(';', max(concat_str))
from
( select
s.id, collect_list(s.c) over (partition by s.id order by s.c ) concat_str
from
( select 1 id,'ax' c union
select 1,'b'
union select 2,'f'
union select 2,'g'
union all select 1,'b'
union all select 1,'b' )s
) gs
group by id

Windows functions orderen by date when some dates doesn't exist

Suppose this example query:
select
id
, date
, sum(var) over (partition by id order by date rows 30 preceding) as roll_sum
from tab
When some dates are not present on date column the window will not consider the unexistent dates. How could i make this windowns aggregation including these unexistent dates?
Many thanks!
You can join a sequence containing all dates from a desired interval.
select
*
from (
select
d.date,
q.id,
q.roll_sum
from unnest(sequence(date '2000-01-01', date '2030-12-31')) d
left join ( your_query ) q on q.date = d.date
) v
where v.date > (select min(my_date) from tab2)
and v.date < (select max(my_date) from tab2)
In standard SQL, you would typically use a window range specification, like:
select
id,
date,
sum(var) over (
partition by id
order by date
range interval '30' day preceding
) as roll_sum
from tab
However I am unsure that Presto supports this syntax. You can resort a correlated subquery instead:
select
id,
date,
(
select sum(var)
from tab t1
where
t1.id = t.id
and t1.date >= t.date - interval '30' day
and t1.date <= t.date
) roll_sum
from tab t
I don't think Presto support window functions with interval ranges. Alas. There is an old fashioned way to doing this, by counting "ins" and "outs" of values:
with t as (
select id, date, var, 1 as is_orig
from t
union all
select id, date + interval '30 day', -var, 0
from t
)
select id.*
from (select id, date, sum(var) over (partition by id order by date) as running_30,
sum(is_org) as is_orig
from t
group by id, date
) id
where is_orig > 0

Check if timestamp is contained in date

I'm trying to check if a datetime is contained in current date, but I'm not veing able to do it.
This is my query:
select
date(timestamp) as event_date,
count(*)
from pixel_logs.full_logs f
where 1=1
where event_date = CUR_DATE()
How can I fix it?
Like Mikhail said, you need to use CURRENT_DATE(). Also, count(*) requires you to GROUP BY the date in your example. I do not know how your data is formatted, but one way to modify your query:
#standardSQL
WITH
table AS (
SELECT
1494977678 AS timestamp_secs) -- Current timestamp (in seconds)
SELECT
event_date,
COUNT(*) as count
FROM (
SELECT
DATE(TIMESTAMP_SECONDS(timestamp_secs)) AS event_date,
CURRENT_DATE()
FROM
table)
WHERE
event_date = CURRENT_DATE()
GROUP BY
event_date;

Compare datetimes from SQL Server table with a datetime from user and get the date from table closest to the user's datetime

I have a table in SQL Server 2014 with time stamps.
This is my table:
I want to compare each time stamp from my table with a time stamp that I input and get from my table the time stamp for which the datediff(table_timeStamp, #myTimestamp) is the smallest. Hope it is clear what I want. This is for a function and I want to know how can I do that in the easiest way possible?
SELECT TOP 1 * ........... order by ABS(datediff(second,table_timeStamp, #myTimestamp) )
If you want the closest date previous to the date input:
;With cteTestDates As
(
Select *, DateDiff(Second,datefield1, '2016-07-01') DateDifference
From TestDates
)
Select Top 1 *
From cteTestDates
Where DateDifference >= 0
Order By DateDifference
If you want the closest date regardless if it is in the past or future:
;With cteTestDates As
(
Select *, ABS(DateDiff(Second,datefield1, '2016-07-03')) DateDifference
From TestDates
)
Select Top 1 *
From cteTestDates
Order By DateDifference
That query works much faster if table has an index on Time_Stamp column.
The winner query will ALWAYS do a table scan and NEWER use index at all.
SELECT TOP 1 Time_Stamp FROM
SELECT Max(Time_Stamp) as Time_Stamp FROM MyTable WHERE Time_Stamp < #myTimestamp
UNION
SELECT MIN(Time_Stamp) as Time_Stamp FROM MyTable WHERE Time_Stamp > #myTimestamp)
ORDER BY 1

Google Bigquery How to convert lag() funcion result to timestamp

Hi is it possible to convert the result of a lag function to timestamp. I basically want to get the diff of two timestamp in seconds.
With the following codes the system tells me that the type of 'last_timestamp' is unknown. When i put the mouse cursor on the column 'last_timestamp' of the inner query, then i can see that its of type timestamp.
SELECT clientId, timestamp
FROM (
SELECT clientId, timestamp,
LAG(timestamp,1) OVER
(PARTITION BY clientId ORDER BY timestamp)
AS last_timestamp
FROM [oxidation.201602]
) last
WHERE (TIMESTAMP_TO_SEC(timestamp) - TIMESTAMP_TO_SEC(last_timestamp) >= (60 * 30))
OR last_timestamp IS NULL
SELECT clientId, timestamp
FROM (
SELECT clientId, timestamp, timestamp_sec,
LAG(timestamp_sec, 1)
OVER (PARTITION BY clientId ORDER BY timestamp_sec) AS prev_timestamp_sec
FROM (
SELECT clientId, timestamp, TIMESTAMP_TO_SEC(timestamp) as timestamp_sec
FROM [oxidation.201602]
)
) last
WHERE timestamp_sec - prev_timestamp_sec >= 60 * 30
OR prev_timestamp_sec IS NULL