Syntax error in redshift sql query with subqueries - sql

I'm quite new in SQL in general and haven't deal with redshift before. I'm trying to make one query, which works perfectly in postgresql. But I get syntax error in redshift. The query is:
SELECT
test.table_1.user_id as user_id,
test.table_1.timestamp as start_session,
test.table_1.step_3 :: timestamp + interval '1 hour' as end_session,
test.table_1.step_3 :: timestamp + interval '1 hour' - test.table_1.timestamp :: timestamp as session_duration
FROM (SELECT *,
min(case when page = 'second_page' then timestamp end) OVER (partition by user_id order by timestamp desc rows between unbounded preceding and unbounded following) as step_2,
min(case when page = 'third_page' then timestamp end) OVER (partition by user_id order by timestamp desc rows between unbounded preceding and unbounded following) as step_3
FROM test.table_1) test.table_1
WHERE
test.table_1.page = 'first_page' AND
step_2 > test.table_1.timestamp AND
step_3 > step_2 AND
step_3 :: timestamp - step_2 :: timestamp < '1 hour' AND
step_2 :: timestamp - test.table_1.timestamp :: timestamp < '1 hour'
ORDER BY
user_id,start_session
The error is Error running query: syntax error at or near "." LINE 11: FROM test.vimbox_pages) test.vimbox_pages ^ in line FROM test.table_1) test.table_1
I don't understand what's wrong there.
By this query I'm trying to get session list of users actions during reading pages in some order.
Will be thankful for any help!

Aliases are identifiers and need to follow the rules for identifiers. You can also simplify your query in other ways:
SELECT t.user_id, t.timestamp as start_session,
(t.step_3::timestamp + interval '1 hour' as end_session),
(t.step_3::timestamp + interval '1 hour' - t.timestamp::timestamp) as session_duration
FROM (SELECT t.*,
MIN(CASE WHEN page = 'second_page' THEN timestamp END) OVER (PARTITION BY user_id) as step_2,
MIN(CASE WHEN page = 'third_page' THEN timestamp END) OVER (partition by user_id) as step_3
FROM test.table_1 t
) t
WHERE t.page = 'first_page' AND
step_2 > t.timestamp AND
step_3 > step_2 AND
step_3::timestamp < step_2::timestamp + interval '1 hour' AND
step_2::timestamp < timestamp + interval '1 hour'
ORDER BY user_id, start_session;
Notes:
Your windowing clause is unnecessarily complex. No ORDER BY is necessary if you want the entire window range.
The conversions to timestamp should be unnecessary, given the names of the columns. But I have left them in.
t.user_id as user_id is redundant. The column name is going to be user_id anyway.
I don't ever see spaces around ::. Of course they are allowed, but the type conversion has very high precedence and is typically written without spaces.
I prefer timestamp comparisons to timestamps, rather than converting to intervals. Strange things can happen with intervals.

Related

How to get the percentage change from same time 7 days ago?

I have a big PostgreSQL database with time series data.
I query the data with a resample to one hour. What I want is to compare the the mean value from the last hour to the value 7 days ago at the same time and I don't know how to do it.
This is what I use to get the latest value.
SELECT DATE_TRUNC('hour', datetime) AS time, AVG(value) as value, id FROM database
GROUP BY id, time
WHERE datetime > now()- '01:00:00'::interval
You can use a CTE to calculate last week's average in the same time period, then join on id and hour.
with last_week as
(
SELECT
id,
extract(hour from datetime) as time,
avg(value) as avg_value
FROM my_table
where DATE_TRUNC('hour', datetime) =
(date_trunc('hour', now() - interval '7 DAYS'))
group by 1,2
)
select n.id,
DATE_TRUNC('hour', n.datetime) AS time_now,
avg(n.value) as avg_now,
t.avg_value as avg_last_week
from my_table n
left join last_week t
on t.id = n.id
and t.time = extract(hour from n.datetime)
where datetime > now()- '01:00:00'::interval
group by 1,2,4
order by 1
I'm making a few assumptions on how your data appear.
**EDIT - JUST NOTICED YOU ASKED FOR PERCENT CHANGE
Showing change as decimal...
select id,
extract(hour from time_now) as hour_now,
avg_now,
avg_last_week,
coalesce(((avg_now - avg_last_week) / avg_last_week), 0) AS CHANGE
from (
with last_week as
(
SELECT
id,
extract(hour from datetime) as time,
avg(value) as avg_value
FROM my_table
where DATE_TRUNC('hour', datetime) =
(date_trunc('hour', now() - interval '7 DAYS'))
group by 1,2
)
select n.id,
DATE_TRUNC('hour', n.datetime) AS time_now,
avg(n.value) as avg_now,
t.avg_value as avg_last_week
from my_table n
left join last_week t
on t.id = n.id
and t.time = extract(hour from n.datetime)
where datetime > now()- '01:00:00'::interval
group by 1,2,4
)z
group by 1,2,3,4
order by 1,2
db-fiddle found here: https://www.db-fiddle.com/f/rWJATypGzHPZ8sG2vXAGXC/4

Finding groups within time series in Postgre

I have a list of timestamps and i want to tag them as a group when they are close enough (less than 15 sec intervall). This is what I want to have eventually :
time
group number
18:01:00
1
18:01:06
1
18:10:00
/
18:20:30
2
18:20:40
2
18:20:50
2
18:25:02
/
Use lag() and date comparisons to determine where a group begins. Then use a cumulative sum. You actually only want to include rows that have multiple rows in the group, so this is a little tricker than the simple gaps-and-islands problem:
select t.*,
(case when prev_time > time - interval '15 second' or
next_time < time + interval '15 second'
then sum(case when prev_time > time - interval '15 sec' then 0 else 1 end) over (order by time)
end) as group_number
from (select t.*,
lag(time) over (order by time) as prev_time,
lead(time) over (order by time) as next_time
from t
) t

Postgres: must appear in the GROUP BY while I am using an aggregate function

When I run the sql as below I receive the following error message:
column subQuery.numbers must appear in the GROUP BY clause or be used in an aggregate function**"
I don't understand why this error comes out while I'm using aggregate functions sum and count in left join with an alias.
I think that the parent query doesn't recognize the subquery with its alias("subQuery").
I'm trying to find the solution but i didn't find other cases like me.
Could you please explain me why this error comes out while aggregate function used?
select to_char(customer1.date_time, 'MM-dd') as DateTime,
(case when subQuery.numbers is null then 0 else subQuery.numbers end) as "2019-numbers",
(case when subQuery.amount is null then 0 else subQuery.amount end) as "2019-amount"
from customer_table customer1
left join (
select to_char(customer2.date_time, 'MM-dd') as DateTime,
count(*) as numbers,
sum(amount) as amount
from customer_table customer2
where customer2.date_time > date_trunc('day', (now() - interval '1 day') - interval '1 year')
and customer2.date_time < date_trunc('day', now() - interval '1 year')
and customer2.status = 'OK'
group by to_char(customer2.date_time, 'MM-dd')) as subQuery on subQuery.DateTime = to_char(customer1.date_time, 'MM-dd')
where customer1.date_time > date_trunc('day', now() - interval '1 day')
and customer1.date_time < current_date - interval '1 day' + time '23:59'
group by to_char(customer1.date_time, 'MM-dd');
Simply add subQuery.numbers and subQuery.amount (they appear in the SELECT list) to the GROUP BY clause.
Otherwise, which of the several subQuery.numbers that belong to one to_char(customer1.date_time, 'MM-dd') should be used?

PostgreSQL generate_series with WHERE clause

I'm having an issue generating a series of dates and then returning the COUNT of rows matching that each date in the series.
SELECT generate_series(current_date - interval '30 days', current_date, '1 day':: interval) AS i, COUNT(*)
FROM download
WHERE product_uuid = 'someUUID'
AND created_at = i
GROUP BY created_at::date
ORDER BY created_at::date ASC
I want the output to be the number of rows that match the current date in the series.
05-05-2018, 35
05-06-2018, 23
05-07-2018, 0
05-08-2018, 10
...
The schema has the following columns: id, product_uuid, created_at. Any help would be greatly appreciated. I can add more detail if needed.
Put the table generating function in the from and use a join:
SELECT g.dte, COUNT(d.product_uuid)
FROM generate_series(current_date - interval '30 days', current_date, '1 day':: interval
) gs(dte) left join
download d
on d.product_uuid = 'someUUID' AND
d.created_at::date = g.dte
GROUP BY g.dte
ORDER BY g.dte;

Difference of datetime column in SQL

I have a table of 20000 records. each Record has a datetime field. I want to select all records where gap between one record and subsequent record is more than one hour [condition to be applied on datetime field].
can any one give me the SQL command code for this purpose.
regards
KAM
ANSI SQL supports the lead() function. However, date/time functions vary by database. The following is the logic you want, although the exact syntax varies, depending on the database:
select t.*
from (select t.*,
lead(datetimefield) over (order by datetimefield) as next_datetimefield
from t
) t
where datetimefield + interval '1 hour' < next_datetimefield;
Note: In Teradata, the where would be:
where datetimefield + interval '1' hour < next_datetimefield;
This can also be done with a sub query, which should work on all DBMS. As gordon said, date/time functions are different in every one.
SELECT t.* FROM YourTable t
WHERE t.DateCol + interval '1 hour' < (SELECT min(s.DateCol) FROM YourTable s
WHERE t.ID = s.ID AND s.DateCol > t.DateCol)
You can replace this:
t.DateCol + interval '1 hour'
With one of this so it will work on almost every DBMS:
DATE_ADD( t.DateCol, INTERVAL 1 hour)
DATEADD(hour,1,t.DateCol)
Although Teradata doesn't support Standard SQL's LEAD it's easy to rewrite:
select tab.*,
min(ts) over (order by ts rows between 1 following and 1 following) as next_ts
from tab
qualify
ts < next_ts - interval '1' hour
If you don't need to show the next timestamp:
select *
from tab
qualify
ts < min(ts) over (order by ts rows between 1 following and 1 following) - interval '1' hour
QUALIFY is a Teradata extension, but really nice to have, similar to HAVING after GROUP BY