How to split column based on the min and max value of another column in postgresql - sql

I am very new to postgres trying to create a query but stuck halfway.
so here is the structure of my table:
so I need to Return a list of rows from the events table that has the following columns:
The customer id
The time difference (in seconds) between their
first and last events
The “types” of the first and last events
The location that the events originated from
I was able to create query but it does not solve point 3. and I am stuck.
select customer_id, location, EXTRACT(EPOCH FROM (max(tstamp) - min(tstamp))) AS difference
from events
GROUP BY customer_id ,location;
here is my partial solution output:
partial output
ANY help would be much appreciated.

location seems to be tied with the customer. For the rest, I would suggest conditional aggregation with row_number():
select customerid, location, min(tstamp), max(tstamp),
extract(epoch from max(tstamp) - min(tstamp)),
min(type) filter (where seqnum_asc = 1) as first_event,
min(type) filter (where seqnum_desc = 1) as last_event
from (select e.*,
row_number() over (partition by customerid order by tstamp) as seqnum_asc,
row_number() over (partition by customerid order by tstamp desc) as seqnum_desc
from events e
) e
group by customerid, location;

Related

Redshift - Group Table based on consecutive rows

I am working right now with this table:
What I want to do is to clear up this table a little bit, grouping some consequent rows together.
Is there any form to achieve this kind of result?
The first table is already working fine, I just want to get rid of some rows to free some disk space.
One method is to peak at the previous row to see when the value changes. Assuming that valid_to and valid_from are really dates:
select id, class, min(valid_to), max(valid_from)
from (select t.*,
sum(case when prev_valid_to >= valid_from + interval '-1 day' then 0 else 1 end) over (partition by id order by valid_to rows between unbounded preceding and current row) as grp
from (select t.*,
lag(valid_to) over (partition by id, class order by valid_to) as prev_valid_to
from t
) t
) t
group by id, class, grp;
If the are not dates, then this gets trickier. You could convert to dates. Or, you could use the difference of row_numbers:
select id, class, min(valid_from), max(valid_to)
from (select t.*,
row_number() over (partition by id order by valid_from) as seqnum,
row_number() over (partition by id, class order by valid_from) as seqnum_2
from t
) t
group by id, class, (seqnum - seqnum_2)

Why all the rank number become 1 when using a window function in a subquery

i have a table with traffic_id, date, start_time, session_id, page, platform, page-views, revenue, segment_id, and customer_id columns in my sessions table. Each customer_id could have multiple session_id with different revenue/date/start_time/page/platform/page_views/segment_id values. Sample data is shown below.
traffic_id|date|start_time|session_id|page|platform|page_views|revenue|segment_id|customer_id
303|1/1/2017|05:23:33|123457080|homepage|mobile|581|37.40|1|310559
I would like to know the max session revenue per customer and the session sequence number as the table shown below.
Customer_id|Date|Maximum|session_revenue|Session_id|Session_Sequence|
138858|1/13/17|100.44|123458749|5
I thought I could just use a subquery to do the job. But all the ranking values are 1 and session_id and date are wrong. Please help!---------------------------------------------------------------------------------------
SELECT max(revenue),customer_id, date, session_id, session_sequence
FROM (
SELECT
revenue,
date,
customer_id,
session_id,
RANK() OVER(partition by customer_id ORDER BY date,start_time ASC) AS session_sequence
FROM sessions
) AS a
group by customer_id
;
Your query should generate an error because the GROUP BY columns and SELECT columns are inconsistent.
Presumably you want the maximum revenue and the sequence number where that occurs.
SELECT s.*
FROM (SELECT s.*,
RANK() OVER (partition by customer_id ORDER BY date, start_time ASC) AS session_sequence,
MAX(revenue) OVER (PARTITION BY customer_id) as max_revenue
FROM sessions
) s
WHERE revenue = max_revenue;

Count occurences in a row using aggregate functions

Consider the following relation
column measured_at holds thousands of different timestamps and column cell_id holds the number of the cell tower used at each timestamp. I want to query for each day saved in measured_at, which cell tower has the most occurences (used the most at that day, here is time irrelevant, only the date is to query). This probably can be done using window functions, but I want to do it using only aggregate functions and simple queries.
an output should look like for example:
cell_id measured_at
27997442 2015-12-22
for the above example because on 22-12-2015 tower number 27997442 has been used the most.
You can use aggregation and distinct on. To get the counts:
select date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt
from t
group by dte, cell_id
And then extend this for only one value:
select distinct on (date_trunc(date, measured_at)) date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt
from t
group by dte, cell_id
order by date_trunc(date, measured_at), count(*) desc;
Of course, you can use window functions as well -- and that is a better approach if you want to get ties as well:
select dte, cell_id, cnt
from (select date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt,
rank() over (partition by date_trunc(date, measured_at) order by count(*) desc) as seqnum
from t
group by dte, cell_id
) dc
where seqnum = 1;

Running total in per year ordered by person based on latest date info

We try to calculate the running total in for each year ordered by person based on his latest date info. So i got an example for you how the data is ordered:
Expected result:
So for each downloaded date we want to running total in of all persons ordered by year (now the year is only 2018)
What do we have so far:
sum(Amount)
over(partition by [Year],[Person]
order by [Enddate)
where max(Downloaded)
Any idea how to fix this?
Just use window function
select *,
sum(Amount) over (partition by Year, Downloaded) RuningTotal
from table t
Try using a subquery with a moving downloaded date range.
SELECT
T.*,
RunningTotalByDate = (
SELECT
SUM(N.Amount)
FROM
YourTable AS N
WHERE
N.Downloaded <= T.Downloaded)
FROM
YourTable AS T
ORDER BY
T.Downloaded ASC,
T.Person ASC
Or with windowed SUM(). Do no include a PARTITION BY because it will reset the sum when the partitioned by column value changes.
SELECT
T.*,
RunningTotalByDate = SUM(T.Amount) OVER (ORDER BY T.Downloaded ASC)
FROM
YourTable AS T
ORDER BY
T.Downloaded ASC,
T.Person ASC

Tagging consecutive days

Supposedly I have data something like this:
ID,DATE
101,01jan2014
101,02jan2014
101,03jan2014
101,07jan2014
101,08jan2014
101,10jan2014
101,12jan2014
101,13jan2014
102,08jan2014
102,09jan2014
102,10jan2014
102,15jan2014
How could I efficiently code this in Greenplum SQL such that I can have a grouping of consecutive days similar to the one below:
ID,DATE,PERIOD
101,01jan2014,1
101,02jan2014,1
101,03jan2014,1
101,07jan2014,2
101,08jan2014,2
101,10jan2014,3
101,12jan2014,4
101,13jan2014,4
102,08jan2014,1
102,09jan2014,1
102,10jan2014,1
102,15jan2014,2
You can do this using row_number(). For a consecutive group, the difference between the date and the row_number() is a constant. Then, use dense_rank() to assign the period:
select id, date,
dense_rank() over (partition by id order by grp) as period
from (select t.*,
date - row_number() over (partition by id order by date) * 'interval 1 day'
from table t
) t