Count occurences in a row using aggregate functions

Count occurences in a row using aggregate functions - sql

Consider the following relation
column measured_at holds thousands of different timestamps and column cell_id holds the number of the cell tower used at each timestamp. I want to query for each day saved in measured_at, which cell tower has the most occurences (used the most at that day, here is time irrelevant, only the date is to query). This probably can be done using window functions, but I want to do it using only aggregate functions and simple queries.
an output should look like for example:
cell_id measured_at
27997442 2015-12-22
for the above example because on 22-12-2015 tower number 27997442 has been used the most.

You can use aggregation and distinct on. To get the counts:
select date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt
from t
group by dte, cell_id
And then extend this for only one value:
select distinct on (date_trunc(date, measured_at)) date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt
from t
group by dte, cell_id
order by date_trunc(date, measured_at), count(*) desc;
Of course, you can use window functions as well -- and that is a better approach if you want to get ties as well:
select dte, cell_id, cnt
from (select date_trunc(date, measured_at) as dte, cell_id, count(*) as cnt,
rank() over (partition by date_trunc(date, measured_at) order by count(*) desc) as seqnum
from t
group by dte, cell_id
) dc
where seqnum = 1;

Related

How to split column based on the min and max value of another column in postgresql

I am very new to postgres trying to create a query but stuck halfway.
so here is the structure of my table:
so I need to Return a list of rows from the events table that has the following columns:
The customer id
The time difference (in seconds) between their
first and last events
The “types” of the first and last events
The location that the events originated from
I was able to create query but it does not solve point 3. and I am stuck.
select customer_id, location, EXTRACT(EPOCH FROM (max(tstamp) - min(tstamp))) AS difference
from events
GROUP BY customer_id ,location;
here is my partial solution output:
partial output
ANY help would be much appreciated.

location seems to be tied with the customer. For the rest, I would suggest conditional aggregation with row_number():
select customerid, location, min(tstamp), max(tstamp),
extract(epoch from max(tstamp) - min(tstamp)),
min(type) filter (where seqnum_asc = 1) as first_event,
min(type) filter (where seqnum_desc = 1) as last_event
from (select e.*,
row_number() over (partition by customerid order by tstamp) as seqnum_asc,
row_number() over (partition by customerid order by tstamp desc) as seqnum_desc
from events e
) e
group by customerid, location;

How to calculate the median in Postgres?

I have created a basic database (picture attached) Database, I am trying to find the following:
"Median total amount spent per user in each calendar month"
I tried the following, but getting errors:
SELECT
user_id,
AVG(total_per_user)
FROM (SELECT user_id,
ROW_NUMBER() over (ORDER BY total_per_user DESC) AS desc_total,
ROW_NUMBER() over (ORDER BY total_per_user ASC) AS asc_total
FROM (SELECT EXTRACT(MONTH FROM created_at) AS calendar_month,
user_id,
SUM(amount) AS total_per_user
FROM transactions
GROUP BY calendar_month, user_id) AS total_amount
ORDER BY user_id) AS a
WHERE asc_total IN (desc_total, desc_total+1, desc_total-1)
GROUP BY user_id
;

In Postgres, you could just use aggregate function percentile_cont():
select
user_id,
percentile_cont(0.5) within group(order by total_per_user) median_total_per_user
from (
select user_id, sum(amount) total_per_user
from transactions
group by date_trunc('month', created_at), user_id
) t
group by user_id
Note that date_trunc() is probably closer to what you want than extract(month from ...) - unless you do want to sum amounts of the same month for different years together, which is not how I understood your requirement.

Just use percentile_cont(). I don't fully understand the question. If you want the median of the monthly spending, then:
SELECT user_id,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_per_user
ROW_NUMBER() over (ORDER BY total_per_user DESC) AS desc_total,
ROW_NUMBER() over (ORDER BY total_per_user ASC) AS asc_total
FROM (SELECT DATE_TRUNC('month', created_at) AS calendar_month,
user_id, SUM(amount) AS total_per_user
FROM transactions t
GROUP BY calendar_month, user_id
) um
GROUP BY user_id;
There is a built-in function for median. No need for fancier processing.

ORACLE SQL: Find last minimum and maximum consecutive period

I have the sample data set below which list the water meters not working for specific reason for a certain range period (jan 2016 to december 2018).
I would like to have a query that retrieves the last maximum and minimum consecutive period where the meter was not working within that range of period.
any help will be greatly appreciated.

You have two options:
select code, to_char(min_period, 'yyyymm') min_period, to_char(max_period, 'yyyymm') max_period
from (
select code, min(period) min_period, max(period) max_period,
max(min(period)) over (partition by code) max_min_period
from (
select code, period, sum(flag) over (partition by code order by period) grp
from (
select code, period,
case when add_months(period, -1)
= lag(period) over (partition by code order by period)
then 0 else 1 end flag
from (select mrdg_acc_code code, to_date(mrdg_per_period, 'yyyymm') period from t)))
group by code, grp)
where min_period = max_min_period
Explanation:
flag rows where period is not equal previous period plus one month,
create column grp which sums flags consecutively,
group data using code and grp additionaly finding maximal start of period,
show only rows where min_period = max_min_period
Second option is recursive CTE available in Oracle 11g and above:
with
data(period, code) as (
select to_date(mrdg_per_period, 'yyyymm'), mrdg_acc_code from t
where mrdg_per_period between 201601 and 201812),
cte (period, code) as (
select to_char(period, 'yyyymm'), code from data
where (period, code) in (select max(period), code from data group by code)
union all
select to_char(data.period, 'yyyymm'), cte.code
from cte
join data on data.code = cte.code
and data.period = add_months(to_date(cte.period, 'yyyymm'), -1))
select code, min(period) min_period, max(period) max_period
from cte group by code
Explanation:
subquery data filters only rows from 2016 - 2018 additionaly converting period to date format. We need this for function add_months to work.
cte is recursive. Anchor finds starting rows, these with maximum period for each code. After union all is recursive member, which looks for the row one month older than current. If it finds it then net row, if not then stop.
final select groups data. Notice that period which were not consecutive were rejected by cte.
Though recursive queries are slower than traditional ones, there can be scenarios where second solution is better.
Here is the dbfiddle demo for both queries. Good luck.

use aggregate function with group by
select max(mdrg_per_period) mdrg_per_period, mrdg_acc_code,max(mrdg_date_read),rea_Desc,min(mdrg_per_period) not_working_as_from
from tablename
group by mrdg_acc_code,rea_Desc

This is a bit tricky. This is a gap-and-islands problem. To get all continuous periods, it will help if you have an enumeration of months. So, convert the period to a number of months and then subtract a sequence generated using row_number(). The difference is constant for a group of adjacent months.
This looks like:
select acc_code, min(period), max(period)
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum);
Then, if you want the last one for each account, you can use a subquery:
select t.*
from (select acc_code, min(period), max(period),
row_number() over (partition by acc_code order by max(period desc) as seqnum
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum)
) t
where seqnum = 1;

Tagging consecutive days

Supposedly I have data something like this:
ID,DATE
101,01jan2014
101,02jan2014
101,03jan2014
101,07jan2014
101,08jan2014
101,10jan2014
101,12jan2014
101,13jan2014
102,08jan2014
102,09jan2014
102,10jan2014
102,15jan2014
How could I efficiently code this in Greenplum SQL such that I can have a grouping of consecutive days similar to the one below:
ID,DATE,PERIOD
101,01jan2014,1
101,02jan2014,1
101,03jan2014,1
101,07jan2014,2
101,08jan2014,2
101,10jan2014,3
101,12jan2014,4
101,13jan2014,4
102,08jan2014,1
102,09jan2014,1
102,10jan2014,1
102,15jan2014,2

You can do this using row_number(). For a consecutive group, the difference between the date and the row_number() is a constant. Then, use dense_rank() to assign the period:
select id, date,
dense_rank() over (partition by id order by grp) as period
from (select t.*,
date - row_number() over (partition by id order by date) * 'interval 1 day'
from table t
) t

SQL Query Early/Late dates

I am trying to create an SQL view, based on results from the earliest and latest dates. I am aware of the min and max functions but I've not been able to implement it correctly. So far I have:
select distinct
name,
study,
group,
ROUND (TLength * POWER (TWidth, 2) * 0.000523, 3) as Volume,
firstDate as firstDate,
lastDate as lastDate
from
(select
name,
study,
group,
min(operation_time) firstDate,
max(operation_time) lastDate,
MAX(DECODE (ACTIVITY,'length', RESULT_VALUE, NULL)) TLength,
MAX(DECODE (ACTIVITY,'width', RESULT_VALUE,NULL)) TWidth
from mx_all_data_vw
where mx_all_data_vw.study_name like '%MT%'
group by name, group study);
This gives me a single row for either the earliest or latest date, and two columns with earliest and latest dates.
I want 2 rows, that has a row containing all data for earliest date and another containing all data for latest date, rather than two columns seperating the early and late dates.
Thanks.

Simplified for readability:
SELECT *
FROM (
SELECT mx_all_data_vw.*,
ROW_NUMBER() OVER (PARTITION BY name, study, "group" ORDER BY operation_time) rna,
ROW_NUMBER() OVER (PARTITION BY name, study, "group" ORDER BY operation_time DESC) rnd,
DECODE(activity, 'length', result_value, NULL) AS TLength,
DECODE(activity, 'width', result_value, NULL) AS TWidth
FROM mx_all_data_vw
WHERE mx_all_data_vw.study_name like '%MT%'
)
WHERE 1 IN (rna, rnd)
Add the computed expressions instead of *.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Count occurences in a row using aggregate functions - sql

Related

How to split column based on the min and max value of another column in postgresql

How to calculate the median in Postgres?

ORACLE SQL: Find last minimum and maximum consecutive period

Tagging consecutive days

SQL Query Early/Late dates

Categories

Resources