Add cumulative total sum over many columns in Postgres - sql

My table is like this:
+----+--------+--------+--------+---------+
| id | type | c1 | c2 | c3 |
+----+--------+--------+--------+---------+
| a | 0 | 10 | 10 | 10 |
| a | 0 | 0 | 10 | |
| a | 0 | 50 | 10 | |
| c | 0 | | 10 | 20 |
| c | 0 | | 10 | |
+----+--------+--------+--------+---------+
I need to the output like this:
+----+---------+--------+--------+---------+
| id | type | c1 | c2 | c3 |
+----+---------+--------+--------+---------+
| a | 0 | 10 | 10 | 10 |
| a | 0 | 0 | 10 | |
| a | 0 | 50 | 10 | |
| c | 0 | | 10 | 20 |
| c | 0 | | 10 | |
+----+---------+--------+--------+---------+
|total | 0 | 60 | 50 | 30 |
+------------------------------------------+
|cumulative| 0 | 60 | 110 | 140 |
+------------------------------------------+
My query so far:
WITH res_1 AS
(SELECT id,c1,c3,c3 FROM cloud10k.dash_reportcard),
res_2 AS
(SELECT 'TOTAL'::VARCHAR, SUM(c1),SUM(c2),SUM(c3) FROM cloud10k.dash_reportcard)
SELECT * FROM res_1
UNION ALL
SELECT * FROM res_2;
It produces a sum total per column.
How can I add the cumulative total sum?
Note: the demo has 3 data columns, my actual table has more than 250.

It would be very tedious and increasingly inefficient to list 250 columns over and over for the sum of columns - an O(n²) problem in disguise. Effectively, you want the equivalent of a window-function to calculate the running total over columns instead of rows.
You can:
Transform the row to a set ("unpivot").
Run the window aggregate function sum() OVER (...).
Transform the set back to a row ("pivot").
WITH total AS (
SELECT 'total'::text AS id, 0 AS type
, sum(c1) AS s1, sum(c2) AS s2, sum(c3) AS s3 -- more ...
FROM cloud10k.dash_reportcard
)
TABLE cloud10k.dash_reportcard
UNION ALL
TABLE total
UNION ALL
SELECT 'cumulative', 0, a[1], a[2], a[3] -- more ...
FROM (
SELECT ARRAY(
SELECT sum(v.s) OVER (ORDER BY rn)
FROM total
, LATERAL (VALUES (1, s1), (2, s2), (3, s3)) v(rn, s) -- more ...
)::int[] AS a
) sub;
See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
SELECT DISTINCT on multiple columns
The last step could also be done with crosstab() from the tablefunc module, but for this simple case it's simpler to just aggregate into an array and break out elements to a separate columns in the outer SELECT.
Alternative for Postgres 9.1
Same as above, but:
...
UNION ALL
SELECT 'cumulative'::text, 0, a[1], a[2], a[3] -- more ...
FROM (
SELECT ARRAY(
SELECT sum(v.s) OVER (ORDER BY rn)
FROM (
SELECT row_number() OVER (), s
FROM unnest((SELECT ARRAY[s1, s2, s3] FROM total)) s -- more ...
) v(rn, s)
)::int[] AS a
) sub;
Consider:
PostgreSQL unnest() with element number
db<>fiddle here - demonstrating both
Old sqlfiddle

Just add another CTE to get cumulative row:
WITH res_1 AS
(SELECT id,c1,c2,c3
FROM dash_reportcard),
res_2 AS
(SELECT 'TOTAL'::VARCHAR, SUM(c1) AS sumC1,
SUM(c2) AS sumC2, SUM(c3) AS sumC3
FROM dash_reportcard),
res_3 AS
(SELECT 'CUMULATIVE'::VARCHAR, sumC1,
sumC2+sumC1, sumC1+sumC2+sumC3
FROM res_2)
SELECT * FROM res_1
UNION ALL
SELECT * FROM res_2
UNION ALL
SELECT * FROM res_3;
Demo here

WITH total AS (
SELECT 'TOTAL'::VARCHAR, SUM(c1) AS sumc1, SUM(c2) AS sumc2, SUM(c3) AS sumc3
FROM cloud10k.dash_reportcard
), cum_total AS (
SELECT 'CUMULATIVE'::varchar, sumc1, sumc1+sumc2, sumc1+sumc2+sumc3
FROM total
)
SELECT id, c1, c2, c3 FROM cloud10k.dash_reportcard
UNION ALL
SELECT * FROM total
UNION ALL
SELECT * FROM cum_total;

Related

Oracle SQL, how to select * having distinct columns

I want to have a query something like this (this doesn't work!)
select * from foo where rownum < 10 having distinct bar
Meaning I want to select all columns from ten random rows with distinct values in column bar. How to do this in Oracle?
Here is an example. I have the following data
| item | rate |
-------------------
| a | 50 |
| a | 12 |
| a | 26 |
| b | 12 |
| b | 15 |
| b | 45 |
| b | 10 |
| c | 5 |
| c | 15 |
And result would be for example
| item no | rate |
------------------
| a | 12 | --from (26 , 12 , 50)
| b | 45 | --from (12 ,15 , 45 , 10)
| c | 5 | --from (5 , 15)
Aways having distinct item no
SQL Fiddle
Oracle 11g R2 Schema Setup:
Generate a table with 12 items A - L each with rates 0 - 4:
CREATE TABLE items ( item, rate ) AS
SELECT CHR( 64 + CEIL( LEVEL / 5 ) ),
MOD( LEVEL - 1, 5 )
FROM DUAL
CONNECT BY LEVEL <= 60;
Query 1:
SELECT item,
rate
FROM (
SELECT i.*,
-- Give the rates for each item a unique index assigned in a random order
ROW_NUMBER() OVER ( PARTITION BY item ORDER BY DBMS_RANDOM.VALUE ) AS rn
FROM items i
ORDER BY DBMS_RANDOM.VALUE -- Order all the rows randomly
)
WHERE rn = 1 -- Only get the first row for each item
AND ROWNUM <= 10 -- Only get the first 10 items.
Results:
| ITEM | RATE |
|------|------|
| A | 0 |
| K | 2 |
| G | 4 |
| C | 1 |
| E | 0 |
| H | 0 |
| F | 2 |
| D | 3 |
| L | 4 |
| I | 1 |
I mention table create and query for distinct and top 10 rows;
(Ref SqlFiddle)
create table foo(item varchar(20), rate int);
insert into foo values('a',50);
insert into foo values('a',12);
insert into foo values('a',26);
insert into foo values('b',12);
insert into foo values('b',15);
insert into foo values('b',45);
insert into foo values('b',10);
insert into foo values('c',5);
insert into foo values('c',15);
--Here first get the distinct item and then filter row number wise rows:
select item, rate from (
select item, rate, ROW_NUMBER() over(PARTITION BY item ORDER BY rate desc)
row_num from foo
) where row_num=1;

sql query to find unique records

I am new to sql and need your help to achieve the below , I have tried using group and count functions but I am getting all the rows in the unique group which are duplicated.
Below is my source data.
CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
543,xxx-23,12,12,500
543,xxx-23,12,12,501
543,xxx-23,12,12,510
643,xxx-33,11,17,700
343,xxx-33,11,17,700
766,xxx-74,32,1,300
766,xxx-74,32,1,300
877,xxx-32,12,2,300
877,xxx-32,12,2,300
877,xxx-32,12,2,301
Please note :-the source has multiple combinations of unique records, so when I do the count the unique set is not appearing as count =1
example :- the below data in source have 60 records for each combination
877,xxx-32,12,2,300 -- 60 records
877,xxx-32,12,2,301 -- 60 records
I am trying to get the unique unique records, but the duplicate records are also getting in
Below are the rows which should come up in the unique group. i.e. there will be multiple call_Plans for the same combinations of CDR_ID,TelephoneNo,Call_ID,call_Duration. I want to read records for which there is only one call plan for each unique combination of CDR_ID,TelephoneNo,Call_ID,call_Duration,
CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
643,xxx-33,11,17,700
343,xxx-33,11,17,700
766,xxx-74,32,1,300
Please advice on this.
Thanks and Regards
To do more complex groupings you could also use a Common Table Expression/Derived Table along with windowed functions:
declare #t table(CDR_ID int,TelephoneNo nvarchar(20),Call_ID int,call_Duration int,Call_Plan int);
insert into #t values (543,'xxx-23',12,12,500),(543,'xxx-23',12,12,501),(543,'xxx-23',12,12,510),(643,'xxx-33',11,17,700),(343,'xxx-33',11,17,700),(766,'xxx-74',32,1,300),(766,'xxx-74',32,1,300),(877,'xxx-32',12,2,300),(877,'xxx-32',12,2,300),(877,'xxx-32',12,2,301);
with cte as
(
select CDR_ID
,TelephoneNo
,Call_ID
,call_Duration
,Call_Plan
,count(*) over (partition by CDR_ID,TelephoneNo,Call_ID,call_Duration) as c
from (select distinct * from #t) a
)
select *
from cte
where c = 1;
Output:
+--------+-------------+---------+---------------+-----------+---+
| CDR_ID | TelephoneNo | Call_ID | call_Duration | Call_Plan | c |
+--------+-------------+---------+---------------+-----------+---+
| 343 | xxx-33 | 11 | 17 | 700 | 1 |
| 643 | xxx-33 | 11 | 17 | 700 | 1 |
| 766 | xxx-74 | 32 | 1 | 300 | 1 |
+--------+-------------+---------+---------------+-----------+---+
using not exists()
select distinct *
from t
where not exists (
select 1
from t as i
where i.cdr_id = t.cdr_id
and i.telephoneno = t.telephoneno
and i.call_id = t.call_id
and i.call_duration = t.call_duration
and i.call_plan <> t.call_plan
)
rextester demo: http://rextester.com/RRNNE20636
returns:
+--------+-------------+---------+---------------+-----------+-----+
| cdr_id | TelephoneNo | Call_id | call_Duration | Call_Plan | cnt |
+--------+-------------+---------+---------------+-----------+-----+
| 343 | xxx-33 | 11 | 17 | 700 | 1 |
| 643 | xxx-33 | 11 | 17 | 700 | 1 |
| 766 | xxx-74 | 32 | 1 | 300 | 1 |
+--------+-------------+---------+---------------+-----------+-----+
Basically you should try this:
SELECT A.CDR_ID, A.TelephoneNo, A.Call_ID, A.call_Duration, A.Call_Plan
FROM YOUR_TABLE A
INNER JOIN (SELECT CDR_ID,TelephoneNo,Call_ID,call_Duration
FROM YOUR_TABLE
GROUP BY CDR_ID,TelephoneNo,Call_ID,call_Duration
HAVING COUNT(*)=1
) B ON A.CDR_ID= B.CDR_ID AND A.TelephoneNo=B.TelephoneNo AND A.Call_ID=B.Call_ID AND A.call_Duration=B.call_Duration
You can do a shorter query using Windows Function COUNT(*) OVER ...
Below query will provide you the result
SELECT CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan, COUNT(*)
FROM TABLE_NAME GROUP BY CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
HAVING COUNT(*) < 2;
It gives you with the count as well. If not required you can remove it.
Select *, count(CDR_ID)
from table
group by CDR_ID, TelephoneNo, Call_ID, call_Duration, Call_Plan
having count(CDR_ID) = 1

Only return top n results for each group in GROUPING SETS query

I have a rather complicated query performing some aggregations using GROUPING SETS, it looks roughly like the following:
SELECT
column1,
[... more columns here]
count(*)
FROM table_a
GROUP BY GROUPING SETS (
column1,
[... more columns here]
)
ORDER BY count DESC
This works very well in general, as long as the number of results for each group is reasonably small. But I have some columns in this query that can have a large number of distinct values, which results in a large amount of rows returned by this query.
I'm actually only interested in the top results for each group in the grouping set. But there doesn't seem to be an obvious way to limit the number of results per group in a query using grouping sets, LIMIT doesn't work in this case.
I'm using PostgreSQL 9.6, so I'm not restricted in which newer features I can use here.
So what my query does is something like this:
| column1 | column2 | count |
|---------|---------|-------|
| DE | | 32455 |
| US | | 3445 |
| FR | | 556 |
| GB | | 456 |
| RU | | 76 |
| | 12 | 10234 |
| | 64 | 9805 |
| | 2 | 6043 |
| | 98 | 2356 |
| | 65 | 1023 |
| | 34 | 501 |
What I actually want is something that only returns the top 3 results:
| column1 | column2 | count |
|---------|---------|-------|
| DE | | 32455 |
| US | | 3445 |
| FR | | 556 |
| | 12 | 10234 |
| | 64 | 9805 |
| | 2 | 6043 |
Use row_number and grouping
select a, b, total
from (
select
a, b, total,
row_number() over(
partition by g
order by total desc
) as rn
from (
select a, b, count(*) as total, grouping ((a),(b)) as g
from t
group by grouping sets ((a),(b))
) s
) s
where rn <= 3
Something like this:
WITH T(column1 , column2, cnt) AS
(
SELECT 'kla', 'k', 10
UNION ALL
SELECT 'kle', 'm', 30
UNION ALL
SELECT 'foo', 'k', 10
UNION ALL
SELECT 'bar', 'm', 30
UNION ALL
SELECT 'bar', 'k', 20
UNION ALL
SELECT 'foo', 'm', 15
UNION ALL
SELECT 'foo', 'p', 10
),
tt AS (select column1, column2, COUNT(*) AS cnt from t GROUP BY GROUPING SETS( (column1), (column2)) )
(SELECT column1, NULL as column2, cnt FROM tt WHERE column1 IS NOT NULL ORDER BY cnt desc LIMIT 3)
UNION ALL
(SELECT NULL as column1, column2, cnt FROM tt WHERE column2 IS NOT NULL ORDER BY cnt desc LIMIT 3)

Row-wise maximum in T-SQL [duplicate]

This question already has answers here:
SQL MAX of multiple columns?
(24 answers)
Closed 7 years ago.
I've got a table with a few columns, and for each row I want the maximum:
-- Table:
+----+----+----+----+----+
| ID | C1 | C2 | C3 | C4 |
+----+----+----+----+----+
| 1 | 1 | 2 | 3 | 4 |
| 2 | 11 | 10 | 11 | 9 |
| 3 | 3 | 1 | 4 | 1 |
| 4 | 0 | 2 | 1 | 0 |
| 5 | 2 | 7 | 1 | 8 |
+----+----+----+----+----+
-- Desired result:
+----+---------+
| ID | row_max |
+----+---------+
| 1 | 4 |
| 2 | 11 |
| 3 | 4 |
| 4 | 2 |
| 5 | 8 |
+----+---------+
With two or three columns, I'd just write it out in iif or a CASE statement.
select ID
, iif(C1 > C2, C1, C2) row_max
from table
But with more columns this gets cumbersome fast. Is there a nice way to get this row-wise maximum? In R, this is called a "parallel maximum", so I'd love something like
select ID
, pmax(C1, C2, C3, C4) row_max
from table
What about unpivoting the data to get the result? You've said tsql but not what version of SQL Server. In SQL Server 2005+ you can use CROSS APPLY to convert the columns into rows, then get the max value for each row:
select id, row_max = max(val)
from yourtable
cross apply
(
select c1 union all
select c2 union all
select c3 union all
select c4
) c (val)
group by id
See SQL Fiddle with Demo. Note, this could be abbreviated by using a table value constructor.
This could also be accomplished via the UNPIVOT function in SQL Server:
select id, row_max = max(val)
from yourtable
unpivot
(
val
for col in (C1, C2, C3, C4)
) piv
group by id
See SQL Fiddle with Demo. Both versions gives a result:
| id | row_max |
|----|---------|
| 1 | 4 |
| 2 | 11 |
| 3 | 4 |
| 4 | 2 |
| 5 | 8 |
You can use the following query:
SELECT id, (SELECT MAX(c)
FROM (
SELECT c = C1
UNION ALL
SELECT c = C2
UNION ALL
SELECT c = C3
UNION ALL
SELECT c = C4
) as x(c)) maxC
FROM mytable
SQL Fiddle Demo
One method uses cross apply:
select t.id, m.maxval
from table t cross apply
(select max(val) as maxval
from (values (c1), (c2), (c3), (c4)) v(val)
) m

how to get median for every record?

There's no median function in sql server, so I'm using this wonderful suggestion:
https://stackoverflow.com/a/2026609/117700
this computes the median over an entire dataset, but I need the median per record.
My dataset is:
+-----------+-------------+
| client_id | TimesTested |
+-----------+-------------+
| 214220 | 1 |
| 215425 | 1 |
| 212839 | 4 |
| 215249 | 1 |
| 210498 | 3 |
| 110655 | 1 |
| 110655 | 1 |
| 110655 | 12 |
| 215425 | 4 |
| 100196 | 1 |
| 110032 | 1 |
| 110032 | 1 |
| 101944 | 3 |
| 101232 | 2 |
| 101232 | 1 |
+-----------+-------------+
here's the query I am using:
select client_id,
(
SELECT
(
(SELECT MAX(TimesTested ) FROM
(SELECT TOP 50 PERCENT t.TimesTested
FROM counted3 t
where t.timestested>1
and CLIENT_ID=t.CLIENT_ID
ORDER BY t.TimesTested ) AS BottomHalf)
+
(SELECT MIN(TimesTested ) FROM
(SELECT TOP 50 PERCENT t.TimesTested
FROM counted3 t
where t.timestested>1
and CLIENT_ID=t.CLIENT_ID
ORDER BY t.TimesTested DESC) AS TopHalf)
) / 2 AS Median
) TotalAvgTestFreq
from counted3
group by client_id
but it is giving my funny data:
+-----------+------------------+
| client_id | median???????????|
+-----------+------------------+
| 100007 | 84 |
| 100008 | 84 |
| 100011 | 84 |
| 100014 | 84 |
| 100026 | 84 |
| 100027 | 84 |
| 100028 | 84 |
| 100029 | 84 |
| 100042 | 84 |
| 100043 | 84 |
| 100071 | 84 |
| 100072 | 84 |
| 100074 | 84 |
+-----------+------------------+
i can i get the median for every client_id ?
I am currently trying to use this awesome query from Aaron's site:
select c3.client_id,(
SELECT AVG(1.0 * TimesTested ) median
FROM
(
SELECT o.TimesTested ,
rn = ROW_NUMBER() OVER (ORDER BY o.TimesTested ), c.c
FROM counted3 AS o
CROSS JOIN (SELECT c = COUNT(*) FROM counted3) AS c
where count>1
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2)
) a
from counted3 c3
group by c3.client_id
unfortunately, as Richardthekiwi points out:
it's for a single median whereas this question is about a median
per-partition
i would like to know how i can join it on counted3 to get the median per partition?>
Note: If testFreq is an int or bigint type, you need to CAST it before taking an average, otherwise you'll get integer division, e.g. (2+5)/2 => 3 if 2 and 5 are the median records - e.g. AVG(Cast(testfreq as float)).
select client_id, avg(testfreq) median_testfreq
from
(
select client_id,
testfreq,
rn=row_number() over (partition by CLIENT_ID
order by testfreq),
c=count(testfreq) over (partition by CLIENT_ID)
from tbk
where timestested>1
) g
where rn in (round(c/2,0),c/2+1)
group by client_id;
The median is found either as the central record in an ODD number of rows, or the average of the two central records in an EVEN number of rows. This is handled by the condition rn in (round(c/2,0),c/2+1) which picks either the one or two records required.
try this:
select client_id,
(
SELECT
(
(SELECT MAX(testfreq) FROM
(SELECT TOP 50 PERCENT t.testfreq
FROM counted3 t
where t.timestested>1
and c3.CLIENT_ID=t.CLIENT_ID
ORDER BY t.testfreq) AS BottomHalf)
+
(SELECT MIN(testfreq) FROM
(SELECT TOP 50 PERCENT t.testfreq
FROM counted3 t
where t.timestested>1
and c3.CLIENT_ID=t.CLIENT_ID
ORDER BY t.testfreq DESC) AS TopHalf)
) / 2 AS Median
) TotalAvgTestFreq
from counted3 c3
group by client_id
I added the c3 alias to the outer CLIENT_ID references and the outer table.