Row-wise maximum in T-SQL [duplicate] - sql

This question already has answers here:
SQL MAX of multiple columns?
(24 answers)
Closed 7 years ago.
I've got a table with a few columns, and for each row I want the maximum:
-- Table:
+----+----+----+----+----+
| ID | C1 | C2 | C3 | C4 |
+----+----+----+----+----+
| 1 | 1 | 2 | 3 | 4 |
| 2 | 11 | 10 | 11 | 9 |
| 3 | 3 | 1 | 4 | 1 |
| 4 | 0 | 2 | 1 | 0 |
| 5 | 2 | 7 | 1 | 8 |
+----+----+----+----+----+
-- Desired result:
+----+---------+
| ID | row_max |
+----+---------+
| 1 | 4 |
| 2 | 11 |
| 3 | 4 |
| 4 | 2 |
| 5 | 8 |
+----+---------+
With two or three columns, I'd just write it out in iif or a CASE statement.
select ID
, iif(C1 > C2, C1, C2) row_max
from table
But with more columns this gets cumbersome fast. Is there a nice way to get this row-wise maximum? In R, this is called a "parallel maximum", so I'd love something like
select ID
, pmax(C1, C2, C3, C4) row_max
from table

What about unpivoting the data to get the result? You've said tsql but not what version of SQL Server. In SQL Server 2005+ you can use CROSS APPLY to convert the columns into rows, then get the max value for each row:
select id, row_max = max(val)
from yourtable
cross apply
(
select c1 union all
select c2 union all
select c3 union all
select c4
) c (val)
group by id
See SQL Fiddle with Demo. Note, this could be abbreviated by using a table value constructor.
This could also be accomplished via the UNPIVOT function in SQL Server:
select id, row_max = max(val)
from yourtable
unpivot
(
val
for col in (C1, C2, C3, C4)
) piv
group by id
See SQL Fiddle with Demo. Both versions gives a result:
| id | row_max |
|----|---------|
| 1 | 4 |
| 2 | 11 |
| 3 | 4 |
| 4 | 2 |
| 5 | 8 |

You can use the following query:
SELECT id, (SELECT MAX(c)
FROM (
SELECT c = C1
UNION ALL
SELECT c = C2
UNION ALL
SELECT c = C3
UNION ALL
SELECT c = C4
) as x(c)) maxC
FROM mytable
SQL Fiddle Demo

One method uses cross apply:
select t.id, m.maxval
from table t cross apply
(select max(val) as maxval
from (values (c1), (c2), (c3), (c4)) v(val)
) m

Related

SQL Query - Check for Two Distinct Values

Given the below data set I want to run a query to highlight any 'pairs' that do not consist of a 'left' and 'right'.
+---------+-----------+---------------+----------------------+
| Pair_Id | Pair_Name | Individual_Id | Individual_Direction |
+---------+-----------+---------------+----------------------+
| 1 | A | A1 | Left |
| 1 | A | A2 | Right |
| 2 | B | B1 | Right |
| 2 | B | B2 | Left |
| 3 | C | C1 | Left |
| 3 | C | C2 | Left |
| 4 | D | D1 | Right |
| 4 | D | D2 | Left |
| 5 | E | E1 | Left |
| 5 | E | E2 | Right |
+---------+-----------+---------------+----------------------+
In this instance Pair 3 'C' has two lefts. Therefore, I would look to display the following:
+---------+-----------+---------------+----------------------+
| Pair_Id | Pair_Name | Individual_Id | Individual_Direction |
+---------+-----------+---------------+----------------------+
| 3 | C | C1 | Left |
| 3 | C | C2 | Left |
+---------+-----------+---------------+----------------------+
You can simply use not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_Direction <> t.Individual_Direction
) ;
With an index on (pair_id, Individual_Direction), this should not only be the most concise solution but also the fastest.
If you want to be sure that there are pairs (the above returns singletons):
select t.*
from t
where not exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_Direction <> t.Individual_Direction
) and
exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_ID <> t.Individual_ID
);
You can also do this using window functions:
select t.*
from (select t.*,
count(*) over (partition by pair_id) as cnt,
min(status) over (partition by pair_id) as min_status,
max(status) over (partition by pair_id) as max_status
from t
) t
where cnt > 1 and min_status <> max_status;
One option uses aggregation:
WITH cte AS (
SELECT Pair_Name
FROM yourTable
WHERE Individual_Direction IN ('Left', 'Right')
GROUP BY Pair_Name
HAVING MIN(Individual_Direction) = MAX(Individual_Direction)
)
SELECT *
FROM yourTable
WHERE Pair_Name IN (SELECT Pair_Name FROM cte);
The HAVING clause used above asserts that a matching pair has both a minimum and maximum direction which are the same. This implies that such a pair only has one direction.
As is the case with Gordon's answer, an index on (Pair_Name, Individual_Direction) might help performance:
CREATE INDEX idx ON yourTable (Pair_Name, Individual_Direction);
There should be an elegant way of using window function than what I wrote:
WITH ranked AS
(
SELECT *, RANK() OVER(ORDER BY Pair_Id, Pair_Name, Individual_Direction) AS r
FROM pairs
),
counted AS
(
SELECT Pair_Id, Pair_Name, Individual_Direction,r, COUNT(r) as times FROM ranked
GROUP BY Pair_Id, Pair_Name, Individual_Direction, r
HAVING COUNT(r) > 1
)
SELECT ranked.Pair_Id, ranked.Pair_Name, ranked.Individual_Id, ranked.Individual_Direction FROM ranked
RIGHT JOIN counted
ON ranked.Pair_Id=counted.Pair_Id
AND ranked.Pair_Name=counted.Pair_Name
AND ranked.Individual_Direction=counted.Individual_Direction

Impala - Does impala allow multi GROUP_CONCAT in one query

For example, I have a table below
+-----------+-------+------------+
| Id | a| b|
+-----------+-------+------------+
| 1 | 6 | 20 |
| 1 | 4 | 55 |
| 1 | 9 | 56 |
| 1 | 2 | 67 |
| 1 | 7 | 80 |
| 1 | 5 | 66 |
| 1 | 3 | 33 |
| 1 | 8 | 34 |
| 1 | 1 | 52 |
I want the output would be like below by using Impala
+-----------+-------------------+-----------------------------+
| Id | a | b |
+-----------+-------------------+-----------------------------+
| 1 | 6,4,9,2,7,5,3,8,1 | 20,55,56,67,80,66,33,34,52 |
+-----------+-------------------+-----------------------------+
In Impala, I have used
SELECT Id,
group_concat(DISTINCT a) AS a,
group_concat(DISTINCT b) AS b
FROM table GROUP BY Id
It will always get Syntax error. Just wondering is that we are not allowed to use multi group_concat for one query in Impala? or not allow to use multi Distinct for one query?
From the documentation for GROUP_CONCAT:
You cannot apply the DISTINCT operator to the argument of this function.
But, as workaround, we can use two separate subqueries to find the distinct values:
WITH cte1 AS (
SELECT Id, GROUP_CONCAT(a) AS a
FROM (SELECT DISTINCT Id, a FROM yourTable) t
GROUP BY Id
),
cte2 AS (
SELECT Id, GROUP_CONCAT(b) AS b
FROM (SELECT DISTINCT Id, b FROM yourTable) t
GROUP BY Id
)
SELECT
t1.Id,
t1.a,
t2.b
FROM cte1 t1
INNER JOIN cte2 t2
ON t1.Id = t2.Id;

oracle transposing with analytical function [duplicate]

This question already has answers here:
Oracle10g SQL pivot
(2 answers)
Closed 5 years ago.
I am transposing key value pairs from a table and facing an issue.
I am using Oracle 12C database.
Test data looks like this. table is tab1
+---------------------------+
| Name | VAL | ID | grp_id|
+---------------------------+
| a | 3 | 1 | 1 |
| b | 5 | 2 | 1 |
| c | 8 | 3 | 1 |
| c | 9 | 4 | 2 |
+---------------------------+
My expected result is
+-------------------------+
| grp_id| a | b | c |
+-------------------------+
| 1 | 3 | 5 | 8 |
| 2 | null | null | 9 |
+-------------------------+
What I did so far is
with t as(
select row_number() over (partition by grp_id order by grp_id) rn,
name,
grp_id,
lead(val,0) over (partition by grp_id order by grp_id) as a,
lead(val,1) over (partition by grp_id order by grp_id) as b,
lead(val,2) over (partition by grp_id order by grp_id) as c
from tab1 where grp_id in (1,2) and name in ('a', 'b','c')
)
select grp_id,a,b,c from t where rn=1;
When data is consistent and for all grp_id-s key value pairs are the same then this query works fine, But in case when some keys are missing for one grp_id then I get a result like the following which is wrong and not what I expect
+----------------------------+
| grp_id| a | b | c |
+----------------------------+
| 1 | 3 | 5 | 8 |
| 2 | 9 | null | null |
+----------------------------+
How can I improve the query to work correctly? And I want to avoid using pivot
I would do this using conditional aggregation:
select grp_id,
max(case when name = 'a' then val end) as a,
max(case when name = 'b' then val end) as b,
max(case when name = 'c' then val end) as c
from tab1
group by grp_id;
grp_id is already defined so I see no need for analytic functions.

Grouping by similar values in multiple columns

I have a table of entities with an id, and a category (few different values with NULL allowed) from 3 different years (category can be different from 1 year to another), in 'wide' table format:
| ID | CATEG_Y1 | CATEG_Y2 | CATEG_Y3 |
+-----+----------+----------+----------+
| 1 | NULL | B | C |
| 2 | A | A | C |
| 3 | B | A | NULL |
| 4 | A | C | B |
| ... | ... | ... | ... |
I would like to simply count the number of entities by category, grouped by category, independently for the year:
+-------+----+----+----+
| CATEG | Y1 | Y2 | Y3 |
+-------+----+----+----+
| A | 6 | 4 | 5 | <- 6 entities w/ categ_y1, 4 w/ categ_y2, 5 w/ categ_y3
| B | 3 | 1 | 10 |
| C | 8 | 4 | 5 |
| NULL | 3 | 3 | 3 |
+-------+----+----+----+
I guess I could do it by grouping values one column after the other and UNION ALL the results, but I was wondering if there was a more rapid & convenient way, and if it can be generalized if I have more columns/years to manage (e.g. 20-30 different values)
A bit clumsy, but probably someone has a better idea. Query first collects all diferent categories (the union-query in the from part), and then counts the occurences with dedicated subqueries in the select part. One could omit the union-part if there is a table already defining the available categories (I suppose categ_y1 is a foreign key to such a primary category table). Hope there are not to many typos:
select categories.cat,
(select count(categ_y1) from table ty1 where select categories.cat = categ_y1) as y1,
(select count(categ_y2) from table ty2 where select categories.cat = categ_y2) as y2,
(select count(categ_y3) from table ty3 where select categories.cat = categ_y3) as y3
from ( select categ_y1 as cat from table t1
union select categ_y2 as cat from table t2
union select categ_y3 as cat from table t3) categories
Use jsonb functions to transpose the data (from the question) to this format:
select categ, jsonb_object_agg(key, count) as jdata
from (
select value as categ, key, count(*)
from my_table t,
jsonb_each_text(to_jsonb(t)- 'id')
group by 1, 2
) s
group by 1
order by 1;
categ | jdata
-------+-----------------------------------------------
A | {"categ_y1": 2, "categ_y2": 2}
B | {"categ_y1": 1, "categ_y2": 1, "categ_y3": 1}
C | {"categ_y2": 1, "categ_y3": 2}
| {"categ_y1": 1, "categ_y3": 1}
(4 rows)
For a known (static) number of years you can easily unpack the jsonb column:
select categ, jdata->'categ_y1' as y1, jdata->'categ_y2' as y2, jdata->'categ_y3' as y3
from (
select categ, jsonb_object_agg(key, count) as jdata
from (
select value as categ, key, count(*)
from my_table t,
jsonb_each_text(to_jsonb(t)- 'id')
group by 1, 2
) s
group by 1
) s
order by 1;
categ | y1 | y2 | y3
-------+----+----+----
A | 2 | 2 |
B | 1 | 1 | 1
C | | 1 | 2
| 1 | | 1
(4 rows)
To get fully dynamic solution you can use the function create_jsonb_flat_view() described in Flatten aggregated key/value pairs from a JSONB field.
I would do this as using union all followed by aggregation:
select categ, sum(categ_y1) as y1, sum(categ_y2) as y2,
sum(categ_y3) as y3
from ((select categ_y1, 1 as categ_y1, 0 as categ_y2, 0 as categ_y3
from t
) union all
(select categ_y2, 0 as categ_y1, 1 as categ_y2, 0 as categ_y3
from t
) union all
(select categ_y3, 0 as categ_y1, 0 as categ_y2, 1 as categ_y3
from t
)
)
group by categ ;

Add cumulative total sum over many columns in Postgres

My table is like this:
+----+--------+--------+--------+---------+
| id | type | c1 | c2 | c3 |
+----+--------+--------+--------+---------+
| a | 0 | 10 | 10 | 10 |
| a | 0 | 0 | 10 | |
| a | 0 | 50 | 10 | |
| c | 0 | | 10 | 20 |
| c | 0 | | 10 | |
+----+--------+--------+--------+---------+
I need to the output like this:
+----+---------+--------+--------+---------+
| id | type | c1 | c2 | c3 |
+----+---------+--------+--------+---------+
| a | 0 | 10 | 10 | 10 |
| a | 0 | 0 | 10 | |
| a | 0 | 50 | 10 | |
| c | 0 | | 10 | 20 |
| c | 0 | | 10 | |
+----+---------+--------+--------+---------+
|total | 0 | 60 | 50 | 30 |
+------------------------------------------+
|cumulative| 0 | 60 | 110 | 140 |
+------------------------------------------+
My query so far:
WITH res_1 AS
(SELECT id,c1,c3,c3 FROM cloud10k.dash_reportcard),
res_2 AS
(SELECT 'TOTAL'::VARCHAR, SUM(c1),SUM(c2),SUM(c3) FROM cloud10k.dash_reportcard)
SELECT * FROM res_1
UNION ALL
SELECT * FROM res_2;
It produces a sum total per column.
How can I add the cumulative total sum?
Note: the demo has 3 data columns, my actual table has more than 250.
It would be very tedious and increasingly inefficient to list 250 columns over and over for the sum of columns - an O(n²) problem in disguise. Effectively, you want the equivalent of a window-function to calculate the running total over columns instead of rows.
You can:
Transform the row to a set ("unpivot").
Run the window aggregate function sum() OVER (...).
Transform the set back to a row ("pivot").
WITH total AS (
SELECT 'total'::text AS id, 0 AS type
, sum(c1) AS s1, sum(c2) AS s2, sum(c3) AS s3 -- more ...
FROM cloud10k.dash_reportcard
)
TABLE cloud10k.dash_reportcard
UNION ALL
TABLE total
UNION ALL
SELECT 'cumulative', 0, a[1], a[2], a[3] -- more ...
FROM (
SELECT ARRAY(
SELECT sum(v.s) OVER (ORDER BY rn)
FROM total
, LATERAL (VALUES (1, s1), (2, s2), (3, s3)) v(rn, s) -- more ...
)::int[] AS a
) sub;
See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
SELECT DISTINCT on multiple columns
The last step could also be done with crosstab() from the tablefunc module, but for this simple case it's simpler to just aggregate into an array and break out elements to a separate columns in the outer SELECT.
Alternative for Postgres 9.1
Same as above, but:
...
UNION ALL
SELECT 'cumulative'::text, 0, a[1], a[2], a[3] -- more ...
FROM (
SELECT ARRAY(
SELECT sum(v.s) OVER (ORDER BY rn)
FROM (
SELECT row_number() OVER (), s
FROM unnest((SELECT ARRAY[s1, s2, s3] FROM total)) s -- more ...
) v(rn, s)
)::int[] AS a
) sub;
Consider:
PostgreSQL unnest() with element number
db<>fiddle here - demonstrating both
Old sqlfiddle
Just add another CTE to get cumulative row:
WITH res_1 AS
(SELECT id,c1,c2,c3
FROM dash_reportcard),
res_2 AS
(SELECT 'TOTAL'::VARCHAR, SUM(c1) AS sumC1,
SUM(c2) AS sumC2, SUM(c3) AS sumC3
FROM dash_reportcard),
res_3 AS
(SELECT 'CUMULATIVE'::VARCHAR, sumC1,
sumC2+sumC1, sumC1+sumC2+sumC3
FROM res_2)
SELECT * FROM res_1
UNION ALL
SELECT * FROM res_2
UNION ALL
SELECT * FROM res_3;
Demo here
WITH total AS (
SELECT 'TOTAL'::VARCHAR, SUM(c1) AS sumc1, SUM(c2) AS sumc2, SUM(c3) AS sumc3
FROM cloud10k.dash_reportcard
), cum_total AS (
SELECT 'CUMULATIVE'::varchar, sumc1, sumc1+sumc2, sumc1+sumc2+sumc3
FROM total
)
SELECT id, c1, c2, c3 FROM cloud10k.dash_reportcard
UNION ALL
SELECT * FROM total
UNION ALL
SELECT * FROM cum_total;