Categorize by columns - sql

Have no idea what I'm to name this request so therefore I have not found any answers to it.
Basically a common statement like:
SELECT A,B,C,D,E FROM TABLE
Example result:
A B C D E
1 1 2 3 4
1 2 3 4 5
1 2 7 8 9
2 1 4 5 6
How do I 'categorize' by certain columns (in example A and B) so column values are omitted?
Preferred result:
A B C D E
1 1 2 3 4
1 2 3 4 5
7 8 9
2 1 4 5 6

In your case I guess it would make sense to have the result ordered by A,B?
In that case you could use:
SELECT DECODE(RN,1,A,NULL) AS A,
DECODE(RN,1,B,NULL) AS B,
C,
D,
E
FROM
(SELECT A,
B,
row_number() over (partition BY A,B order by A,B) AS RN,
C,
D,
E
FROM
(SELECT * FROM TEST_TABLE ORDER BY A,B
)
);

You can use LAG analytic function to access previous row values. See below example:
SELECT case
when LAG(a, 1, NULL)
OVER(ORDER BY a, b, c, d, e) = a and LAG(b, 1, NULL)
OVER(ORDER BY a, b, c, d, e) = b then
null
else
a
end new_a,
case
when LAG(a, 1, NULL)
OVER(ORDER BY a, b, c, d, e) = a and LAG(b, 1, NULL)
OVER(ORDER BY a, b, c, d, e) = b then
null
else
b
end new_b,
c,
d,
e
FROM t_table t
ORDER BY a, b, c, d, e;
SQLFiddle.

Related

BigQuery, FIRST_VALUE, and null

In the following example, I would have expected the results
Row a b f0_
1 1 1 3
2 1 2 3
3 1 3 5
4 1 4 5
5 1 5 null
because, in general, aggregates tend to ignore nulls. If FIRST_VALUE doesn't ignore nulls, what value does it have over using LEAD
Example:
select a, b, first_value(c) over (partition by a order by b asc rows BETWEEN 1 following AND UNBOUNDED FOLLOWING)
from
(select 1 as a, 1 as b, 1 as c),
(select 1 as a, 2 as b, null as c),
(select 1 as a, 3 as b, 3 as c),
(select 1 as a, 4 as b, null as c),
(select 1 as a, 5 as b, 5 as c),
gives
Row a b f0_
1 1 1 null
2 1 2 3
3 1 3 null
4 1 4 5
5 1 5 5
I would have expected the results
Below trick gives expected (in your question) result
SELECT
a, b,
MAX(c) OVER (PARTITION BY a ORDER BY grp ASC RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING)
FROM (
SELECT
a, b, c,
COUNT(c) OVER (PARTITION BY a ORDER BY b ASC rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) grp
FROM
(SELECT 1 AS a, 1 AS b, 1 AS c),
(SELECT 1 AS a, 2 AS b, NULL AS c),
(SELECT 1 AS a, 3 AS b, 3 AS c),
(SELECT 1 AS a, 4 AS b, NULL AS c),
(SELECT 1 AS a, 5 AS b, 5 AS c)
)
what value does it have over using LEAD
LEAD has more reach signature - LEAD(<expr>[, <offset>[, <default_value>]]) - so if you just need first value you can short cut it to FIRST_VALUE(<field_name>) - I think this is the major practical difference

Sorting columns within rows in BigQuery

I have table like below
id a b c
1 2 1 3
2 3 2 1
3 16 14 15
4 10 12 13
5 15 16 14
6 10 12 8
I need to "normalize" this table by sorting values in columns a, b, c - row by row and deduping them counting dups
Expected result
a b c dups
1 2 3 2
14 15 16 2
10 12 13 1
8 10 12 1
I do have solution but I don't see how to "scale" it easily to case when I have more than 3 columns to normalize. The first and last column as you can see below is not an issue. Stuff gets messy for columns in the middle when number of columns > 3
select a, b, c, count(1) as dups from (
select a1 as a, if(a != a1 and a != c1, a, if(b != a1 and b != c1, b, c)) as b, c1 as c
from (select a, b, c, least(a, b, c) as a1, greatest(a, b, c) as c1 from table)
) group by a, b, c
Can anyone suggest another approach?
Below example works for 4 columns and can be adjusted to any number of columns by adding extra STRING(x) to CONCAT() and extra line for REGEXP_EXRACT per each extra column.
SELECT a, b, c, d, COUNT(1) AS dups
FROM (
SELECT id,
REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){0}(.*),') AS a,
REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){1}(.*),') AS b,
REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){2}(.*),') AS c,
REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){3}(.*),') AS d
FROM (
SELECT id, GROUP_CONCAT(s) AS s FROM (
SELECT id, s,
INTEGER(s) AS e,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY e) pos
FROM (
SELECT id,
SPLIT(CONCAT(STRING(a),',',STRING(b),',',STRING(c),',',STRING(d))) AS s
FROM table
) ORDER BY id, pos
) GROUP BY id
)
) GROUP BY a, b, c, d

Sql grouping trouble

I got in trouble with a sql script as this:
SELECT A, B, C,
CASE WHEN D < 21 THEN '0<20'
WHEN D < 51 THEN '21-50'
WHEN D < 101 THEN '51-100'
ELSE '>101' END AS E
COUNT(*) FROM TABLE_X
GROUP BY A, B, C, D;
Resultset like this;
A B C D count(*)
CAR 1 2 21-50 1
CAR 1 2 21-50 1
BIKE 1 3 0-20 1
At first row is CAR has a D=25.So it is between 21-50.
And then second row is CAR has D=32.So it is between 21-50 too.
Shortly I want to resultset like above:
A B C D count(*)
CAR 1 2 21-50 2
BIKE 1 3 0-20 1
So CAR must be 2 by grouping as using D column.
How can I assure this ?
The problem here is that you're grouping by D first and only then applying the case logic. If you add D to the select list, you'd see results that probably look like this:
A B C D E count(*)
CAR 1 2 20 21-50 1
CAR 1 2 30 21-50 1
BIKE 1 3 7 0-20 1
In order to avoid this, you could apply the case first and only then the group by clause, by using a subquery:
SELECT A, B, C, E, COUNT(*)
FROM (SELECT A, B, C,
CASE WHEN D < 21 THEN '0<20'
WHEN D < 51 THEN '21-50'
WHEN D < 101 THEN '51-100'
ELSE '>101' END AS E
FROM TABLE_X) t
GROUP BY A, B, C, E;
The below query should work. Basically, I am just pulling the count(1) function and hence the group by clause to an outer query while leaving all the rest functionality to the inner query.
SELECT A,B,C,E, count(1) from
(
SELECT A, B, C,
CASE WHEN D < 21 THEN '0<20'
WHEN D < 51 THEN '21-50'
WHEN D < 101 THEN '51-100'
ELSE '>101' END AS E
FROM TABLE_X
)
GROUP BY A, B, C, E;
Group by the calculation for D, not D itself, like this:
SELECT A, B, C,
CASE WHEN D < 21 THEN ' 0-20'
WHEN D < 51 THEN '21-50'
WHEN D < 101 THEN '51-100'
ELSE '>101' END AS E
,COUNT(*) as "Coun"
FROM TABLE_X
GROUP BY A, B, C,
CASE WHEN D < 21 THEN ' 0-20'
WHEN D < 51 THEN '21-50'
WHEN D < 101 THEN '51-100'
ELSE '>101' END
yields this
A B C E Count
---- ----------- ----------- ------ -----------
BIKE 1 3 0-20 1
CAR 1 2 21-50 2
when run in SQL Server 2012 on a table loaded with these values:
values
('CAR', 1,2,22)
,('CAR', 1,2,23)
,('BIKE',1,3,2)
You can also try below query:
SELECT A, B, C,
CASE WHEN D < 21 THEN '0<20'
WHEN D < 51 THEN '21-50'
WHEN D < 101 THEN '51-100'
ELSE '>101' END AS E
COUNT(*) FROM TABLE_X
GROUP BY 1,2,3,4;
We use the SQL GROUP BY clause to group by relative position in the result set, where the first field in the result set is 1. The next field is 2, and so on.

SQLSERVER group by (aggregate column based on other column)

I have a table which has 3 columns A, B, C
I want to do a query like this:
select A, Max(B), ( C in the row having max B ) from Table group by A.
is there a way to do such a query?
Test Data:
A B C
2 5 3
2 6 1
4 5 1
4 7 9
6 5 0
the expected result would be:
2 6 1
4 7 9
6 5 0
;WITH CTE AS
(
SELECT A,
B,
C,
RN = ROW_NUMBER() OVER(PARTITION BY A ORDER BY B DESC)
FROM YourTable
)
SELECT A, B, C
FROM CTE
WHERE RN = 1
Try this
select t.*
from table t
join (Select A,max(b) B from table group by A) c
on c.a=t.a
and c.b=a.b

SQL grouping

I have a table with the following columns:
A B C
---------
1 10 X
1 11 X
2 15 X
3 20 Y
4 15 Y
4 20 Y
I want to group the data based on the B and C columns and count the distinct values of the A column. But if there are two ore more rows where the value on the A column is the same I want to get the maximum value from the B column.
If I do a simple group by the result would be:
B C Count
--------------
10 X 1
11 X 1
15 X 1
20 Y 2
15 Y 1
What I want is this result:
B C Count
--------------
11 X 1
15 X 1
20 Y 2
Is there any query that can return this result. Server is SQL Server 2005.
I like to work in steps: first get rid of duplicate A records, then group. Not the most efficient, but it works on your example.
with t1 as (
select A, max(B) as B, C
from YourTable
group by A, C
)
select count(A) as CountA, B, C
from t1
group by B, C
I have actually tested this:
SELECT
MAX( B ) AS B,
C,
Count
FROM
(
SELECT
B, C, COUNT(DISTINCT A) AS Count
FROM
t
GROUP BY
B, C
) X
GROUP BY C, Count
and it gives me:
B C Count
---- ---- --------
15 X 1
15 y 1
20 y 2
WITH cteA AS
(
SELECT
A, C,
MAX(B) OVER(PARTITION BY A, C) [Max]
FROM T1
)
SELECT
[Max] AS B, C,
COUNT(DISTINCT A) AS [Count]
FROM cteA
GROUP BY C, [Max];
Check this out. This should work in Oracle, although I haven't tested it;
select count(a), BB, CC from
(
select a, max(B) BB, Max(C) CC
from yourtable
group by a
)
group by BB,CC