Creating column for every group in group by - sql

Suppose I have a table T which has entries as follows:
id | type | value |
-------------------------
1 | A | 7
1 | B | 8
2 | A | 9
2 | B | 10
3 | A | 11
3 | B | 12
1 | C | 13
2 | C | 14
For each type, I want a different column. Since the number of types is exhaustive, I would like all different types to be enumerated and a corresponding column for each. I wanted to make id a primary key for the table.
So, the desired output is something like:
id | A's value | B's value | C's value
------------------------------------------
1 | 7 | 8 | 13
2 | 9 | 10 | 14
3 | 11 | 12 | NULL
Please note that this is a simplified version. The actual table T is derived from a much bigger table using group by. And for each group, I would like a separate column. Is that even possible?

Use conditional aggregation:
select id,
max(case when type = 'A' then value end) as a_value,
max(case when type = 'B' then value end) as b_value,
max(case when type = 'C' then value end) as c_value
from t
group by id;

I'd recommend looking into the PIVOT function:
https://docs.snowflake.com/en/sql-reference/constructs/pivot.html
The main blocker with this function though is the list of values for the pivot_column needs to be
pre-determined. To do this, I normally use the LISTAGG function:
https://docs.snowflake.com/en/sql-reference/functions/listagg.html
I've included a query below to show you how to build that string,
and doing this together in a script like
Python or even a Stored Procedure should be fairly straightforward (build the pivot_column, build the aggregate/pivot command, execute the aggregate/pivot command).
I hope this helps...Rich
CREATE OR REPLACE TABLE monthly_sales(
empid INT,
amount INT,
month TEXT)
AS SELECT * FROM VALUES
(1, 10000, 'JAN'),
(1, 400, 'JAN'),
(2, 4500, 'JAN'),
(2, 35000, 'JAN'),
(1, 5000, 'FEB'),
(1, 3000, 'FEB'),
(2, 200, 'FEB'),
(2, 90500, 'FEB'),
(1, 6000, 'MAR'),
(1, 5000, 'MAR'),
(2, 2500, 'MAR'),
(2, 9500, 'MAR'),
(1, 8000, 'APR'),
(1, 10000, 'APR'),
(2, 800, 'APR'),
(2, 4500, 'APR');
SELECT *
FROM monthly_sales
PIVOT(SUM(amount)
FOR month IN ('JAN', 'FEB', 'MAR', 'APR'))
AS p
ORDER BY empid;
SELECT LISTAGG( DISTINCT ''''||month||'''', ', ' )
FROM monthly_sales;

Related

SQL: detecting consecutive blocks of sequential rows with same key

My problem boils down to the following. I have a table with some natural sequencing, and in it I have a key value which may repeat over time. I want to find the blocks where the key is the same, then changes, and then comes back to being the same. Example:
A
A
B
B
B
C
C
A
A
C
C
Here I want the result to be
A, 1-2
B, 3-5
C, 6-7
A, 8-9
C, 10-11
so I can't use that key value A, B, C to group by, because the same key can appear multiple times, I just want to squeeze out repetitive occurrences that are uninterrupted.
Needless to say, I want the simplest SQL one can come up with. It would use OLAP window functions.
I am usually pretty good with complicated SQL, but with sequences I am not so good. I will work on this a little bit myself, of course, and annex some ideas below this question in a subsequent edit.
Let's begin by defining the table for our discussion:
CREATE TABLE Seq (
num integer,
key char
);
UPDATE 1: doing some research I find a similar question here: How to find consecutive rows based on the value of a column? but both the question and the answers are wrapped up into a lot of extra stuff and confusing.
UPDATE 2: I already got one answer, thanks. Inspecting it now. Here is my test I am typing into PostgreSQL even as we speak:
CREATE TABLE Seq ( num int, key char );
INSERT INTO Seq VALUES
(1, 'A'), (2, 'A'),
(2, 'B'), (3, 'B'), (5, 'B'),
(6, 'C'), (7, 'C'),
(8, 'A'), (9, 'A'),
(10, 'C'), (11, 'C');
UPDATE 3: First contender of a solution is this
SELECT key, min(num), max(num)
FROM (
SELECT seq.*,
row_number() over (partition by key order by num) as seqnum
FROM Seq
) s
GROUP BY key, (num - seqnum)
ORDER BY min;
yields:
key | min | max
-----+-----+-----
A | 1 | 2
B | 2 | 3
B | 5 | 5
C | 6 | 7
A | 8 | 9
C | 10 | 11
(6 rows)
for some reason B repeats twice, I see why, I made a "mistake" in my test data, skipping sequence num 4 and going straight from 3 to 5.
This mistake is fortunate, because it allows me to point out that while in this example the sequence number is discrete, I am intending the sequence to arise from some continuous domain (e.g., time).
There is another "mistake" I made, in that I have num 2 repeated. Is that allowable? Probably not. So cleaning up the example, removing duplicate but leaving the gap:
DROP TABLE Seq;
CREATE TABLE Seq ( num int, key char );
INSERT INTO Seq VALUES
(1, 'A'), (2, 'A'),
(3, 'B'), (4, 'B'), (6, 'B'),
(7, 'C'), (8, 'C'),
(9, 'A'), (10, 'A'),
(11, 'C'), (12, 'C');
this still leaves us with the duplicate B block:
key | min | max
-----+-----+-----
A | 1 | 2
B | 3 | 4
B | 6 | 6
C | 7 | 8
A | 9 | 10
C | 11 | 12
(6 rows)
Now going with that first intuition by Gordon Linoff and trying to understand it and add to it:
SELECT s.*, num - seqnum AS diff
FROM (
SELECT seq.*,
row_number() over (partition by key order by num) as seqnum
FROM Seq
) s
ORDER BY num;
here is the num - seqnum trick before grouping:
num | key | seqnum | diff
-----+-----+--------+------
1 | A | 1 | 0
2 | A | 2 | 0
3 | B | 1 | 2
4 | B | 2 | 2
6 | B | 3 | 3
7 | C | 1 | 6
8 | C | 2 | 6
9 | A | 3 | 6
10 | A | 4 | 6
11 | C | 3 | 8
12 | C | 4 | 8
(11 rows)
I doubt that this is the answer quite yet.
Because of gaps you can't use num directly as Gordon's solution suggested. Row_number it too.
select key, min(num), max(num)
from (select seq.*,
row_number() over (order by num) as rn,
row_number() over (partition by key order by num) as seqnum
from seq
) s
group by key, (rn - seqnum)
order by min(num);
This answers the original problem.
You can enumerate the rows for each key and subtract that from num. Voila! This is number is constant when the key is constant on adjacent rows:
select key, min(num), max(num)
from (select seq.*,
row_number() over (partition by key order by num) as seqnum
from seq
) s
group by key, (num - seqnum);
Here is a db<>fiddle showing that it works.

Roll up multiple rows into one when joining in SQL Server

I have a table, Foo
ID | Name
-----------
1 | ONE
2 | TWO
3 | THREE
And another, Bar:
ID | FooID | Value
------------------
1 | 1 | Alpha
2 | 1 | Alpha
3 | 1 | Alpha
4 | 2 | Beta
5 | 2 | Gamma
6 | 2 | Beta
7 | 3 | Delta
8 | 3 | Delta
9 | 3 | Delta
I would like a query that joins these tables, returning one row for each row in Foo, rolling up the 'value' column from Bar. I can get back the first Bar.Value for each FooID:
SELECT * FROM Foo f OUTER APPLY
(
SELECT TOP 1 Value FROM Bar WHERE FooId = f.ID
) AS b
Giving:
ID | Name | Value
---------------------
1 | ONE | Alpha
2 | TWO | Beta
3 | THREE | Delta
But that's not what I want, and I haven't been able to find a variant that will bring back a rolled up value, that is the single Bar.Value if it is the same for each corresponding Foo, or a static string something like '(multiple)' if not:
ID | Name | Value
---------------------
1 | ONE | Alpha
2 | TWO | (multiple)
3 | THREE | Delta
I have found some solutions that would bring back concatenated values (albeit not very elegant) 'Alpha' Alpha, Alpha', 'Beta, Gamma, Beta' &c, but that's not what I want either.
One method, using a a CASE expression and assuming that [Value] cannot have a value of NULL:
WITH Foo AS
(SELECT *
FROM (VALUES (1, 'ONE'),
(2, 'TWO'),
(3, 'THREE')) V (ID, [Name])),
Bar AS
(SELECT *
FROM (VALUES (1, 1, 'Alpha'),
(2, 1, 'Alpha'),
(3, 1, 'Alpha'),
(4, 2, 'Beta'),
(5, 2, 'Gamma'),
(6, 2, 'Beta'),
(7, 3, 'Delta'),
(8, 3, 'Delta'),
(9, 3, 'Delta')) V (ID, FooID, [Value]))
SELECT F.ID,
F.[Name],
CASE COUNT(DISTINCT B.[Value]) WHEN 1 THEN MAX(B.Value) ELSE '(Multiple)' END AS [Value]
FROM Foo F
JOIN Bar B ON F.ID = B.FooID
GROUP BY F.ID,
F.[Name];
You can also try below:
SELECT F.ID, F.Name, (case when B.Value like '%,%' then '(Multiple)' else B.Value end) as Value
FROM Foo F
outer apply
(
select SUBSTRING((
SELECT distinct ', '+ isnull(Value,',') FROM Bar WHERE FooId = F.ID
FOR XML PATH('')
), 2 , 9999) as Value
) as B

SELECT check the colum of the max row

Here my row with my first select:
SELECT
user.id, analytic_youtube_demographic.age,
analytic_youtube_demographic.percent
FROM
`user`
INNER JOIN
analytic ON analytic.user_id = user.id
INNER JOIN
analytic_youtube_demographic ON analytic_youtube_demographic.analytic_id = analytic.id
Result:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |13-17| 19,6 |
| 1 |18-24| 38.4 |
| 1 |25-34| 22.5 |
| 1 |35-44| 11.5 |
| 1 |45-54| 5.3 |
| 1 |55-64| 1.6 |
| 1 |65+ | 1.2 |
| 2 |13-17| 10 |
| 2 |18-24| 10 |
| 2 |25-34| 25 |
| 2 |35-44| 5 |
| 2 |45-54| 25 |
| 2 |55-64| 5 |
| 1 |65+ | 20 |
---------------------------
The max value by user_id:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |18-24| 38.4 |
| 2 |45-54| 25 |
| 2 |25-34| 25 |
---------------------------
And I need to filter Age in ['25-34', '65+']
I must have at the end :
-----------
| id |
|----------
| 2 |
-----------
Thanks a lot for your help.
Have tried to use MAX(analytic_youtube_demographic.percent). But I don't know how to filter with the age too.
Thanks a lot for your help.
You can use the rank() function to identify the largest percentage values within each user's data set, and then a simple WHERE clause to get those entries that are both of the highest rank and belong to one of the specific demographics you're interested in. Since you can't use windowed functions like rank() in a WHERE clause, this is a two-step process with a subquery or a CTE. Something like this ought to do it:
-- Sample data from the question:
create table [user] (id bigint);
insert [user] values
(1), (2);
create table analytic (id bigint, [user_id] bigint);
insert analytic values
(1, 1), (2, 2);
create table analytic_youtube_demographic (analytic_id bigint, age varchar(32), [percent] decimal(5, 2));
insert analytic_youtube_demographic values
(1, '13-17', 19.6),
(1, '18-24', 38.4),
(1, '25-34', 22.5),
(1, '35-44', 11.5),
(1, '45-54', 5.3),
(1, '55-64', 1.6),
(1, '65+', 1.2),
(2, '13-17', 10),
(2, '18-24', 10),
(2, '25-34', 25),
(2, '35-44', 5),
(2, '45-54', 25),
(2, '55-64', 5),
(2, '65+', 20);
-- First, within the set of records for each user.id, use the rank() function to
-- identify the demographics with the highest percentage.
with RankedDataCTE as
(
select
[user].id,
youtube.age,
youtube.[percent],
[rank] = rank() over (partition by [user].id order by youtube.[percent] desc)
from
[user]
inner join analytic on analytic.[user_id] = [user].id
inner join analytic_youtube_demographic youtube on youtube.analytic_id = analytic.id
)
-- Now select only those records that are (a) of the highest rank within their
-- user.id and (b) either the '25-34' or the '65+' age group.
select
id,
age,
[percent]
from
RankedDataCTE
where
[rank] = 1 and
age in ('25-34', '65+');

SQL combine two records based on one value

Update - work done in SQL-92
I work in SQL reporting tool and trying to combine two records into one. Let's say there as some duplicates were time got split into two values and hence the duplication. Basically any values that are not duplicated should be added
wo---text---time---value
1----test---5------1
1----test---2------a
3----aaaa---3------1
4----bbbb---4------2
Results
wo---text---time----value
1----test---7--------1a
3----aaaa---3--------1
4----bbbb---4--------2
I tried:
SELECT ....
FROM ....
GROUP BY wo SUM (time) but that did not even work.
Set-up:
create table so48345659a
(
wo integer,
text varchar(4),
time integer,
value varchar(2)
);
create table so48345659b
(
wo integer,
text varchar(4),
time integer,
value varchar(2)
);
insert into so48345659a (wo, text, time, value) values (1, 'test', 5, '1');
insert into so48345659a (wo, text, time, value) values (1, 'test', 2, 'a');
insert into so48345659a (wo, text, time, value) values (3, 'aaaa', 3, '1');
insert into so48345659a (wo, text, time, value) values (4, 'bbbb', 4, '2');
insert into so48345659b (wo, text, time, value) values (1, 'test', 7, '1a');
insert into so48345659b (wo, text, time, value) values (3, 'aaaa', 3, '1');
insert into so48345659b (wo, text, time, value) values (4, 'bbbb', 4, '2');
Union, by default removes duplicates
select wo, text, time, value from so48345659a
union
select wo, text, time, value from so48345659b;
Result:
wo | text | time | value
----+------+------+-------
1 | test | 7 | 1a
1 | test | 2 | a
3 | aaaa | 3 | 1
1 | test | 5 | 1
4 | bbbb | 4 | 2
(5 rows)
So now run sum on the union
select
wo,
sum(time) as total_time
from
(
select wo, text, time, value from so48345659a
union
select wo, text, time, value from so48345659b
) x
group by
wo;
Result:
wo | total_time
----+------------
3 | 3
1 | 14
4 | 4
(3 rows)
From your supplementary question (22-Jan-2017), I guess you mean that you have one table that contains duplicate rows. Is that right?
If so, it might look like this:
select * from so48345659c;
wo | text | time | value
----+------+------+-------
1 | test | 5 | 1
1 | test | 2 | a
3 | aaaa | 3 | 1
4 | bbbb | 4 | 2
1 | test | 7 | 1a
3 | aaaa | 3 | 1
4 | bbbb | 4 | 2
(7 rows)
So then you get the sum of the times, ignoring duplicate rows, like this:
select
wo,
sum(time) as total_time
from
(
select distinct wo, text, time, value from so48345659c
) x
group by
wo;
wo | total_time
----+------------
3 | 3
1 | 14
4 | 4
(3 rows)
With just two values, you can do:
select wo, text, sum(time) as time, concat(min(value), max(value)) as value
from t
group by wo, text;
This uses the fact that the standard representation of 1 has a value less than a.
Most databases support string aggregation of some sort (group_concat(), listagg(), and string_agg() are typical functions). You can use one of these for a more general solution.

How to convert all columns value to rows and then check for a particular column?

i have a table like this:
student | group
1 | A
2 | B
1 | B
3 | C
i want to produce the output like the following:
Student | Group_A | Group_B | Group_C
1 | Yes | Yes |
2 | | Yes |
3 | | | Yes
Do anybody have any idea how can I produce this type of report? I tried several ways using Pivot and Unpivot but it's not working here
I believe you are looking for pivot.
http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
declare #students table(id int, type_ nvarchar)
insert into #students
values(1, 'A'),
(2, 'B'),
(3, 'C'),
(1, 'B');
select result.id Student,
iif(result.A = 1, 'Yes', 'No') Group_A,
iif(result.B = 1, 'Yes', 'No') Group_B,
iif(result.C = 1, 'Yes', 'No') Group_C
from (
select *
from #students a
pivot(count(type_) for type_ in (
[A],
[B],
[C]
)) as pivotExample
) result;
If a dynamic pivot is required, I would refer you to this:
SQL Server dynamic PIVOT query?