How to split values for the same id into columns - sql

How to split values for the same id into columns?
Example Table
I want to achieve this:
I try:
SUM (CASE WHEN Type = 1 THEN Price END) AS Type_1
SUM (CASE WHEN Type = 2 THEN Price END) AS Type_2
SUM (CASE WHEN Type = 3 THEN Price END) AS Type_3
--SUM (CASE WHEN Type = 1 THEN CarsID END) AS CarsID_Type1
--SUM (CASE WHEN Type = 2 THEN CarsID END) AS CarsID_Type2
--SUM (CASE WHEN Type = 3 THEN CarsID END) AS CarsID_Type3
GROUP BY ID
3 additional columns (Type_1, Type_2, Type_3) are correctly created and everything is in one line. Unfortunately, the last commented out part causes an error:
Operand data type uniqueidentifier is invalid for sum operator.
What to replace with SUM to make the query run correctly.
I will be grateful for your help.

This query is more complex then I thought, but it will do what you want for any kind of id. But there is a problem, it was construct to work with the max of 3 different "types". If your column "Type" have more then 3 different values for the same "id" you will need to adapt the code below.
/*Shifting the value to another column*/
with first_lag as (
select
id
, type_
, case when count(type_)over(partition by id) > 1 then lag(type_)over(order by type_ desc)
end as lag_type_1
, CarsID
, case when count(CarsID)over(partition by id) > 1 then lag(CarsID)over(order by CarsID desc )
end as lag_cars_1
, price::int
from stack_overflow so
)
/*Shifting again the value to another column*/
, second_lag as(
select
id
, type_
, lag_type_1
, lag(lag_type_1)over(order by lag_type_1 desc) lag_type_2
, CarsID
, lag_cars_1
, lag(lag_cars_1)over(order by lag_cars_1 desc) lag_cars_2
, sum(price) over(partition by id) as price_by_id
from first_lag
)
/*Counting how many "types" the same id have (preparing to filter)*/
, counting_rows as (
select
*
, count(type_)over(partition by type_) +count(lag_type_1) over(partition by type_) +count(lag_type_2) over(partition by type_) as counting
from second_lag
)
/*Knowing which row have the max number of "types" (preparing to filter)*/
, selecting_max as (
select
*
,max(counting)over(partition by id) as max_flag
from counting_rows
)
/*Selecting the columns and filtering just the row with the max number of "types"*/
select id,type_,lag_type_1,lag_type_2,carsid,lag_cars_1,lag_cars_2,price_by_id
from selecting_max
where counting = max_flag
--group by id
Note it will work for different ids. (But only if it has 3 or less different"types")
Image

Related

flatten data in SQL based on fixed set of column

I am stuck with a specific scenario of flattening the data and need help for it. I need the output as flattened data where the column values are not fixed. Due to this I want to restrict the output to fixed set of columns.
Given Table 'test_table'
ID
Name
Property
1
C1
xxx
2
C2
xyz
2
C3
zz
The scenario is, column Name can have any no. of values corresponding to an ID. I need to flatten the data based in such a way that there is one row per ID field. Since the Name field varies with each ID, I want to flatten it for fix 3 columns like Co1, Co2, Co3. The output should look like
ID
Co1
Co1_Property
Co2
Co2_Property
Co3
Co3_Property
1
C1
xxx
null
null
2
C2
xyz
C3
zz
Could not think of a solution using Pivot or aggregation. Any help would be appreciated.
You can use arrays:
select id,
array_agg(name order by name)[safe_ordinal(1)] as name_1,
array_agg(property order by name)[safe_ordinal(1)] as property_1,
array_agg(name order by name)[safe_ordinal(2)] as name_2,
array_agg(property order by name)[safe_ordinal(2)] as property_2,
array_agg(name order by name)[safe_ordinal(3)] as name_3,
array_agg(property order by name)[safe_ordinal(3)] as property_3
from t
group by id;
All current answers are too verbose and involve heavy repetition of same fragments of code again and again and if you need to account more columns you need to copy paste and add more lines which will make it even more verbose!
My preference is to avoid such type of coding and rather use something more generic as in below example
select * from (
select *, row_number() over(partition by id) col
from `project.dataset.table`)
pivot (max(name) as name, max(property) as property for col in (1, 2, 3))
If applied to sample data in your question - output is
If you want to change number of output columns - you just simply modify for col in (1, 2, 3) part of query.
For example if you would wanted to have 5 columns - you would use for col in (1, 2, 3, 4, 5) - that simple!!!
The standard practice is to use conditional aggregation. That is, to use CASE expressions to pick which row goes to which column, then MAX() to collapse multiple rows into individual rows...
SELECT
id,
MAX(CASE WHEN name = 'C1' THEN name END) AS co1,
MAX(CASE WHEN name = 'C1' THEN property END) AS co1_property,
MAX(CASE WHEN name = 'C2' THEN name END) AS co2,
MAX(CASE WHEN name = 'C2' THEN property END) AS co2_property,
MAX(CASE WHEN name = 'C3' THEN name END) AS co3,
MAX(CASE WHEN name = 'C3' THEN property END) AS co3_property
FROM
yourTable
GROUP BY
id
Background info:
Not having an ELSE in the CASE expression implicitly means ELSE NULL
The intention is therefore for each column to recieve NULL from every input row, except for the row being pivoted into that column
Aggregates, such as MAX() essentially skip NULL values
MAX( {NULL,NULL,'xxx',NULL,NULL} ) therefore equals 'xxx'
A similar approach "bunches" the values to the left (so that NULL values always only appears to the right...)
That approach first uses row_number() to give each row a value corresponding to which column you want to put that row in to..
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY name) AS seq_num
FROM
yourTable
)
SELECT
id,
MAX(CASE WHEN seq_num = 1 THEN name END) AS co1,
MAX(CASE WHEN seq_num = 1 THEN property END) AS co1_property,
MAX(CASE WHEN seq_num = 2 THEN name END) AS co2,
MAX(CASE WHEN seq_num = 2 THEN property END) AS co2_property,
MAX(CASE WHEN seq_num = 3 THEN name END) AS co3,
MAX(CASE WHEN seq_num = 3 THEN property END) AS co3_property
FROM
yourTable
GROUP BY
id

Transpose row value to additional pre-defined field when value is less than specified value

I have a table that contains two identifying columns, a date, and a value. This value can be up to 100. What I want to do is where [ID] and [DATE] is the same across subsequent rows and the values are less than 100 (which also means [ID_SECONDARY] is always different), I want a query to place each one of these values in a column '[VALUE_1]...[VALUE_N]' along with the Value Description ([ID_SECONDARY]-->[VALUE_1_DESC]...[VALUE_N_DESC]). Ultimately each row should contain a unique [ID], [DATE], and an aggregation of the different [ID_SECONDARY] descriptions along with their values [VALUE_1]...[VALUE_N]. The number of unique [ID_SECONDARY] will not surpass 4, but could be from 1 to 4.
My initial inclination is to approach this using a cursor, but am hopeful there is a better alternative.
The first image is a sample of the information provided in the table, the second image is the output I'm looking for. Any help is greatly appreciated.
.
As far as I can tell this is different from the various dynamic pivot posts out there because the columns are independent of the secondary ID and are fully dependent on the VALUE column to determine if the value itself belongs in columns 1-4.
Try this
WITH a AS (
SELECT
ID
, [DATE]
, ID_SECONDARY
, VALUE
, ROW_NUMBER() OVER (PARTITION BY ID, DATE ORDER BY ID) AS RNUM
)
SELECT
a.ID
, a.[DATE]
, MAX (
CASE a.RNUM
WHEN 1 THEN a.VALUE
ELSE NULL
) AS VALUE_1
, MAX (
CASE a.RNUM
WHEN 1 THEN a.ID_SECONDARY
ELSE NULL
) AS VALUE_1_DESC
, MAX (
CASE a.RNUM
WHEN 2 THEN a.VALUE
ELSE NULL
) AS VALUE_2
, MAX (
CASE a.RNUM
WHEN 2 THEN a.ID_SECONDARY
ELSE NULL
) AS VALUE_2_DESC
, MAX (
CASE RNUM
WHEN 3 THEN a.VALUE
ELSE NULL
) AS VALUE_3
, MAX (
CASE RNUM
WHEN 3 THEN a.ID_SECONDARY
ELSE NULL
) AS VALUE_3_DESC
, MAX (
CASE RNUM
WHEN 4 THEN a.VALUE
ELSE NULL
) AS VALUE_4
, MAX (
CASE RNUM
WHEN 4 THEN a.ID_SECONDARY
ELSE NULL
) AS VALUE_4_DESC
FROM a
GROUP BY a.ID, a.[DATE]

Calculation of occurrence of strings

I have a table with 3 columns, id, name and vote. They're populated with many registers. I need that return the register with the best balance of votes. The votes types are 'yes' and 'no'.
Yes -> Plus 1
No -> Minus 1
This column vote is a string column. I am using SQL SERVER.
Example:
It must return Ann for me
Use conditional Aggregation to tally the votes as Kannan suggests in his answer
If you really only want 1 record then you can do it like so:
SELECT TOP 1
name
,SUM(CASE WHEN vote = 'yes' THEN 1 ELSE -1 END) AS VoteTotal
FROM
#Table
GROUP BY
name
ORDER BY
VoteTotal DESC
This will not allow for ties but you can use this method which will rank the responses and give you results use RowNum to get only 1 result or RankNum to get ties.
;WITH cteVoteTotals AS (
SELECT
name
,SUM(CASE WHEN vote = 'yes' THEN 1 ELSE -1 END) AS VoteTotal
,ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY SUM(CASE WHEN vote = 'yes' THEN 1 ELSE -1 END) DESC) as RowNum
,DENSE_RANK() OVER (PARTITION BY 1 ORDER BY SUM(CASE WHEN vote = 'yes' THEN 1 ELSE -1 END) DESC) as RankNum
FROM
#Table
GROUP BY
name
)
SELECT name, VoteTotal
FROM
cteVoteTotals
WHERE
RowNum = 1
--RankNum = 1 --if you want with ties use this line instead
Here is the test data used and in the future do NOT just put an image of your test data spend the 2 minutes to make a temp table or a table variable so that people you are asking for help do not have to!
DECLARE #Table AS TABLE (id INT, name VARCHAR(25), vote VARCHAR(4))
INSERT INTO #Table (id, name, vote)
VALUES (1, 'John','no'),(2, 'John','no'),(3, 'John','yes')
,(4, 'Ann','no'),(5, 'Ann','yes'),(6, 'Ann','yes')
,(9, 'Marie','no'),(8, 'Marie','no'),(7, 'Marie','yes')
,(10, 'Matt','no'),(11, 'Matt','yes'),(12, 'Matt','yes')
Use this code,
;with cte as (
select id, name, case when vote = 'yes' then 1 else -1 end as votenum from register
) select name, sum(votenum) from cte group by name
You can get max or minimum based out of this..
This one gives the 'yes' rate for each person:
SELECT Name, SUM(CASE WHEN Vote = 'Yes' THEN 1 ELSE 0 END)/COUNT(*) AS Rate
FROM My_Table
GROUP BY Name

pivot table returns more than 1 row for the same ID

I have a sql code which I am using to do pivot. Code is as follows:
SELECT DISTINCT PersonID
,MAX(pivotColumn1)
,MAX(pivotColumn2) --originally these were in 2 separate rows)
FROM(SELECT srcID, PersonID, detailCode, detailValue) FROM src) AS SrcTbl
PIVOT(MAX(detailValue) FOR detailCode IN ([pivotColumn1],[pivotColumn2])) pvt
GROUP BY PersonID
In the source data the ID has 2 separate rows due to having its own ID which separates the values. I have now pivoted it and its still giving me 2 separate rows for the ID even though i grouped it and used aggregation on the pivot columns. Ay idea whats wrong with the code?
So I have all my possible detailCode listed in the IN clause. So I have null returned when the value is none but I want it all summarised in 1 row. See image below.
If those are all the options of detailCode , you can use conditional aggregation with CASE EXPRESSION instead of Pivot:
SELECT t.personID,
MAX(CASE WHEN t.detailCode = 'cas' then t.detailValue END) as cas,
MAX(CASE WHEN t.detailCode = 'buy' then t.detailValue END) as buy,
MAX(CASE WHEN t.detailCode = 'sel' then t.detailValue END) as sel,
MAX(CASE WHEN t.detailCode = 'pla' then t.detailValue END) as pla
FROM YourTable t
GROUP BY t.personID

Looping in select query

I want to do something like this:
select id,
count(*) as total,
FOR temp IN SELECT DISTINCT somerow FROM mytable ORDER BY somerow LOOP
sum(case when somerow = temp then 1 else 0 end) temp,
END LOOP;
from mytable
group by id
order by id
I created working select:
select id,
count(*) as total,
sum(case when somerow = 'a' then 1 else 0 end) somerow_a,
sum(case when somerow = 'b' then 1 else 0 end) somerow_b,
sum(case when somerow = 'c' then 1 else 0 end) somerow_c,
sum(case when somerow = 'd' then 1 else 0 end) somerow_d,
sum(case when somerow = 'e' then 1 else 0 end) somerow_e,
sum(case when somerow = 'f' then 1 else 0 end) somerow_f,
sum(case when somerow = 'g' then 1 else 0 end) somerow_g,
sum(case when somerow = 'h' then 1 else 0 end) somerow_h,
sum(case when somerow = 'i' then 1 else 0 end) somerow_i,
sum(case when somerow = 'j' then 1 else 0 end) somerow_j,
sum(case when somerow = 'k' then 1 else 0 end) somerow_k
from mytable
group by id
order by id
this works, but it is 'static' - if some new value will be added to 'somerow' I will have to change sql manually to get all the values from somerow column, and that is why I'm wondering if it is possible to do something with for loop.
So what I want to get is this:
id somerow_a somerow_b ....
0 3 2 ....
1 2 10 ....
2 19 3 ....
. ... ...
. ... ...
. ... ...
So what I'd like to do is to count all the rows which has some specific letter in it and group it by id (this id isn't primary key, but it is repeating - for id there are about 80 different values possible).
http://sqlfiddle.com/#!15/18feb/2
Are arrays good for you? (SQL Fiddle)
select
id,
sum(totalcol) as total,
array_agg(somecol) as somecol,
array_agg(totalcol) as totalcol
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
;
id | total | somecol | totalcol
----+-------+---------+----------
1 | 6 | {b,a,c} | {2,1,3}
2 | 5 | {d,f} | {2,3}
In 9.2 it is possible to have a set of JSON objects (Fiddle)
select row_to_json(s)
from (
select
id,
sum(totalcol) as total,
array_agg(somecol) as somecol,
array_agg(totalcol) as totalcol
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
) s
;
row_to_json
---------------------------------------------------------------
{"id":1,"total":6,"somecol":["b","a","c"],"totalcol":[2,1,3]}
{"id":2,"total":5,"somecol":["d","f"],"totalcol":[2,3]}
In 9.3, with the addition of lateral, a single object (Fiddle)
select to_json(format('{%s}', (string_agg(j, ','))))
from (
select format('%s:%s', to_json(id), to_json(c)) as j
from
(
select
id,
sum(totalcol) as total_sum,
array_agg(somecol) as somecol_array,
array_agg(totalcol) as totalcol_array
from (
select id, somecol, count(*) as totalcol
from mytable
group by id, somecol
) s
group by id
) s
cross join lateral
(
select
total_sum as total,
somecol_array as somecol,
totalcol_array as totalcol
) c
) s
;
to_json
---------------------------------------------------------------------------------------------------------------------------------------
"{1:{\"total\":6,\"somecol\":[\"b\",\"a\",\"c\"],\"totalcol\":[2,1,3]},2:{\"total\":5,\"somecol\":[\"d\",\"f\"],\"totalcol\":[2,3]}}"
In 9.2 it is also possible to have a single object in a more convoluted way using subqueries in instead of lateral
SQL is very rigid about the return type. It demands to know what to return beforehand.
For a completely dynamic number of resulting values, you can only use arrays like #Clodoaldo posted. Effectively a static return type, you do not get individual columns for each value.
If you know the number of columns at call time ("semi-dynamic"), you can create a function taking (and returning) polymorphic parameters. Closely related answer with lots of details:
Dynamic alternative to pivot with CASE and GROUP BY
(You also find a related answer with arrays from #Clodoaldo there.)
Your remaining option is to use two round-trips to the server. The first to determine the the actual query with the actual return type. The second to execute the query based on the first call.
Else, you have to go with a static query. While doing that, I see two nicer options for what you have right now:
1. Simpler expression
select id
, count(*) AS total
, count(somecol = 'a' OR NULL) AS somerow_a
, count(somecol = 'b' OR NULL) AS somerow_b
, ...
from mytable
group by id
order by id;
How does it work?
Compute percents from SUM() in the same SELECT sql query
SQL Fiddle.
2. crosstab()
crosstab() is more complex at first, but written in C, optimized for the task and shorter for long lists. You need the additional module tablefunc installed. Read the basics here if you are not familiar:
PostgreSQL Crosstab Query
SELECT * FROM crosstab(
$$
SELECT id
, count(*) OVER (PARTITION BY id)::int AS total
, somecol
, count(*)::int AS ct -- casting to int, don't think you need bigint?
FROM mytable
GROUP BY 1,3
ORDER BY 1,3
$$
,
$$SELECT unnest('{a,b,c,d}'::text[])$$
) AS f (id int, total int, a int, b int, c int, d int);