SQL Transform Crosstab Pivot Data - sql

I am using SQL Server 2008 and would like to transform my data such that:
Dataset:
ID Item Columns Result
1 1 X A
2 1 Y B
3 1 Z C
4 2 X D
5 2 Y E
6 2 Z NULL
7 3 X F
8 3 Y G
9 3 Z H
Results Desired:
Item X Y Z
1 A B C
2 D E NULL
3 F G H
At this time, I am doing the following, then pasting the columns I need into Excel:
Select * from thisTable where Column=X
Select * from thisTable where Column=Y
Select * from thisTable where Column=Z
However, not all of the rows match up to can can't just smack the tables side by side. For columns without a Result, I'd like NULL to show up to fill in the rows to make them all the same number of records.
I looked up PIVOT but I don't think this works here...what is this type of data transformation called? I don't think it's a crosstab...
Thanks!

You can do a crosstab using conditional aggregation:
SELECT
Item,
[X] = MAX(CASE WHEN [Columns] = 'X' THEN Result END),
[Y] = MAX(CASE WHEN [Columns] = 'Y' THEN Result END),
[Z] = MAX(CASE WHEN [Columns] = 'Z' THEN Result END)
FROM thisTable
GROUP BY Item

use PIVOT
select *
from (
select Item, Columns, Result
from thisTable
) t
pivot (
max (Result)
for Columns in (X, Y, Z)
) p

Related

Filtering on two conditions in SQL

I am using Oracle SQL and have the following table, which I would like to filter to exclude the records in which ID = 2 and GRP = X, and ID = 3 and GRP = X, as these were entered in error.
ID GRP
1 X
2 B
2 X
3 C
3 X
What is the correct syntax to do so? My desired end result table is:
ID GRP
1 X
2 B
3 C
Using row value constructor:
SELECT *
FROM tab
WHERE (ID, GRP) NOT IN ((2,'X'),(3,'X'))
SELECT *
FROM tab
WHERE ID NOT IN (2,3) AND GRP <> 'X'
or
SELECT *
FROM tab
WHERE (ID <> 2 OR ID <> 3) AND GRP <> 'X'

Getting both an individual value and a sum from a table in BigQuery

Suppose after a query from a bigger dataset I have a table like this:
day x y
1 4 5
2 3 6
3 3 2
4 2 1
5 8 3
From that table I want to get the values of x and y from day 1 and the sums of x and y from all days into a new table. And how to have the results in a table with two rows instead of just one? Like this:
x y
day1 4 5
days1-5 20 17
Now the best I can do is this:
SELECT
SUM(x) AS allx,
SUM(y) AS ally,
SUM(CASE WHEN day = 1 THEN x END) AS day1x,
SUM(CASE WHEN day = 1 THEN y END) AS day1y
FROM (
..
..
)
I guess there is a more clever way of doing this.
BigQuery - Legacy SQL:
Using comma style UNION ALL
SELECT
day, x, y
FROM
( SELECT 'day1' AS day, x, y
FROM YourTable
WHERE day = 1 ),
( SELECT
CONCAT('day1-',STRING(COUNT(1))) AS day,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable )
OR
Using ROLLUP
SELECT
CONCAT('day_', IFNULL(STRING(day), 'all')) AS day,
x,
y
FROM (
SELECT
DAY,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable
GROUP BY ROLLUP(day)
)
WHERE IFNULL(day, 1) = 1
BigQuery - Standard SQL:
Don't forget to uncheck Use Legacy SQL checkbox under Show Options
SELECT
'day1' AS day,
x,
y
FROM YourTable
WHERE day = 1
UNION ALL
SELECT
FORMAT('day1-%d', COUNT(1)) AS day,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable
Output from al is as expected:
day x y
day1 4 5
day1-5 20 17

SQL Rows to Separate Columns

I realise this maybe similar to other questions, but I am stuck!
I am having trouble organising some data into an appropriate format to export to another tool. Basically I have an ID column and then 2 response columns. I would like to separate the ID and then list the responses under each. See the example below for clarification.
I have played around with Pivot and UnPivot but can't get it quite right.
Here is how the data looks now.
ID X1 X2
1 2 Y
1 5 Y
1 3 N
1 7 N
1 6 Y
2 5 N
2 4 Y
2 8 Y
2 3 N
3 5 Y
3 1 N
3 9 N
Here is how I would like the data to look
ID1_X1 ID1_X2 ID2_X1 ID2_X2 ID3_X1 ID3_X2
2 Y 5 N 5 Y
5 Y 4 Y 1 N
3 N 8 Y 9 N
7 N 3 N null null
6 Y null null null null
Here is the code to create/populate the table.
create table #test (ID int, X1 int, X2 varchar(1))
insert into #test values
('1','2','Y'),('1','5','Y'),('1','3','N'),('1','7','N'),
('1','6','Y'),('2','5','N'),('2','4','Y'),('2','8','Y'),
('2','3','N'),('3','5','Y'),('3','1','N'),('3','9','N')
You can do this using aggregation and row_number() . . . assuming you know the ids in advance:
select max(case when id = 1 then x1 end) as x1_1,
max(case when id = 1 then x2 end) as x2_1,
max(case when id = 2 then x1 end) as x1_2,
max(case when id = 2 then x2 end) as x2_2,
max(case when id = 3 then x1 end) as x1_3,
max(case when id = 3 then x2 end) as x2_3
from (select t.*,
row_number() over (partition by id order by (select null)) a seqnum
from #test t
) t
group by seqnum;
I should note that SQL tables represent unordered sets. Your original data doesn't have an indication of the ordering, so this is not guaranteed to put the values in the same order as the original data (actually, there is no such order that that statement is a tautology). If you have another column with the ordering, then you can use that.
Here is a alternative approach to Gordan's good answer using OUTER JOIN's
Considering that there is a Identity column in your table to define the order of X1 in each ID and fixed number of ID's
;WITH FST
AS (SELECT ROW_NUMBER()OVER(ORDER BY IDENTITY_COL) RN,X1 AS ID1_X1,X2 AS ID1_X2
FROM #TEST A
WHERE ID = 1),
SCD
AS (SELECT ROW_NUMBER()OVER(ORDER BY IDENTITY_COL) RN,X1 AS ID2_X1,X2 AS ID2_X2
FROM #TEST A
WHERE ID = 2),
TRD
AS (SELECT ROW_NUMBER()OVER(ORDER BY IDENTITY_COL) RN,X1 AS ID3_X1,X2 AS ID3_X2
FROM #TEST A
WHERE ID = 3)
SELECT ID1_X1,ID1_X2,ID2_X1,ID2_X2,ID3_X1,ID3_X2
FROM FST A
FULL OUTER JOIN SCD B
ON A.RN = B.RN
FULL OUTER JOIN TRD C
ON C.RN = COALESCE(B.RN, A.RN)

Selecting certain values that contain every number but not only one

Allow me to preface this by saying that I am fairly new to sql, and I'm sure there is an easy way to do this that I'm not understanding.
Lets say we have a table:
X | Y
2 | 2
3 | 1
3 | 3
3 | 2
I am trying to find values of y such that x contains both 2 and 3.
Basically, y = 2 is the only value that satisfies this.
EDIT: I know that in relational algebra this is trivial with division
using a conditional SUM. If any group of Y contain 2 sum will be greater than 0, same with 3
SELECT Y
FROM YourTable
GROUP BY Y
HAVING SUM(CASE WHEN X = 2 THEN 1 ELSE 0 END) > 0
and SUM(CASE WHEN X = 3 THEN 1 ELSE 0 END) > 0
You could probably try this:
select y
from test
where x in (2,3)
group by y
having count(*) = 2;
EDIT: Notice a good recommendation by Juan. In case your data contains X=2 and Y=2, a better way of writing the query would be this:
select y
from test
where x in (2,3)
group by y
having count(distinct x) = 2;
I'd use INTERSECT:
SELECT Y
FROM YourTable
WHERE X = 2
INTERSECT
SELECT Y
FROM YourTable
WHERE X = 3
Using the analytic LAG() function.
SELECT y
FROM
( SELECT x,
y,
lag(x) OVER(PARTITION BY y ORDER BY x) x_lag FROM your_table WHERE x IN (2, 3)
)
WHERE x_lag = x - 1;
Working demo:
SQL> WITH DATA AS(
2 SELECT 2 X, 2 Y FROM dual UNION ALL
3 SELECT 3 X, 1 Y FROM dual UNION ALL
4 SELECT 3 X, 3 Y FROM dual UNION ALL
5 SELECT 3 X, 2 Y FROM dual
6 )
7 SELECT y
8 FROM
9 ( SELECT x,
10 y,
11 lag(x) OVER(PARTITION BY y ORDER BY x) x_lag FROM data WHERE x IN (2, 3)
12 )
13 WHERE x_lag = x - 1;
Y
----------
2

"Cluster" Code Help in SQL

I am relative newcomer to SQL, but have gained many useful ideas through the site. Now I'm stuck on a piece of code that seems simple enough, but for some reason I can't wrap my head around it.
I am trying to create a third column (Column Z) based off of the first two columns below:
Column X Column Y
-------------------
1 a
1 b
1 c
2 a
2 d
2 e
2 f
4 b
5 i
5 c
3 g
3 h
6 j
6 k
6 l
What i need to have happen in Column Z:
For each individual value found in Column Y, note the value of Column X
Likewise, for each individual value in Column X, note the value of Column Y
Then, cluster (RANK/ROW_NUMBER?) these into groups seen below:
Column X Column Y Column Z
-----------------------------
1 a 1
1 b 1
1 c 1
2 a 1
2 d 1
2 e 1
2 f 1
4 b 1
5 i 1
5 c 1
3 g 2
3 h 2
6 j 3
6 k 3
6 l 3
I hope I've been clear enough without over-complicating things. My head has been spinning all morning. Let me know if anyone needs any more info.
Greatly appreciated in advance!
I have faced exactly this problem for some analyses in the past. The only way I could get it to work is by doing a loop, that incrementally adds in the information.
The loop assigns the minimum "x" value within each group as the group id. By your rules, this is guaranteed to be unique. It starts by assigning the current x value to z. It then finds the minimum z along the x and y dimensions. It repeats this process until no records change.
Given your data, the following is an outline of how to do it:
update t set z = x
while 1=1
begin
with toupdate as (
select t.*,
min(z) over (partition by x) as idx,
min(z) over (partition by y) as idy from t
)
update toupdate
set z = (case when idx < idy then idx else idy end)
where z > idx or z > idy;
if (##ROWCOUNT = 0) break;
end;
;with a as
(
select z, dense_rank() over (order by z) newZ from t
)
update a set z = newZ
Maybe not the best way, but it works
SQLFiddle http://sqlfiddle.com/#!3/99532/1
;WITH cte AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS row_nb
FROM #t
)
, c2 AS (
SELECT e1.*
,CASE WHEN EXISTS(SELECT * FROM cte e2 WHERE e1.Y = e2.Y and e2.row_nb < e1.row_nb) THEN 1 ELSE 0 END as ex
FROM cte e1
)
, c3 AS (
SELECT X,1 - SIGN(SUM(ex)) as ex,MAX(row_nb) as max_row_nb
FROM c2
GROUP BY X
)
SELECT
cte.X,cte.Y
,(SELECT SUM(cc3.ex) FROM c3 cc3 where cc3.max_row_nb<= c3.max_row_nb) AS Z
FROM cte
INNER JOIN c3
ON c3.X = cte.X
ORDER BY cte.row_nb
declare #t table (x tinyint, y char(1), z tinyint)
insert #t (x,y) values(1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'d'),(2,'e'),(2,'c'),
(2,'f'),(4,'b'),(5,'i'),(5,'c'),(3,'g'),(3,'h'),(6,'j'),(6,'k'),(6,'l'),(7,'v')
;with a as
(
select x,parent from
(
select x, min(x) over (partition by y) parent from #t
) a
where x > parent
), b as
(
select x, parent from a
union all
select a.x, b.parent
from a join b on a.parent = b.x
), c as
(
select x, min(parent) parent
from b
group by x
), d as
(
select t.x,t.y, t.z,
dense_rank() over (order by coalesce(c.parent, t.x)) calculatedZ
from #t t
left join c on t.x = c.x
)
select x,y,calculatedZ as z from d
-- if you want to update instead of selecting, replace last line with:
-- update d set z = newz
-- select x,y,z from #t
option (maxrecursion 0)
Result:
x y z
1 a 1
1 b 1
1 c 1
2 a 1
2 d 1
2 e 1
2 c 1
2 f 1
4 b 1
5 i 1
5 c 1
3 g 2
3 h 2
6 j 3
6 k 3
6 l 3
8 j 3
7 v 4