"Cluster" Code Help in SQL - sql

I am relative newcomer to SQL, but have gained many useful ideas through the site. Now I'm stuck on a piece of code that seems simple enough, but for some reason I can't wrap my head around it.
I am trying to create a third column (Column Z) based off of the first two columns below:
Column X Column Y
-------------------
1 a
1 b
1 c
2 a
2 d
2 e
2 f
4 b
5 i
5 c
3 g
3 h
6 j
6 k
6 l
What i need to have happen in Column Z:
For each individual value found in Column Y, note the value of Column X
Likewise, for each individual value in Column X, note the value of Column Y
Then, cluster (RANK/ROW_NUMBER?) these into groups seen below:
Column X Column Y Column Z
-----------------------------
1 a 1
1 b 1
1 c 1
2 a 1
2 d 1
2 e 1
2 f 1
4 b 1
5 i 1
5 c 1
3 g 2
3 h 2
6 j 3
6 k 3
6 l 3
I hope I've been clear enough without over-complicating things. My head has been spinning all morning. Let me know if anyone needs any more info.
Greatly appreciated in advance!

I have faced exactly this problem for some analyses in the past. The only way I could get it to work is by doing a loop, that incrementally adds in the information.
The loop assigns the minimum "x" value within each group as the group id. By your rules, this is guaranteed to be unique. It starts by assigning the current x value to z. It then finds the minimum z along the x and y dimensions. It repeats this process until no records change.
Given your data, the following is an outline of how to do it:
update t set z = x
while 1=1
begin
with toupdate as (
select t.*,
min(z) over (partition by x) as idx,
min(z) over (partition by y) as idy from t
)
update toupdate
set z = (case when idx < idy then idx else idy end)
where z > idx or z > idy;
if (##ROWCOUNT = 0) break;
end;
;with a as
(
select z, dense_rank() over (order by z) newZ from t
)
update a set z = newZ

Maybe not the best way, but it works
SQLFiddle http://sqlfiddle.com/#!3/99532/1
;WITH cte AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS row_nb
FROM #t
)
, c2 AS (
SELECT e1.*
,CASE WHEN EXISTS(SELECT * FROM cte e2 WHERE e1.Y = e2.Y and e2.row_nb < e1.row_nb) THEN 1 ELSE 0 END as ex
FROM cte e1
)
, c3 AS (
SELECT X,1 - SIGN(SUM(ex)) as ex,MAX(row_nb) as max_row_nb
FROM c2
GROUP BY X
)
SELECT
cte.X,cte.Y
,(SELECT SUM(cc3.ex) FROM c3 cc3 where cc3.max_row_nb<= c3.max_row_nb) AS Z
FROM cte
INNER JOIN c3
ON c3.X = cte.X
ORDER BY cte.row_nb

declare #t table (x tinyint, y char(1), z tinyint)
insert #t (x,y) values(1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'d'),(2,'e'),(2,'c'),
(2,'f'),(4,'b'),(5,'i'),(5,'c'),(3,'g'),(3,'h'),(6,'j'),(6,'k'),(6,'l'),(7,'v')
;with a as
(
select x,parent from
(
select x, min(x) over (partition by y) parent from #t
) a
where x > parent
), b as
(
select x, parent from a
union all
select a.x, b.parent
from a join b on a.parent = b.x
), c as
(
select x, min(parent) parent
from b
group by x
), d as
(
select t.x,t.y, t.z,
dense_rank() over (order by coalesce(c.parent, t.x)) calculatedZ
from #t t
left join c on t.x = c.x
)
select x,y,calculatedZ as z from d
-- if you want to update instead of selecting, replace last line with:
-- update d set z = newz
-- select x,y,z from #t
option (maxrecursion 0)
Result:
x y z
1 a 1
1 b 1
1 c 1
2 a 1
2 d 1
2 e 1
2 c 1
2 f 1
4 b 1
5 i 1
5 c 1
3 g 2
3 h 2
6 j 3
6 k 3
6 l 3
8 j 3
7 v 4

Related

SQL Transform Crosstab Pivot Data

I am using SQL Server 2008 and would like to transform my data such that:
Dataset:
ID Item Columns Result
1 1 X A
2 1 Y B
3 1 Z C
4 2 X D
5 2 Y E
6 2 Z NULL
7 3 X F
8 3 Y G
9 3 Z H
Results Desired:
Item X Y Z
1 A B C
2 D E NULL
3 F G H
At this time, I am doing the following, then pasting the columns I need into Excel:
Select * from thisTable where Column=X
Select * from thisTable where Column=Y
Select * from thisTable where Column=Z
However, not all of the rows match up to can can't just smack the tables side by side. For columns without a Result, I'd like NULL to show up to fill in the rows to make them all the same number of records.
I looked up PIVOT but I don't think this works here...what is this type of data transformation called? I don't think it's a crosstab...
Thanks!
You can do a crosstab using conditional aggregation:
SELECT
Item,
[X] = MAX(CASE WHEN [Columns] = 'X' THEN Result END),
[Y] = MAX(CASE WHEN [Columns] = 'Y' THEN Result END),
[Z] = MAX(CASE WHEN [Columns] = 'Z' THEN Result END)
FROM thisTable
GROUP BY Item
use PIVOT
select *
from (
select Item, Columns, Result
from thisTable
) t
pivot (
max (Result)
for Columns in (X, Y, Z)
) p

Microsoft SQL Server : comparing a vertical table as a horizontal table for vector

I have a set of vectors where each vector has an element of a through z.
I would like to have a query such that that I get the first vector on the left and the comparing to vector on the right.
Let's say the first vector is (a,b,c) and the other two vectors are (a,b) and (c).
When I really want this:
v1 el1 v2 el2
-----------------
1 a 2 a
1 b 2 b
1 c 2 null
1 a 3 null
1 b 3 null
1 c 3 c
This way it would be easy to have another pass to calculate the metrics per vector as they relate to vector #1.
DROP TABLE #vector
CREATE TABLE #vector (v VARCHAR(10),el VARCHAR(10))
INSERT INTO #vector(v, el) VALUES ('1', 'a')
INSERT INTO #vector(v, el) VALUES ('1', 'b')
INSERT INTO #vector(v, el) VALUES ('1', 'c')
INSERT INTO #vector(v, el) VALUES ('2', 'a')
INSERT INTO #vector(v, el) VALUES ('2', 'b')
INSERT INTO #vector(v, el) VALUES ('3', 'c')
SELECT *
FROM #vector a
LEFT JOIN #vector b on a.el = b.el AND a.v <> b.v
WHERE a.v = '1'
I actually get this:
v el v el
--------------
1 a 2 a
1 b 2 b
1 c 3 c
I thought about PIVOT:
WITH vectors AS (
select *
from
( select v,el from #vector ) src
PIVOT (
count(el) for el in ([a],[b],[c])
) piv)
SELECT * FROM vectors a JOIN vectors b ON b.v <> a.v WHERE a.v=1
Which returns this:
v a b c v a b c
------------------------------
1 1 1 1 2 1 1 0
1 1 1 1 3 0 0 1
Which admittedly, I can use but it requires me to rewrite simple summation query into one in which I must specify a through z.
SELECT
v, el, present
FROM
(SELECT *
FROM
(SELECT v, el FROM #vector) src
PIVOT (count(el) for el in ([a],[b],[c]) ) piv) foo
UNPIVOT (present FOR el IN (a,b,c)) AS up;
This returns:
v el present
------------------
1 a 1
1 b 1
1 c 1
2 a 1
2 b 1
2 c 0
3 a 0
3 b 0
3 c 1
So as a possible final answer:
SELECT v, el, present
INTO #vector2
FROM (select * FROM ( select v,el from #vector ) src PIVOT ( count(el) for el in ([a],[b],[c]) ) piv) foo
UNPIVOT (present FOR el IN (a,b,c)) AS up;
SELECT * FROM #vector2 a LEFT JOIN #vector2 b on a.el = b.el AND a.v <> b.v WHERE a.v='1'
ORDER BY a.v,b.v
Returns:
v el present v el present
1 a 1 2 a 1
1 b 1 2 b 1
1 c 1 2 c 0
1 c 1 3 c 1
1 b 1 3 b 0
1 a 1 3 a 0
So through the PIVOT and UNPIVOT, I can get the zeros filled in.
However, this seems like a complicated solution.
Is there an easier way?
One idea would be to alter #vector and add 'present' and populate the zero entries. But populating the other zero entries wastes space and it is non-trivial to determine which 0s to insert.
Thank you for your help. :)
Instead of pivot/unpivot use UNION
SELECT * FROM (
SELECT 2 as part, a.v as a_v, a.el as a_el, b.v as b_v, b.el as b_el
FROM #vector a
LEFT JOIN #vector b on b.v='2' and a.el=b.el
WHERE a.v='1'
UNION ALL
SELECT 3 as part, a.*, b.*
FROM #vector a
LEFT JOIN #vector b on b.v='3' and a.el=b.el
WHERE a.v='1') t
order by part, a_v, a_el

SQL Rows to Separate Columns

I realise this maybe similar to other questions, but I am stuck!
I am having trouble organising some data into an appropriate format to export to another tool. Basically I have an ID column and then 2 response columns. I would like to separate the ID and then list the responses under each. See the example below for clarification.
I have played around with Pivot and UnPivot but can't get it quite right.
Here is how the data looks now.
ID X1 X2
1 2 Y
1 5 Y
1 3 N
1 7 N
1 6 Y
2 5 N
2 4 Y
2 8 Y
2 3 N
3 5 Y
3 1 N
3 9 N
Here is how I would like the data to look
ID1_X1 ID1_X2 ID2_X1 ID2_X2 ID3_X1 ID3_X2
2 Y 5 N 5 Y
5 Y 4 Y 1 N
3 N 8 Y 9 N
7 N 3 N null null
6 Y null null null null
Here is the code to create/populate the table.
create table #test (ID int, X1 int, X2 varchar(1))
insert into #test values
('1','2','Y'),('1','5','Y'),('1','3','N'),('1','7','N'),
('1','6','Y'),('2','5','N'),('2','4','Y'),('2','8','Y'),
('2','3','N'),('3','5','Y'),('3','1','N'),('3','9','N')
You can do this using aggregation and row_number() . . . assuming you know the ids in advance:
select max(case when id = 1 then x1 end) as x1_1,
max(case when id = 1 then x2 end) as x2_1,
max(case when id = 2 then x1 end) as x1_2,
max(case when id = 2 then x2 end) as x2_2,
max(case when id = 3 then x1 end) as x1_3,
max(case when id = 3 then x2 end) as x2_3
from (select t.*,
row_number() over (partition by id order by (select null)) a seqnum
from #test t
) t
group by seqnum;
I should note that SQL tables represent unordered sets. Your original data doesn't have an indication of the ordering, so this is not guaranteed to put the values in the same order as the original data (actually, there is no such order that that statement is a tautology). If you have another column with the ordering, then you can use that.
Here is a alternative approach to Gordan's good answer using OUTER JOIN's
Considering that there is a Identity column in your table to define the order of X1 in each ID and fixed number of ID's
;WITH FST
AS (SELECT ROW_NUMBER()OVER(ORDER BY IDENTITY_COL) RN,X1 AS ID1_X1,X2 AS ID1_X2
FROM #TEST A
WHERE ID = 1),
SCD
AS (SELECT ROW_NUMBER()OVER(ORDER BY IDENTITY_COL) RN,X1 AS ID2_X1,X2 AS ID2_X2
FROM #TEST A
WHERE ID = 2),
TRD
AS (SELECT ROW_NUMBER()OVER(ORDER BY IDENTITY_COL) RN,X1 AS ID3_X1,X2 AS ID3_X2
FROM #TEST A
WHERE ID = 3)
SELECT ID1_X1,ID1_X2,ID2_X1,ID2_X2,ID3_X1,ID3_X2
FROM FST A
FULL OUTER JOIN SCD B
ON A.RN = B.RN
FULL OUTER JOIN TRD C
ON C.RN = COALESCE(B.RN, A.RN)

Selecting certain values that contain every number but not only one

Allow me to preface this by saying that I am fairly new to sql, and I'm sure there is an easy way to do this that I'm not understanding.
Lets say we have a table:
X | Y
2 | 2
3 | 1
3 | 3
3 | 2
I am trying to find values of y such that x contains both 2 and 3.
Basically, y = 2 is the only value that satisfies this.
EDIT: I know that in relational algebra this is trivial with division
using a conditional SUM. If any group of Y contain 2 sum will be greater than 0, same with 3
SELECT Y
FROM YourTable
GROUP BY Y
HAVING SUM(CASE WHEN X = 2 THEN 1 ELSE 0 END) > 0
and SUM(CASE WHEN X = 3 THEN 1 ELSE 0 END) > 0
You could probably try this:
select y
from test
where x in (2,3)
group by y
having count(*) = 2;
EDIT: Notice a good recommendation by Juan. In case your data contains X=2 and Y=2, a better way of writing the query would be this:
select y
from test
where x in (2,3)
group by y
having count(distinct x) = 2;
I'd use INTERSECT:
SELECT Y
FROM YourTable
WHERE X = 2
INTERSECT
SELECT Y
FROM YourTable
WHERE X = 3
Using the analytic LAG() function.
SELECT y
FROM
( SELECT x,
y,
lag(x) OVER(PARTITION BY y ORDER BY x) x_lag FROM your_table WHERE x IN (2, 3)
)
WHERE x_lag = x - 1;
Working demo:
SQL> WITH DATA AS(
2 SELECT 2 X, 2 Y FROM dual UNION ALL
3 SELECT 3 X, 1 Y FROM dual UNION ALL
4 SELECT 3 X, 3 Y FROM dual UNION ALL
5 SELECT 3 X, 2 Y FROM dual
6 )
7 SELECT y
8 FROM
9 ( SELECT x,
10 y,
11 lag(x) OVER(PARTITION BY y ORDER BY x) x_lag FROM data WHERE x IN (2, 3)
12 )
13 WHERE x_lag = x - 1;
Y
----------
2

Combination of group by, order and distinct

My query
SELECT a, b, c
FROM table
WHERE
a > 0 AND a < 4 AND
b IN (
SELECT z FROM table2
WHERE x = y
)
produces the following output:
A B C
1 1 Car
1 1 Keyboard
1 2 Apple
1 3 Frog
2 1 Carrot
2 2 Parrot
3 1 Doll
what I want is the following output
A B C
1 1 Car
2 1 Carrot
3 1 Doll
So basically for every A, the lowest B and associated C (as well as other columns).
I tried various join types, group bys, but I am running out of ideas.
How can I accomplish this?
Use a Top N Apply
SELECT a, b, c
FROM table
CROSS APPLY (SELECT top 1 z
FROM table2
WHERE x = y
order by z ) t2
WHERE a > 0 AND a < 4 AND
Do a join on a subquery:
SELECT a, b, c
FROM table t1
INNER JOIN (SELECT a a2, MIN(b) b2 FROM table GROUP BY a) t2
ON t1.a = t2.a2 AND t1.b = t2.b2
WHERE
a > 0 AND a < 4 AND
b IN (
SELECT z FROM table2
WHERE x = y
)