SQL transpose based on values in col 1 - sql

I have a SQL query that gives me a 2 column result like this:
A | B
-----
2 | 1
3 | 2
3 | 3
3 | 4
3 | 5
4 | 6
4 | 7
4 | 8
4 | 9
I would like to split this up into multiple columns like this. Each for each row in A with a particular value convert that into columns. The maximum number of rows a particular value say x can occur in column A is 4 (if that helps).
A | B1 | B2 | B3 | B4
----------------------------
2 | 1 | NULL | NULL | NULL
3 | 2 | 3 | 4 | 5
4 | 6 | 7 | 8 | 9
I have been stuck trying to do this using pivot for hours. I am now thinking about querying this in python (the client using this) and doing the transformation there (which is easy). But can this be done in SQL? I am using SQL Server 2016.

You can use row_number() and conditional aggregation:
select a,
max(case when seqnum = 1 then b end) as b_1,
max(case when seqnum = 2 then b end) as b_2,
max(case when seqnum = 3 then b end) as b_3,
max(case when seqnum = 4 then b end) as b_4
from (select t.*, row_number() over (partition by a order by b) as seqnum
from t
) t
group by a;

Related

Group observations with SQL and Specifying in same group

I have a table consisting of two columns (X,Y) that represent correlations between observations like below.
X Y
1 2
2 3
3 4
A B
B C
I want a create new column that represent the relation between observation. 1 become 2, 2 become 3, 3 become 4. So i wanna show this variables in same group(1,2,3,4 are belong to same group). The table should be like below.
X Y Z
1 2 Group 1
2 3 Group 1
3 4 Group 1
A B Group 2
B C Group 2
I am using SAS Enterprise Guide. The solution would be great with proc sql or any sql type. I need the logic.
Note: I have no additional information except this table.
Try the following, here is the demo which is in PostgreSQL but you may be able to use the same logic.
with cte as
(
select
*,
lag(y) over (order by x) as rnk
from myTable
)
select
x,
y,
concat('Group ', sum(case when x = rnk then 0 else 1 end) over (order by x)) as z
from cte;
Output:
| x | y | z |
| --- | --- | ------- |
| 1 | 2 | Group 1 |
| 2 | 3 | Group 1 |
| 3 | 4 | Group 1 |
| A | B | Group 2 |
| B | C | Group 2 |

query to count occurances of aparticular column value

Let's say I have a table with the following value
1
1
1
2
2
2
3
3
3
1
1
1
2
2
2
I need to get an out put like this, which counts each occurances of a
particular value
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
1 1
1 2
1 3
2 1
2 2
2 3
NB: This is a sample table Actual table is a complex table with lots of rows and columns and query contains some more conditions
If the number repeats over different "islands" then you need to calculate a value to maintain those islands first (grpnum). That first step can be undertaken by subtracting a raw top-to-bottom row number (raw_rownum) from a partitioned row number. That result gives each "island" a reference unique to that island that can then be used to partition a subsequent row number. As each order by can disturb the outcome I find it necessary to use individual steps and to pass the prior calculation up so it may be reused.
SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE Table1 ([num] int);
INSERT INTO Table1 ([num])
VALUES (1),(1),(1),(2),(2),(2),(3),(3),(3),(1),(1),(1),(2),(2),(2);
Query 1:
select
num
, row_number() over(partition by (grpnum + num) order by raw_rownum) rn
, grpnum + num island_num
from (
select
num
, raw_rownum - row_number() over(partition by num order by raw_rownum) grpnum
, raw_rownum
from (
select
num
, row_number() over(order by (select null)) as raw_rownum
from table1
) r
) d
;
Results:
| num | rn | island_num |
|-----|----|------------|
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 1 | 5 |
| 2 | 2 | 5 |
| 2 | 3 | 5 |
| 1 | 1 | 7 |
| 1 | 2 | 7 |
| 1 | 3 | 7 |
| 3 | 1 | 9 |
| 3 | 2 | 9 |
| 3 | 3 | 9 |
| 2 | 1 | 11 |
| 2 | 2 | 11 |
| 2 | 3 | 11 |
SQL Server provide row_number() function :
select ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RN FROM <TABLE_NAME>
EDIT :
select * , case when (row_number() over (order by (select 1))) %3 = 0 then 3 else
(row_number() over (order by (select 1))) %3 end [rn] from table
I think there is a problem with your sample, in that you have an implied order but not an explicit one. There is no guarantee that the database will keep and store the values the way you have them listed, so there has to be some inherent/explicit ordering mechanism to tell the database to give those values back exactly the way you listed.
For example, if you did this:
update test
set val = val + 2
where val < 3
You would find your select * no longer comes back the way you expected.
You indicated your actual table was huge, so I assume you have something like this you can use. There should be something in the table to indicate the order you want... a timestamp, perhaps, or maybe a surrogate key.
That said, assuming you have something like that and can leverage it, I believe a series of windowing functions would work.
with rowed as (
select
val,
case
when lag (val, 1, -1) over (order by 1) = val then 0
else 1
end as idx,
row_number() over (order by 1) as rn -- fix this once you have your order
from
test
),
partitioned as (
select
val, rn,
sum (idx) over (order by rn) as instance
from rowed
)
select
val, instance, count (1) over (partition by instance order by rn)
from
partitioned
This example orders by the way they are listed in the database, but you would want to change the row_number function to accommodate whatever your real ordering mechanism is.
1 1 1
1 1 2
1 1 3
2 2 1
2 2 2
2 2 3
3 3 1
3 3 2
3 3 3
1 4 1
1 4 2
1 4 3
2 5 1
2 5 2
2 5 3

Updating Rows Into Different Columns

Both of these tables already exist, so not looking for a dynamic situation. The goal is to consolidate the data rows horizontally, but have them to the leftmost "data" field available. There will never be a 4th entry.
I am using Microsoft SQL Server
Table1:
ID|Data
--------
A | 1
A | 2
B | 3
C | 4
C | 5
C | 6
Table2:
ID | Data 1 | Data 2 | Data 3
------------------------------
A | | |
B | | |
C | | |
Desired Result of Table2:
ID | Data 1 | Data 2 | Data 3
------------------------------
A | 1 | 2 |
B | 3 | |
C | 6 | 7 | 8
You can use row_number:
select id,
max(case when rn = 1 then data end) as data_1,
max(case when rn = 2 then data end) as data_2,
max(case when rn = 3 then data end) as data_3
from (
select t.*,
row_number() over (
partition by id order by data
) as rn
from your_table t
) t
group by id;

Merge multiple rows in SQL with tie breaking on primary key

I have a table with data like the following
key | A | B | C
---------------------------
1 | x | 0 | 1
2 | x | 2 | 0
3 | x | NULL | 4
4 | y | 7 | 1
5 | y | 3 | NULL
6 | z | NULL | 4
And I want to merge the rows together based on column A with largest primary key being the 'tie breaker' between values that are not NULL
Result
key | A | B | C
---------------------------
1 | x | 2 | 4
2 | y | 3 | 1
3 | z | NULL | 4
What would be the best way to achieve this assuming my data is actually 40 columns and 1 million rows with an unknown level of duplications?
Using ROW_NUMBER and conditional aggregation:
SQL Fiddle
WITH cte AS(
SELECT *,
rnB = ROW_NUMBER() OVER(PARTITION BY A ORDER BY CASE WHEN B IS NULL THEN 0 ELSE 1 END DESC, [key] DESC),
rnC = ROW_NUMBER() OVER(PARTITION BY A ORDER BY CASE WHEN C IS NULL THEN 0 ELSE 1 END DESC, [key] DESC)
FROM tbl
)
SELECT
[key] = ROW_NUMBER() OVER(ORDER BY A),
A,
B = MAX(CASE WHEN rnB = 1 THEN B END),
C = MAX(CASE WHEN rnC = 1 THEN C END)
FROM cte
GROUP BY A

Select non distinct rows from two columns

My question is very similar to Multiple NOT distinct only it deals with multiple columns instead of one. I have a table like so:
A B C
1 1 0
1 2 1
2 1 2
2 1 3
2 2 4
2 3 5
2 3 6
3 1 7
3 3 8
3 1 9
And the result should be:
A B C
2 1 2
2 1 3
2 3 5
2 3 6
3 1 7
3 1 9
Essentially, like the above question, removing all unique entries only where uniqueness is determined by two columns instead of one. I already tried various tweaks to the above answer but couldn't get any of them to work.
You are using SQL Server, so this is easier than in Access:
select A, B, C
from (select t.*, count(*) over (partition by A, B) as cnt
from t
) t
where cnt > 1;
This use of count(*) is as a window function. It is counting the number of rows with the same value of A and B. The final where just selects the rows that have more than one entry.
Another possible solution with EXISTS
SELECT a, b, c
FROM Table1 t
WHERE EXISTS
(
SELECT 1
FROM Table1
WHERE a = t.a
AND b = t.b
AND c <> t.c
)
It should be fast enough.
Output:
| A | B | C |
-------------
| 2 | 1 | 2 |
| 2 | 1 | 3 |
| 2 | 3 | 5 |
| 2 | 3 | 6 |
| 3 | 1 | 7 |
| 3 | 1 | 9 |
Here is SQLFiddle demo