SQL revising table data to a more compact form - sql

I have a table with data pairs modeled like the following:
Id1 Id2
-----------
100 50
120 70
70 50
34 20
50 40
40 10
Id1 is always bigger then Id2. The pairs represent replacements to be made. So 100 will be replaced with 50, but then 50 will be replaced with 40, which will then be replaced by 10.
So the result would be like this:
Id1 Id2
-----------
100 10
120 10
34 20
Is there a nice succinct way that I can alter, or join this table to represent this?
I know i can join it on itself something akin to:
SELECT t1.Id1, t2.Id2
FROM mytable t1
JOIN myTable t2 ON t2.Id1 = t1.Id2
But this will require several passes, hence why i ask if there is a nicer way to accomplish it?

declare #t table(Id1 int, Id2 int)
insert #t values (100, 50)
insert #t values ( 120, 70)
insert #t values ( 70, 50)
insert #t values ( 34, 20)
insert #t values ( 50, 40)
insert #t values ( 40, 10)
;with a as
(
-- find all rows without parent <*>
select id2, id1 from #t t where not exists (select 1 from #t where t.id1 = id2)
union all -- recusive work down to lowest child while storing the parent id1
select t.id2 , a.id1
from a
join #t t on a.id2 = t.id1
)
-- show the lowest child for each row found in <*>
select id1, min(id2) id2 from a
group by id1
Result:
id1 id2
----------- -----------
34 20
100 10
120 10

Related

SELECT query based on mentioned result table - Oracle SQL

I have 3 tables.
TABLE1 with one column ROLL_NO
CREATE TABLE TABLE1 (ROLL_NO NUMBER(3) NOT NULL);
TABLE2 with 3 columns ROLL_NO, CLASS, SEC
CREATE TABLE TABLE2 (ROLL_NO NUMBER(3) NOT NULL, CLASS NUMBER(3) NOT NULL, SEC NUMBER(3) NOT NULL);
TABLE3 with 3 columns ROLL_NO, CODE, AMT
CREATE TABLE TABLE3 (ROLL_NO NUMBER(3) NOT NULL, CODE VARCHAR2(3) NOT NULL, AMT NUMBER(3) NOT NULL);
INSERT INTO TABLE1 VALUES (101);
INSERT INTO TABLE1 VALUES (102);
INSERT INTO TABLE1 VALUES (103);
INSERT INTO TABLE1 VALUES (104);
----------------------------------
INSERT INTO TABLE2 VALUES (101,1, 12);
INSERT INTO TABLE2 VALUES (102,1, 12);
INSERT INTO TABLE2 VALUES (103,1, 12);
INSERT INTO TABLE2 VALUES (104,1, 12);
--------------------------------------
INSERT INTO TABLE3 VALUES (101, 'A2', 100);
INSERT INTO TABLE3 VALUES (101, '10', 100);
INSERT INTO TABLE3 VALUES (102, 'B3', 200);
INSERT INTO TABLE3 VALUES (102, '10', 200);
INSERT INTO TABLE3 VALUES (103, '04', 300);
The SQL query which mentioned below:
SELECT T1.ROLL_NO,
T2.CLASS,
T2.SEC,
NVL(T3.CODE,0) AS CODE,
NVL(T3.AMT, 0) AS AMT
FROM TABLE1 T1
JOIN TABLE2 T2 ON T1.ROLL_NO = T2.ROLL_NO
LEFT JOIN TABLE3 T3 ON T1.ROLL_NO = T3.ROLL_NO
WHERE T1.ROLL_NO IN (101,102,103,104);
If we don't find any record i.e. CODE and AMT for particular ROLL_NO, by default we are assigning as 0.
The result for above query:
ROLL_NO CLASS SEC CODE AMT
-------------------------------------
101 1 12 A2 100
101 1 12 10 100
102 1 12 B3 200
102 1 12 10 200
103 1 12 4 300
104 1 12 0 0
I am looking for a query in such a way that
a) if particular ROLL_NO has CODE 10 and also additional CODE values other than 10 then get that row which has CODE as 10 in the result table.
b) if particular ROLL_NO don't have CODE 10 but has other additional CODE values then get that row in the result table.
In previous table, ROLL_NO 101 and 102 comes under case 'a' and 103, 104 comes under case 'b'
Final result should be
ROLL_NO CLASS SEC CODE AMT
-------------------------------------
101 1 12 10 100
102 1 12 10 200
103 1 12 4 300
104 1 12 0 0
I am looking for a query to get the above result but I am not able to get it. Till now, I tried using RANK function by partitioning on ROLL_NO and order by CODE in descending and select 1st row in each partition but it doesn't work if particular ROLL_NO have additional code greater than 10.
You can use analytic function ROW_NUMBER() and it should work:
select roll_no, class, sec, code, amt
from (SELECT T1.ROLL_NO,
T2.CLASS,
T2.SEC,
NVL(T3.CODE,0) AS CODE,
NVL(T3.AMT, 0) AS AMT,
--analytic function
row_number() over (partition by t1.roll_no order by NVL(T3.CODE,0) desc) as row_num
FROM TABLE1 T1
JOIN TABLE2 T2 ON T1.ROLL_NO = T2.ROLL_NO
LEFT JOIN TABLE3 T3 ON T1.ROLL_NO = T3.ROLL_NO
WHERE T1.ROLL_NO IN (101,102,103,104)) t
where t.row_num = 1;
You simply need a ROW_NUMBER() function to achieve your desired result -
SELECT T1.ROLL_NO,
T2.CLASS,
T2.SEC,
NVL(T3.CODE,0) AS CODE,
NVL(T3.AMT, 0) AS AMT
FROM TABLE1 T1
JOIN TABLE2 T2 ON T1.ROLL_NO = T2.ROLL_NO
LEFT JOIN (SELECT ROLL_NO, CODE, AMT,
ROW_NUMBER() OVER(PARTITION BY ROLL_NO ORDER BY CODE DESC) RN
-- If you have code greater than 10 also you have to use a CASE statement instead of simple order by clause
FROM TABLE3) T3 ON T1.ROLL_NO = T3.ROLL_NO
AND RN = 1
WHERE T1.ROLL_NO IN (101,102,103,104);

Avoid duplicates for joining two tables with one of them having GROUP BY on SQL server

Avoid duplicates for joining two tables with one of them having GROUP BY on SQL server
On SQL server, I have a table
Id1, id2, id3, val1, val2
1 15 20 110 25.69
1 15 20 120 26.17
2 19 58 110 17.11
3 66 75 129 9.55
3 66 75 268 66.82
I need to find all rows that val1 has more than one distinct values.
The expected output should be :
Id1, id2, id3, val1, val2
1 15 20 110 25.69
1 15 20 120 26.17
3 66 75 129 9.55
3 66 75 268 66.82
Because for same
Id1, id2, id3
they have more than one value in "val1".
I know how to do it in this:
SELECT Id1, id2, id3, val1, val2
FROM table1 AS a
INNER JOIN
(
SELECT Id1, id2, id3
FROM table 1
GROUP BY Id1, id2, id3
HAVING COUNT(DISTINCT val1) > 1
) AS b
ON a.id1 = b.id1 and a.id2 = b.id2 and a.id3 = b.id3
But, this may have duplicated rows because
Id1, id2, id3
1 15 20
Can be joined to get
Id1, id2, id3, val1, val2
1 15 20 110 25.69
1 15 20 120 26.17
1 15 20 110 25.69
1 15 20 120 26.17
I do not want to use "distinct" for floating point numbers.
How to improve the query ?
Is it possible to do it without sub-query ?
You can use WHERE EXISTS:
DECLARE #tablename TABLE (Id1 int, id2 int, id3 int, val1 int, val2 decimal(16,2))
INSERT INTO #tablename VALUES
(1, 15, 20, 110, 25.69)
,(1, 15, 20, 120, 26.17)
,(2, 19, 58, 110, 17.11)
,(3, 66, 75, 129, 9.55)
,(3, 66, 75, 268, 66.82)
SELECT *
FROM #tablename T1
WHERE EXISTS (SELECT *
FROM #tablename T2
WHERE T2.Id1 = T1.Id1
AND T2.id2 = T1.id2
AND T2.id3 = T1.id3
AND T2.val1 <> T1.val1)
Produces output:
Id1 id2 id3 val1 val2
1 15 20 110 25.69
1 15 20 120 26.17
3 66 75 129 9.55
3 66 75 268 66.82
I would use window functions:
select Id1, id2, id3, val1, val2
from (select t1.*,
min(val1) over (partition by id1, id2, id3) as min1,
max(val1) over (partition by id1, id2, id3) as max1
from t1
) t1
where min1 <> max1;
This will find all repetitions of Val1 and remove all but the first of them from the result:
select *
from (
select *
, row_number() over(partition by val1
order by (select null)) rn
from yourtable
) d
where rn = 1

Collect all Similar Persons to One Group

I have a person with several Id's.
Some of them in Column Id1 and Some of them in Id2.
I want to collect all the same persons Id's to one group.
If id1=10, is in the same row with id2=20. so it's mean that person with id1=10 he is the same person like id2=20.
The Input and Output example:
Input
Id1 Id2
--- ---
10 20
10 30
30 30
10 40
50 70
60 50
70 70
Output
NewId OldId
----- -----
1 10
1 20
1 30
1 40
2 50
2 60
2 70
For recursive tasks you should use recursive CTE.
with cq as
(
select distinct Id2, Id1 from #Tmp -- get your table
union
select distinct Id1, Id2 from #Tmp -- get your table (or sort output)
union
select distinct Id1, Id1 from #Tmp -- add root from Id1
union
select distinct Id2, Id2 from #Tmp -- add root from Id2
), cte (Id1, Id2, lvl)
as (
select t.Id1, t.Id2, 0 lvl
from cq t
union all
select t2.Id2, c.Id1, lvl + 1 lvl
from cq t2, cte c
where t2.Id1 = c.Id2
and t2.Id1 != c.Id1
and c.lvl < 5 -- maximum level of recursion
)
select
Id1,
min(Id2) FirstId1,
dense_rank() over(order by min(Id2)) rn
from cte
group by Id1
Max lvl and condition with != isn't necessary if your table will be well ordered.
I suspect this could be done with recursive CTEs, but here is a less elegent solution.
-- CREATE Temps
CREATE TABLE #Table (id1 INT, id2 INT)
CREATE TABLE #NewTable (NewID INT, OldID INT)
CREATE TABLE #AllIDs (ID INT)
-- Insert Test data
INSERT #Table
( id1, id2 )
VALUES ( 10, 20 ),
( 10, 30 ),
( 30, 20 ),
( 10, 40 ),
( 50, 70 ),
( 60, 50 ),
( 70, 70 ),
( 110, 120 ),
( 120, 130 ),
( 140, 130 )
-- Assemble all possible OldIDs
INSERT INTO #AllIDs
SELECT id1 FROM #Table
UNION
SELECT id2 FROM #Table
DECLARE #NewID INT = 1,
#RowCnt int
-- Insert seed OldID
INSERT #NewTable
SELECT TOP 1 #NewID, id
FROM #AllIDs
WHERE id NOT IN (SELECT OldID FROM #NewTable)
ORDER BY 2
SET #RowCnt = ##ROWCOUNT
WHILE #RowCnt > 0
BEGIN
WHILE #RowCnt > 0
BEGIN
-- Check for id2 that match current OldID
INSERT #NewTable
SELECT DISTINCT #NewID, id2
FROM #Table t
INNER JOIN #NewTable nt ON t.id1 = nt.OldID
WHERE nt.[NewID] = #NewID
AND t.id2 NOT IN (SELECT OldID FROM #NewTable WHERE [NewID] = #NewID)
SELECT #RowCnt = ##ROWCOUNT
-- Check for id1 that match current OldID
INSERT #NewTable
SELECT DISTINCT #NewID, id1
FROM #Table t
INNER JOIN #NewTable nt ON t.id2 = nt.OldID
WHERE nt.[NewID] = #NewID
AND t.id1 NOT IN (SELECT OldID FROM #NewTable WHERE [NewID] = #NewID)
SELECT #RowCnt = #RowCnt + ##ROWCOUNT
END
SET #NewID = #NewID + 1
-- Add another seed OldID if any left
INSERT #NewTable
SELECT TOP 1 #NewID, id
FROM #AllIDs
WHERE id NOT IN (SELECT OldID FROM #NewTable)
ORDER BY 2
SELECT #RowCnt = ##ROWCOUNT
END
-- Get Results
SELECT * FROM #NewTable ORDER BY [NewID], OldID
Anna, is that a good example?
This is a connected components issue.
Input
Id1 Id2
--- ---
10 20
10 30
30 30
10 40
50 70
60 50
70 70
Output
NewId OldId
----- -----
1 10
1 20
1 30
1 40
2 50
2 60
2 70
The CTE version. Note that I have added a few more data points to simulate duplicates and lone Ids.
--create test data
declare #table table (Id1 int, Id2 int);
insert #table values
(10, 20),
(10, 30),
(30, 30),
(10, 40),
(40, 45),
(20, 40),
(50, 70),
(60, 50),
(70, 70),
(80, 80);
select *
from #table;
--join related IDs with recursive CTE
;with min_first_cte as (
select case when Id1 <= Id2 then Id1 else Id2 end Id1,
case when Id1 <= Id2 then Id2 else Id1 end Id2
from #table
), related_ids_cte as (
--anchor IDs
select distinct Id1 BaseId, Id1 ParentId, Id1 ChildId
from min_first_cte
where Id1 not in ( select Id2
from min_first_cte
where Id2 <> Id1)
union all
--related recursive IDs
select r.BaseId, m.Id1 ParentId, M.Id2 ChildId
from min_first_cte m
join related_ids_cte r
on r.ChildId = m.Id1
and m.Id1 <> m.Id2
), distinct_ids_cte as (
select distinct r.BaseId, r.ChildId
from related_ids_cte r
)
select dense_rank() over (order by d.BaseId) [NewId],
d.ChildId OldId
from distinct_ids_cte d
order by BaseId, ChildId;
Conceptually, it's about finding connected components given a list of connected pairs. And then, assign each of the groups a new id. The following implementation works:
CREATE TABLE #pairs (a int, b int)
CREATE TABLE #groups (a int, group_id int)
INSERT INTO #pairs
VALUES (1, 2), (3, 4), (5, 6), (5, 7), (3, 9), (8, 10), (11, 12), (1, 3)
-- starting stage - all items belong to their own group
INSERT INTO #groups(a, group_id)
SELECT a, a
FROM #pairs
UNION
SELECT b, b
FROM #pairs
DECLARE #a INT
DECLARE #b INT
DECLARE #cGroup INT
SET ROWCOUNT 0
SELECT * INTO #mytemp FROM #pairs
SET ROWCOUNT 1
SELECT #a = a, #b = b FROM #mytemp
WHILE ##rowcount <> 0
BEGIN
SET ROWCOUNT 0
DECLARE #aGroup INT, #bGroup INT, #newGroup INT
SELECT #aGroup = group_id FROM #groups WHERE a = #a
SELECT #bGroup = group_id FROM #groups WHERE a = #b
SELECT #newGroup = MIN(group_id) FROM #groups WHERE a IN (#a, #b)
-- update the grouping table with the new group
UPDATE #groups
SET group_id = #newGroup
WHERE group_id IN (#aGroup, #bGroup)
DELETE FROM #mytemp
WHERE a = #a
AND b = #b
SET ROWCOUNT 1
SELECT #a = a, #b = b FROM #mytemp
END
SET ROWCOUNT 0
SELECT * FROM #groups
DROP TABLE #mytemp
DROP TABLE #pairs
DROP TABLE #groups
Here's the explanation:
initially, assign each number a group of it's own value
iterate over the pairs
for each pair
find the minimum as set it as the new group id
set the group id to all the numbers where the current group id is the same as for the numbers in the current pair
In terms of a procedure, these are 2 iterations continuously updating the group ids to the minimum in the group - O(n2).

Full Outer Join on Incomplete Data (by id variable)

I have two tables (see example data below). I need to keep all of the ID values in table 1 and merge table 1 with table 2 by sequence. The tricky part is that I also have to retain the field value1 from table 1 and value2 from table 2.
table 1 :
ID sequence value1
-------------------------
p1 1 5
p1 2 10
p2 1 15
p2 2 20
table 2 :
sequence value2
-------------------------
1 10
2 20
3 30
4 40
I need the resulting table to appear like so:
ID sequence value1 value2
----------------------------------
p1 1 5 10
p1 2 10 20
p1 3 - 30
p1 4 - 40
p2 1 15 10
p2 2 20 20
p2 3 - 30
p2 4 - 40
I have tried the following sql code, but it doesn't merge the missing values from from value1 field in table 1 and merge it with the values2 field from table 2
select t1.ID, t2.sequence, t1.value1, t2.value2 from
t2 full outer join t1 on t2.sequence=t1.sequence
Any assistance you can provide is greatly appreciated.
You can try something like this:
select coalesce(t1.[id], t3.[id]),
, t2.[sequence]
, t1.[value]
, t2.[value]
from [tbl2] t2
left join [tbl1] t1 on t1.[sequence] = t2.[sequence]
left join (select distinct [id] from [tbl1]) t3 on t1.[id] is null
SQLFiddle
One way with CROSS JOIN and OUTER APPLY:
DECLARE #t1 TABLE(ID CHAR(2), S INT, V1 INT)
DECLARE #t2 TABLE(S INT, V2 INT)
INSERT INTO #t1 VALUES
('p1', 1, 5),
('p1', 2, 10),
('p2', 1, 15),
('p2', 2, 20)
INSERT INTO #t2 VALUES
(1, 10),
(2, 20),
(3, 30),
(4, 40)
SELECT c.ID, t2.S, ca.V1, t2.V2 FROM #t2 t2
CROSS JOIN (SELECT DISTINCT ID FROM #t1) c
OUTER APPLY(SELECT * FROM #t1 t1 WHERE c.ID = t1.ID AND t1.S = t2.S) ca
ORDER BY c.ID, t2.S
Output:
ID S V1 V2
p1 1 5 10
p1 2 10 20
p1 3 NULL 30
p1 4 NULL 40
p2 1 15 10
p2 2 20 20
p2 3 NULL 30
p2 4 NULL 40
Given this schema:
create table #table_1
(
ID varchar(8) not null ,
sequence int not null ,
value int not null ,
primary key clustered ( ID , sequence ) ,
unique nonclustered ( sequence , ID ) ,
)
create table #table_2
(
sequence int not null ,
value int not null ,
primary key clustered ( sequence ) ,
)
go
insert #table_1 values ( 'p1' , 1 , 5 )
insert #table_1 values ( 'p1' , 2 , 5 )
insert #table_1 values ( 'p2' , 1 , 15 )
insert #table_1 values ( 'p2' , 2 , 20 )
insert #table_2 values ( 1 , 10 )
insert #table_2 values ( 2 , 20 )
insert #table_2 values ( 3 , 30 )
insert #table_2 values ( 4 , 40 )
go
This should get you what you want:
select ID = map.ID ,
sequence = map.sequence ,
value1 = t1.value ,
value2 = t2.value
from ( select distinct
t1.ID ,
t2.sequence
from #table_1 t1
cross join #table_2 t2
) map
left join #table_1 t1 on t1.ID = map.ID
and t1.sequence = map.sequence
join #table_2 t2 on t2.sequence = map.sequence
order by map.ID ,
map.sequence
go
Producing:
ID sequence value1 value2
== ======== ====== ======
p1 1 5 10
p1 2 5 20
p1 3 - 30
p1 4 - 40
p2 1 15 10
p2 2 20 20
p2 3 - 30
p2 4 - 40

How to merge ranges from different tables

Giving the following 2 tables:
T1
------------------
From | To | Value
------------------
10 | 20 | XXX
20 | 30 | YYY
30 | 40 | ZZZ
T2
------------------
From | To | Value
------------------
10 | 15 | AAA
15 | 19 | BBB
19 | 39 | CCC
39 | 40 | DDD
What is the best way to get the result below, using T-SQL on SQL Server 2008?
The From/To ranges are sequential (there are no gaps) and the next From always has the same value as the previous To
Desired result
-------------------------------
From | To | Value1 | Value2
-------------------------------
10 | 15 | XXX | AAA
15 | 19 | XXX | BBB
19 | 20 | XXX | CCC
20 | 30 | YYY | CCC
30 | 39 | ZZZ | CCC
39 | 40 | ZZZ | DDD
First I declare data that looks like the data you posted. Please correct me if any assumptions I have made are wrong. Better would be to post your own declaration in the question so we are all working with the same data.
DECLARE #T1 TABLE (
[From] INT,
[To] INT,
[Value] CHAR(3)
);
INSERT INTO #T1 (
[From],
[To],
[Value]
)
VALUES
(10, 20, 'XXX'),
(20, 30, 'YYY'),
(30, 40, 'ZZZ');
DECLARE #T2 TABLE (
[From] INT,
[To] INT,
[Value] CHAR(3)
);
INSERT INTO #T2 (
[From],
[To],
[Value]
)
VALUES
(10, 15, 'AAA'),
(15, 19, 'BBB'),
(19, 39, 'CCC'),
(39, 40, 'DDD');
Here is my select query to generate your expected result:
SELECT
CASE
WHEN [#T1].[From] > [#T2].[From]
THEN [#T1].[From]
ELSE [#T2].[From]
END AS [From],
CASE
WHEN [#T1].[To] < [#T2].[To]
THEN [#T1].[To]
ELSE [#T2].[To]
END AS [To],
[#T1].[Value],
[#T2].[Value]
FROM #T1
INNER JOIN #T2 ON
(
[#T1].[From] <= [#T2].[From] AND
[#T1].[To] > [#T2].[From]
) OR
(
[#T2].[From] <= [#T1].[From] AND
[#T2].[To] > [#T1].[From]
);
Stealing #isme's data setup, I wrote the following:
;With EPs as (
select [From] as EP from #T1
union
select [To] from #T1
union
select [From] from #T2
union
select [To] from #T2
), OrderedEndpoints as (
select EP,ROW_NUMBER() OVER (ORDER BY EP) as rn from EPs
)
select
oe1.EP,
oe2.EP,
t1.Value,
t2.Value
from
OrderedEndpoints oe1
inner join
OrderedEndpoints oe2
on
oe1.rn = oe2.rn - 1
inner join
#T1 t1
on
oe1.EP < t1.[To] and
oe2.EP > t1.[From]
inner join
#T2 t2
on
oe1.EP < t2.[To] and
oe2.EP > t2.[From]
That is, you create a set containing all of the possible end points of periods (EPs), then you "sort" those and assign each one a row number (OrderedEPs).
Then the final query assembles each "adjacent" pair of rows together, and joins back to the original tables to find which rows from each one overlap the selected range.
The below query finds the smallest ranges, then picks the values back out the tables again:
SELECT ranges.from, ranges.to, T1.Value, T2.Value
FROM (SELECT all_from.from, min(all_to.to) as to
FROM (SELECT T1.FROM
FROM T1
UNION
SELECT T2.FROM
FROM T2) all_from
JOIN (SELECT T1.TO
FROM T1
UNION
SELECT T2.FROM
FROM T2) all_to ON all_from.from < all_to.to
GROUP BY all_from.from) ranges
JOIN T1 ON ranges.from >= T1.from AND ranges.to <= T1.to
JOIN T2 ON ranges.from >= T2.from AND ranges.to <= T2.to
ORDER BY ranges.from
Thanks for the answers, but I ended using a CTE, wgich I think is cleaner.
DECLARE #T1 TABLE ([From] INT, [To] INT, [Value] CHAR(3));
DECLARE #T2 TABLE ([From] INT, [To] INT, [Value] CHAR(3));
INSERT INTO #T1 ( [From], [To], [Value]) VALUES (10, 20, 'XXX'), (20, 30, 'YYY'), (30, 40, 'ZZZ');
INSERT INTO #T2 ( [From], [To], [Value]) VALUES (10, 15, 'AAA'), (15, 19, 'BBB'), (19, 39, 'CCC'), (39, 40, 'DDD');
;with merged1 as
(
select
t1.[From] as from1,
t1.[to] as to1,
t1.Value as Value1,
t2.[From] as from2,
t2.[to] as to2,
t2.Value as Value2
from #t1 t1
inner join #T2 t2
on t1.[From] < t2.[To]
and t1.[To] >= t2.[From]
)
,merged2 as
(
select
case when from2>=from1 then from2 else from1 end as [From]
,case when to2<=to1 then to2 else to1 end as [To]
,value1
,value2
from merged1
)
select * from merged2