SQL Optimization - Match Analysis sorting matched pairs - sql

Scenario :
Matching algorithm has identified ID1 AND ID2 have matched.I need to do further analysis on the matching. For that I need to reduce the number of rows in output and sorted correctly.
This input is just sample and subset. Having thousands of actual records makes this task difficult.
INPUT:
ID1
NAME1
ID2
NAME2
222
SIM
333
SIM
111
SAM
222
SIM
111
SAM
333
SIM
111
SAM
444
SOM
111
SAM
555
SAM
222
SIM
444
SOM
222
SIM
555
SAM
333
SIM
444
SOM
444
SOM
555
SAM
013
AAA
014
BBB
021
SUB
111
SAM
010
CCC
011
DDD
023
SOB
333
SIM
EXPECTED OUTPUT:
ID
NAME
111
SAM
222
SIM
333
SIM
444
SOM
555
SAM
021
SUB
023
SOB
013
AAA
014
BBB
010
CCC
011
DDD
I need to ensure that output should have ID should have distinct records of ID1 and ID2 combined which is still fine as I can do distinct and union.
Tricky part is to ensure sorting of data in ouptput. I need to keep the rows that are similar in order.
Example :
111,222,333,444,555,021,023 have similar matching ID's in ID1 and ID2 and have to be sorted together. Within this group, the sorting order doesn't matter, just they need to be together. Similarly there could be many such groups.
The rest whenever only 1 pair is there, just need to sort them together like 013,014 and 010,011 and so on
Can anyone help me with this query?

For the case of knowing the patterns a priori,
union + order by case when would go here:
select * from (
select ID1, NAME1 from tab1 union
select ID2, NAME2 from tab1 ) a
order by case when NAME1 like '%S' then '' else NAME1 end;
SQL Fiddle on your data here
To group based on just ID1 & ID2, added a sorter column,
concatenating the IDs of each row with least & greatest on MySQL:
select * from (
select concat(least(ID1, ID2), greatest(ID1,ID2)) sorter, ID1, NAME1 from tab1 union
select concat(least(ID1, ID2), greatest(ID1,ID2)) sorter, ID2, NAME2 from tab1 ) a
group by ID1
order by sorter
or with case when on SQL Server:
select min(sorter) sorter, ID1, NAME1 from (
select concat(case when ID1<ID2 then ID1 else ID2 end, case when ID1>ID2 then ID1 else ID2 end) sorter, ID1, NAME1 from tab1 union
select concat(case when ID1<ID2 then ID1 else ID2 end, case when ID1>ID2 then ID1 else ID2 end) sorter, ID2, NAME2 from tab1 ) a
group by ID1, Name1
order by min(sorter)

Related

TSQL: Find duplicate values based on two database values

I like to find duplicate entrys based on the same "Article" AND "Warehouse" column. I cant find a solution for an MSSQL-Query to find out the different "Value1" and "Value2" based on the following table:
Article Value1 Value2 Warehouse
123 123 01.01.2021 1
123 456 02.12.2022 1
123 789 05.05.2024 1
123 123 01.01.2021 2
123 123 01.01.2021 3
456 123 01.01.2021 1
456 123 01.01.2021 1
456 123 01.01.2021 1
The result should be:
Article Value1 Value2 Warehouse
123 123 01.01.2021 1
123 456 02.12.2022 1
123 789 05.05.2024 1
EDIT: The Warehouse and Article is always different. In the result I want to the see the same article and warehouse which has different entry's on value1 and value2.
As you can see the article "123" AND Warehouse "1" has different entry´s on the value1 and value2. So I´d like to get them in the result of the SQL-Query.
But the article "456" has the SAME entry's on value1 and value2 for Warehouse 1, so I don´t wan´t them in the result.
Thank you very much for your help!
Use COUNT DISTINCT.
select *
from mytable t1
where exists
(
select null
from mytable t2
where t2.article = t1.article and t2.warehouse = t1.warehouse
having count(distinct value1) > 1 or count(distinct value2) > 1
)
order by article, warehouse, value1, value2;
(This would be more readable with an IN clause in my opinion, but SQL Server doesn't allow IN clauses on tuples like WHERE (article, warehouse) IN (...).)
Use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.article = t.article and
t2.warehouse = t.warehouse and
(t2.value1 <> t.value1 or t2.value2 <> t.value2)
);
It is unclear from your question whether both values have to be different or either value. The above implements either value being different.
For performance, I would recommend an index on (article, warehouse, value1, value2).

identify NULL and update for same key column in oracle

I have a test table having below details:
ID Key_COLUMN final_Value
1 aaa 1234
2 bbb 2345
3 bbb NULL
4 ccc 456
5 ccc 145
Desired Output:
--final_value updated from NULL to 2345 based key_column (bbb)
ID Key_COLUMN final_Value
1 aaa 1234
2 bbb 2345
3 bbb 2345
4 ccc 456
5 ccc 145
Identify KEY column having NULL and value and update NULL with the value.
this update requied on huge amount of data
Please assist.
You can use window functions:
select t.*,
coalesce(final_value, max(final_value) over (partition by key_column)) as imputed_final_value
from t;
If you wanted an update -- to actually change the data -- you can use a correlated subquery:
update t
set final_value = (select t2.final_value
from t t2
where t2.key_column = t.key_column and
t2.final_value is not null and
rownum = 1
)
where final_value is null;

Joining two tables on column A or column B in SQL

I have two tables called Plan and Actual. Every row in each table represents a unique item, and I need to find items that are in the Plan table, but not the Actual table, and vice versa.
There are three columns that uniquely identify each item, and the value for each of these columns may or may not be null.
For Example:
Say "Plan" looks like this:
ID_1 ID_2 ID_3
aaa Null Null
Null 111 Null
Null Null 123
bbb 222 Null
ccc Null 456
Null 333 789
ddd 444 202
Say "Actual" looks like this:
ID_1 ID_2 ID_3
aaa Null Null
Null 111 Null
bbb 222 Null
Null 333 789
Null 555 Null
eee Null 303
Using SQL, how can I identify the "In plan not in actual" rows of:
Null Null 123
ccc Null 456
ddd 444 202
And in "In actual not in plan" rows of:
Null 555 Null
eee Null 303
Thank you for your help!
If you are using ORACLE, you can use the operator MINUS. It will select rows in table "Plan" not in table "Actual".
SELECT ID_1, ID_2, ID_3
FROM Plan
MINUS
SELECT ID_1, ID_2, ID_3
FROM Actual;
If you are using SQL Server; use EXCEPT instead of minus.
You can use the LEFT JOIN. Below is the example with MySql.
In plan not in actual
SELECT Plan.* FROM Plan
LEFT JOIN Actual ON (Plan.ID_1 = Actual.ID_1 AND Plan.ID_2 = Actual.ID_2 AND Plan.ID_3 = Actual.ID_3)
WHERE Actual.ID_1 IS NULL AND Actual.ID_2 IS NULL AND Actual.ID_3 IS NULL;
In actual not in plan
SELECT Actual.* FROM Actual
LEFT JOIN Plan ON (Plan.ID_1 = Actual.ID_1 AND Plan.ID_2 = Actual.ID_2 AND Plan.ID_3 = Actual.ID_3)
WHERE Plan.ID_1 IS NULL AND Plan.ID_2 IS NULL AND Plan.ID_3 IS NULL;

return columns vertically

Let's say I have a simple select query that returns the following:
ID Name1 Name2 Description1 Description2 Notes1 Notes2
1 A B AA BB AAA BBB
2 C D CC DD CCC DDD
and I want to return dataset as follows:
ID ColumnName 1st 2nd
1 Name A B
1 Description AA BB
1 Notes AAA BBB
2 Name C D
2 Description CC CC
2 Notes DDD DDD
Any way of doing that in sql server 2008-r2?
Looks like it's a job for PIVOT but a'm confused on how to achieve this with PIVOT
You can use this, assuming the values are static or not so numerous that patching it up with your actual values isn't too painful:
SELECT ID, 'Name' ColumnName, Name1 '1st', Name2 '2nd'
FROM YourTable
UNION
SELECT ID, 'Description' ColumnName, Description1 '1st', Description2 '2nd'
FROM YourTable
UNION
SELECT ID, 'Notes' ColumnName, Notes1 '1st', Notes2 '2nd'
FROM YourTable
Yet another great example of why data normalization is so important.

Problem with joining db tables

I have problem when joining tables (left join)
table1:
id1 amt1
1 100
2 200
3 300
table2:
id2 amt2
1 150
2 250
2 350
my Query:
select id1,amt1,id2,amt2 from table1
left join table2 on table2.id1=table1.id2
My supposed o/p is:
id1 amt1 id2 amt2
row1: 1 100 1 150
row2: 2 200 2 250
row3: 2 200 2 350
I want o/p in row3 as
2 null 2 350
ie I want avoid repetition of data(amt1).
This really is a formatting issue which is best handled by the client. For instance, in SQL*Plus we can use BREAK ....
SQL> select t1.*, t2.* from t1, t2
2 /
A B C D C1
--- --- --- --- ----------
aaa bbb ccc ddd 111
aaa bbb ccc ddd 222
SQL> break on a on b on c on d
SQL> select t1.*, t2.* from t1, t2
2 /
A B C D C1
--- --- --- --- ----------
aaa bbb ccc ddd 111
222
SQL>
Note: in the absence of any further information I opted for a Cartesian product.
edit
BREAK is a SQLPlus command, which suppresses duplicate columns in our rows. It only works in the SQLPlus client. As might be expected, it is covered in Oracle's SQL*Plus User Guide. Find out more.
I used BREAK as an example of the proper way of doing things, because it is clean and correctly implements the separation of concerns. It you are using a different client you would need to use its formatting capabilities. It is possible to tweak the SQL (see below) but that diminishes the utility of the query, because we cannot reuse the query in other places which don't want to suppress the duplicated values.
Anyway, here is one solution which uses the ROW_NUMBER() analytic function in an inline view.
SQL> select * from t1
2 /
A B C D ID
--- --- --- --- ----------
eee fff ggg hhh 1
aaa bbb ccc ddd 2
SQL> select * from t2
2 /
C1 ID
---------- ----------
333 2
111 1
222 2
444 2
SQL> select t1_id
2 , case when rn = 1 then a else null end as a
3 , t2_id
4 , c1
5 from (
6 select t1.id as t1_id
7 , row_number () over (partition by t1.id order by t2.c1) as rn
8 , t1.a
9 , t2.c1
10 , t2.id as t2_id
11 from t1, t2
12 where t1.id = t2.id
13 )
14 order by t1_id, rn
15 /
T1_ID A T2_ID C1
---------- --- ---------- ----------
1 eee 1 111
2 aaa 2 222
2 2 333
2 2 444
SQL>
I chose not to use LAG(), because that only works with fixed offsets, and it seemed likely that the number of rows in T2 would be variable.
Assuning you want all data in a single row, you just do a union select...
Select fieldA from tableA
Union
Select fieldB from TableB
Note that you need to cast the datatype to be the same for both tables.
If you need an other answer, please format the expected result better ;)
Ok...
You have corrected the formating...
In the case above I would simply return 2 cursors from my query. The example data provides no field to link both tables together, so i see no way to join them in a resonable manner. It is possible for a sproc to return several resultsets.
You've done the cartesian product of the two tables since you haven't specified any join criteria. In order to eliminate duplicates, you need to specify how you want the tables to join.
For example, you could try
select * from table1, table2 where table2.val = 111;
Your example doesn't have any join key, so there's no obvious value to join the tables on. But is a more typical example, there would be a related value in both tables so that you could join them together in a meaningful way.
You seem to be doing a cross join here. I suspect you wanted either an equi join or a left outer join.