Compare groups have same binding keys - sql

I have samples as below in 2 different tables. I would have to write a sql to compare same source keys are binded together. binding keys in both the tables doesnt match.
In below example, in table 1 there are 3 binding keys 1, 2 & 3. Binding key 1 has 3 members attached to it ABC, XYZ, & QBC. Similarly binding key 2 & 3 each has got 2 source keys attached to them.
In table 2, binding key 99 has same 3 keys attached which are same as table 1(both count and keys are identical) whereas binding key 78 has got the same count as table 1's binding key 2 but they source keys are different. binding key 64 has 1 source key and binding key 65 has 1.
table 1:
==============================
Binding Key|source Key
1|ABC
1|XYZ
1|QBC
2|xxx
2|yyy
3|uuu
3|ddd
Table 2:
==========================
Binding Key|source Key
99|XYZ
99|QBC
99|ABC
78|xxx
78|QQQ
64|uuu
65|ddd
Expected output is to identify groups that doesnt match the count or the source key members.
Expected Output:
===========================
xxx
yyy
uuu
ddd
QQQ
Many Thanks!!

I found a solution. Its with the help of listagg function to concatenate strings within the same group and then compare them. example sql is shown below.
> SELECT * FROM (SELECT grp ,
> ListAgg( elmnt, ',' ) within GROUP ( ORDER BY pos) AS list FROM table1 GROUP BY grp ) table1 WHERE table1.list NOT IN
> (SELECT ListAgg( elmnt, ',' ) within GROUP ( ORDER BY pos) AS list
> FROM table2 GROUP BY grp ) UNION SELECT * FROM (SELECT grp ,
> ListAgg( elmnt, ',' ) within GROUP ( ORDER BY pos) AS list FROM table2 GROUP BY grp ) table1 WHERE table1.list NOT IN
> (SELECT ListAgg( elmnt, ',' ) within GROUP ( ORDER BY pos) AS list
> FROM table1 GROUP BY grp ) ;

This query gives desired output:
with t1 as ( select row_number() over (partition by grp order by elmnt) pos,
grp, elmnt from table1 ),
t2 as ( select row_number() over (partition by grp order by elmnt) pos,
grp, elmnt from table2 ),
tx1 as (select pos, grp grp1, elmnt,
listagg(elmnt, ',') within group (order by pos)
over (partition by grp) list
from t1),
tx2 as (select pos, grp grp2, elmnt,
listagg(elmnt, ',') within group (order by pos)
over (partition by grp) list
from t2)
select distinct elmnt
from (select * from tx1 full join tx2 using (list, elmnt))
where grp1 is null or grp2 is null;
You could easily change it to show lists, just replace distinct elmnt with distinct list. The difference between your answer and my query is listagg in analytic version and filtered full join instead of union combined with two not in clauses.
First two subqueries (t1 and t2) only adds pos column, which you did not present in original question ;-) Probably this can also be done with minus operator.
Test data and output:
create table table1 (grp number(3), elmnt varchar2(5));
insert into table1 values (1, 'ABC');
insert into table1 values (1, 'XYZ');
insert into table1 values (1, 'QBC');
insert into table1 values (2, 'xxx');
insert into table1 values (2, 'yyy');
insert into table1 values (3, 'uuu');
insert into table1 values (3, 'ddd');
create table table2 (grp number(3), elmnt varchar2(5));
insert into table2 values (99, 'XYZ');
insert into table2 values (99, 'QBC');
insert into table2 values (99, 'ABC');
insert into table2 values (78, 'xxx');
insert into table2 values (78, 'QQQ');
insert into table2 values (64, 'uuu');
insert into table2 values (65, 'ddd');
ELMNT
-----
uuu
QQQ
yyy
ddd
xxx

Related

Query to Find maximum possible combinations between two columns

The target is to create all possible combinations of joining the two columns. every article of the first column ('100','101','102','103') must be in the combination result.
Sample Code
create table basis
(article Integer,
supplier VarChar(10) );
Insert into basis Values (100, 'A');
Insert into basis Values (101, 'A');
Insert into basis Values (101, 'B');
Insert into basis Values (101, 'C');
Insert into basis Values (102, 'D');
Insert into basis Values (103, 'B');
Result set
combination_nr;article;supplier
1;100;'A'
1;101;'A'
1;102;'D'
1;103;'B'
2;100;'A'
2;101;'B'
2;102;'D'
2;103;'B'
3;100;'A'
3;101;'C'
3;102;'D'
3;103;'B'
Let suppose if we add one more row against 102 as 'A' then our result set will be like this
Also according to the below-given calculations now we have 24 result sets
1;100;'A'
1;101;'A'
1;102;'A'
1;103;'B'
2;100;'A'
2;101;'A'
2;102;'D'
2;103;'B'
3;100;'A'
3;101;'B'
3;102;'A'
3;103;'B'
4;100;'A'
4;101;'B'
4;102;'D'
4;103;'B'
5;100;'A'
5;101;'C'
5;102;'A'
5;103;'B'
6;100;'A'
6;101;'C'
6;102;'D'
6;103;'B'
Already tried code
I have tried different Cross Joins but they always give exceeded rows according to my result sets.
SELECT article, supplier
FROM (SELECT DISTINCT supplier FROM basis2) AS t1
CROSS JOIN (SELECT DISTINCT article FROM basis2) AS t2;
Calculations:
article 100: 1 supplier ('A')
article 101: 3 suppliers ('A','B','C')
article 102: 1 supplier ('D')
article 103: 1 supplier ('B')
unique articles: 4 (100,101,102,103)
1x3x1x1 x 4 = 12 (combination rows)
You can do what you want using a recursive CTE. It is easier to put the combinations in single rows rather than across multiple rows:
with b as (
select b.*, dense_rank() over (order by article) as seqnum
from basis b
),
cte as (
select convert(varchar(max), concat(article, ':', supplier)) as suppliers, seqnum
from b
where seqnum = 1
union all
select concat(cte.suppliers, ',', concat(article, ':', supplier)), b.seqnum
from cte join
b
on b.seqnum = cte.seqnum + 1
)
select row_number() over (order by suppliers), suppliers
from (select cte.*, max(seqnum) over () as max_seqnum
from cte
) cte
where seqnum = max_seqnum;
For your particular result set, you can unroll the string:
with b as (
select b.*, dense_rank() over (order by article) as seqnum
from basis b
),
cte as (
select convert(varchar(max), concat(article, ':', supplier)) as suppliers, seqnum
from b
where seqnum = 1
union all
select concat(cte.suppliers, ',', concat(article, ':', supplier)), b.seqnum
from cte join
b
on b.seqnum = cte.seqnum + 1
)
select seqnum,
left(s.value, charindex(':', s.value) - 1) as article,
stuff(s.value, 1, charindex(s.value, ':'), '') as supplier
from (select row_number() over (order by suppliers) as seqnum, suppliers
from (select cte.*, max(seqnum) over () as max_seqnum
from cte
) cte
where seqnum = max_seqnum
) cte cross apply
string_split(suppliers, ',') s;
Here is a db<>fiddle.

Finding a random sample of unique data across multiple columns - SQL Server

Given a set of data in a SQL Server database with the following columns
AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2
All three columns are primary keys from the same users table. I need to find a random sample that will include every UserID available in all three columns no matter the position while guaranteeing the fewest unique AccountID's possible.
--SET UP TEST DATA
CREATE TABLE MY_TABLE
(
AccountID int,
UserID_Salesperson int,
UserID_Servicer1 int,
UserID_Servicer2 int
)
INSERT INTO MY_TABLE (AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2)
VALUES (12345, 1, 1, 2)
INSERT INTO MY_TABLE (AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2)
VALUES (12346, 3, 2, 1)
INSERT INTO MY_TABLE (AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2)
VALUES (12347, 4, 3, 1)
INSERT INTO MY_TABLE (AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2)
VALUES (12348, 1, 2, 3)
--VIEW THE NEW TABLE
SELECT * FROM MY_TABLE
--NORMALIZE DATA (Unique List of UserID's)
SELECT DISTINCT MyDistinctUserIDList
FROM
(SELECT UserID_Salesperson as MyDistinctUserIDList, 'Sales' as Position
FROM MY_TABLE
UNION
SELECT UserID_Servicer1, 'Service1' as Position
FROM MY_TABLE
UNION
SELECT UserID_Servicer2, 'Service2' as Position
FROM MY_TABLE) MyDerivedTable
--NORMALIZED DATA
SELECT *
FROM
(SELECT AccountID, UserID_Salesperson as MyDistinctUserIDList, 'Sales' as Position
FROM MY_TABLE
UNION
SELECT AccountID, UserID_Servicer1, 'Service1' as Position
FROM MY_TABLE
UNION
SELECT AccountID, UserID_Servicer2, 'Service2' as Position
FROM MY_TABLE) MyDerivedTable
DROP TABLE MY_TABLE
For this example table, I could select AccountID (12347 and 12348) OR (12347 and 12346) to get the least accounts with all users.
My current solution is inefficient and can make mistakes. I currently select a random AccountID, insert the data into a temp table and try to find the next insert from something I have not already put in the temp table. I loop through the records until it finds something not used beforeā€¦ and after a few thousand loops it will give up and select any record.
I don't know how you guarantee the fewest account ids, but you can get one row per user id using:
select t.*
from (select t.*,
row_number() over (partition by UserId order by newid()) as seqnum
from my_table t cross apply
(values (t.UserID_Salesperson), (t.UserID_Servicer1), (t.UserID_Servicer2)
) v(UserID)
) t
where seqnum = 1;
Your original table doesn't have a primary key. Assuming that there is one row per account, you can dedup this so it doesn't have duplicate accounts:
select top (1) with ties t.*
from (select t.*,
row_number() over (partition by UserId order by newid()) as seqnum
from my_table t cross apply
(values (t.UserID_Salesperson), (t.UserID_Servicer1), (t.UserID_Servicer2)
) v(UserID)
) t
where seqnum = 1
order by row_number() over (partition by accountID order by accountID);

Counting repeated data

I'm trying to get maximum repeat of integer in table I tried many ways but could not make it work. The result I'm looking for is as:
"james";"108"
As this 108 when I concat of two fields loca+locb repeated two times but others did not I try below sqlfiddle link with sample table structure and the query I tried... sqlfiddle link
Query I tried is :
select * from (
select name,CONCAT(loca,locb),loca,locb
, row_number() over (partition by CONCAT(loca,locb) order by CONCAT(loca,locb) ) as att
from Table1
) tt
where att=1
please click here so you can see complete sample table and query I tried.
Edite: adding complete table structure and data:
CREATE TABLE Table1
(name varchar(50),loca int,locb int)
;
insert into Table1 values ('james',100,2);
insert into Table1 values ('james',100,3);
insert into Table1 values ('james',10,8);
insert into Table1 values ('james',10,8);
insert into Table1 values ('james',10,7);
insert into Table1 values ('james',10,6);
insert into Table1 values ('james',0,7);
insert into Table1 values ('james',10,0);
insert into Table1 values ('james',10);
insert into Table1 values ('james',10);
and what I'm looking for is to get (james,108) as that value is repeated two time in entire data, there is repetion of (james,10) but that have null value of loca so Zero value and Null value is to be ignored only those to be considered that have value in both(loca,locb).
SQL Fiddle
select distinct on (name) *
from (
select name, loca, locb, count(*) as total
from Table1
where loca is not null and locb is not null
group by 1,2,3
) s
order by name, total desc
WITH concat AS (
-- get concat values
SELECT name,concat(loca,locb) as merged
FROM table1 t1
WHERE t1.locb NOTNULL
AND t1.loca NOTNULL
), concat_count AS (
-- calculate count for concat values
SELECT name,merged,count(*) OVER (PARTITION BY name,merged) as merged_count
FROM concat
)
SELECT cc.name,cc.merged
FROM concat_count cc
WHERE cc.merged_count = (SELECT max(merged_count) FROM concat_count)
GROUP BY cc.name,cc.merged;
SqlFiddleDemo
select name,
newvalue
from (
select name,
CONCAT(loca,locb) newvalue,
COUNT(CONCAT(loca,locb)) as total,
row_number() over (order by COUNT(CONCAT(loca,locb)) desc) as att
from Table1
where loca is not null
and locb is not null
GROUP BY name, CONCAT(loca,locb)
) tt
where att=1

how to insert many records excluding some

I want to create a table with a subset of records from a master table.
for example, i have:
id name code
1 peter 73
2 carl 84
3 jack 73
I want to store peter and carl but not jack because has same peter's code.
I need hight performance because i have 20M records.
I try this:
SELECT id, name, DISTINCT(code) INTO new_tab
FROM old_tab
WHERE (conditions)
but don't work.
Assuming you want to pick the row with the maximum id per code, then this should do it:
insert into new_tab (id, name, code)
(SELECT id, name, code
FROM
(
SELECT id, name, code, rank() as rnk OVER (PARTITION BY code ORDER BY id DESC)
FROM old_tab WHERE rnk = 1
)
)
and for the minimum id per code, just change the sort order in the rank from DESC to ASC:
insert into new_tab (id, name, code)
(SELECT id, name, code
FROM
(
SELECT id, name, code, rank() as rnk OVER (PARTITION BY code ORDER BY id ASC)
FROM old_tab WHERE rnk = 1
)
)
Using a derived table, you can find the minID for each code, then join back to that in the outer to get the rest of the columns for that ID from oldTab.
select id,name,code
insert into newTabFROM
from old_tab t inner join
(SELECT min(id) as minId, code
from old_tab group by code) x
on t.id = x.minId
WHERE (conditions)
Try this:
CREATE TABLE #Temp
(
ID INT,
Name VARCHAR(50),
Code INT
)
INSERT #Temp VALUES (1, 'Peter', 73)
INSERT #Temp VALUES (2, 'Carl', 84)
INSERT #Temp VALUES (3, 'Jack', 73)
SELECT t2.ID, t2.Name, t2.Code
FROM #Temp t2
JOIN (
SELECT t.Code, MIN(t.ID) ID
FROM #temp t
JOIN (
SELECT DISTINCT Code
FROM #Temp
) d
ON t.Code = d.Code
GROUP BY t.Code
) b
ON t2.ID = b.ID

How to select top 3 values from each group in a table with SQL which have duplicates [duplicate]

This question already has answers here:
Select top 10 records for each category
(14 answers)
Closed 5 years ago.
Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person.
The issue can be solved if there are no duplicates in the table by the query provided in this article Select top 3 values from each group in a table with SQL . But if there are duplicates, what is the solution?
For example, if for one name John, he has 5 values related to him. They are 20,7,7,7,4. I need to return the name/value pairs as below order by value descending for each name:
-----------+-------+
| name | value |
-----------+-------+
| John | 20 |
| John | 7 |
| John | 7 |
-----------+-------+
Only 3 rows should be returned for John even though there are three 7s for John.
In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER() which is part of the latest SQL standard:
WITH cte AS
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
)
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;
Without CTE, only ROW_NUMBER():
SELECT name, value, rn
FROM
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
) tmp
WHERE rn <= 3
ORDER BY name, rn ;
Tested in:
Postgres
Oracle
SQL-Server
In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY.
The (tid) is assumed to be the primary key of the table:
SELECT t.tid, t.name, t.value, -- self join and GROUP BY
COUNT(*) AS rn
FROM t
JOIN t AS t2
ON t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;
SELECT t.tid, t.name, t.value, rn
FROM
( SELECT t.tid, t.name, t.value,
( SELECT COUNT(*) -- inline, correlated subquery
FROM t AS t2
WHERE t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
) AS rn
FROM t
) AS t
WHERE rn <= 3
ORDER BY name, rn ;
Tested in MySQL
I was going to downvote the question. However, I realized that it might really be asking for a cross-database solution.
Assuming you are looking for a database independent way to do this, the only way I can think of uses correlated subqueries (or non-equijoins). Here is an example:
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3)
However, each database that you mention (and I note, Hadoop is not a database) has a better way of doing this. Unfortunately, none of them are standard SQL.
Here is an example of it working in SQL Server:
with t as (
select 1 as personid, 5 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 7 as val union all
select 1 as personid, 8 as val
)
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3);
Using GROUP_CONCAT and FIND_IN_SET you can do that.Check SQLFIDDLE.
SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
FROM tbl t1
WHERE t1.name = t.name
GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc
If your result set is not so heavy, you can write a stored procedure (or an anonymous PL/SQL-block) for that problem which iterates the result set and finds the bigges three by a simple comparing algorithm.
Try this -
CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);
WITH cte
AS (
SELECT NAME
,value
,ROW_NUMBER() OVER (
PARTITION BY NAME ORDER BY (value) DESC
) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC
This works for MS SQL. Should be workable in any other SQL dialect that has the ability to assign row numbers in a group by or over clause (or equivelant)
if object_id('tempdb..#Data') is not null drop table #Data;
GO
create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Jack', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO
with D as (
SELECT
name
,Value
,row_number() over (partition by name order by value desc) rn
From
#Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc
Name Value
Jack 5
Jane 30
Jane 21
Jane 18
John 20
John 7
John 7