Finding duplicate **Set of Values** in multiple rows in SQL Server table

Finding duplicate **Set of Values** in multiple rows in SQL Server table - sql

I have a SQL Server database table with this sample data:
ProductID GenericID MG
---------------------------------
1 1 2g
1 2 5g
2 2 5g
3 1 2g
3 2 5g
4 1 2g
5 1 2g
5 3 7g
6 2 5g
7 1 2g
8 1 2g
I want to find out the query to select data
if I select ProductID=1 then the query should check what GenericID are associated with ProductID=1
In above data case if user select ProductID=1 then query will check GenericID=1 and 2 are associated with ProductID=1.
Then after I want to go through all rows and select those rows who has the same Unique ProductID and also having only GenericID=1 and 2.
as in above case the final output will be as shown below....
I select ProductID=1 and output has four rows, because only ProductId 3 has same GenericID as were of ProductId=1.
If I select only ProductId=1 then I want to get all the rows with the same exact set of GenericID values as ProductID=1, which is the set { 1, 2 } in my sample data. I am struggling with the query logic.
For example - I select ProductID=1, this is the output that I want is as follows, because ProductID 3 has the same set of GenericID values as ProductID 1.
ProductID GenericID MG
-------------------------------
1 1 2g
1 2 5g
3 1 2g
3 2 5g
GenericID can be on or multiple dynamic values.
Another example - if I select ProductID=7, this is the output I want:
In this example - It will only get those results that are having only GenericID=1 because ProductID=7 has only GenericID=1. any set of productID which is having GenericID=1 and also that set includes other GenericID will be neglected.
ProductID GenericID MG
------------------------------
7 1 2g
8 1 2g
4 1 2g
I need to find out the query to select the required output.
I want all of the products that have the same set of generic id's as the predicate product.

The simplest method is probably to use string_agg():
with t as (
select productID, string_agg(genericId, ',') within group (order by genericId) as genericIds
from sample
group by productID
)
select s.*
from t join
t t2
on t.genericIds = t2.genericIds and t2.productId = 1 join
sample s
on s.productId = t.productId;

Gordon, thanks a lot for your prompt response, basically I forget to inform you that I am using SQL 2014 and that's why string_agg(): action function wasn't helpful for me but I really appreciate your help and the prompt response that make my day. Here I created my query with the help of your other query and you became and very helpful resource for me.
select PG.PID2 as Alternatives
from (select d1.ProductID as PID1, d2.ProductID as PID2
from (select distinct ProductID from ProductsGenerics Where ProductID=#PID) d1 cross join
(select distinct ProductID from ProductsGenerics) d2
) PG left outer join
ProductsGenerics e1
on e1.ProductID = PG.PID1 full outer join
ProductsGenerics e2
on PG.PID2 = e2.ProductID and e1.genericid = e2.GenericID-- and e1.MG = e2.MG
group by PG.PID1, PG.PID2
having SUM(case when e1.GenericID is null then 1 else 0 end) = 0 and
SUM(case when e2.GenericID is null then 1 else 0 end) = 0

Related

Finding duplicate values in multiple rows in SQL Server table

I have a SQL Server database table with this sample data:
ProductID GenericID MG
---------------------------------
1 1 2g
1 2 5g
2 2 5g
3 1 2g
3 2 5g
4 1 2g
5 1 2g
5 3 7g
6 2 5g
7 1 2g
8 1 2g
I want to find out the query to select data
if I select 'ProductID=1' then the query should check what 'GenericID' are associated with 'ProductID=1'
In above data case if user select 'ProductID=1' then query will check 'GenericID=1' and 2 are associated with 'ProductID=1'.
Then after I want to go through all rows and select those rows who has the same Unique 'ProductID' and also having only 'GenericID=1 and 2'.
as in above case the final output will be as shown below....
I select 'ProductID=1' and output has four rows, because only ProductId 3 has same 'GenericID' as were of 'ProductId=1'
If I select only ProductId=1 then I want to get all the rows with the same exact set of GenericID values as ProductID=1, which is the set { 1, 2 } in my sample data. I am struggling with the query logic.
For example - I select ProductID=1, this is the output that I want is as follows, because ProductID 3 has the same set of GenericID values as ProductID 1.
ProductID GenericID MG
-------------------------------
1 1 2g
1 2 5g
3 1 2g
3 2 5g
GenericID can be on or multiple dynamic values.
Another example - if I select ProductID=7, this is the output I want:
In this example - It will only get those results that are having only GenericID=1 because ProductID=7 has only GenericID=1. any set of productID which is having 'GenericID=1' and also that set includes other 'GenericID' will be neglected.
ProductID GenericID MG
------------------------------
7 1 2g
8 1 2g
4 1 2g
I need to find out the query to select the required output.
I want all of the products that have the same set of generic id's as the predicate product.

How about
SELECT *
FROM yourtable
WHERE
GenericID in (SELECT GenericID FROM yourtable WHERE ProductID=1)
Update:
If the whole set of GenericIDs needs to match, this should work (assuming ProductID, GenericID is unique):
WITH yourtable_gids AS (
SELECT
ProductID,
STRING_AGG(GenericID, ',') WITHIN GROUP (ORDER BY GenericID) as GenericIDs
FROM yourtable
GROUP BY ProductID
)
SELECT y.*
FROM yourtable y
JOIN yourtable_gids yg ON y.ProductID=yg.ProductID
WHERE
yg.GenericIDs in (SELECT GenericIDs FROM yourtable_gids WHERE ProductID=1)

So if i interpret the question right it's
I select on a product Id, and want to additionaly have every product with the same generic IDs as the ones I got.
select * from products
where productID = X
or genericId in (select genericId from products where product ID = X)
This should be the answer if I got your question right.

Left Join Display All Data From Table1 and Table2

I am trying to do a left join so that I get all of my rows from Table 1 even if there is no value corresponding to it in the second table.
My structures are:
Location Table:
ID LocName
1 Trk1
2 Trk2
3 Trk3
4 Unk
Quantity Table:
ID PartID Quantity LocationID
1 1 2 1
2 3 12 2
3 2 6 1
4 6 8 3
5 6 5 1
I am trying to join but also make a query on a specific PartID. My query is:
SELECT
INV_LOCATIONS.ID AS LocationID,
INV_LOCATIONS.NAME AS LocationName,
INV_QUANTITY.QUANTITY AS Quantity
FROM INV_LOCATIONS
LEFT JOIN INV_QUANTITY ON INV_LOCATIONS.ID = INV_QUANTITY.LOCATION_ID
WHERE INV_QUANTITY.PART_ID = 1;
My output right now would be:
ID LocName Quantity
1 Trk1 5
3 Trk3 8
The Desired output is:
ID LocName Quantity
1 Trk1 5
2 Trk2 NULL/0
3 Trk3 8
4 Unk NULL/0
I assume it is because I have the WHERE INV_QUANTITY.PART_ID = 1 and that is forcing it to be in the quantity table. I need to be able to verify it is on the right part but how do I also include it if it doesn't exist. I know I have done something very similar before but I cannot remember which project and so I cannot find the code anywhere.

You need to move the filtering logic to the ON clause:
SELECT il.ID AS LocationID, il.NAME AS LocationName,
iq.QUANTITY AS Quantity
FROM INV_LOCATIONS il LEFT JOIN
INV_QUANTITY iq
ON il.ID = iq.LOCATION_ID AND iq.PART_ID = 1;

Find whether id matches and substitute using Case Hive query

I have a table called "Scan" customer transactions where an individual_id appears once for every different transaction and contains column like scan_id.
I have another table called ids which contains random individual_ids sampled from Scan Table
I would like to join ids with scan and get a single record of ids and scan_id if it matches certain values.
Suppose data is like below
Scan table
Ids scan_id
---- ------
1 100
1 111
1 1000
2 100
2 111
3 124
4 1000
4 111
Ids table
id
1
2
3
4
5
I want below output i.e if scan_id matches either 100 or 1000
Id MT
------ ------
1 1
2 1
3 0
4 1
I executed below query and got error
select MT, d.individual_id
from
(
select
CASE
when scan_id in (90069421,53971306,90068594,136739913,195308160) then 1
ELSE 0
END as MT
from scan cs join ids r
on cs.individual_id = r.individual_id
where
base_div_nbr =1
and
country_code ='US'
and
retail_channel_code=1
and visit_date between '2019-01-01' and '2019-12-31'
) as d
group by individual_id;
I would appreciate any suggestions or help with regard to this Hive query. If there is an efficient way of getting this job done. Let me know.

Use a group by:
select s.individual_id,
max(case when s.scan_id in (100, 1000) then 1 else 0 end) as mt
from scan s
group by s.individual_id;
The ids table doesn't seem to be needed for this query.

Nested sum loop until foreign key 'dies out'

I am pulling my hair out over a data retrieval function I'm trying to write. In essence this query is meant to SUM up the count of all voorwerpnummers in the Voorwerp_in_Rubriek table, grouped by their rubrieknummer gathered from Rubriek.
After that I want to keep looping through the sum in order to get to their 'top level parent'. Rubriek has a foreign key reference to itself with a 'hoofdrubriek', this would be easier seen as it's parent in a category tree.
This also means they can be nested. A value of 'NULL' in the hoofdcategory column means that it is a top-level parent. The idea behind this query is to SUM up the count of voorwerpnummers in Voorwerp_in_rubriek, and add them together until they are at their 'top level parent'.
As the database and testdata is quite massive I've decided not to add direct code to this question but a link to a dbfiddle instead so there's more structure.
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=8068a52da6a29afffe6dc793398f0998
I got it working in some degree using this query:
SELECT R2.hoofdrubriek ,
COUNT(Vr.rubrieknummer) AS aantal
FROM Rubriek R1
RIGHT OUTER JOIN Rubriek R2 ON R1.rubrieknummer = R2.hoofdrubriek
INNER JOIN Voorwerp_in_rubriek Vr ON R2.rubrieknummer = Vr.rubrieknummer
WHERE NOT EXISTS ( SELECT *
FROM Rubriek
WHERE hoofdrubriek = R2.rubrieknummer )
AND R1.hoofdrubriek IS NOT NULL
GROUP BY Vr.rubrieknummer ,
R2.hoofdrubriek
But that doesn't get back all items and flops in general. I hope someone can help me.

If I got it right
declare #t table (
rubrieknummer int,
cnt int);
INSERT #t(rubrieknummer, cnt)
SELECT R.rubrieknummer, COUNT(Vr.voorwerpnummer)
FROM Rubriek R
INNER JOIN voorwerp_in_rubriek Vr ON R.rubrieknummer = Vr.rubrieknummer
GROUP BY Vr.rubrieknummer, R.rubrieknummer;
--select * from #t;
with t as(
select rubrieknummer, cnt
from #t
union all
select r.hoofdrubriek, cnt
from t
join Rubriek r on t.rubrieknummer = r.rubrieknummer
)
select rubrieknummer, sum(cnt) cnt
from t
group by rubrieknummer;
applying to your fiddle data returns
rubrieknummer cnt
<null> 42
100 42
101 26
102 6
103 10
10000 8
10100 4
10101 1
10102 3
10500 4
10501 2
10502 2
15000 18
15100 6
15101 2
15102 2
15103 2
15500 12
15501 4
15502 3
15503 5
20000 6
20001 2
20002 1
20003 1
20004 2
25000 4
25001 1
25002 1
25003 1
25004 1
30001 2
30002 1
30004 3

Find difference between two sets of records

I have a set of data on sql server something like:
ID ID_Invoice Article Quantity Status
1 10 carrot 10 null
2 10 carrot 5 C
3 10 onion 8 null
4 10 onion 4 C
5 11 tomato 20 null
6 11 tomato 18 C
7 11 onion 2 null
8 11 onion 1 C
It means that a customer ordered 10 carrots and 8 onions (on one invoice) but actually received only 5 carrots and 4 onions. If status is null then it is original quantity, if status is C then it is corrected quantity
I need to generate a table like
ID ID_Invoice Article Quantity
1 10 carrot -5
2 10 onion -4
3 11 tomato -2
4 11 onion -1
which shows the difference between ordered quantity and real quantity on each invoice. I have no idea how to begin. Any help deeply appreciated :)

Option with simple CASE expression without excessive JOIN
SELECT ID_Invoice, Article,
SUM(CASE WHEN Status IS NULL
THEN -1 * Quantity ELSE Quantity END) AS Quantity
FROM dbo.test38
GROUP BY ID_Invoice, Article
Result:
ID_Invoice Article Quantity
10 carrot -5
10 onion -4
11 onion -1
11 tomato -2
Demo on SQLFiddle

Least resource intensive:
SELECT id_invoice
, article
, org_quantity
, new_quantity
, new_quantity - org_quantity diff
FROM (SELECT id_invoice
, article
, max(CASE WHEN status IS NULL THEN quantity else null END) org_quantity
, max(CASE WHEN status = 'C' THEN quantity else null END) new_quantity
FROM orders
GROUP BY id_invoice
, article)
See it working here: http://sqlfiddle.com/#!4/f96adf/14

You didn't specify, which RDBMS you're using, but my answer is ANSI-SQL standard compliant :) Works with every valid RDBMS out there.
SELECT
yt1.ID_Invoice,
yt1.Article,
yt2.Quantity - yt1.Quantity AS Quantity
FROM
yourTable yt1
INNER JOIN yourTable yt2 ON yt1.ID_Invoice = yt2.ID_Invoice
AND yt1.Article = yt2.Article
AND yt2.Status = 'C'
WHERE
yt1.Status IS NULL
This answer is asuming, there's always a record with Status NULL and corresponding row with status 'C'. If this is not the case, you'd have to adjust it like this:
SELECT
yt1.ID_Invoice,
yt1.Article,
CASE WHEN yt2.Quantity IS NULL THEN yt1.Quantity ELSE yt2.Quantity - yt1.Quantity END AS Quantity
FROM
yourTable yt1
LEFT JOIN yourTable yt2 ON yt1.ID_Invoice = yt2.ID_Invoice
AND yt1.Article = yt2.Article
AND yt2.Status = 'C'
WHERE
yt1.Status IS NULL

So first of all you have to separate the actual from the ordered by making 2 queries and then you have to left join the orders to the actual .. something like this
select
Recived.ID,
Recived.ID_Invoice,
Recived.Article,
Recived.Quantity - Ordered.Quantity as Quantity
from
(select * from dataTable where Status is null) as Ordered
left join (select * from dataTable where Status = 'C') as Recived on (Ordered.ID_Invoice = Recived.ID_Invoice and Ordered.Article = Recived.Article )
NOTE! you will be better if you have an id for each article to use in the "left join" instead of comparing varchars.
Here is a fiddle example: http://sqlfiddle.com/#!2/16666/1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas