SQL - Looking to show when 2 columns combined have the same data

SQL - Looking to show when 2 columns combined have the same data - sql

I have a database table that has a Vendor_ID column and a Vendor_Item column.
Vendor_id Vendor_item
101 111
101 111
101 123
I need a way to show when vendor_id and vendor_item are combined, show if having count greater than 1. The vendor_item number can be in there multiple times as long as it has a different vendor_id.
Vendor_id Vendor_item
101 111
101 111
I have done the following but it only shows results have have more than 1 and doesn't show both records like the above example.
SELECT vendor_id,vendor_item
From Inventory_master
group by vendor_id,vendor_item
having count(*) >1
If possible I would like a way to add another column ( UPC ) to the results. The system I am working on can import back into the system with UPC so I would be able to fix what is duplicated.
Vendor_id Vendor_item UPC
101 111 456
101 111 789

Not sure about the UPC column as from where and how you are getting it but you can change your existing query a bit like below to get the desired data
SELECT * FROM Inventory_master WHERE vendor_item IN (
SELECT vendor_item
From Inventory_master
group by vendor_item
having count(vendor_item) >1);

You can use a subquery and then JOIN back to the inventory_master table:
SELECT im.*
FROM
Inventory_master im INNER JOIN (
SELECT vendor_id, vendor_item
From Inventory_master
group by vendor_id,vendor_item
having count(*) >1) s
ON im.vendor_id = s.vendor_id AND im.vendor_item = s.vendor_item

Try this
select * from(
select vendor_id,vendor_item, count(*) over (partition by vendor_id) cnt
from Inventory_master
) where cnt>1

Related

Joining a transaction fact table to a periodic snapshot table in SQL using the nearest date

I am using Redshift on AWS and I have two tables, the first is a list of transactions like so:
cust_ID
order_date
product
100
2022/05/01
A
101
2022/05/01
A
100
2022/05/05
B
101
2022/05/07
B
The second is a snapshot table which has customer attributes for each customer at a specific point in time. Though the second table has rows for most dates, it doesn't have rows for every customer at every date.
cust_ID
as_of_date
favourite_colour
100
2022/05/01
blue
100
2022/05/02
red
100
2022/05/05
green
100
2022/05/07
red
101
2022/05/01
blue
101
2022/05/04
red
101
2022/05/05
green
101
2022/05/08
yellow
How can I join the tables such that the transaction table has the customer attributes either on the date of the order itself, or if the transaction date is not available in table 2, at the nearest available date before the transaction?
An example of the desired output would be:
cust_ID
order_date
product
Favourite_colour
as_of_date
100
2022/05/01
A
blue
2022/05/01
101
2022/05/01
A
blue
2022/05/01
100
2022/05/05
B
green
2022/05/05
101
2022/05/07
B
green
2022/05/05
Joining by cust_ID and order_date = as_of_date doesn't work due to edge cases where the order_date/id combination is not in the second table.
I've also tried something like:
with snapshot as (
SELECT
row_number() OVER(PARTITION BY cust_ID ORDER BY as_of_date DESC) as row_number,
cust_ID,
favourite_color,
as_of_date
FROM table2 t2
INNER JOIN table1 t1
ON t1.cust_ID = t2.cust_ID
AND t2.as_of_date <= t1.order_date
)
SELECT * FROM snapshot
WHERE row_number = 1
However, this doesn't handle cases where the same customer has multiple transactions in table 1. When I check the count of the resulting table, the number of distinct cust_IDs is the same as count(*) so it seems like the resulting table is only retaining one transaction per customer.
Any help would be appreciated.

Using your provided table inputs, I tested this solution in DB Fiddle and it works for your desired output.
with my_cte AS (
select *,
row_number() OVER(PARTITION BY cust_id, order_date ORDER BY as_of_date desc) ranked
from transactions
left join attribs using (cust_id)
where as_of_date <= order_date
)
select cust_id, order_date, product, favorite_color, as_of_date
from my_cte
where ranked = 1
order by order_date, cust_id;

Counting unique combinations of values across multiple columns regardless of order?

I have a table that looks a bit like this:
Customer_ID | Offer_1 | Offer_2 | Offer_3
------------|---------|---------|--------
111 | A01 | 001 | B01
222 | A01 | B01 | 001
333 | A02 | 001 | B01
I want to write a query to figure out how many unique combinations of offers there are in the table, regardless of what order the offers appear in.
So in the example above there are two unique combinations: customers 111 & 222 both have the same three offers so they count as one unique combination, and then customer 333 is the only customer to have the three orders that they have. So the desired output of the query would be 2.
For some additional context:
The customer_ID column is in integer format, and all the offer
columns are in varchar format.
There are 12 offer columns and over 3 million rows in the actual
table, with over 100 different values in the offer columns. I
simplified the example to better illustrate what I'm trying to do, but any solution needs to scale to this amount of
possible combinations.
I can concatenate all of the offer columns together and then run a count distinct statement on the result, but this doesn't account for customers who have the same unique combination of offers but ordered differently (like customers 111 & 222 in the example above).
Does anyone know how to solve this problem please?

Assuming the character / doesn't show up in any of the offer names, you can do:
select count(distinct offer_combo) as distinct_offers
from (
select listagg(offer, '/') within group (order by offer) as offer_combo
from (
select customer_id, offer_1 as offer from t
union all select customer_id, offer_2 from t
union all select customer_id, offer_3 from t
) x
group by customer_id
) y
Result:
DISTINCT_OFFERS
---------------
2
See running example at db<>fiddle.

One way to do it would be to union all the offers into one column, then use select distinct listagg... to get the combinations of offers. Try this:
with u as
(select Customer_ID, Offer_1 as Offer from table_name union all
select Customer_ID, Offer_2 as Offer from table_name union all
select Customer_ID, Offer_3 as Offer from table_name)
select distinct listagg(Offer, ',') within group(order by Offer) from u
group by Customer_ID
Fiddle

The solution without UNION ALLs. It should have better performance.
/*
WITH MYTAB (Customer_ID, Offer_1, Offer_2, Offer_3) AS
(
VALUES
(111, 'A01', '001', 'B01')
, (222, 'A01', 'B01', '001')
, (333, 'A02', '001', 'B01')
)
*/
SELECT COUNT (DISTINCT LIST)
FROM
(
SELECT LISTAGG (V.Offer, '|') WITHIN GROUP (ORDER BY V.Offer) LIST
FROM MYTAB T
CROSS JOIN TABLE (VALUES T.Offer_1, T.Offer_2, T.Offer_3) V (Offer)
GROUP BY T.CUSTOMER_ID
)

Find the Missing Key ID or Numbers from a Column values

Need to find the missing numbers which have been deleted or a Column does not have yet.
For example:
i have a table Named Person have Columns [PersonID] [PersonName]
[PersonID] is primary and incremented Number e.g. From 1 to N.
PersonID PersonName
1001 ABC
1002 ABC
1003 XYZ
1004 MNO
1006 ABC
1008 MNO
1009 ABC
1010 ABC
1011 XYZ
1014 ABC
1015 ABC
1016 XYZ
1017 MNO
In the given table ,there are some missing numbers in Column PersonID like
1005
1007
1012
1013
Need to find the missing Numbers only.
Note: There are more than 20 million records in my table.
So please suggest a faster method to find the desired numbers.

Thanks to all of you who supported and share some points. I have found the way to find the Missing using ROWNUMBER().
SELECT
NOTEXIST FROM (
SELECT ROW_NUMBER() OVER (ORDER BY PERSONID) NOTEXIST ,PERSONID FROM #A ) T
WHERE NOTEXIST NOT IN ( SELECT PERSONID FROM PERSONID )

Create another table and populate all the numbers between Min and Max ranges of PersonID. Do an anti join (Left/right) to get the list of numbers missing.
select * from NewIDTable a
left join OriginalTable b on a.PersonID=b.PersonID
where b.Personid is null

The simplest way is to get ranges. You can do this with lead():
select personid + 1, next_personid - 1 as end_range,
next_personid - personid - 1 as num_missing
from (select t.*,
lead(personid) over (order by personid) as next_personid
from t
) t
where next_personid <> personid + 1;
If you still want the list of ids, you can expand out the ranges, but that depends on the database.
In SQL Server 2008, this is much more performance intensive, but you can do it:
select personid + 1, tnext.personid - 1 as end_range,
text.personid - personid - 1 as num_missing
from t cross apply
(select top (1) t2.person_id
from t t2
where t2.personid > t.person_id
order by t2.personid asc
) tnext
where tnext.personid <> personid + 1;

Trying to group quantities based off an ID

We have two columns one with ID and another with QTY. And the layout goes along the lines of:
ID QTY
-------------
123 456
123 634
123 4235
234 67
234 735
234 666
What I am trying to do is add up all the numbers based off the ID so it would look like:
ID QTY
-------------
123 5325
234 1468
I currently have the following SQL query:
SELECT CLIENT_ID, ID, QTY_ON_HAND,
SUM(QTY_ON_HAND)
FROM
(select CLIENT_ID, ID, QTY_ON_HAND
FROM INVENTORY
WHERE CLIENT_ID = '(CLIENT ID HERE)')
GROUP BY QTY_ON_HAND
It would be appreciated if anyone can tell me simple way on how to do this.

I do not have a test DB at hand, but it should be this:
select
ID,
sum(QTY) as TOTAL
from
YourTableName
group by
ID;
YourTableName ... name of data table with two columns ID, QTY. Be aware of whole table name, it can be also something like dbo.yourtablename, etc.

Full Join and sum 2 queries in access

I am very new to this type of complicated joins and summing functions.
I had 2 table queries which have same fields (i.e: ProdID, ProdName, NetQty, Unit)
The 1st one query contains OpStock and the 2nd one contains Purchase.
I want to add this 2 table query to make a single table so that I can able to see current stock.
The data looks like this for 1 st table is:
ProdID ProdName Qty
100 Rose 700
101 Lilly 550
103 Jasmine 600
105 Lavender 400
The data looks like this for 2nd table is:
ProdID ProdName Qty
100 Rose 400
101 Lilly 250
104 Lotus 1000
106 MariGold 400
The final data looks like this for 3rd table is:
ProdID ProdName Qty
100 Rose 1100
101 Lilly 800
103 Jasmine 600
104 Lotus 1000
105 Lavender 400
106 MariGold 400
How can i achieve this one using sql for access2003.
Thanks.
I am very sorry Ciaran,
This is purely access used for vb.net
Here Is my access query1
SELECT sp_OpenIandP_Prod.ProdID,
sp_OpenIandP_Prod.ProdName,
Sum(([sp_OpenIandP_Prod.SumOfNetQty]-[sp_OpenSales_Prod.SumOfNetQty])) AS NetQty,
sp_OpenIandP_Prod.UnitSName
FROM sp_OpenIandP_Prod
INNER JOIN sp_OpenSales_Prod ON sp_OpenIandP_Prod.ProdID=sp_OpenSales_Prod.ProdID
GROUP BY sp_OpenIandP_Prod.ProdID,
sp_OpenIandP_Prod.ProdName,
sp_OpenIandP_Prod.UnitSName;
The 1st query result would be like this:
ProdID ProdName NetQty UnitSName
1 Rose 0 Kgs
2 Lilly 7125 Mts
3 Lotus 12374 Nos
The second query is:
SELECT Products.ProdID, Products.ProdName,
Sum(OPDDetails.NetQty) AS SumOfNetQty, Units.UnitSName
FROM Units
INNER JOIN (Products
INNER JOIN (Brands
INNER JOIN OPDDetails ON Brands.BrID=OPDDetails.BrandID)
ON Products.ProdID=Brands.ProdID)
ON Units.UnitID=Products.UnitID
WHERE (((OPDDetails.PurID)>0)
AND ((OPDDetails.OPDDate)>=[StartDate] And (OPDDetails.OPDDate)<=[EndDate]))
GROUP BY Products.ProdID, Products.ProdName, Units.UnitSName;
and the result set would be like this:
ProdID ProdName SumOfNetQty UnitSName
1 Rose 1800 Kgs
2 Lilly 21000 Mts
I got the above result easily.
But it is not as much easy to get like this:
ProdID ProdName SumOfNetQty UnitSName
1 Rose 1800 Kgs
2 Lilly 28125 Mts
3 Lotus 12374 Nos
That's all. Thanks again.

This is not a vb.net question, however you need to use a UNION
Select ProdId, ProdName, Sum(Qty) As QtySum
From (Select ProdId, ProdName, Qty From TableA
Union All
Select ProdId, ProdName, Qty From TableB) DerrivedView
Group By ProdId, ProdName

You can create a UNION between your 2 queries then group by product:
SELECT ProdID, ProdName, Sum(NetQty) As NetQty, UnitSName
FROM
(
SELECT sp_OpenIandP_Prod.ProdID,
sp_OpenIandP_Prod.ProdName,
Sum(([sp_OpenIandP_Prod.SumOfNetQty]-[sp_OpenSales_Prod.SumOfNetQty])) AS NetQty,
sp_OpenIandP_Prod.UnitSName
FROM sp_OpenIandP_Prod
INNER JOIN sp_OpenSales_Prod ON sp_OpenIandP_Prod.ProdID=sp_OpenSales_Prod.ProdID
GROUP BY sp_OpenIandP_Prod.ProdID,
sp_OpenIandP_Prod.ProdName,
sp_OpenIandP_Prod.UnitSName
UNION ALL
SELECT Products.ProdID, Products.ProdName,
Sum(OPDDetails.NetQty) AS NetQty, Units.UnitSName
FROM Units
INNER JOIN (Products
INNER JOIN (Brands
INNER JOIN OPDDetails ON Brands.BrID=OPDDetails.BrandID)
ON Products.ProdID=Brands.ProdID)
ON Units.UnitID=Products.UnitID
WHERE (((OPDDetails.PurID)>0)
AND ((OPDDetails.OPDDate)>=[StartDate] And (OPDDetails.OPDDate)<=[EndDate]))
GROUP BY Products.ProdID, Products.ProdName, Units.UnitSName
)
GROUP BY ProdID, ProdName, UnitSName

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Looking to show when 2 columns combined have the same data - sql

Not sure about the UPC column as from where and how you are getting it but you can change your existing query a bit like below to get the desired data SELECT * FROM Inventory_master WHERE vendor_item IN ( SELECT vendor_item From Inventory_master group by vendor_item having count(vendor_item) >1);

You can use a subquery and then JOIN back to the inventory_master table: SELECT im.* FROM Inventory_master im INNER JOIN ( SELECT vendor_id, vendor_item From Inventory_master group by vendor_id,vendor_item having count(*) >1) s ON im.vendor_id = s.vendor_id AND im.vendor_item = s.vendor_item

Try this select * from( select vendor_id,vendor_item, count(*) over (partition by vendor_id) cnt from Inventory_master ) where cnt>1

Related

Joining a transaction fact table to a periodic snapshot table in SQL using the nearest date

Counting unique combinations of values across multiple columns regardless of order?

Find the Missing Key ID or Numbers from a Column values

Trying to group quantities based off an ID

Full Join and sum 2 queries in access

Categories

Resources