Select Top Row in a Group By without any numerical aggregation - sql

I am trying to build a Item Table in access. I have item number, mfg name, and product description. I would like my PK to be item number and mfg name. However, I have about 5k areas where the product description is creating duplicates based on a slight variation in the description itself. I would like to just have access create the table by grouping all items based on item number and mfg name and then select the first result.
NOTE: the method I have attempted below uses MIN/MAX. This does NOT have to be the method suggested. Ultimate goal is to select the top row or a single row for each group So if i have 2 part numbers that say 123 and 2 product descriptions for that part number, I just want one of those descriptions to display. It does NOT matter which one.
Example:
Item_Num, MFG_Name, Product_Desc
414001000, AMBU INC., ASCOPE 3,LARGE,5.8/2.8 5EA/BX
414001000, AMBU INC., ASCOPE 3,LARGE,5.8/2.8 5EA/BX
06L21-01, ABBOTT LABORATORIES INC, 07K0040AT HAVAB-M CALB 4ML RX
06L21-01, ABBOTT LABORATORIES INC, ARCHITECT HAVAB-M CALB 4ML RX
Ideally, this is my result:
Item_Num, MFG_Name, Product_Desc
414001000, AMBU INC., ASCOPE 3,LARGE,5.8/2.8 5EA/BX
06L21-01, ABBOTT LABORATORIES INC, 07K0040AT HAVAB-M CALB 4ML RX
Idea so far that I have is to count the length of the description to quantify. Then use min/max to select the one that is desired. My code so far is:
SELECT
x.distributor_item_number,
x.mfg_item_number,
x.mfg_name,
x.distributor_product_description,
min(x.[LENGTH OF DESC])
INTO Product_Table
FROM [Product Table] AS x
INNER JOIN
(SELECT p.distributor_item_number,
max(p.[LENGTH OF DESC]) AS [MAX LENGTH]
FROM [Product Table] AS p
GROUP BY p.distributor_item_number) AS y ON (y.distributor_item_number = x.distributor_item_number) AND (y.[MAX LENGTH] = X.[LENGTH OF DESC])
GROUP BY x.distributor_item_number, x.mfg_item_number, x.mfg_name, x.distributor_product_description;
However, it doesn't seem to be working. I am still having duplicates in the data.
Any help would be wonderful.

I ended up adding a sequencing number in the select statement that would sequence for each group. Then I just selected the first row of each sequence. Code below
SELECT
p1.mfg_item_number,
p1.mfg_name,
p1.distributor_product_description,
Count(*) AS Seq
INTO Clean_Product_Table
FROM Product_Table AS p1
INNER JOIN Product_Table AS p2 ON (p2.mfg_item_number = p1.mfg_item_number)
AND (P2.MFG_NAME = P1.MFG_NAME)
AND (p2.InoSeq <= p1.InoSeq)
GROUP BY p1.mfg_item_number,
p1.mfg_name,
p1.distributor_product_description
HAVING COUNT(*) = 1
ORDER BY 1, 2, 5;

Related

Find duplicate values in SQL search

I'm having a problem finding the right solution for this problem. I want to find duplicate invoices in a list of thousands of invoices and want to make sure were not paying the same invoice twice where the same invoice has been scanned in to our system twice but where the invoice number is not the same but has the same supplier id. I have tried to use COUNT to find all the invoices that have the same supplier id and the same amount but cannot get it to work.
(In the example I want to find the two Johns Bakery invoices)
You would seem to want:
select ip.*
from invoice_payment ip
where exists (select 1
from invoice_payment ip2
where ip2.supplier_id = ip.supplier_id and
ip2.name = ip.name and
ip2.amount = ip.amount and
ip2.invoice_number <> ip.invoice_number
)
order by supplier_id, name, amount;

T-SQL JOIN Table On Self Based on Closest Date

Thank you in advance for reading!
The question I'm trying to answer is: "How much do parts really cost to make?" We manufacture by machining raw metal billets down to metal parts. Final parts are sold to a customer and scrap metal from the process is sold to the scrap yard.
For business/ERP configuration reasons our scrap vendor is listed as a customer and we ship him 'parts' like our other customers. These dummy parts are simply for each of the metal alloys we work with, so there is one dummy scrap part for each alloy we use. The scrap shipments are made whenever we fill our scrap bins so there's no defined time interval.
I'm trying to connect the ship date of a real part to a real customer to the closest scrap ship date of the same alloy. Then I can grab the scrap value per pound we were paid and include it in our revenue for the parts we make. If I can ask for the world it would be helpful to know how to grab the scrap shipment immediately before or immediately after the shipment of a real part - I'm sure management will change their minds several times debating if they want to use the 'before' or 'after' number.
I've tried other solutions and can't get them to work. I'm crying uncle, I simply can't get it to work....the web SQL interface our ERP uses claims it's T-SQL... thank you for reading this far!
What I'd like the output to look like is:
Customer Part Price Alloy Weight_Lost Scrap_Value Ship_Date
ABC Widget1 99.99 C182 63 2.45 10-01-2016
Here's the simplest I can boil the tables down to:
SELECT
tbl_Regular_Sales.Customer
tbl_Regular_Sales.Part
tbl_Regular_Sales.Price
tbl_Regular_Sales.Alloy
tbl_Regular_Sales.Weight_Lost
tbl_Scrap_Sales.Price AS 'Scrap_Value'
tbl_Regular_Sales.Ship_Date
FROM
(SELECT P.Part
,P.Alloy
,P.Price
,S.Ship_Date
,S.Customer
FROM Part AS P
JOIN S AS S
ON S.Part_Key = P.Part_Key
WHERE Shipper.Customer = 'Scrap_Yard'
) AS tbl_Scrap_Sales
JOIN
(SELECT P.Part
,P.Weight_Lost
,P.Alloy
,P.Price
,S.Ship_Date
,S.Customer
FROM Part AS P
JOIN S AS S
ON S.Part_Key = P.Part_Key
WHERE Shipper.Customer <> 'Scrap_Yard' ) AS tbl_Regular_Sales
ON
tbl_Regular_Sales.Alloy = tbl_Scrap_Sales.Alloy
AND <Some kind of date JOIN to get the closest scrap shipment value>
Something like this may do the trick:
WITH cteScrapSales AS (
SELECT
P.Alloy
,P.Price
,S.Ship_Date
FROM Part AS P
JOIN Shipper AS S ON S.Part_Key = P.Part_Key
WHERE S.Customer = 'Scrap_Yard'
), cteRegularSales AS (
SELECT
P.Part_Key
,P.Part
,P.Weight_Lost
,P.Alloy
,P.Price
,S.Ship_Date
,S.Customer
FROM Part AS P
JOIN Shipper AS S ON S.Part_Key = P.Part_Key
WHERE S.Customer <> 'Scrap_Yard'
)
SELECT
C.Customer
,C.Part
,C.Price
,C.Alloy
,C.Weight_Lost
,C.Scrap_Value
,C.Ship_Date
FROM (
SELECT R.*, S.Price AS Scrap_Value, ROW_NUMBER() OVER (PARTITION BY R.Part_Key ORDER BY DATEDIFF(SECOND, R.Ship_Date, S.Ship_Date)) ix
FROM cteRegularSales R
JOIN cteScrapSales S ON S.Allow = R.Allow AND S.Ship_Date > R.Ship_Date
) AS C
WHERE C.ix = 1;

Using the HAVING Clause with GROUP by to Return Unique Records

Good day all,
I am having difficulty understanding the mechanics of the GROUP BY AND HAVING clause and was hoping for some advice.
I am trying to query two tables - PRODUCTS and ORDER_ITEMS. The PRODUCT_ID column is used to link these two tables.
I wish to view products which have been ordered from a certain supplier (filtered using the SUPPLIER_ID column which is in ORDER_ITEMS); have been successfully ordered before (ORDER_STATUS 6 in ORDER_ITEMS);and which have not been deleted (RECORD_DELETED column in ORDER_ITEMS). I only use the PRODUCTS table to show the name of the product. Furthermore I only want distinct products returned, meaning I want to exclude any results which duplicate the PRODUCT_ID column
This is the query that I am using:
SELECT
PD.PRODUCT_ID,
PD.PRODUCT_NAME,
PD.BARCODE,
PD.SUPPLIER_BARCODE,
COUNT(PD.PRODUCT_ID) AS COUNTED,
ODI.ORDER_ITEM_ID
FROM PRODUCTS PD
INNER JOIN ORDER_ITEMS ODI
ON PD.PRODUCT_ID = ODI.PRODUCT_ID
WHERE ODI.SUPPLIER_ID = 34359738399
AND ORDER_STATUS = 6
AND ODI.RECORD_DELETED = 0
GROUP BY PD.PRODUCT_ID,PD.PRODUCT_NAME,PD.BARCODE,PD.SUPPLIER_BARCODE,ODI.ORDER_ITEM_ID
HAVING COUNT(ODI.PRODUCT_ID) = 1
ORDER BY PRODUCT_ID ASC
Unfortunately this is returning 502 records with many of them duplicating the PRODUCT_ID. If I remove the ORDER_ITEM_ID column from the query 175 records are returned. These 175 records are products that meet the criteria given above. The problem is that I also need to pull the ORDER_ITEM_ID from ORDER_ITEMS (along with some other columns).
I vaguely understand that when I include ORDER_ITEMS the query is going to group the data by the ORDER_ITEM column and so will count the PRODUCT_ID values based on each individual ORDER_ITEM_ID. This results in there always being a count of 1 for each product.
How does one get around this? Also, is there a more suitable way of carrying out this task which would allow me to include one ORDER_ITEM record for every duplicated product? Rather than omitting them altogether as I am doing above?
This is some of the data that is returned by the query above:
PRODUCT_ID,PRODUCT_NAME,BARCODE,SUPPLIER_BARCODE,COUNTED,ORDER_ITEM_ID
34359738628,ADCORTYL INTRA-ARTIC/DERMAL 10MG/ML 5ML,5099627022132,5012712000037,1,34359755708
34359739609,ARTELAC 3.2MG/ML EYE DROPS SOLN,5099627456722,5027519008933,1,34359741719
34359739626,ASACOLON 500MG SUPPOSITORIES,5099627516587,5015313012737,1,34359742783
34359739767,ATROVENT 250MCG/1ML UDV NEB SOLN,5099627639637,5012816012561,1,34359738421
34359739770,ATROVENT 500MCG/2ML UDV NEB SOLN,5099627460293,5012816012592,1,34359743524
34359739893,AZOPT 10MG/ML EYE DROPS SUSP,5099627831543,5015664002753,1,34359749091
34359739893,AZOPT 10MG/ML EYE DROPS SUSP,5099627831543,5015664002753,1,34359749687
34359739893,AZOPT 10MG/ML EYE DROPS SUSP,5099627831543,5015664002753,1,34359749715
34359739893,AZOPT 10MG/ML EYE DROPS SUSP,5099627831543,5015664002753,1,34359754053
34359740053,BACTIGRAS MED DRSS 10CMX10CM STERILE GMS,5099627672368,5000223421984,1,34359748101
34359740062,BACTROBAN 2% OINTMENT,5099627053914,5099211003165,1,34359755226
34359740558,BETNOVATE RD CREAM,5099627005692,5099211001642,1,34359752422
34359740558,BETNOVATE RD CREAM,5099627005692,5099211001642,1,34359738487
34359741045,BISODOL ANTACID TABS,5099627057707,5014398001438,1,34359750542
34359741995,BROLENE 0.1% EYE DROPS SOLN,5099627006323,50982790,1,34359746555
34359741995,BROLENE 0.1% EYE DROPS SOLN,5099627006323,50982790,1,34359751650
34359741995,BROLENE 0.1% EYE DROPS SOLN,5099627006323,50982790,1,34359751783
34359742132,BURINEX 1MG TABS,5099627551328,5702191004212,1,34359749705
34359742152,BUSCOPAN 20MG/ML SOLN FOR INJ,5099627006620,5012816018532,1,34359749083
In the example above, several records were returned with duplicate PRODUCT_ID values e.g ASACOLON 500MG SUPPOSITORIES
You need GROUP_CONCAT/LISTAGG equivalent in SQL Server. You can use XML, STUFF and correlated subquery as replacement.
If PRODUCT_ID is UNIQUE you can use:
WITH cte AS
(
SELECT
PD.PRODUCT_ID,
PD.PRODUCT_NAME,
PD.BARCODE,
PD.SUPPLIER_BARCODE,
ODI.ORDER_ITEM_ID
FROM PRODUCTS PD
JOIN ORDER_ITEMS ODI
ON PD.PRODUCT_ID = ODI.PRODUCT_ID
WHERE ODI.SUPPLIER_ID = 34359738399
AND ORDER_STATUS = 6
AND ODI.RECORD_DELETED = 0
)
SELECT PRODUCT_ID,
PRODUCT_NAME,
BARCODE,
SUPPLIER_BARCODE,
[COUNTED] = COUNT(PD.PRODUCT_ID),
[ORDER_ITEM_ID] = STUFF((SELECT CONCAT(',' , ORDER_ITEM_ID)
FROM cte c2
WHERE c2.PRODUCT_ID = c1.PRODUCT_ID
ORDER BY c2.ORDER_ITEM_ID
FOR XML PATH ('')), 1, 1, '')
FROM cte c1
GROUP BY PRODUCT_ID,PRODUCT_NAME,BARCODE,SUPPLIER_BARCODE
HAVING COUNT(PRODUCT_ID) = 1
ORDER BY PRODUCT_ID ASC;
LiveDemo_SimplifiedVersion
Otherwise correlate using multiple columns:
SELECT CONCAT(',' , ORDER_ITEM_ID)
FROM cte c2
WHERE c2.PRODUCT_ID = c1.PRODUCT_ID
AND c2.PRODUCT_NAME = c1.PRODUCT_NAME
AND ...
ORDER BY c2.ORDER_ITEM_ID
FOR XML PATH ('')), 1, 1, '')

SSRS is removing multiple lines in grouping

I have an SSRS report with the following query:
SELECT DISTINCT
Rtrim(ltrim(CUSTNAME)) as 'CUSTNAME',
ItemName,
ISNULL(NAME, LOGCREATEDBY) AS 'Modified By'
,b.ITEMID as 'Item Id'
,[PRICE_NEW] as 'New Price'
,[PRICE_OLD] as 'Old Price'
,[PRICEUNIT_NEW] as 'New Unit Price'
,[PRICEUNIT_OLD] as 'Old Unit Price'
,LOGCREATEDDATE as 'Created Date'
,LOGCREATEDTIME
,(select Description from Dimensions where a.Dimension2_ = Dimensions.Num) as 'Division'
,(Select TOP 1 INVENTTRANS.DATEFINANCIAL From INVENTTRANS Where
INVENTTRANS.ITEMID = B.ITEMID and InvoiceID like 'Inv%' order by INVENTTRANS.DATEFINANCIAL desc) As 'LastInvoice'
FROM PMF_INVENTTABLEMODULELOG AS b
LEFT JOIN USERINFO ON ID = LOGCREATEDBY
LEFT JOIN INVENTTABLE AS a on a.ITEMID in (b.itemId)
WHERE LOGCREATEDDATE between #beginCreatedDate and #endCreatedDate
and a.dimension2_ in (#dimension)
order by LOGCREATEDDATE,LOGCREATEDTIME desc
What happens, in short, is it goes through a table and picks out an item number and lists each price change for that item.
the query, wen run, will return something like:
CUSTNAME | Modified By | Item ID | New Price | Old Price
------------------------------------------------------------------
Performance Joe 12345 21.50 21.49
Performance Mary 12345 21.49 19.10
(This happens to be the return that is causing problem)
My report lists each line by division, Customer name and item Number. The problem is, when I have an Item ID group, it adds up the total (makes sense) So i get rid of the item number group, but now it will list only one item per customer!
it should show the two lines for Performance in the example, but instead, it lists neither. I would like it to show every single line for each customer. It must be the ITEM ID group, but I can't seem to get it right.
Rather than getting rid of the group, change it to show detail data.
Right click on the group select 'Group Properties' and select the Group On expression. Then click the delete button. It will then no longer sum as it is a detail group.
I would recommend that you then remove sum from the relevant expressions, to avoid confusion, as they will only be summing single values but will make it look otherwise.

How can I modify my SQL subquery to get my query to work?

So basically, I know SQL doesn't allow this, but I wish I could do this because it's the only way I can think of to make my query.
So for example, say that there are 2 delivery trucks heading to the address '55 Alaska Rd.' with some items to deliver.
1 truck has 100 iPads and 200 iPhones
1 truck has 150 iPads
I am happily monitoring them by running this query:
[select truck.truck_id,
truck.driver_name,
truck.current_location,
item.prtnum,
item.quantity
from truck,
box,
item,
shipment
where item.box_id = box.box_id
and box.truck_id = truck.truck_id
and truck.ship_adr = shipment.ship_adr
and shipment.ship_adr = '55 Alaska Rd.']
It tells me who my 2 guys are, their current location, and what they're carrying. It returns 3 rows:
| Truck ID | Driver Name | Current Location | Item Number | Item Quantity |
|---TRK83--|---Gene R.---|------Hwy 18------|----iPad-----|------100------|
|---TRK83--|---Gene R.---|------Hwy 18------|---iPhone----|------200------|
|---TRK59--|---Jill M.---|------Hwy 894-----|----iPad-----|------150------|
Then my manager calls me and DEMANDS that I send him this same query, but modified so that it only returns trucks that have 1 item on it. So in this instance, he only wants the last row to be returned, because it only has iPads on it, and the other one has iPads and iPhones.
This is how I wish I could do it.
[select t.truck_id,
t.driver_name,
t.current_location,
i.prtnum,
i.quantity
from item i,
box b,
truck t,
shipment s
where i.box_id = b.box_id
and b.truck_id = t.truck_id
and t.truck_id in (select tr.truck_id,
decode(max(it.prtnum), min(it.prtnum), max(it.prtnum), 'Mixed Items') prtnum
from item it,
box bo,
truck tr
where it.box_id = bo.box_id
and bo.truck_id = tr.truck_id
and tr.truck_id = t.truck_id
and prtnum != 'Mixed Items'
group by tr.truck_id) p
and t.ship_adr = s.ship_adr
and s.ship_adr = '55 Alaska Rd.']
That subquery is supposed to be selecting only the trucks in the parent query that do not have Mixed Parts on it. But that doesn't work because:
I can only have "tr.truck_id" in the subquery select; the decode can't also be there, but I don't know where else to put it.
I can't use the alias "prtnum" like that.
Does anyone know how I can achieve what my hypothetical boss wants me to do? Does anyone have any ideas on how I can alter the query to make it only select the trucks that have 1 item in it? I am going to have to change a lot of queries to do this, and I just can't figure out a good clean way. Or even a bad way that works.
Thank you for reading, and thank you for any help!
You can create a group for each truck, and then per truck, demand that they carry only one type of item. Group-wide conditions are set with the having clause. Here's an example:
select truck.truck_id
, truck.driver_name
, truck.current_location
, sum(item.quantity) as sum_quantity
from truck
join box
on box.truck_id = truck.truck_id
join item
on item.box_id = box.box_id
where truck.ship_adr = '55 Alaska Rd.'
group by
truck.truck_id
, truck.driver_name
, truck.current_location
having count(distinct item.prtnum) = 1 -- Only one item type for this truck
There's no need to join the shipment table. You can use the ship_adr from the truck table to filter on address.
I've added the sum of all quantities per truck to show how you can display group-wide statistics in addition to filtering on them.
Not having your dataset makes it a bit awkward but, as you only want the single count rows a GROUP BY with a trailing HAVING clause may fix the problem:
SELECT truck.truck_id,
truck.driver_name,
truck.current_location,
item.prtnum,
item.quantity
FROM truck,
box,
item,
shipment
WHERE item.box_id = box.box_id
AND box.truck_id = truck.truck_id
AND truck.ship_adr = shipment.ship_adr
AND shipment.ship_adr = '55 Alaska Rd.'
GROUP BY truck.truck_id,
truck.driver_name,
truck.current_location
HAVING COUNT(item.quantity) = 1
Sounds like a job for analytic functions:
with truck_info as (select truck.truck_id,
truck.driver_name,
truck.current_location,
item.prtnum,
item.quantity,
count(item.prtnum) over (partition by truck.truck_id, truck.driver_name, truck.current_location) cnt
from truck,
box,
item,
shipment
where item.box_id = box.box_id
and box.truck_id = truck.truck_id
and truck.ship_adr = shipment.ship_adr
and shipment.ship_adr = '55 Alaska Rd.')
select truck_id,
driver_name,
current_location,
prtnum,
quantity
from truck_info
where cnt = 1;
N.B. untested
If you've never come across analytic functions before, they're well worth learning about!