Removing Duplicate Rows in SQL

Removing Duplicate Rows in SQL - sql

I have a query that returns a list of devices that have multiple "moved" dates. I only want the oldest date entry. I used the MIN function to give me the oldest date, but I'm still getting multiple entries (I am, however, getting less than before). I tried to get a more precise JOIN, but I couldn't narrow the fields down any more.
If you'll look at the screenshot, the first three rows have the same "wonum" but three different "Moved Dates." I am thinking that if I can somehow take the oldest "Moved Date" out of those three and remove the other rows, that would give me the result I'm looking for. However, I'm not skilled enough to do that (I've only been working in SQL for a few months now). Would that work, or is there a better way to narrow down my results? I'm wondering if I need to perform some kind of sub-query to get what I need.
I've looked around but can't find anything that allows me to remove a row of data the way I'm looking to. Nor can I seem to find a reason my MIN function isn't paring down the data anymore than it is. Below is the code I'm currently using. Thanks for any help that can be given.
SELECT wo.wonum, wo.location, wo.statusdate, wo.status, l.subcontractor,
wo.description, MIN(ast.datemoved) AS 'Moved Date'
FROM workorder wo
JOIN locations l ON wo.location = l.location
JOIN asset a ON wo.location = a.location
-- AND wo.assetnum = a.assetnum
JOIN assettrans ast ON a.assetnum = ast.assetnum
-- AND a.assetid = ast.assetid
WHERE wo.description LIKE '%deteriorating%'
AND wo.status != 'close'
GROUP BY wo.wonum, wo.location, wo.statusdate,
wo.status, l.subcontractor, wo.description
ORDER BY wo.wonum;
DBV SQL Query Result
Update: Table Data

You need to do the grouping in your join statement inside a subquery(not tested, but you'll get the idea):
Replace
JOIN assettrans ast ON a.assetnum = ast.assetnum
With
inner join
(
select ast.assetnum,MIN(ast.datemoved) AS 'Moved Date'
from assettrans ast
group by ast.assetnum
) grouped
on a.assetnum = grouped.assetnum
So the full query looks like:
SELECT wo.wonum, wo.location, wo.statusdate, wo.status, l.subcontractor,
wo.description, grouped.MovedDate
FROM workorder wo
JOIN locations l ON wo.location = l.location
JOIN asset a ON wo.location = a.location
INNER JOIN
(
select ast.assetnum,MIN(ast.datemoved) AS MovedDate
from assettrans ast
group by ast.assetnum
) grouped
on a.assetnum = grouped.assetnum
WHERE wo.description LIKE '%deteriorating%'
AND wo.status != 'close'
ORDER BY wo.wonum;

Please test before using in production
--if you have id column and leave the oldest record
delete from T1 from MyTable T1, MyTable T2
where T1.dupField = T2.dupField (and add more filters if applies)
and
T1.uniqueField > T2.uniqueField
--if you want to delete the new "Moved Dates" and leave the oldest one
delete from T1 from MyTable T1, MyTable T2
where T1.dupField = T2.dupField (and add more filters if applies)
and
T1.Moved Dates > T2.Moved Dates

Related

sql counting the number is not working correctly

I make related queries and the counting does not work correctly, when I connect 4 and join and add a condition, it does not count correctly, but without the 4th joina and the condition it works correctly. first option result = 2
SELECT
pxixolog_details.*,
directions.direction,
COUNT(directions.direction) procent
FROM
pxixolog_details
LEFT JOIN psixologs_direction ON pxixolog_details.id = psixologs_direction.psixolog_id
LEFT JOIN directions ON directions.id = psixologs_direction.direction_id
LEFT JOIN psixologs_weeks ON pxixolog_details.id = psixologs_weeks.psixolog_id
WHERE
directions.direction IN(
'Трудности в отношениях',
'Проблемы со сном',
'Нежелательная агрессия'
)
AND birthday BETWEEN '1956-04-29' AND '2021-04-29' AND psixologs_weeks.week = '4'
GROUP BY
pxixolog_details.id
and the second one doesn't work correctly. result = 4
SELECT
pxixolog_details.*,
directions.direction,
COUNT(directions.direction) procent
FROM
pxixolog_details
LEFT JOIN psixologs_direction ON pxixolog_details.id = psixologs_direction.psixolog_id
LEFT JOIN directions ON directions.id = psixologs_direction.direction_id
LEFT JOIN psixologs_weeks ON pxixolog_details.id = psixologs_weeks.psixolog_id
LEFT JOIN psixologs_times ON pxixolog_details.id = psixologs_times.psixolog_id
WHERE
directions.direction IN(
'Трудности в отношениях',
'Проблемы со сном',
'Нежелательная агрессия'
)
AND birthday BETWEEN '1956-04-29' AND '2021-04-29' AND psixologs_weeks.week = '4'
AND (psixologs_times.time = '09:00' OR psixologs_times.time = '10:00')
GROUP BY
pxixolog_details.id
what am I doing wrong?

You get double the amount of results when doing 4 JOINs because through the new (4th) JOIN you allow 2 records (9:00 and 10:00 o'clock) for each of the other joined records in the first 3 JOINs. That can lead to the observed result.
Check your data and make sure that your 4th JOIN condition yields a 1:1 record matching with the other data.

The last table has psixologs_times matches multiple rows for each psixolog_id.
You can easily see this using a query:
select psixolog_id, count(*)
from psixologs_times
group by psixolog_id
having count(*) > 1;
How you fix this problem depends on what you want to do. The simplest solution is to use count(distinct):
COUNT(DISTINCT directions.direction) as procent
However, this might just be hiding the problem. You might want to choose one row from the psixologs_times table. Or pre-aggregate it. Or do something else.

SQL Count uses info from join

I need to count the amount of times InternalMenuLinkItemNumber appears per sitenumber and per order mode. Then i need to show MenuItemID and i do that with a inner join using item numbers, but when i add this join it skews the QTY result. I've tried using distinct in the COUNT but then all the QTY is 1. Please assist.
Query and result where QTY result is 100% correct but no MenuItemID.
SELECT ST_Sites.BusinessUnit,[ST_SalesMixTransactions_RealTimeFeed].SiteNumber,InternalMenuLinkItemNumber,[ST_SalesMix].MenuItemID,OrderMode,SellingPrice,COUNT(ST_SalesMixTransactions_RealTimeFeed.InternalMenuLinkItemNumber) as QTY
FROM ST_AlohaSalesMixTransactions_RealTimeFeed
inner join ST_Sites on ST_Sites.SiteNumber= [ST_SalesMixTransactions_RealTimeFeed].SiteNumber
where [ST_SalesMixTransactions_RealTimeFeed].BusinessDate between'2017-06-27'and'2017-07-03' and [ST_SalesMixTransactions_RealTimeFeed].SiteNumber = '1001006'
group by InternalMenuLinkItemNumber,[ST_SalesMixTransactions_RealTimeFeed].SiteNumber,OrderMode,SellingPrice,ST_Sites.BusinessUnit,[ST_SalesMix].MenuItemID
order by InternalMenuLinkItemNumber
Result where QTY comes out as expected:
If I add the inner join to get MenuItemID:
Query:
SELECT ST_Sites.BusinessUnit,[ST_SalesMixTransactions_RealTimeFeed].SiteNumber,InternalMenuLinkItemNumber,[ST_SalesMix].MenuItemID,OrderMode,SellingPrice,COUNT(ST_SalesMixTransactions_RealTimeFeed.InternalMenuLinkItemNumber) as QTY
FROM ST_AlohaSalesMixTransactions_RealTimeFeed
inner join ST_SalesMix on [ST_AlohaSalesMixTransactions_RealTimeFeed].InternalMenuLinkItemNumber= ST_SalesMix.ItemNumber
inner join ST_Sites on ST_Sites.SiteNumber= [ST_SalesMixTransactions_RealTimeFeed].SiteNumber
where [ST_SalesMixTransactions_RealTimeFeed].BusinessDate between'2017-06-27'and'2017-07-03' and [ST_SalesMixTransactions_RealTimeFeed].SiteNumber = '1001006'
group by InternalMenuLinkItemNumber,[ST_SalesMixTransactions_RealTimeFeed].SiteNumber,OrderMode,SellingPrice,ST_Sites.BusinessUnit,[ST_SalesMix].MenuItemID
order by InternalMenuLinkItemNumber
Result where QTY is now way off:
If I use distinct:
Query:
SELECT ST_Sites.BusinessUnit,[ST_SalesMixTransactions_RealTimeFeed].SiteNumber,InternalMenuLinkItemNumber,[ST_SalesMix].MenuItemID,OrderMode,SellingPrice,COUNT(distinct ST_SalesMixTransactions_RealTimeFeed.InternalMenuLinkItemNumber) as QTY
FROM ST_AlohaSalesMixTransactions_RealTimeFeed
inner join ST_SalesMix on [ST_AlohaSalesMixTransactions_RealTimeFeed].InternalMenuLinkItemNumber= ST_SalesMix.ItemNumber
inner join ST_Sites on ST_Sites.SiteNumber= [ST_SalesMixTransactions_RealTimeFeed].SiteNumber
where [ST_SalesMixTransactions_RealTimeFeed].BusinessDate between'2017-06-27'and'2017-07-03' and [ST_SalesMixTransactions_RealTimeFeed].SiteNumber = '1001006'
group by InternalMenuLinkItemNumber,[ST_SalesMixTransactions_RealTimeFeed].SiteNumber,OrderMode,SellingPrice,ST_Sites.BusinessUnit,[ST_SalesMix].MenuItemID
order by InternalMenuLinkItemNumber
Result for QTY is now all 1:

If I understand correctly, you want something like
SELECT SiteNumber, OrderMode, count([DISTINCT?] InternalMenuLinkItemNumber)
...
GROUP BY SiteNumber, OrderMode
You want to count the InternalMenuLinkItemNumber, so InternalMenuLinkItemNumber must not occur in the GROUP BY clause.
EDIT:
When using GROUP BY, the SELECT list may only contain columns also mentioned in the GROUP BY clause, or aggregate functions (on arbitrary columns).

Try this:
SELECT a.InternalMenuLinkItemNumber, a.SiteNumber, a.OrderMode, a.SellingPrice, a.BusinessUnit, a.MenuItemID, a.QTY, CASE WHEN MAX(b.MenuItemID) = MIN(b.MenuItemID) THEN MAX(b.MenuItemID) ELSE -1 END AS MenuItemID
FROM
(SELECT ST_Sites.BusinessUnit, [ST_SalesMixTransactions_RealTimeFeed].SiteNumber, InternalMenuLinkItemNumber, [ST_SalesMix].MenuItemID, OrderMode, SellingPrice, COUNT(ST_SalesMixTransactions_RealTimeFeed.InternalMenuLinkItemNumber) as QTY
FROM ST_AlohaSalesMixTransactions_RealTimeFeed
INNER JOIN ST_Sites on ST_Sites.SiteNumber = [ST_SalesMixTransactions_RealTimeFeed].SiteNumber
WHERE [ST_SalesMixTransactions_RealTimeFeed].BusinessDate between'2017-06-27'and'2017-07-03' and [ST_SalesMixTransactions_RealTimeFeed].SiteNumber = '1001006'
GROUP BY InternalMenuLinkItemNumber, [ST_SalesMixTransactions_RealTimeFeed].SiteNumber, OrderMode, SellingPrice, ST_Sites.BusinessUnit, [ST_SalesMix].MenuItemID
) a
INNER JOIN ST_SalesMix b ON a.InternalMenuLinkItemNumber = b.ItemNumber
GROUP BY a.InternalMenuLinkItemNumber, a.SiteNumber, a.OrderMode, a.SellingPrice, a.BusinessUnit, a.MenuItemID, a.QTY
ORDER BY a.InternalMenuLinkItemNumber
It works on the theory that your first query gives good counts, so keep that as it is (it's now the inner query) and then do the problematic join outside of it. Obviously there are many rows from ST_SalesMix for each properly counted row in the first query, so I'm grouping on the original group list but that means that you might get multiple MenuItemIDs. I'm checking for that in the CASE statement by testing the MAX and MIN MenuItemIDs - if they are the same return MAX(MenuItemID) otherwise I'm returning -1 as an error flag to indicate that there were multiple MenuItemIDs associated with this group. It might not be the most efficient method but I didn't have much to go on.
I hope this helps.

all is sorted now. Thanks to everyone.
#jwolf your suggested query was the answer.

Matching values from two columns with the values of two columns in a sub-query

I have a problem I can't really figure out, even though I thought I had the solution.
I think this is DB2 SQL by the way.
I have a customer number and a country code (extracted from a string using SUBSTR) which I don't want to find in combination in a subquery, like so:
SELECT ku.orgnr AS customer ,
Substr(bu.bank_account_swiftadr,5,2) AS country
FROM db811.bet_utl bu
LEFT JOIN db811.henv_utl bh
ON bu.betaling_urn = bh.betaling_urn
LEFT JOIN db811.betaling_status bs
ON bu.betaling_status = bs.betaling_status
LEFT JOIN db811.kunde_orgnr ku
ON bu.kundenr = ku.kundenr
WHERE bu.kanal = 'N'
AND (ku.orgnr, Substr(bu.bank_account_swiftadr,5,2)) ;
Not in the results below
SELECT ku.orgnr AS customer ,
Substr(bu.bank_account_swiftadr,5,2) AS country ,
COUNT(*) AS numberof
FROM db811.bet_utl_hist bu
LEFT JOIN db811.kunde_orgnr ku
ON bu.kundenr = ku.kundenr
WHERE
and bu.kanal = 'N'
AND bu.betalingsdato > '2016-01-01'
GROUP BY ku.orgnr ,
substr(bu.bank_account_swiftadr,5,2);
This should work I though, but it seems to match on just one of them, and I need both to be true in order for me to exclude it with the NOT IN.
I assume I am missing something basic since I am quite new at this.

Using a Correlated Subquery within a Left Join

I need a fresh set of eyes on this query. Without getting mega in depth in this code my problem is I'm doing a left join to pull from the TXP_Digital_Signatures (tds) table which stores signatures to the most current version of Treatment Plans (txp_master txp). What this code is doing is bringing back results where tds.signed is null (no signature) or marked N (No). This works, but what this report has done is show people what No's need to become yes, but that is leaving the No left behind, so if there is a more recent Yes then the No in that version of the tds.plan_id it is still pulling that plan_id where I no longer want it where the most recent signature status is a Y (yes), etc. The code snippet below added to the where statement works, but it hides all No's even if there is not a newer Y (yes).
tds.date = (select Max(date) from TXP_Digital_Signatures where tds.plan_id = txp.plan_id)
Can anyone think of a way to either add a correlated subquery to the left join, so it only pulls the max(tds.date) for each tds.plan_id or how to rework my where statements so the no's without a newer yes and the null's still show up. I really don't want to redo the entire report as a grouped report if I can help it where I feel it'll break a ton of stuff on me and basically have me redoing this report from scratch. SQL 2008 R2
SELECT case_status,
CONVERT(CHAR(10), episode_open_date, 101)AS 'Enrolled' ,
txp.patient_id,
p.lname+', ' + p.fname AS 'Client',
CONVERT(CHAR(10), txp.effective_date, 101)AS 'Effective',
CONVERT(CHAR(10), next_review_date, 101)AS 'Review',
txp.signed,
(SELECT location_code FROM staff s WHERE s.staff_id = txp_coordinator_id) AS 'Clinic',
(SELECT s.lname+', ' +s.fname FROM staff s WHERE s.staff_id = txp_coordinator_id) AS 'Coordinator',
(SELECT s.lname+', ' +s.fname FROM staff s WHERE s.staff_id = ts.team_member_id ) AS 'Team',
ts.signed,
tds.signed as 'Patient Sig'
FROM txp_master txp join patient p ON p.patient_id = txp.patient_id and p.episode_id = txp.episode_id
join txp_signature ts on ts.plan_id = txp.plan_id and ts.version_no = txp.version_no and ts.team_member_id <> txp.txp_coordinator_id
left join TXP_Digital_Signatures tds on tds.plan_id = txp.plan_id
where p.case_status = 'A' and
txp.status <> 'er' and patient_signed_date is null
and tds.signed is null or tds.signed = 'N'
and txp.effective_date > '2016-12-31 00:00:00.000'
and tds.date = (select Max(date) from TXP_Digital_Signatures where tds.plan_id = txp.plan_id)
order by patient_id

Your query should work if you correct the sub-query, like this:
(select Max(date) from TXP_Digital_Signatures x where x.plan_id = tds.plan_id)
currently, you are not filtering the sub-query TXP_Digital_Signatures.
One other thing to take note of is that you have a LEFT JOIN on TXP_Digital_Signatures tds yet you include it on the WHERE clause. This will convert it to an INNER JOIN. So decide on what join you require and change accordingly.
If you want results regardless of TXP_Digital_Signatures tds then move those conditions to the ON clause.
If you only want results based on TXP_Digital_Signatures tds then change to INNER JOIN

SQL: SELECT DISTINCT not returning distinct values

The code below is supposed to return unique records in the lp_num field from the subquery to then be used in the outer query, but I am still getting multiples of the lp_num field. A ReferenceNumber can have multiple ApptDate records, but each lp_num can only have 1 rf_num. That's why I tried to retrieve unique lp_num records all the way down in the subquery, but it doesn't work. I am using Report Builder 3.0.
Current Output
Screenshot
The desired output would be to have only unique records in the lp_num field. This is because each value in the lp_num field is a pallet, one single pallet. the info to the right is when it arrived (ApptDate) and what the reference number is for the delivery (ref_num). Therefore, it makes no sense for a pallet to have multiple receipt dates...it can only arrive once...
SELECT DISTINCT
dbo.ISW_LPTrans.item,
dbo.ISW_LPTrans.lot,
dbo.ISW_LPTrans.trans_type,
dbo.ISW_LPTrans.lp_num,
dbo.ISW_LPTrans.ref_num,
(MIN(CONVERT(VARCHAR(10),dbo.CW_CheckInOut.ApptDate,101))) as appt_date_only,
dbo.CW_CheckInOut.ApptTime,
dbo.item.description,
dbo.item.u_m,
dbo.ISW_LPTrans.qty,
(CASE
WHEN dbo.ISW_LPTrans.trans_type = 'F'
THEN 'Produced internally'
ELSE
(CASE
WHEN dbo.ISW_LPTrans.trans_type = 'R'
THEN 'Received from outside'
END)
END
) as original_source
FROM
dbo.ISW_LPTrans
INNER JOIN dbo.CW_Dock_Schedule ON LTRIM(RTRIM(dbo.ISW_LPTrans.ref_num)) = dbo.CW_Dock_Schedule.ReferenceNumber
INNER JOIN dbo.CW_CheckInOut ON dbo.CW_CheckInOut.TruckID = dbo.CW_Dock_Schedule.TruckID
INNER JOIN dbo.item ON dbo.item.item = dbo.ISW_LPTrans.item
WHERE
(dbo.ISW_LPTrans.trans_type = 'R') AND
--CONVERT(VARCHAR(10),dbo.CW_CheckInOut.ApptDate,101) <= CONVERT(VARCHAR(10),dbo.ISW_LPTrans.trans_date,101) AND
dbo.ISW_LPTrans.lp_num IN
(SELECT DISTINCT
dbo.ISW_LPTrans.lp_num
FROM
dbo.ISW_LPTrans
INNER JOIN dbo.item ON dbo.ISW_LPTrans.item = dbo.item.item
INNER JOIN dbo.job ON dbo.ISW_LPTrans.ref_num = dbo.job.job AND dbo.ISW_LPTrans.ref_line_suf = dbo.job.suffix
WHERE
(dbo.ISW_LPTrans.trans_type = 'W' OR dbo.ISW_LPTrans.trans_type = 'I') AND
dbo.ISW_LPTrans.ref_num IN
(SELECT
dbo.ISW_LPTrans.ref_num
FROM
dbo.ISW_LPTrans
--INNER JOIN dbo.ISW_LPTrans on dbo.ISW_LPTrans.
WHERE
dbo.ISW_LPTrans.item LIKE #item AND
dbo.ISW_LPTrans.lot LIKE #lot AND
dbo.ISW_LPTrans.trans_type = 'F'
GROUP BY
dbo.ISW_LPTrans.ref_num
) AND
dbo.ISW_LPTrans.ref_line_suf IN
(SELECT
dbo.ISW_LPTrans.ref_line_suf
FROM
dbo.ISW_LPTrans
--INNER JOIN dbo.ISW_LPTrans on dbo.ISW_LPTrans.
WHERE
dbo.ISW_LPTrans.item LIKE #item AND
dbo.ISW_LPTrans.lot LIKE #lot AND
dbo.ISW_LPTrans.trans_type = 'F'
GROUP BY
dbo.ISW_LPTrans.ref_line_suf
)
GROUP BY
dbo.ISW_LPTrans.lp_num
HAVING
SUM(dbo.ISW_LPTrans.qty) < 0
)
GROUP BY
dbo.ISW_LPTrans.item,
dbo.ISW_LPTrans.lot,
dbo.ISW_LPTrans.trans_type,
dbo.ISW_LPTrans.lp_num,
dbo.ISW_LPTrans.ref_num,
dbo.CW_CheckInOut.ApptDate,
dbo.CW_CheckInOut.ApptTime,
dbo.item.description,
dbo.item.u_m,
dbo.ISW_LPTrans.qty
ORDER BY
dbo.ISW_LPTrans.lp_num

In a nutshell - the way you use DISTINCT is logically wrong from SQL perspective.
Your DISTINCT is in an IN subquery in the WHERE clause - and at that point of code it has absolutely no effect (except from the performance penalty). Think on it - if the outer query returns non-unique values of dbo.ISW_LPTrans.lp_num (which obvioulsy happens) those values can still be within the distinct values of the IN subquery - the IN does not enforce a 1-to-1 match, it only enforces the fact that the outer query values are within the inner values, but they can match multiple times. So it is definitely not DISTINCT's fault.
I would go through the following check steps:
See if there is insufficient JOIN ON condition(s) in the outer FROM section that leads to data multiplication (e.g. if a table has primary-to-foreign key relation on several columns, but you join on one of them only etc.).
Check which of the sources contains non-distinct records in the outer FROM section - then either cleanse your source, or adjust the JOIN condition and / or the WHERE clause so that you only pick distinct & correct records. In fact you might need to SELECT DISTINCT in the FROM sections - there it would make much more sense.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Removing Duplicate Rows in SQL - sql

Related

sql counting the number is not working correctly

SQL Count uses info from join

Matching values from two columns with the values of two columns in a sub-query

Using a Correlated Subquery within a Left Join

SQL: SELECT DISTINCT not returning distinct values

Categories

Resources