Change duplicate value in a column - sql

Can you please tell me what SQL query can I use to change duplicates in one column of my table?
I found these duplicates:
SELECT Model, count(*) FROM Devices GROUP BY model HAVING count(*) > 1;
I was looking for information on exactly how to change one of the duplicate values, but unfortunately I did not find a specific option for myself, and all the more information is all in abundance filled by deleting the duplicate value line, which I don't need. Not strong in SQL at all. I ask for help. Thank you so much.

You can easily use a Window Functions such as ROW_NUMBER() with partitioning option in order to group by Model column to eliminate the duplicates, and then pick the first rows(rn=1) returning from the subquery such as
WITH d AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Model) AS rn
FROM Devices
)
SELECT ID, Model -- , and the other columns
FROM d
WHERE rn = 1
Demo

use exists as follows:
update d
set Model = '-'
from Devices d
where exists (select 1 from device dd where dd.model = d.model and dd.id > d.id)

After the command:
SELECT Model, count (*) FROM Devices GROUP BY model HAVING count (*)> 1;
i get the result:
1895 lines = NULL;
3383 lines with duplicate values;
and all these values are 1243.
after applying your command:
update Devices set
Model = '-'
where id not in
(select
min(Devices .id)
from Devices
group by Devices.Model)
i got 4035 lines changed.
if you count, it turns out, (3383 + 1895) = 5278 - 1243 = 4035
and it seems like everything fits together, the result suits, it works.

Related

SQL Server 2008 R2 - How to filter a group of records with conditional logic?

I have the following dataset:
Each sales order line has an item which can be found in various location areas in our warehouse (UPPER, GROUND, FLOOR). What I want is a way to evaluate each sales order line and then pick one location, based on a condition.
The condition would say, if SO line contains a location with FLOOR, pick only that location, else check if it contains GROUND, then pick that, or if it contains neither ground or floor then return UPPER.
I don't want to see multiple location areas for each SO line. What's all the ways this can be done? I'd imagine some form of using a case statement with a HAVING clause?
This can be done using the row_number function by ordering the location areas based on the conditions.
select *
from (select t.*
,row_number() over(partition by so#
order by case when location_area='Floor' then 1
when location_area='GROUND' then 2
else 3 end) rn
from tablename t
) x
where rn = 1
Select coalesce(f.SO, g.SO, u.SO) SO,
coalesce(f.Line, g.Line, u.Line) Line,
coalesce(f.item_code, g.item_code, u.item_code) item_code,
coalesce(f.item_description, g.item_description, u.item_description) item_description,
coalesce(f.SO_Qty, g.SO_Qty, u.SO_Qty) SO_Qty,
coalesce(f.branch_Number, g.branch_Number, u.branch_Number) branch_Number,
coalesce(f.location_area, g.location_area, u.location_area) location_area
from myTable f
full join myTable g
on f.location_area='floor'
and g.SO = f.So
and g.location_area='ground'
full join myTable u
on u.SO = f.So
and u.location_area='upper'

Getting count of latest items from secondary view

I've got a problem constructing a somewhat advanced query.
I have two views - A and B where B is the child of A.
This relationship is handled by
A vw_StartDate.MapToID
=
B vw_TrackerFeaturesBasic.StartDateMapToID.
What I need to do is grab every parent A and a count of the LATEST added children B.
This is a query that gets the latest children B in a SSRS-report: (This does not use A at all!):
/****** Selecting the incomplete, applicable issues of the latest insert. ******/
SELECT DISTINCT [TRK_Feature_LKID]
,[TrackerGroupDescription]
,[ApplicableToProject]
,[ReadyForWork]
,[DateStamp]
FROM [vw_TRK_TrackerFeaturesBasic] as temp
WHERE ApplicableToProject = 1
AND DateStamp = (SELECT MAX(DateStamp) from [vw_TRK_TrackerFeaturesBasic] WHERE [TRK_StartDateID] = #WSCTrackerID AND StartDateMapToID = #HierarchyID AND [TRK_Feature_LKID] = temp.TRK_Feature_LKID )
ORDER BY DateStamp DESC
I've tried a few different ways, but I can't figure out how to get the latest added children from the subquery (I've mainly used a subquery nestled in a COUNT / Case + SUM).
Since SQL Server doesn't really allow us to use aggregate functions in aggregate functions I'm not sure how to get the latest added item in a subquery as the subquery most likely has to be nested in a COUNT or something similar.
Below is a version I'm working on (doesn't work):
Column 'vw_TRK_TrackerFeaturesBasic.StartDateMapToID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
SELECT b.TRK_StartDateID
,(SELECT COUNT(b.TRK_Feature_LKID) FROM b )
FROM vw_TRK_StartDate as a
LEFT JOIN vw_TRK_TrackerFeaturesBasic as b
ON b.StartDateMapToID = a.MapToID AND b.DateStamp = (SELECT MAX(DateStamp) FROM [vw_TRK_TrackerFeaturesBasic] WHERE [TRK_StartDateID] = 47 AND [StartDateMapToID] = 13)
WHERE MapToId = 13
--(SELECT MAX(DateStamp) from [vw_TRK_TrackerFeaturesBasic] WHERE [TRK_StartDateID] = #WSCTrackerID AND StartDateMapToID = #HierarchyID AND [TRK_Feature_LKID] = temp.TRK_Feature_LKID
GROUP BY b.TRK_StartDateID
Your question is a bit hard to follow, because I don't see a relationship between your queries and this request:
What I need to do is grab every parent A and a count of the LATEST
added children B.
Focusing on this statement, you can do this readily with window functions:
SELECT b.StartDateMapToID, COUNT(*)
FROM (SELECT tfb.*,
MAX(tfb.DateStamp) OVER (PARTITION BY tfb.StartDateMapToID) as max_DateStamp
FROM vw_TRK_TrackerFeaturesBasic tfb
) tfb
GROUP BY b.StartDateMapToID;

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

SQL Statement to select row where previous row status = 'C' AS400

This is being run on sql for IBMI Series 7
I have a table which stores info about orders. Each row has an order number (ON), part number(PN), and sequence number(SEQ). Each ON will have multiple PN's linked to them and each part number has multiple SEQ Number. Each sequence number represents the order in which to do work on the part. Somewhere else in the system once the part is at a location and ready to be worked on it shows a flag. What I want to do is get a list of orders for a location that have not yet arrived but have been closed out on the previous location( Which means the part is on it's way).
I have a query listed below that I believe should work but I get the following error: "The column qualifier or table t undefined". Where is my issue at?
Select * From (SELECT M2ON as Order__Number , M2SEQ as Sequence__Number,
M2PN as Product__Number,ML2OQ as Order__Quantity
FROM M2P
WHERE M2pN in (select R1PN FROM R1P WHERE (RTWC = '7411') AND (R1SEQ = M2SEQ)
)
AND M2ON IN (SELECT M1ON FROM M1P WHERE ML1RCF = '')
ORDER BY ML2OSM ASC) as T
WHERE
T.Order__Number in (Select t3.m2on from (SELECT *
FROM(Select * from m2p
where m2on = t.Order__Number and m2pn = t.Product__Number
order by m2seq asc fetch first 2 rows only
)as t1 order by m2seq asc fetch first row only
) as t3 where t3.m2stat = 'C')
EDIT- Answer for anyone else with this issue
Clutton's Answer worked with slight modification so thank you to him for the fast response! I had to name my outer table and specify that in the subquery otherwise the as400 would kick back and tell me it couldn't find the columns. I also had to order by the sequence number descending so that I grabbed the highest record that was below the parameter(otherwise for example if my sequence number was 20 it could grab 5 even though 10 was available and should be shown first. Here is the subquery I now use. Please note the actual query names m2p as T1.
IFNULL((
SELECT
M2STAT
FROM
M2P as M2P_1
WHERE
M2ON = T1.M2ON
AND M2SEQ < T1.M2SEQ
AND M2PN IN (select R1PN FROM R1P WHERE (RTWC = #WC) AND (R1SEQ = T1.M2SEQ))
ORDER BY M2SEQ DESC
FETCH FIRST ROW ONLY
), 'NULL') as PRIOR_M2STAT
Just reading your question, it looks like something I do frequently to emulate RPG READPE op codes. Is the key to M2P Order/Seq? If so, here is a basic piece that may help you build out the rest of the query.
I am assuming that you are trying to get the prior record by key using SQL. In RPG this would be like doing a READPE on the key for a file with Order/Seq key.
Here is an example using a subquery to get the status field of the prior record.
SELECT
M2ON, M2PN, M2OQ, M2STAT,
IFNULL((
SELECT
M2STAT
FROM
M2P as M2P_1
WHERE
M2P_1.M2ON = M2ON
AND M2P_1.M2SEQ < M2SEQ
FETCH FIRST ROW ONLY
), '') as PRIOR_M2STAT
FROM
M2P
Note that this wraps the subquery in an IFNULL to handle the case where it is the first sequence number and no prior sequence exists.

Collapse Data in Sql without stored precedure or function if a value is the same as the value from row above

I got a problem regarding grouping if a value is the same as in the row above.
Our statement looks like this:
SELECT pat_id,
treatData.treatmentdate AS Date,
treatMeth.name AS TreatDataTableInfo,
treatData.treatmentid AS TreatID
FROM dialysistreatmentdata treatData
LEFT JOIN hdtreatmentmethods treatMeth
ON treatMeth.id = treatData.hdtreatmentmethodid
WHERE treatData.hdtreatmentmethodid IS NOT NULL
AND Year(treatData.treatmentdate) >= 2013
AND ekeyid = 12
ORDER BY treatData.ekeyid,
treatmentdate DESC,
treatdatatableinfo;
The output looks like this:
The desired output should be grouped if the value is the same as in the row/rows before and ther should be a ToDate as you can see in the screenshot which is the date of the next row -1 day.
The desired output should look like this:
I hope someone has a solution regarding this matter!
Or maybe someone has an idea how to solve this problem within qlikview.
Looking forward for solutions
Michael
You want to collapse episodes of treatment into single rows. This is a "gaps-and-islands" problem. I like the difference of row numbers approach:
select patid, min(date) as fromdate, max(date) as todate, TreatDataTableInfo,
min(treatid)
from (select td.Pat_ID, td.TreatmentDate As Date, tm.Name As TreatDataTableInfo,
td.TreatmentID As TreatID,
row_number() over (partition by td.pat_id order by td.treatmentdate) as seqnum_p,
row_number() over (partition by td.pat_id, tm.name order by td.treatment_date) as seqnum_pn
from DialysisTreatmentData td Left join
HDTreatmentMethods tm
On tm.ID = td.HDTreatmentMethodID
where td.HDTreatmentMethodID Is Not Null And
td.TreatmentDate) >= '2013-01-01' and
EKeyID = 12
) t
group by patid, TreatDataTableInfo, (seqnum_p - seqnum_pn)
order by patid, TreatmentDate Desc, TreatDataTableInfo;
Note: This uses the ANSI standard window function row_number(), which is available in most databases.
Below is a possible Qlikview solution. I've put some comments in the script. If it's not clear just let me know. The result picture is below the script.
RawData:
Load * Inline [
Pat_ID,Date,TreatDataTableInfo,TreatId
PatNum_12,08.07.2016,HDF Pradilution,1
PatNum_12,07.07.2016,HDF Predilution,2
PatNum_12,23.03.2016,HD,3
PatNum_12,24.11.2015,HD,4
PatNum_12,22.11.2015,HD,5
PatNum_12,04.09.2015,HD,6
PatNum_12,01.09.2015,HD,7
PatNum_12,30.07.2015,HD,8
PatNum_12,12.01.2015,HD,9
PatNum_12,09.01.2015,HD,10
PatNum_12,26.08.2014,Hemodialysis,11
PatNum_12,08.07.2014,Hemodialysis,12
PatNum_12,23.05.2014,Hemodialysis,13
PatNum_12,19.03.2014,Hemodialysis,14
PatNum_12,29.01.2014,Hemodialysis,15
PatNum_12,14.12.2013,Hemodialysis,16
PatNum_12,26.10.2013,Hemodialysis,17
PatNum_12,05.10.2013,Hemodialysis,18
PatNum_12,03.10.2013,HD,19
PatNum_12,24.06.2013,Hemodialysis,20
PatNum_12,03.06.2013,Hemodialysis,21
PatNum_12,14.05.2013,Hemodialysis,22
PatNum_12,26.02.2013,HDF Postdilution,23
PatNum_12,23.02.2013,HDF Pradilution,24
PatNum_12,21.02.2013,HDF Postdilution,25
PatNum_12,07.02.2013,HD,26
PatNum_12,25.01.2013,HDF Pradilution,27
PatNum_12,18.01.2013,HDF Pradilution,28
];
GroupedData:
Load
*,
// assign new GroupId for all rows where the TreatDataTableInfo is equal
if( RowNo() = 1, 1,
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
peek('GroupId') + 1, peek('GroupId'))) as GroupId,
// assign new GroupSubId (incremental int) for all the records in each group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
1, peek('GroupSubId') + 1) as GroupSubId,
// pick the first Date field value and spread it acccross the group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'), TreatId, peek('TreatId_Temp')) as TreatId_Temp
Resident
RawData
;
Drop Table RawData;
right join (GroupedData)
// get the max GroupSubId for each group and right join it to
// the GroupedData table to remove the records we dont need
MaxByGroup:
Load
max(GroupSubId) as GroupSubId,
GroupId
Resident
GroupedData
Group By
GroupId
;
// these are not needed anymore
Drop Fields GroupId, GroupSubId, TreatId;
// replace the old TreatId with the new TreatId_Temp field
// which contains the first TreatId for each group
Rename Field TreatId_Temp to TreatId;