SQL Delete Duplicates with greater difference between two columns - sql

I have a table something similar to :
ID Value1 Value2
122 800 1600
122 800 1800
133 700 1500
154 800 1800
133 700 1500
188 700 1400
176 900 1500
From this table I want to delete the duplicates (ID of 122 and 133) which have a greater difference between value2 and value1.
This means that where ID is 122 I want to keep the first row (1800-800>1600-800)
This means that where ID is 133 I want to keep either one because they both have the same difference.
ID Value1 Value2
122 800 1600
122 800 1800 <------delete this row
133 700 1500 <------delete either this row or the other identical row
154 800 1800
133 700 1500 <------delete either this row or the other identical row
188 700 1400
176 900 1500
It is on a much larger scale that this, so I cant just individually delete records.
Is there a way to write a statement that will delete all duplicates from my table where Value2 - Value1 is greater than Value2 - Value1 for its duplicate?

SQL Server has this great feature of updatable CTEs and subqueries. So, you can do this as:
with todelete as (
select t.*,
row_number() over (partition by id order by value2 - value1) as diff_seqnum
from table t
)
delete from todelete
where diff_seqnum > 1;
That is, enumerate the rows for each id based on the difference in the two values. Then, only keep the rows where the sequence number is 1.

Related

I need the top x most recent (by SALEDT) rows grouped be neighborhood (NBHD)

I'm using microsoft access and I need a sql query to return the top x (40 in my case) most recent sales for each neighborhood (NBHD). My data looks something like this:
PARID PRICE SALEDT SALEVAL NBHD
04021000 140000 1/29/2016 11 700
04021000 160000 2/16/2016 11 700
04018470 250000 4/23/2015 08 701
04018470 300000 4/23/2015 08 701
04016180 40000 5/9/2017 11 705
04023430 600000 6/12/2017 19 700
And what I need is the top 40 most recent SALEDT entries for each NBHD, and if the same PARID would show up in that top 40 twice or more, I only want the most recent one. If the rows have the same PARID and the same SALEDT, I need the only most expensive one. For this small set of sample data, I would get:
PARID PRICE SALEDT SALEVAL NBHD
04021000 160000 2/16/2016 11 700
04023430 600000 6/12/2017 19 700
04018470 300000 4/23/2015 08 701
04016180 40000 5/9/2017 11 705
I get row 2 (as it has a later SALEDT than row 1), row 4 (as it has a higher PRICE than row 3, and row 5 and row 6. Hopefully that is clear. Also, I'm using MS access SQL to do this, but wouldn't be opposed to some VBA solution if that is easier. Thanks in advance.
Here you go:
select a.parid, max(a.price)price, a.saledt, a.saleval, a.nbhd from #table a join (
select parid, max(saledt) saledt from #table
group by parid ) b on a.parid=b.parid and a.saledt=b.saledt
group by a.parid, a.saledt, a.saleval, a.nbhd
order by a.nbhd
In MS Access, you can do the following to get the 40 most recent entries for each neighborhood:
select t.*
from t
where t.salesdt in (select top 40 t2.salesdt
from t as t2
where t2.nbhd = t.nbhd
order by t2.salesdt desc
);
Your additional constraints are rather confusing. I'm not sure I fully follow them because I don't know what the columns really refer to.

Select records for batch processing in loop

I need to select the records in batch wise, like in below example we have 20 records. if I give batch of size of 10 there would be two loops. the problem here is if I do top 10 then 555 value will be split as its position is 10 and 11. hence 555 should also include in that top first batch. how I can achieve this? this is just example, I have 900 million records to process and my batch will be 2 million in real scenario.
ID
-------
111
111
111
222
222
333
333
444
444
555
555
666
666
777
777
888
888
You can use top with ties - this might return more records then stated but will not break similar ids to different batches:
Create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(ID int)
INSERT INTO #T VALUES
(111),(111),(111),
(222),(222),
(333),(333),
(444),(444),
(555),(555),
(666),(666),
(777),(777),
(888),(888)
The select statement:
SELECT TOP 10 WITH TIES ID
FROM #T
ORDER BY ID
Results:
row ID
1 111
2 111
3 111
4 222
5 222
6 333
7 333
8 444
9 444
10 555
11 555
While selecting the records, you can group them by id prior to limiting their number.

SQL selecting values between two columns with a list

I'm attempting to find rows given a list of values where one of the values is in a range between two of the columns, as an example:
id column1 column2
1 1 5
2 6 10
3 11 15
4 16 20
5 21 25
...
99 491 495
100 496 500
I'd like to give a list of values, e.g. (23, 83, 432, 334, 344) which would return the rows
id column1 column2
5 21 25
17 81 85
87 431 435
67 331 335
69 341 345
The only way I can think of doing this so far has been to split each into it's own call by doing
SELECT * FROM TableA WHERE (column1 < num1 AND num1 < column2)
However this scales quite poorly when the list of numbers is around several million.
Is there any better way of doing this?
Thanks for the help.
Putting millions of numbers into the SQL command itself would be unwieldy.
Obviously, you have to put the numbers into a (temporary) table.
Then you can just join the two tables:
SELECT *
FROM TableA JOIN TempTable
ON TempTable.Value BETWEEN TableA.column1 AND TableA.column2;

select columns from a pivot query

I have the following pivot query:
select *
from
(
select order_id,unit_price,quantity,sum(unit_price*quantity)
over (partition by order_id) as Total
from DEMO_ORDER_ITEMS
) tbla
pivot
(
sum(unit_price*quantity) as unit_totals
for unit_price in(30,50,60,80,110,120,125,150)
) tblb
order by order_id;
producing following result:
ORDER_ID TOTAL 30_UNIT_TOTALS 50_UNIT_TOTALS 60_UNIT_TOTALS 80_UNIT_TOTALS 110_UNIT_TOTALS 120_UNIT_TOTALS 125_UNIT_TOTALS 150_UNIT_TOTALS
1 1890 500 640 750
2 2380 60 250 180 480 220 240 500 450
3 1640 100 240 320 480 500
4 1090 180 200 220 240 250
5 950 150 180 320 300
6 1515 330 360 375 450
7 905 90 250 120 320 125
8 1060 160 330 120 450
9 730 240 240 250
10 870 250 320 300
I would like to change order of columns ending with the TOTAL. How can i select the columns in preferred order?
This works:select tblb.* .... but select tblb.30_UNIT_TOTALS fails.
You have to quote fields if they don't start with an alphabetic character. In addition, using quotes make the identifier case sensitive. So you have to write:
tblb."30_UNIT_TOTALS"
From the documentation
Nonquoted identifiers must begin with an alphabetic character from your database character set. Quoted identifiers can begin with any character.
[...]
Nonquoted identifiers are not case sensitive. Oracle interprets them as uppercase. Quoted identifiers are case sensitive.

How to assign the Parent Group IDs to each record of a hierarchical table in Oracle 11g?

Based on the following sample hierarchical data that exists within the TECH_VALUES table, how can I create a view, say TECH_VALUES_VW that will take this same data but have an additional column, namely GROUP_ID_PARENT that will show the group id where the parent group id is 0 against the row that child belongs to, see new column data sample:
ID GROUP_ID LINK_ID PARENT_GROUP_ID TECH_TYPE GROUP_ID_PARENT
------- ------------- ------------ -------------------- ---------- ---------------
1 100 LETTER_A 0 100
2 200 LETTER_B 0 200
3 300 LETTER_C 0 300
4 400 LETTER_A1 100 A 100
5 500 LETTER_A2 100 A 100
6 600 LETTER_A3 100 A 100
7 700 LETTER_AA1 400 B 100
8 800 LETTER_AAA1 700 C 100
9 900 LETTER_B2 200 B 200
10 1000 LETTER_BB5 900 B 200
12 1200 LETTER_CC1 300 C 300
13 1300 LETTER_CC2 300 C 300
14 1400 LETTER_CC3 300 A 300
15 1500 LETTER_CCC5 1400 A 300
16 1600 LETTER_CCC6 1500 C 300
17 1700 LETTER_BBB8 900 B 200
18 1800 LETTER_B 0 1800
19 1900 LETTER_B2 1800 B 1800
20 2000 LETTER_BB5 1900 B 1800
21 2100 LETTER_BBB8 1900 B 1800
So based on the above, I want to take the table definition:
Table Name: TECH_VALUES:
ID,
GROUP_ID,
LINK_ID
PARENT_GROUP_ID,
TECH_TYPE
and create a new view
View Name: TECH_VALUES_VW:
ID,
GROUP_ID,
LINK_ID
PARENT_GROUP_ID,
TECH_TYPE,
GROUP_ID_PARENT
based on the above sample data from the TECH_VALUES table.
I am looking to create a new query to build this new view which will only use the GROUP_IDs for the PARENT_GROUP_IDs that are 0 for each row.
Updated
Just to make things a whole lot clearer of exactly what I am after is if I take out only the records where the PARENT_GROUP_ID is 0 within the TECH_VALUES table, i.e.
ID GROUP_ID LINK_ID PARENT_GROUP_ID
------- ------------- ------------ --------------------
1 100 LETTER_A 0
2 200 LETTER_B 0
3 300 LETTER_C 0
18 1800 LETTER_B 0
Using just the GROUP_ID values for these 4 records, assign this GROUP_ID to all of the children records for each of these parent link ids as a new column in the TECH_VALUES_VW as well as to the original link ids (where PARENT_GROUP_ID is 0) as shown in the sample data set above.
If I understood your question correctly, then this might be what you're after:
CREATE OR REPLACE VIEW tech_values_vw
AS
SELECT TV.*,
CASE WHEN LEVEL = 1 THEN group_id ELSE connect_by_root(group_id) END AS group_id_parent
FROM tech_values TV
START WITH parent_group_id = 0
CONNECT BY PRIOR group_id = parent_group_id
;