Delete rows except for one for every id - sql

I have a dataset with multiple ids. For every id there are multiple entries. Like this:
--------------
| ID | Value |
--------------
| 1 | 3 |
| 1 | 4 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 3 |
| 3 | 5 |
--------------
Is there a SQL DELETE query to delete (random) rows for every id, except for one (random rows would be nice but is not essential)? The resulting table should look like this:
--------------
| ID | Value |
--------------
| 1 | 2 |
| 2 | 1 |
| 3 | 5 |
--------------
Thanks!

It doesn't look like hsqldb fully supports olap functions (in this case row_number() over (partition by ...), so you'll need to use a derived table to identify the one value you want to keep for each ID. It certainly won't be random, but I don't think anything else will be either. Something like so
This query will give you the first part:
select
id,
min(value) as minval
from
group by id
Then you can delete from your table where you don't match:
delete from
<your table> t1
inner join
(
select
id,
min(value) as minval
from
<your table>
group by id
) t2
on t1.id = t2.id
and t1.value <> t2.value

Try this:
alter ignore table a add unique(id);
Here a is the table name

This should do what you want:
SELECT ID, Value
FROM (SELECT ID, Value, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY NEWID()) AS RN
FROM #Table) AS A
WHERE A.RN = 1

I tried the given answers with HSQLDB but it refused to execute those queries for different reasons (join is not allowed in delete query, ignore statement is not allowed in alter query). Thanks to Andrew I came up with this solution (which is a little bit more circumstantial, but allows it to delete random rows):
Add a new column for random values:
ALTER TABLE <table> ADD COLUMN rand INT
Fill this column with random data:
UPDATE <table> SET rand = RAND() * 1000000
Delete all rows which don't have the minimum random value for their id:
DELETE FROM <table> WHERE rand NOT IN (SELECT MIN(rand) FROM <table> GROUP BY id)
Drop the random column:
ALTER TABLE <table> DROP rand
For larger tables you probably should ensure that the random values are unique, but this worked perfectly for me.

Related

Can I join each row of table1 to a unique row of table 2?

I was hoping to query in all the rows of a table that has its ids starting at some number, and update each row of the original table with a one to one of the second table.
For example:
normal
id | fk_test_id
----------------
1 | null
2 | null
3 | null
starts_after
id |
----
12 |
13 |
14 |
What UPDATE can I use to make normal look like this:
id | fk_test_id
----------------
1 | 12
2 | 13
3 | 14
I tried:
UPDATE normal SET fk_test_id = starts_after.id FROM starts_after; which just joins on the first row of starts_after.
UPDATE normal SET fk_test_id = (SELECT id FROM starts_after ORDER BY random() LIMIT 1); Where the subquery only executes once.
Filtering the subquery by which fk_test_ids are already chosen, but it only executes on the pre-updated data.
If you added record with specific order in to starts_after you can use below query:
update normal n
set fk_test_id = tmp.id
from (select id,
row_number() over (order by id)
from starts_after) tmp
where tmp.row_number = n.id;
I ordered by id from starts_after table (ASC) and create range of record with row num:
id | row_number
----------------
12 | 1
13 | 2
14 | 3
After that i join two table and update records

pulling data from max field

I have a table structure with columns similar to the following:
ID | line | value
1 | 1 | 10
1 | 2 | 5
2 | 1 | 6
3 | 1 | 7
3 | 2 | 4
ideally, i'd like to pull the following:
ID | value
1 | 5
2 | 6
3 | 4
one solution would be to do something like the following:
select a.ID, a.value
from
myTable a
inner join (select id, max(line) as line from myTable group by id) b
on a.id = b.id and a.line = b.line
Given the size of the table and that this is just a part of a larger pull, I'd like to see if there's a more elegant / simpler way of pulling this directly.
This is a task for OLAP-functions:
select *
from myTable a
qualify
rank() -- assign a rank for each id
over (partition by id
order by line desc) = 1
Might return multiple rows per id if they share the same max line. If you want to return only one of them, add another column to the order by to make it unique or switch to row_number to get an indeterminate row.

SQL - select distinct only on one column [duplicate]

This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 9 years ago.
I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 2 | 3968 | Spain | Spanish |
| 3 | 3968 | USA | English |
| 4 | 1234 | Greece | Greek |
| 5 | 1234 | Italy | Italian |
I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 4 | 1234 | Greece | Greek |
How is this achievable?
A very typical approach to this type of problem is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by number order by id) as seqnum
from t
) t
where seqnum = 1;
This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.
Since you don't care, I chose the max ID for each number.
select tbl.* from tbl
inner join (
select max(id) as maxID, number from tbl group by number) maxID
on maxID.maxID = tbl.id
Query Explanation
select
tbl.* -- give me all the data from the base table (tbl)
from
tbl
inner join ( -- only return rows in tbl which match this subquery
select
max(id) as maxID -- MAX (ie distinct) ID per GROUP BY below
from
tbl
group by
NUMBER -- how to group rows for the MAX aggregation
) maxID
on maxID.maxID = tbl.id -- join condition ie only return rows in tbl
-- whose ID is also a MAX ID for a given NUMBER
You will use the following query:
SELECT * FROM [table] GROUP BY NUMBER;
Where [table] is the name of the table.
This provides a unique listing for the NUMBER column however the other columns may be meaningless depending on the vendor implementation; which is to say they may not together correspond to a specific row or rows.

Updating Single Row per Group

The Background
I have a temporary table containing information including a unique rowID, OrderNumber, and guestCount. RowID and OrderNumber already exist in this table, and I am running a new query to fill in the missing guestCount for each orderNumber. I would like to then update the temp table with this information.
Example
What I currently have looks something like this, with only RowID being unique, meaning that there can be multiple items having the same OrderNumber.
RowID | OrderNumber | guestCount
1 | 30001 | 0
2 | 30002 | 0
3 | 30002 | 0
4 | 30003 | 0
My query returns the following table, only returning one total number of guests per orderNumber:
OrderNumber | guestCount
30001 | 3
30002 | 10
30003 | 5
The final table should look like:
RowID | OrderNumber | guestCount
1 | 30001 | 3
2 | 30002 | 10
3 | 30002 | 0
4 | 30003 | 5
I'm only interested in updating one (doesn't matter which) entry per orderNumber, but my current logic is resulting in errors:
UPDATE temp
SET temp.guestCount = cc.guestCount
FROM( SELECT OrderNumber, guestCount
FROM (SELECT OrderNumber, guestCount, RowID = MIN(RowID)
FROM #tempTable
GROUP BY RowID, OrderNumber, guestCount) t)temp
INNER JOIN queryTable q ON temp.OrderNumber = q.OrderNumber
I'm not sure if this logic is even a valid way of doing this, but I do know that I'm getting errors in my update due to the fact that I'm using an aggregate function, as well as a GROUP function. Is there any way to go about this operation differently?
You can define the row to update by using row_number() in a CTE. This identifies the first row in the group for the update:
with toupdate as (
select tt.*, row_number() over (partition by OrderNumber order by id) as seqnum
from #tempTable tt
)
UPDATE toupdate
SET toupdate.guestCount = q.guestCount
FROM toupdate
INNER JOIN queryTable q
ON temp.OrderNumber = q.OrderNumber
where toupdate.seqnum = 1;
The problem with you query is that temp is based on an aggregation subquery. Such a subquery is not updatable, because it does not have a 1-1 relationship with the rows of the original query. Using the CTE with row_number() is updatable. In addition, your set statement uses the table alias cc which is not defined in the query.

Select multiple distinct rows from table SQL

I am attempting to select distinct (last updated) rows from a table in my database. I am trying to get the last updated row for each "Sub section". However I cannot find a way to achieve this.
The table looks like:
ID | Name |LastUpdated | Section | Sub |
1 | Name1 | 2013-04-07 16:38:18.837 | 1 | 1 |
2 | Name2 | 2013-04-07 15:38:18.837 | 1 | 2 |
3 | Name3 | 2013-04-07 12:38:18.837 | 1 | 1 |
4 | Name4 | 2013-04-07 13:38:18.837 | 1 | 3 |
5 | Name5 | 2013-04-07 17:38:18.837 | 1 | 3 |
What I am trying to get my SQL Statement to do is return rows:
1, 2, and 5.
They are distinct for the Sub, and the most recent.
I have tried:
SELECT DISTINCT Sub, LastUpdated, Name
FROM TABLE
WHERE LastUpdated = (SELECT MAX(LastUpdated) FROM TABLE WHERE Section = 1)
Which only returns the distinct row for the most recent updated Row. Which makes sense.
I have googled what I am trying, and checked relevant posts on here. However not managed to find one which really answers what I am trying.
You can use the row_number() window function to assign numbers for each partition of rows with the same value of Sub. Using order by LastUpdated desc, the row with row number one will be the latest row:
select *
from (
select row_number() over (
partition by Sub
order by LastUpdated desc) as rn
, *
from YourTable
) as SubQueryAlias
where rn = 1
Wouldn't it be enough to use group by?
SELECT DISTINCT MIN(Sub), MAX(LastUpdated), MIN(NAME) FROM TABLE GROUP BY Sub Where Section = 1