Updating Single Row per Group - sql

The Background
I have a temporary table containing information including a unique rowID, OrderNumber, and guestCount. RowID and OrderNumber already exist in this table, and I am running a new query to fill in the missing guestCount for each orderNumber. I would like to then update the temp table with this information.
Example
What I currently have looks something like this, with only RowID being unique, meaning that there can be multiple items having the same OrderNumber.
RowID | OrderNumber | guestCount
1 | 30001 | 0
2 | 30002 | 0
3 | 30002 | 0
4 | 30003 | 0
My query returns the following table, only returning one total number of guests per orderNumber:
OrderNumber | guestCount
30001 | 3
30002 | 10
30003 | 5
The final table should look like:
RowID | OrderNumber | guestCount
1 | 30001 | 3
2 | 30002 | 10
3 | 30002 | 0
4 | 30003 | 5
I'm only interested in updating one (doesn't matter which) entry per orderNumber, but my current logic is resulting in errors:
UPDATE temp
SET temp.guestCount = cc.guestCount
FROM( SELECT OrderNumber, guestCount
FROM (SELECT OrderNumber, guestCount, RowID = MIN(RowID)
FROM #tempTable
GROUP BY RowID, OrderNumber, guestCount) t)temp
INNER JOIN queryTable q ON temp.OrderNumber = q.OrderNumber
I'm not sure if this logic is even a valid way of doing this, but I do know that I'm getting errors in my update due to the fact that I'm using an aggregate function, as well as a GROUP function. Is there any way to go about this operation differently?

You can define the row to update by using row_number() in a CTE. This identifies the first row in the group for the update:
with toupdate as (
select tt.*, row_number() over (partition by OrderNumber order by id) as seqnum
from #tempTable tt
)
UPDATE toupdate
SET toupdate.guestCount = q.guestCount
FROM toupdate
INNER JOIN queryTable q
ON temp.OrderNumber = q.OrderNumber
where toupdate.seqnum = 1;
The problem with you query is that temp is based on an aggregation subquery. Such a subquery is not updatable, because it does not have a 1-1 relationship with the rows of the original query. Using the CTE with row_number() is updatable. In addition, your set statement uses the table alias cc which is not defined in the query.

Related

Can I join each row of table1 to a unique row of table 2?

I was hoping to query in all the rows of a table that has its ids starting at some number, and update each row of the original table with a one to one of the second table.
For example:
normal
id | fk_test_id
----------------
1 | null
2 | null
3 | null
starts_after
id |
----
12 |
13 |
14 |
What UPDATE can I use to make normal look like this:
id | fk_test_id
----------------
1 | 12
2 | 13
3 | 14
I tried:
UPDATE normal SET fk_test_id = starts_after.id FROM starts_after; which just joins on the first row of starts_after.
UPDATE normal SET fk_test_id = (SELECT id FROM starts_after ORDER BY random() LIMIT 1); Where the subquery only executes once.
Filtering the subquery by which fk_test_ids are already chosen, but it only executes on the pre-updated data.
If you added record with specific order in to starts_after you can use below query:
update normal n
set fk_test_id = tmp.id
from (select id,
row_number() over (order by id)
from starts_after) tmp
where tmp.row_number = n.id;
I ordered by id from starts_after table (ASC) and create range of record with row num:
id | row_number
----------------
12 | 1
13 | 2
14 | 3
After that i join two table and update records

pulling data from max field

I have a table structure with columns similar to the following:
ID | line | value
1 | 1 | 10
1 | 2 | 5
2 | 1 | 6
3 | 1 | 7
3 | 2 | 4
ideally, i'd like to pull the following:
ID | value
1 | 5
2 | 6
3 | 4
one solution would be to do something like the following:
select a.ID, a.value
from
myTable a
inner join (select id, max(line) as line from myTable group by id) b
on a.id = b.id and a.line = b.line
Given the size of the table and that this is just a part of a larger pull, I'd like to see if there's a more elegant / simpler way of pulling this directly.
This is a task for OLAP-functions:
select *
from myTable a
qualify
rank() -- assign a rank for each id
over (partition by id
order by line desc) = 1
Might return multiple rows per id if they share the same max line. If you want to return only one of them, add another column to the order by to make it unique or switch to row_number to get an indeterminate row.

Is it possible to do so without using nested SELECTS?

Suppose I have the following table:
--------------------------------------------
ReceiptNo | Date | EmployeeID | Qty
--------------------------------------------
1 | 12-DEC-2015 | 1 | 200
2 | 13-DEC-2015 | 1 | 500
3 | 13-DEC-2015 | 1 | 100
4 | 13-DEC-2015 | 3 | 100
5 | 13-DEC-2015 | 3 | 500
6 | 13-DEC-2015 | 2 | 75
--------------------------------------------
Show the tuples with maximum Qty.
Answer:
--------------------------------------------
2 | 13-DEC-2015 | 1 | 500
5 | 13-DEC-2015 | 3 | 500
--------------------------------------------
I need to use aggregate function MAX().
Is it possible to do so without using nested SELECTS?
Try this in sql server
SELECT TOP 1 WITH TIES *
FROM TABLE
ORDER BY QTY DESC
No.
You can't show the tuples with maximum Qty, using the max aggregate function while avoiding nested selects.
VR46 posted a nice way to do it without using nested selects, but also without the max aggregate. A similar approach can be used in Oracle 12c using the FETCH clause:
select *
from table
order by qty desc
fetch first row with ties
If you want to use the max aggregate, this is the way to do it:
select *
from table
where qty = (select max(qty) from table)
Another way to do it would be using the rank or dense_rank window functions, but they require a nested select, and do not use the max aggregate function:
select *
from (select t.*,
dense_rank() over (order by t.qty desc) as rnk
from table t) t
where t.rnk = 1
Not using max, but plain "cross-platform" ANSI SQL without nested queries:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2 ON t2.Qty > t1.Qty
WHERE t2.Qty IS NULL
Retrieves all records for which there is no record with a greater quantity in the same table.

Delete rows except for one for every id

I have a dataset with multiple ids. For every id there are multiple entries. Like this:
--------------
| ID | Value |
--------------
| 1 | 3 |
| 1 | 4 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 3 |
| 3 | 5 |
--------------
Is there a SQL DELETE query to delete (random) rows for every id, except for one (random rows would be nice but is not essential)? The resulting table should look like this:
--------------
| ID | Value |
--------------
| 1 | 2 |
| 2 | 1 |
| 3 | 5 |
--------------
Thanks!
It doesn't look like hsqldb fully supports olap functions (in this case row_number() over (partition by ...), so you'll need to use a derived table to identify the one value you want to keep for each ID. It certainly won't be random, but I don't think anything else will be either. Something like so
This query will give you the first part:
select
id,
min(value) as minval
from
group by id
Then you can delete from your table where you don't match:
delete from
<your table> t1
inner join
(
select
id,
min(value) as minval
from
<your table>
group by id
) t2
on t1.id = t2.id
and t1.value <> t2.value
Try this:
alter ignore table a add unique(id);
Here a is the table name
This should do what you want:
SELECT ID, Value
FROM (SELECT ID, Value, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY NEWID()) AS RN
FROM #Table) AS A
WHERE A.RN = 1
I tried the given answers with HSQLDB but it refused to execute those queries for different reasons (join is not allowed in delete query, ignore statement is not allowed in alter query). Thanks to Andrew I came up with this solution (which is a little bit more circumstantial, but allows it to delete random rows):
Add a new column for random values:
ALTER TABLE <table> ADD COLUMN rand INT
Fill this column with random data:
UPDATE <table> SET rand = RAND() * 1000000
Delete all rows which don't have the minimum random value for their id:
DELETE FROM <table> WHERE rand NOT IN (SELECT MIN(rand) FROM <table> GROUP BY id)
Drop the random column:
ALTER TABLE <table> DROP rand
For larger tables you probably should ensure that the random values are unique, but this worked perfectly for me.

How do I write a query to delete duplicates in a table?

Given a table resembling this one, called VehicleUser:
VehicleUserId | VehicleId | UserId
1 | 1001 | 2
2 | 1001 | 2
3 | 1001 | 2
4 | 1001 | 3
5 | 1001 | 3
6 | 1001 | 3
How do I write a query that can delete the duplicates? row 2 and 3 are identical to row 1 except for a different VehicleUserId and rows 5 and 6 are identical to 4 except for a different VehicleUserId.
;with cte as (
select row_number() over
(partition by VehicleId, UserId order by VehicleUserId) as rn
from VehicleUser)
delete from cte
where rn > 1;
You could filter duplicates with a exists clause, like:
delete v1
from VehicleUser v1
where exists
(
select *
from VehicleUser v2
where v1.VehicleId = v2.VehicleId
and v1.UserId = v2.UserId
and v1.VehicleUserId > v2.VehicleUserId
)
Before you run this, check if it works by replacing the delete with a select:
select *
from VehicleUser v1
where exists
(
...
The rows that show up will be deleted.
here's your unique values:
select vehicleid, userid, min(vehicleuserid) as min_id
from vehicleuser
group by vehicleid, userid
you can put them in a new table before deleting anything to make sure you have what you want, then delete vehicleUser or use an outer join to delete rows from vehicleUser that aren't in the new table.
Debugging before deleting rows is safer.
I don't think you can do this purely in a single query.
I'd do a grouped query to find the duplicates, then iterate the results, deleting all but the first VehicleUserId row.
select VehicleId, UserId
from VehicleUser
group by VehicleId, UserId
having count(*) > 1
Will get you the VehicleId/UserId combinations for which there are duplicates.