delete certain amount of rows with group by is possible?

delete certain amount of rows with group by is possible? - sql

So I have in one table called "Park" and another table called "WaterSource". The idea is in each line of Park say how much "waterSources" I have and create that amount of lines in the second table. When I change the value of Park to a minor number (like I had 5 and now I want to have 3) I need to delete the "last" rows of "water source" for the last 2 "Parks".
I need to do something like this,
BEGIN
declare #x int;
set #x = 1;
while #x > 0
BEGIN
delete top (#diff) from [WaterSource]
where [IdView] = #IdView
order by [Park] DESC;
set #x = ##ROWCOUNT;
END
END
I don't know how to do this because the only parameter in these second table is IdView and I just want to "group" all lines for "Park" order by Park Id desc and delete all.
http://www.sqlfiddle.com/#!9/2cbe3fb/1
So my idea is if I change parks number to 1 the lines with idPark 2 should all be deleted and Park 2 will be deleted also. I always delete by the last to the first.

Based on the sample provided I assume that SQL Server is the database engine. The key to my solution is creating a temporary table with a rank or row number column that is created by the windowed function ROW_NUMBER() (More info on windowed functions here: http://www.sqlservertutorial.net/sql-server-window-functions/)
DROP TABLE IF EXISTS #ranker
DECLARE #ParkId INT = 2
DECLARE #newNumWaterSources INT = (SELECT numWaterSources FROM Parks WHERE id = #ParkId)
SELECT
ws.id,
ROW_NUMBER() OVER(ORDER BY ws.id) AS rn
INTO #ranker
FROM WaterSources ws
WHERE idPark = #ParkId
DELETE FROM WaterSources
WHERE idPark = #ParkId
AND id IN
(SELECT id FROM #ranker WHERE rn > #newNumWaterSources)
There are many ways to automate this so that the two variables are populated with the most relevant values. You may want to play around with using an AFTER UPDATE TRIGGER to query the deleted table for the park table id.
This solution would only work if the id column in the WaterSources table is unique. So you would need to change the insert to WaterSources in your fiddle to the following:
INSERT INTO WaterSources VALUES (1, 1, 37);
INSERT INTO WaterSources VALUES (2, 1, 37);
INSERT INTO WaterSources VALUES (3, 1, 37);
INSERT INTO WaterSources VALUES (4, 2, 37);
INSERT INTO WaterSources VALUES (5, 2, 37);
INSERT INTO WaterSources VALUES (6, 2, 37);

Related

How can I delete trailing contiguous records in a partition with a particular value?

I'm using the latest version of SQL Server and have the following problem. Given the table below, the requirement, quite simply, is to delete "trailing" records in each _category partition that have _value = 0. Trailing in this context means, when the records are placed in _date order, any series or contiguous block of records with _value = 0 at the end of the list should be deleted. Records with _value = 0 that have subsequent records in the partition with some non-zero value should stay.
create table #x (_id int identity, _category int, _date date, _value int)
insert into #x values (1, '2022-10-01', 12)
insert into #x values (1, '2022-10-03', 0)
insert into #x values (1, '2022-10-04', 10)
insert into #x values (1, '2022-10-06', 11)
insert into #x values (1, '2022-10-07', 10)
insert into #x values (2, '2022-10-01', 1)
insert into #x values (2, '2022-10-02', 0)
insert into #x values (2, '2022-10-05', 19)
insert into #x values (2, '2022-10-10', 18)
insert into #x values (2, '2022-10-12', 0)
insert into #x values (2, '2022-10-13', 0)
insert into #x values (2, '2022-10-15', 0)
insert into #x values (3, '2022-10-02', 10)
insert into #x values (3, '2022-10-03', 0)
insert into #x values (3, '2022-10-05', 0)
insert into #x values (3, '2022-10-06', 12)
insert into #x values (3, '2022-10-08', 0)
I see a few ways to do it. The brute force way is to to run the records through a cursor in date order, and grab the ID of any record where _value = 0 and see if it holds until the category changes. I'm trying to avoid T-SQL though if I can do it in a query.
To that end, I thought I could apply some gaps and islands trickery and do something with window functions. I feel like there might be a way to leverage last_value() for this, but so far I only see it useful in identifying partitions that have the criteria, not so much in helping me get the ID's of the records to delete.
The desired result is the deletion of records 10, 11, 12 and 17.
Appreciate any help.

I'm not sure that your requirement requires a gaps and islands approach. Simple exists logic should work.
SELECT _id, _catrgory, _date, _value
FROM #x x1
WHERE _value <> 0 OR
EXISTS (
SELECT 1
FROM #x x2
WHERE x2._category = x1._category AND
x2._date > x1._date AND
x2._value <> 0
);

Assuming that all _values are greater than or equal to 0 you can use MAX() window function in an updatable CTE:
WITH cte AS (
SELECT *,
MAX(_value) OVER (
PARTITION BY _category
ORDER BY _date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) max
FROM #x
)
DELETE FROM cte
WHERE max = 0;
If there are negative _values use MAX(ABS(_value)) instead of MAX(_value).
See the demo.

Using common table expressions, you can use:
WITH CTE_NumberedRows AS (
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY _category ORDER BY _date)
FROM #x
),
CTE_Keepers AS (
SELECT _category, rnLastKeeper = MAX(rn)
FROM CTE_NumberedRows
WHERE _value <> 0
GROUP BY _category
)
DELETE NR
FROM CTE_NumberedRows NR
LEFT JOIN CTE_Keepers K
ON K._category = NR._category
WHERE NR.rn > ISNULL(K.rnLastKeeper, 0)
See this db<>fiddle for a working demo.
EDIT: My original post did not handle the all-zero's edge case. This has been corrected above, together with some naming tweaks. (The original can still be found here.
Tim Biegeleisen's post may be the simpler approach.

Delete Result Rows from a Table in SQL

I have 2 tables named BM_Data1 & BM_Data1_May62019. Both Tables contains the same data as BM_Data1_May62019 is the copy of BM_Data1 table. But BM_Data1 has some extra rows, How can I delete those extra rows from BM_Data1 and make it same like BM_Data1_May62019.
I got the extra rows using the following query.
SELECT * FROM
(SELECT * FROM BM_DATA1
EXCEPT
SELECT * FROM BM_DATA1_MAY62019) a
There are 7803 extra rows, how can I delete them from BM_Data1 table?
Thank You

As you confirmed RECID is common in both table with unique value, you can try this following script
DELETE FROM BM_DATA1
WHERE RECID NOT IN
(
SELECT RECID FROM BM_DATA1_MAY62019
)

Use a MERGE statement with WHEN NOT MATCHED BY SOURCE THEN DELETE.
MERGE works like a JOIN of sorts, and you need to be able to identify which rows are equal. You do this by the ON clause - for you, that would be RECID.
I suggest you run these in a transaction first, so you verify that you only delete the data you intend to - and commit the transaction only when you are sure you have the right configuration. If something is wrong, you can rollback
BEGIN TRANSACTION
MERGE BM_DATA1 AS Target
USING BM_DATA1_MAY62019as Source
ON (Target.RECID = Source.RECID)
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
SELECT * FROM BM_DATA1
-- ROLLBACK TRANSACTION -- Uncomment and use this if it deleted the wrong data
-- COMMIT -- Uncomment and use this if it deleted the right data!

I've included some DDL so you can run this for yourself to help you understand the example. The ID column is the one that is common to both tables and you are removing the rows in A that are not in B
create table #data_a( id int, val int)
create table #data_b( id int, val int)
insert into #data_a select 1, 1
insert into #data_a select 2, 4
insert into #data_a select 3, 5
insert into #data_a select 4, 5
insert into #data_a select 5, 5
insert into #data_a select 6, 5
insert into #data_a select 7, 5
insert into #data_a select 8, 5
insert into #data_b select 1, 1
insert into #data_b select 2, 4
insert into #data_b select 3, 5
insert into #data_b select 4, 5
insert into #data_b select 5, 5
-- delete the extra rows in A
delete a from #data_a as a
left join #data_b as b on a.id = b.id
where b.id is null
-- we can see the rows are no longer in B
select * from #data_a
select * from #data_b
drop table #data_a
drop table #data_b

create table t1(id int, demo int);
create table t2(id int, demo int);
insert into t1 values (1, 1);
insert into t2 values (1, 1);
insert into t1 values (2, 2);
insert into t2 values (2, 2);
insert into t1 values (3, 3);
insert into t2 values (3, 3);
insert into t1 values (4, 4); -- t1 table has some extra rows
insert into t1 values (5, 5); -- t1 table has some extra rows
insert into t1 values (6, 6); -- t1 table has some extra rows
To delete those records from first table which are not second table:
delete from t1 where id not in (select id from t2)

use just delete with a correlated subquery
delete from [BM_DATA1]
where not exists
(select 1
from [BM_DATA1_MAY62019]
where [BM_DATA1_MAY62019].RECID = [BM_DATA1].RECID -- put here here identified column name
)

Show row number column from result set in oracle sql view

I want to add an extra column so if I get, let's say, 4 rows in the result this column will have values 1,2,3,4.
I've tried ROWNUM, but since this is a view it shows the actual row number in the whole view, and that's not what I want.
Here is a sample schema:
CREATE TABLE TEST (RID NUMBER, RVAL VARCHAR2(100 BYTE));
INSERT INTO TEST (RID, RVAL) VALUES (1, 'ONE');
INSERT INTO TEST (RID, RVAL) VALUES (2, 'TWO');
INSERT INTO TEST (RID, RVAL) VALUES (3, 'THREE');
INSERT INTO TEST (RID, RVAL) VALUES (4, 'FOUR');
CREATE OR REPLACE VIEW VTEST AS
SELECT ROWNUM AS NUMROW, RID, RVAL FROM TEST;
Here are two sample queries. The first shows the result I want. The second how I want to get it (with a simple select against the view)
SELECT ROWNUM,RID,RVAL FROM TEST WHERE RID = 3 OR RID = 4;
SELECT * FROM VTEST WHERE RID = 3 OR RID = 4;
Here is the fiddle: http://sqlfiddle.com/#!4/4e816/3

in oracle use ROW_NUMBER oracle docs

Order guarantee for identity assignment in multi-row insert in SQL Server

When using a Table Value Constructor (http://msdn.microsoft.com/en-us/library/dd776382(v=sql.100).aspx) to insert multiple rows, is the order of any identity column populated guaranteed to match the rows in the TVC?
E.g.
CREATE TABLE A (a int identity(1, 1), b int)
INSERT INTO A(b) VALUES (1), (2)
Are the values of a guaranteed by the engine to be assigned in the same order as b, i.e. in this case so they match a=1, b=1 and a=2, b=2.

Piggybacking on my comment above, and knowing that the behavior of an insert / select+order by will guarantee generation of identity order (#4: from this blog)
You can use the table value constructor in the following fashion to accomplish your goal (not sure if this satisfies your other constraints) assuming you wanted your identity generation to be based on category id.
insert into thetable(CategoryId, CategoryName)
select *
from
(values
(101, 'Bikes'),
(103, 'Clothes'),
(102, 'Accessories')
) AS Category(CategoryID, CategoryName)
order by CategoryId

It depends as long as your inserting the records in one shot . For example after inserting if you delete the record where a=2 and then again re insert the value b=2 ,then identity column's value will be the max(a)+1
To demonstrate
DECLARE #Sample TABLE
(a int identity(1, 1), b int)
Insert into #Sample values (1),(2)
a b
1 1
2 2
Delete from #Sample where a=2
Insert into #Sample values (2)
Select * from #Sample
a b
1 1
3 2

How can I insert random values into a SQL Server table?

I'm trying to randomly insert values from a list of pre-defined values into a table for testing. I tried using the solution found on this StackOverflow question:
stackoverflow.com/.../update-sql-table-with-random-value-from-other-table
When I I tried this, all of my "random" values that are inserted are exactly the same for all 3000 records.
When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. My best guesses as to what is happening are:
SQL Server is optimizing the SELECT somehow, not allowing the subquery to be evaluated more than once
The random value's seed is the same on every record the query updates
I'm stuck on what my options are. Am I doing something wrong, or is there another way I should be doing this?
This is the code I'm using:
DECLARE #randomStuff TABLE ([id] INT, [val] VARCHAR(100))
INSERT INTO #randomStuff ([id], [val])
VALUES ( 1, 'Test Value 1' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 2, 'Test Value 2' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 3, 'Test Value 3' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 4, 'Test Value 4' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 5, 'Test Value 5' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 6, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 7, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 8, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 9, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 10, null )
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())

When the query engine sees this...
(SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())
... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!"
You need to trick the query engine into thinking it's non-cachable. jfar's answer was close, but the query engine was smart enough to see the tautalogy of MyTable.MyColumn = MyTable.MyColumn, but it ain't smart enough to see through this.
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 val
FROM #randomStuff r
INNER JOIN MyTable _MT
ON M.Id = _MT.Id
ORDER BY NEWID())
FROM MyTable M
By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead.
A cursor would probably be just as fast, but is most certainly not as fun.

use a cross join to generate random data

I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable.
Once #randomStuff is set up, we do this (note in my case, #MyTable is a table variable, adjust accordingly for your normal table):
DECLARE #randomMappings TABLE (id INT, val VARCHAR(100), sorter UNIQUEIDENTIFIER)
INSERT INTO #randomMappings
SELECT M.id, val, NEWID() AS sort
FROM #MyTable AS M
CROSS JOIN #randomstuff
so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. Then
DELETE others FROM #randomMappings AS others
INNER JOIN #randomMappings AS lower
ON (lower.id = others.id) AND (lower.sorter < others.sorter)
This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. This just leaves behind the lowest value. So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table:
UPDATE #MyTable
SET MyColumn = random.val
FROM #MyTable m, #randomMappings AS random
WHERE (random.id = m.id)
And you're done!
I said it was hacky...

I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out.
then you would have
UPDATE MyTable
Set MyColumn = dbo.RANDOM_VALUE()

There is no optimization going on here.
Your using a subquery that selects a single value, there is nothing to optimize.
You can also try putting a column from the table your updating in the select and see if that changes anything. That may trigger an evaluation for every row in MyTable
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID()
WHERE MyTable.MyColumn = MyTable.MyColumn )

I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). Because this is being used to generate test data, I don't have to be concerned about speed however.
In this solution, I iterate over every row in the table and update the values one row at a time. It seems to work:
DECLARE #rows INT
DECLARE #currentRow INT
SELECT #rows = COUNT(*) FROM dbo.MyTable
SET #currentRow = 1
WHILE #currentRow < #rows
BEGIN
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())
WHERE MyPrimaryKey = (SELECT b.MyPrimaryKey
FROM(SELECT a.MyPrimaryKey, ROW_NUMBER() OVER (ORDER BY MyPrimaryKey) AS rownumber
FROM MyTable a) AS b
WHERE #currentRow = b.rownumber
)
SET #currentRow = #currentRow + 1
END

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

delete certain amount of rows with group by is possible? - sql

Related

How can I delete trailing contiguous records in a partition with a particular value?

Delete Result Rows from a Table in SQL

Show row number column from result set in oracle sql view

Order guarantee for identity assignment in multi-row insert in SQL Server

How can I insert random values into a SQL Server table?

Categories

Resources