I got an update query that I have to do and I'm struggling with it.
I have 3 columns, ID, Income, AverageIncome.
ID: string but is ordered alphabetically.
AverageIncome-. averageIncome of the previous 10 Income entries.
All the values of the AverageIncome are incorrect and I need to update them to be correct.
Any Tip?
Thanks!
In MySQL syntax, and assuming that the order is defined by the id column:
CREATE TEMPORARY TABLE my_table_copy AS SELECT * FROM my_table;
UPDATE my_table t
SET average_income = (SELECT AVG(tc.income)
FROM my_table_copy tc
WHERE tc.id < t.id
ORDER BY tc.id DESC
LIMIT 10
);
DROP TABLE my_table_copy;
Of course you will have to make sure that CREATE TABLE and UPDATE execute atomically, i.e. without any modification of the data between one an the other.
Also keep in mind that this is not a very good design, as other users already pointed out. You will have redundancy in your data. You might be better off with a view in this case.
Related
I have this SQL query
SELECT ACCBAL_DATE, ACCBAL_AMOUNT
FROM ACCOUNT_BALANCES t
WHERE ACC_KEY = '964570223'
AND ACCBAL_KEY = '16'
ORDER BY ACCBAL_DATE DESC
FETCH FIRST 1 ROWS ONLY;
It returns one row but I need to use this query for many ACC_KEYS (about 600).
So first way to do that is to run this query about 600x with different ACC_KEY parameter.
The second one is creating a procedure I think.
Procedure which will use variable acc_key and move it to WHERE statement.
Issue is that I can't create procedure stored on server because of permissions.
Is there some way to solve it without storing procedure on server?
EDIT: I know the IN clause but that is not what I need. I need something which will run the query about 600x, each execution with another ACC_KEY in WHERE clause and the output should be 600 rows.
when I used them in clause IN, then it will still return only one row. I want to return only one row because without limitations it returns about 100 rows, so I want only the first row which has needed data. For each ACC_KEY it should return only one row
You can still do that with an IN() clause listing all 600 key values:
select acc_key,
max(accbal_date) as accbal_date,
max(accbal_amount) keep (dense_rank last order by accbal_date) as accbal_amount
from account_balances t
where acc_key in ('964570223', '964570224', ...) -- up to 1000 allowed
and accbal_key = '16'
group by acc_key
order by acc_key;
This is using aggregate functions and grouping by the key, so you will get one row per key, with the data for the most recent date.
Read more about keep/last.
It would still be better to use a collection or a table - maybe an external table loaded from your Excel sheet, saved as a CSV; not least because you can only supply 1000 entries to a single IN() clause - or any expression list - but also for performance and readability/maintenance reasons.
You can store the keys in a table or use a derived table in the query. I would recommend something more like this:
WITH keys as (
SELECT '964570223' as ACC_KEY FROM DUAL UNION ALL
. . .
)
SELECT k.ACC_KEY, MAX(ab.ACCBAL_DATE) as ACCBAL_DATE,
MAX(ab.ACCBAL_AMOUNT) KEEP (DENSE_RANK FIRST ORDER BY ab.ACCBAL_DATE DESC) as ACCBAL_AMOUNT
FROM keys k LEFT JOIN
ACCOUNT_BALANCES ab
ON ab.ACC_KEY = k.ACC_KEY AND
ab.ACCBAL_KEY = '16'
GROUP BY k.ACC_KEY;
Of course the CTE keys could be replaced with a table that has the accounts of interest.
Note that this replaces your logic with aggregation logic. You just want the most recent date and balance, which Oracle supports using the KEEP keyword.
Step-1 : CREATE TABLE WITH 1 COLUMN ACC_KEY STORES ALL LIST OF ACC_KEY.
Step-2 : Code Run.
SELECT T.ACCBAL_DATE, T.ACCBAL_AMOUNT
FROM ACCOUNT_BALANCES t
WHERE EXISTS(SELECT A.ACC_KEY FROM <TABLENAME> A WHERE A.ACC_KEY=T.ACC_KEY)
AND T.ACCBAL_KEY = '16'
ORDER BY T.ACCBAL_DATE DESC
FETCH FIRST 1 ROWS ONLY;
I have an existing app I can’t modify. It needs to execute a SQL GROUP BY, but cannot. However it can and does read a GroupNumber field from the same table.
What I’m doing now is executing the grouping SQL statement, processing it in code and writing back the GroupNumber to the table so that App can do its thing. What I’d like to do is execute a single SQL statement to do both the grouping and the writeback in a single step. I can’t figure out how to do this, if indeed it’s possible. Simple example:
SELECT FirstName, LastName, Age
FROM Persons
WHERE ....
GROUP BY Age
ORDER BY Age
I execute this, then do
for ( i = 1; i <= result_set.n; i++ )
Sql = “UPDATE Persons
SET GroupNumber = “ + fixed( i )
+ “WHERE Age = “ + fixed( result_set.Age[i] )
I need to do this every time a record gets added to the table (so yes, if someone younger than me gets added, my group number changes - don’t ask).
Clearly you want a trigger. However trigger definitions vary from database server to database server. I'll hazard a guess and say you are using some version of Microsoft SQL Server: the create trigger syntax and a couple of examples can be found at http://msdn.microsoft.com/en-us/library/ms189799.aspx. There might be some small complication with the trigger modifying the same table it is sourcing data from, but I believe you can generally do that in most SQL server databases (SQLite may be one of the few where that is difficult).
Try that and see if that helps.
I'm not really sure what you are after, here is my best guess:
;WITH AllRows AS (--get one row per age, and number them
SELECT
Age, ROW_NUMBER() OVER (PARTITION BY AGE ORDER BY Age) AS RowNumber
FROM Persons
WHERE ...
GROUP BY Age
)
UPDATE p --update all the people, getting their GroupNumber based on their Age's row number
SET GroupNumber=a.RowNumber
FROM Persons p
INNER JOIN AllRows a ON p.Age=a.Age
WHERE GroupNumber IS NULL OR GroupNumber!=a.RowNumber
I use SQL Server, but this is fairly standards based code.
You'd immediately think I went straight to here to ask my question but I googled an awful lot to not find a decisive answer.
Facts: I have a table with 3.3 million rows, 20 columns.
The first row is the primary key thus unique.
I have to remove all the rows where column 2 till column 11 is duplicate. In fact a basic question but so much different approaches whereas everyone seeks the same solution in the end, removing the duplicates.
I was personally thinking about GROUP BY HAVING COUNT(*) > 1
Is that the way to go or what do you suggest?
Thanks a lot in advance!
L
As a generic answer:
WITH cte AS (
SELECT ROW_NUMBER() OVER (
PARTITION BY <groupbyfield> ORDER BY <tiebreaker>) as rn
FROM Table)
DELETE FROM cte
WHERE rn > 1;
I find this more powerful and flexible than the GROUP BY ... HAVING. In fact, GROUP BY ... HAVING only gives you the duplicates, you're still left with the 'trivial' task of choosing a 'keeper' amongst the duplicates.
ROW_NUMBER OVER (...) gives more control over how to distinguish among duplicates (the tiebreaker) and allows for behavior like 'keep first 3 of the duplicates', not only 'keep just 1', which is a behavior really hard to do with GROUP BY ... HAVING.
The other part of your question is how to approach this for 3.3M rows. Well, 3.3M is not really that big, but I would still recommend doing this in batches. Delete TOP 10000 at a time, otherwise you'll push a huge transaction into the log and might overwhelm your log drives.
And final question is whether this will perform acceptably. It depends on your schema. IF the ROW_NUMBER() has to scan the entire table and spool to count, and you have to repeat this in batches for N times, then it won't perform. An appropriate index will help. But I can't say anything more, not knowing the exact schema involved (structure of clustered index/heap, all non-clustered indexes etc).
Group by the fields you want to be unique, and get an aggregate value (like min) for your pk field. Then insert those results into a new table.
If you have SQL Server 2005 or newer, then the easiest way would be to use a CTE (Common Table Expression).
You need to know what criteria you want to "partition" your data by - e.g. create partitions of data that is considered identical/duplicate - and then you need to order those partitions by something - e.g. a sequence ID, a date/time or something.
You didn't provide much details about your tables - so let me just give you a sample:
;WITH Duplicates AS
(
SELECT
OrderID,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS RowN
FROM
dbo.Orders
)
DELETE FROM dbo.Orders
WHERE RowN > 1
The CTE ( WITH ... AS :... ) gives you an "inline view" for the next SQL statement - it's not persisted or anything - it just lives for that next statement and then it's gone.
Basically, I'm "grouping" (partitioning) my Orders by CustomerID, and ordering by OrderDate. So for each CustomerID, I get a new "group" of data, which gets a row number starting with 1. The ORDER BY OrderDate DESC gives the newest order for each customer the RowN = 1 value - this is the one order I keep.
All other orders for each customer are deleted based on the CTE (the WITH..... expression).
You'll need to adapt this for your own situation, obviously - but the CTE with the PARTITION BY and ROW_NUMBER() are a very reliable and easy technique to get rid of duplicates.
If you don't want to deal with a new table delete then just use DELETE TOP(1). Use a subquery to get all the ids of rows that are duplicates and then use the delete top to delete where there is multiple rows. You might have to run more than once if there are more than one duplicate but you get the point.
DELETE TOP(1) FROM Table
WHERE ID IN (SELECT ID FROM Table GROUP BY Field HAVING COUNT(*) > 1)
You get the idea hopefully. This is just some pseudo code to help demonstrate.
For example I have:
create table a (i int);
Assume there are 10k rows.
I want to count 0's in the last 20 rows.
Something like:
select count(*) from (select i from a limit 20) where i = 0;
Is that possible to make it more efficient? Like a single SQL statement or something?
PS. DB is SQLite3 if that matters at all...
UPDATE
PPS. No need to group by anything in this instance, assume the table that is literally 1 column (and presumably the internal DB row_ID or something). I'm just curious if this is possible to do without the nested selects?
You'll need to order by something in order to determine the last 20 rows. When you say last, do you mean by date, by ID, ...?
Something like this should work:
select count(*)
from (
select i
from a
order by j desc
limit 20
) where i = 0;
If you do not remove rows from the table, you may try the following hacky query:
SELECT COUNT(*) as cnt
FROM A
WHERE
ROWID > (SELECT MAX(ROWID)-20 FROM A)
AND i=0;
It operates with ROWIDs only. As the documentation says: Rows are stored in rowid order.
You need to remember to order by when you use limit, otherwise the result is indeterminate. To get the latest rows added, you need to include a column with the insertion date, then you can use that. Without this column you cannot guarantee that you will get the latest rows.
To make it efficient you should ensure that there is an index on the column you order by, possibly even a clustered index.
I'm afraid that you need a nested select to be able to count and restrict to last X rows at a time, because something like this
SELECT count(*) FROM a GROUP BY i HAVING i = 0
will count 0's, but in ALL table records, because a LIMIT in this query will basically have no effect.
However, you can optimize making COUNT(i) as it is faster to COUNT only one field than 2 or more (in this case your table will have 2 fields, i and rowid, that is automatically created by SQLite in PKless tables)
What's the best way to delete all rows from a table in sql but to keep n number of rows on the top?
DELETE FROM Table WHERE ID NOT IN (SELECT TOP 10 ID FROM Table)
Edit:
Chris brings up a good performance hit since the TOP 10 query would be run for each row. If this is a one time thing, then it may not be as big of a deal, but if it is a common thing, then I did look closer at it.
I would select ID column(s) the set of rows that you want to keep into a temp table or table variable. Then delete all the rows that do not exist in the temp table. The syntax mentioned by another user:
DELETE FROM Table WHERE ID NOT IN (SELECT TOP 10 ID FROM Table)
Has a potential problem. The "SELECT TOP 10" query will be executed for each row in the table, which could be a huge performance hit. You want to avoid making the same query over and over again.
This syntax should work, based what you listed as your original SQL statement:
create table #nuke(NukeID int)
insert into #nuke(Nuke) select top 1000 id from article
delete article where not exists (select 1 from nuke where Nukeid = id)
drop table #nuke
Future reference for those of use who don't use MS SQL.
In PostgreSQL use ORDER BY and LIMIT instead of TOP.
DELETE FROM table
WHERE id NOT IN (SELECT id FROM table ORDER BY id LIMIT n);
MySQL -- well...
Error -- This version of MySQL does not yet support 'LIMIT &
IN/ALL/ANY/SOME subquery'
Not yet I guess.
Here is how I did it. This method is faster and simpler:
Delete all but top n from database table in MS SQL using OFFSET command
WITH CTE AS
(
SELECT ID
FROM dbo.TableName
ORDER BY ID DESC
OFFSET 11 ROWS
)
DELETE CTE;
Replace ID with column by which you want to sort.
Replace number after OFFSET with number of rows which you want to keep.
Choose DESC or ASC - whatever suits your case.
I think using a virtual table would be much better than an IN-clause or temp table.
DELETE
Product
FROM
Product
LEFT OUTER JOIN
(
SELECT TOP 10
Product.id
FROM
Product
) TopProducts ON Product.id = TopProducts.id
WHERE
TopProducts.id IS NULL
This really is going to be language specific, but I would likely use something like the following for SQL server.
declare #n int
SET #n = SELECT Count(*) FROM dTABLE;
DELETE TOP (#n - 10 ) FROM dTable
if you don't care about the exact number of rows, there is always
DELETE TOP 90 PERCENT FROM dTABLE;
I don't know about other flavors but MySQL DELETE allows LIMIT.
If you could order things so that the n rows you want to keep are at the bottom, then you could do a DELETE FROM table LIMIT tablecount-n.
Edit
Oooo. I think I like Cory Foy's answer better, assuming it works in your case. My way feels a little clunky by comparison.
I would solve it using the technique below. The example expect an article table with an id on each row.
Delete article where id not in (select top 1000 id from article)
Edit: Too slow to answer my own question ...
Refactored?
Delete a From Table a Inner Join (
Select Top (Select Count(tableID) From Table) - 10)
From Table Order By tableID Desc
) b On b.tableID = A.tableID
edit: tried them both in the query analyzer, current answer is fasted (damn order by...)
Better way would be to insert the rows you DO want into another table, drop the original table and then rename the new table so it has the same name as the old table
I've got a trick to avoid executing the TOP expression for every row. We can combine TOP with MAX to get the MaxId we want to keep. Then we just delete everything greater than MaxId.
-- Declare Variable to hold the highest id we want to keep.
DECLARE #MaxId as int = (
SELECT MAX(temp.ID)
FROM (SELECT TOP 10 ID FROM table ORDER BY ID ASC) temp
)
-- Delete anything greater than MaxId. If MaxId is null, there is nothing to delete.
IF #MaxId IS NOT NULL
DELETE FROM table WHERE ID > #MaxId
Note: It is important to use ORDER BY when declaring MaxId to ensure proper results are queried.