I have a limited history table. It uses composite keys - one of the columns is the timestamp in long.
I want to select/delete rows that are the oldest, and keep the latest 1000 by checking the timestamp. How do I write the statement for such a case?
You can fire delete query escaping 1000 records.
Not in a query will be scape 1000 latest records and rest will be deleted.
Delete from table where Id not in(select top 1000 Id from table order by date desc )
Related
Please find the below as a problem scenario:
Current DB design: I'm having 40 tables (currently each table is having 5 billion records) and these records are not having the same result which means each table consists of unique records in their own table format. Each table has a primary key (pk) "timestamp" which is in UTC with timezone in ISO format. Around 1000 records are getting inserted in each respective tables for every 5 min.
Implementation: Now I have to get the most recent timestamp data from every table.
I tried using for every table select * from table_name where timestamp = (select timestamp from table_name order by timestamp desc limit 1) which returns the most recent records but it took some time to get the results. After this query, I tried using select * from table_name where timestamp = (select max(timestamp) from table_name) but eventually this also took some to get the data.
So, How to minimize the query time to get the data from the database?
(all the design and query are welcome)?
Thank you.
FYI, Im using Python3.6 and psycopg2
My SQL table consists of three columns - Event (type xml), InsertedTime (type datetime) and status (type nvarchar - possible values processed and unprocessed). None of them are unique identifiers and all of these are mandatory.
As part of a select query, I retrieve the top 1000 rows of the table (based on the unprocessed status), use the XML to retrieve some values, and would like to update the status of these exact 1000 rows to processed status.
My question is: I'm using the SELECT TOP 1000 FROM table WHERE status ='Unprocessed' ORDER BY InsertedTime to retrieve and UPDATE TOP 1000 table WHERE status = 'Processed' ORDER BY InsertedTime statements to achieve this.
I understand that in Oracle, I can use the rowid pseudocolumn to ensure that I'm updating the same rows that were retrieved in the first place. But how do I achieve this without having any unique identifier or primary key in the table in SQL?
Note: The table is being written to continuously.
You're selecting rows to be handled and then trying to update the status of those rows? Instead of doing select + update you could use output clause, with something like this:
UPDATE TOP (1000) table
set status = 'Processed'
output deleted.Event, deleted.InsertedTime, deleted.status
where status = 'Unprocessed'
This will both update the rows + return Event, InsertedTime and status fields (old values). If you need the new values, you can use the virtual table inserted.
Assuming that processes may try to insert new rows while your two queries are processing, you have a few options:
Wrap the two queries in a transaction. This should guarantee atomicity between them, at the cost of extra locking on the table.
Find the oldest InsertedTime value from the first query and use that with the WHERE clause in the 2nd query.
Combine the UPDATE and SELECT into a single statement via an OUTPUT clause.
I have around 50k records in the table, I want to save all the records with random order and grouping into other table. eg: at the first run, system will select 10(random) records and insert into the 2nd table, at the second run, system will select 15(random) records and insert into the 2nd table & ...
until all the records moved to 2nd table.
I tried to use order by ABS(Checksum(NewID()) % XXX) to make the random order, but how can I make the grouping to control the minimum records & maximum records in the select query??
user order by NEWID() at the end
like
select * from table order by newid()
I have a table that has no indexed rows, nor a specific column...
Let's say "City, PersonName, PersonAge". I need to obtain the last 5 people inserted in that table...
How can I do it in in DB2?
I tried
select * from PEOPLE fetch first 5 rows only
this work perfectly... but no idea how to do it with the LAST rows....
You can't select the last 5 rows inserted, the database doesn't keep track of this. You need some sort of autoincremented ID or timestamp and order by that column descending.
I have a simple join table with two id columns in SQL Server.
Is there any way to select all rows in the exact order they were inserted?
If I try to make a SELECT *, even if I don't specify an ORDER BY clause, the rows are not being returned in the order they were inserted, but ordered by the first key column.
I know it's a weird question, but this table is very big and I need to check exactly when a strange behavior has begun, and unfortunately I don't have a timestamp column in my table.
UPDATE #1
I'll try to explain why I'm saying that the rows are not returned in 'natural' order when I SELECT * FROM table without an ORDER BY clause.
My table was something like this:
id1 id2
---------------
1 1
2 2
3 3
4 4
5 5
5 6
... and so on, with about 90.000+ rows
Now, I don't know why (probably a software bug inserted these rows), but my table have 4.5 million rows and looks like this:
id1 id2
---------------
1 1
1 35986
1 44775
1 60816
1 62998
1 67514
1 67517
1 67701
1 67837
...
1 75657 (100+ "strange" rows)
2 2
2 35986
2 44775
2 60816
2 62998
2 67514
2 67517
2 67701
2 67837
...
2 75657 (100+ "strange" rows)
Crazy, my table have now millions of rows. I have to take a look when this happened (when the rows where inserted) because I have to delete them, but I can't just delete using *WHERE id2 IN (strange_ids)* because there are "right" id1 columns that belongs to these id2 columns, and I can't delete them, so I'm trying to see when exactly these rows were inserted to delete them.
When I SELECT * FROM table, it returns me ordered by id1, like the above table, and
the rows were not inserted in this order in my table. I think my table is not corrupted because is the second time that this strange behavior happens the same way, but now I have so many rows that I can delete manually like it was on 1st time. Why the rows are not being returned in the order they were inserted? These "strange rows" were definetely inserted yesterday and should be returned near the end of my table if I do a SELECT * without an ORDER BY, isn't it?
A select query with no order by does not retrieve the rows in any particular order. You have to have an order by to get an order.
SQL Server does not have any default method for retrieving by insert order. You can do it, if you have the information in the row. The best way is a primary key identity column:
TableId int identity(1, 1) not null primary key
Such a column is incremented as each row is inserted.
You can also have a CreatedAt column:
CreatedAt datetime default getdate()
However, this could have duplicates for simultaneous inserts.
The key point, though, is that a select with no order by clause returns an unordered set of rows.
As others have already written, you will not be able to get the rows out of the link table in the order they were inserted.
If there is some sort of internal ordering of the rows in one or both of the tables that this link table is joining, then you can use that to try to figure out when the link table rows have been created. Basically, they cannot have been created BEFORE both of the rows containing the PK:s have been created.
But on the other hand you will not be able to find out how long after they have been created.
If you have decent backups, you could try to restore one or a few backups of varying age and then try to see if those backups also contains this strange behaviour. It could give you at least some clue about when the strangeness has started.
But the bottom line is that using just a select, there is now way to get the row out of a table like this in the order they were inserted.
If SELECT * doesn't return them in 'natural' order and you didn't insert them with a timestamp or auto-incrementing ID then I believe you're sunk. If you've got an IDENTITY field, order by that.
But the question I have is, how can you tell that SELECT * isn't returning them in the order they were inserted?
Update:
Based on your update, it looks like there is no method by which to return records as you wish, I'd guess you've got a clustered index on ID1?
Select *, %%physloc%% as pl from table
order by pl desc