Delete a row at a certain line number in Postgres - sql

I am building an app in which I need to delete a table row at a certain line number. I don't want to use or rely on an id, because if I delete a row, the following rows won't "shift down" -- line 8 today could be line 7 tomorrow, but line 8 will still have an id of 8.
How can I write a postgres SQL that essentially does this:
DELETE FROM Table
WHERE <row_number> = n;
And row_number isn't a real attribute.

Your question is rather ill-defined; as Milen comments, what do you really mean by "line" and "line number"? I hope you've got ORDER BYs in all your queries if you're doing stuff like that. This question also becomes trivial if your rows have primary keys... do yours not? A table with no primary keys is a table that's asking for trouble, and usually indicates a serious design flaw.
Anyway, if you want to go full speed ahead, damning the potential problems en route, the windowing functions in 8.4 will probably do what you need with minimal fuss. Or you could save yourself a ton of trouble tomorrow by writing a better schema today.

What you want is probably the OID:
DELETE FROM Table
WHERE oid = n;
See here for more details

Not quite the row number of the table as such but should do the desired job.
Perhaps a little more flexible because you can do the delete by date time order or whatever order you like. I'm going to use this for a recent list.
DELETE
FROM table
WHERE pk IN
(
SELECT pk
FROM table
ORDER BY pk OFFSET n
)

I'd go for:
DELETE FROM Table OFFSET <row_number> limit 1 order by id

Related

SQLite: How to SELECT "most recent record for each user" from single table with composite key?

I'm not a database guru and feel like I'm missing some core SQL knowledge to grok a solution to this problem. Here's the situation as briefly as I can explain it.
Context:
I have a SQLite database table that contains timestamped user event records. The records can be uniquely identified by the combination of timestamp and user ID (i.e., when the event took place and who the event is about). I understand this situation is called a "composite primary key." The table looks something like this (with a bunch of other columns removed, of course):
sqlite> select Last_Updated,User_ID from records limit 4;
Last_Updated User_ID
------------- --------
1434003858430 1
1433882146115 3
1433882837088 3
1433964103500 2
Question: How do I SELECT a result set containing only the most recent record for each user?
Given the above example, what I'd like to get back is a table that looks like this:
Last_Updated User_ID
------------- --------
1434003858430 1
1433882837088 3
1433964103500 2
(Note that the result set only includes user 3's most recent record.)
In reality, I have approximately 2.5 million rows in this table.
Bonus: I've been reading answers about JOINs, de-dupe procedures, and a bunch more, and I've been googling for tutorials/articles in the hopes that I would find what I'm missing. I have extensive programming background so I could de-dupe this dataset in procedural code like I've done a hundred times before, but I'm tired of writing scripts to do what I believe should be possible in SQL. That's what it's for, right?
So, what do you think is missing from my understand of SQL, conceptually, that I need in order to understand why the solution you've provided to my question actually works? (A reference to a good article that actually explains the theory behind the practice would suffice.) I want to know WHY the solution actually works, not just that it does.
Many thanks for your time!
You could try this:
select user_id, max(last_updated) as latest
from records
group by user_id
This should give you the latest record per user. I assume you have an index on user_id and last_updated combined.
In the above query, generally speaking - we are asking the database to group user_id records. If there are more than 1 records for user_id 1, they will all be grouped together. From that recordset, maximum last_updated will be picked for output. Then the next group is sought and the same operation is applied there.
If you have a composite index, sqlite will likely just use the index because the index contains both fields addressed in the query. Indexes are smaller than the table itself, so scanning or seeking is faster.
Well, in true "d'oh!" fashion, right after I ask this question, I find the answer.
For my case, the answer is:
SELECT MAX(Last_Updated),User_ID FROM records GROUP BY User_ID
I was making this more complicated than it needed to be by thinking I needed to use JOINs and stuff. Applying an aggregate function like MAX() is all that's needed to select only those rows whose content matches the function result. That means this statement…
SELECT MAX(Last_Updated),User_ID FROM records
…would therefor return a result set containing only 1 row, the most recent event.
By adding the GROUP BY clause, however, the result set contains a row for each "group" of results, i.e., for each user. My programmer-brain did not understand that GROUP BY is how we say "for each" in SQL. I think I get it now.
Note to self: keep it simple, stupid. :)

SQL / Derby: Get last line in table. Bad performance with MAX function

I need to know the last entry in a derby database with a few million entries. Using the MAX() function I get very poor performance. I also tried scrollable resultSets and jumping to the last entry but this is even worse.
Any easy simple performant way to get the last line without iterating over all entries?
SQL: select max(IDDATALINE) from DATADOUBLE_2 where DATADOUBLE_2.DATATYPE = 19
Table: IDDATALINE [BIGINT] | DATATYPE[INTEGER] | DATA [DOUBLE]
I dont know if any indexes are defined, i'm working with source code I took over. I'll see if I finde something, if I don't find anything where/how do I add an index?
This is your query:
select max(IDDATALINE)
from DATADOUBLE_2
where DATADOUBLE_2.DATATYPE = 19;
You should be able to improve performance by creating an index. I would recommend:
create index idx_DATADOUBLE2_DATATYPE_IDDATALINE on DATADOUBLE_2(DATATYPE, IDDATALINE)
Note this is a composite index using two columns in that particular order.
You can sort descending and fetch the first row. Example:
select Something
from SomeTable
order by SomeField desc
fetch first 1 rows only

SQL Limit on "WHERE X IN (...)"

I've got some data I'd like to pull off our SQL server.
This old database does not have any primary keys associated with it, so pulling data is like querying an Excel spreadsheet (what it actually originated as years ago).
I need to run reports on this data, though.
Currently, I get a list of distinct serial numbers for a given time period, then pull all of the records for a given serial number. For a 1-month time frame, this can be 1500 to 3000 serial numbers. The serial number field is formatted as char(20), even though the serial numbers are only 15 characters long.
BEGIN UPDATE
There are typically 5 to 15 entries in this table per Serial_Number.
There are at most 10 machines writing data to this table, so identical Date_Time values are possible
END UPDATE
This process takes a while, but between different serial numbers in the list, I am able to update the Windows Form with a Progress Bar so management knows something is happening and about how much longer to expect.
I am always trying to make this query run faster.
Now, I am thinking about pulling the data I need using a WHERE clause such as:
SELECT Col1, Col2, Col3
FROM Table1
WHERE Serial_Number IN (
SELECT DISTINCT Serial_Number
FROM Table1
WHERE Date_Time Between #startDate AND #endDate
)
My question is: Are there any issues I could run into with this, particularly because we have so many distinct serial numbers during a given time frame.
And, of course, you know someone in Management is going to try running a year's worth of data when they are bored! Then, they are going to try running data since Jesus was born, just because they've got nothing better to do.
Restate Question: Is there a limit to the WHERE clause's IN method that limits the number of items I can pass in?
Index Serial_Number and Date_Time in Table1 (with separate indexes, not a single compound index) and this should perform fairly well for you unless the table is really truly ginormous.
You might get a little more speed with one index on Serial_Number and the second on (Date_Time, Serial_Number). That second index covers the sub query, allowing it to be answered from the index alone.
Note: I'm suggesting indexes, not primary keys, which don't require uniqueness.
Well, in the naive case where there are no indexes (which it sounds like is your case) you're going to have to scan over all the rows in Table1 to perform the DISTINCT on Serial_Number anyway. So I'm not sure it's going to help you much.
I would highly recommend the following:
Use the execution plan to determine what's going on in your query, and
Use that information to add some relevant indexes to speed your operations.
Just from what we see here, it sounds like Date_Time would be a good candidate for a clustered index in Table1.
Edit:
To make a nonunique clustered index as I describe above, you can use the following:
CREATE CLUSTERED INDEX IX_Table1_Date_Time
ON Table1 (Date_Time)
(from http://msdn.microsoft.com/en-us/library/aa258260(v=sql.80).aspx)
This will reorder your table such that all rows are sorted in Date_Time order. Further work with the execution plan will help identify other indexes that may greatly help your performance, depending on the exact types of queries you run.
Honestly, I see no benefit to the WHERE clause as it is written.
You use an expensive inner query, but don't do anything meaningful with the results. I don't even see you getting the Serial_Number in the results anywhere. However, based on your question, it does sound like you need it.
I don't see the need for the DISTINCT keyword for Serial_Number, since the duplicates would not be eliminated in the results in the outer query.
What is wrong with doing this?
SELECT Serial_Number, Col1, Col2, Col3
FROM Table1
WHERE Date_Time Between #startDate AND #endDate
This should do the same thing as your original query. But it would eliminate the expensive nested query.
Just put an index on Date_Time and it should work. This would also eliminate the need for the index on Serial_Number.
Apparently, there is no way to tell what the maximum length of the WHERE X IN (...) can be.
For now, this is the answer.
If, at some later point in time, someone comes along and finds something to the contrary, please post that answer and I will mark it as such.
Thanks,
Joe

SQL - renumbering a sequential column to be sequential again after deletion

I've researched and realize I have a unique situation.
First off, I am not allowed to post images yet to the board since I'm a new user, so see appropriate links below
I have multiple tables where a column (not always the identifier column) is sequentially numbered and shouldn't have any breaks in the numbering. My goal is to make sure this stays true.
Down and Dirty
We have an 'Event' table where we randomly select a percentage of the rows and insert the rows into table 'Results'. The "ID" column from the 'Results' is passed to a bunch of delete queries.
This more or less ensures that there are missing rows in several tables.
My problem:
Figuring out an sql query that will renumber the column I specify. I prefer to not drop the column.
Example delete query:
delete ItemVoid
from ItemTicket
join ItemVoid
on ItemTicket.item_ticket_id = itemvoid.item_ticket_id
where itemticket.ID in (select ID
from results)
Example Tables Before:
Example Tables After:
As you can see 2 rows were delete from both tables based on the ID column. So now I gotta figure out how to renumber the item_ticket_id and the item_void_id columns where the the higher number decreases to the missing value, and the next highest one decreases, etc. Problem #2, if the item_ticket_id changes in order to be sequential in ItemTickets, then
it has to update that change in ItemVoid's item_ticket_id.
I appreciate any advice you can give on this.
(answering an old question as it's the first search result when I was looking this up)
(MS T-SQL)
To resequence an ID column (not an Identity one) that has gaps,
can be performed using only a simple CTE with a row_number() to generate a new sequence.
The UPDATE works via the CTE 'virtual table' without any extra problems, actually updating the underlying original table.
Don't worry about the ID fields clashing during the update, if you wonder what happens when ID's are set that already exist, it
doesn't suffer that problem - the original sequence is changed to the new sequence in one go.
WITH NewSequence AS
(
SELECT
ID,
ROW_NUMBER() OVER (ORDER BY ID) as ID_New
FROM YourTable
)
UPDATE NewSequence SET ID = ID_New;
Since you are looking for advice on this, my advice is you need to redesign this as I see a big flaw in your design.
Instead of deleting the records and then going through the hassle of renumbering the remaining records, use a bit flag that will mark the records as Inactive. Then when you are querying the records, just include a WHERE clause to only include the records are that active:
SELECT *
FROM yourTable
WHERE Inactive = 0
Then you never have to worry about re-numbering the records. This also gives you the ability to go back and see the records that would have been deleted and you do not lose the history.
If you really want to delete the records and renumber them then you can perform this task the following way:
create a new table
Insert your original data into your new table using the new numbers
drop your old table
rename your new table with the corrected numbers
As you can see there would be a lot of steps involved in re-numbering the records. You are creating much more work this way when you could just perform an UPDATE of the bit flag.
You would change your DELETE query to something similar to this:
UPDATE ItemVoid
SET InActive = 1
FROM ItemVoid
JOIN ItemTicket
on ItemVoid.item_ticket_id = ItemTicket.item_ticket_id
WHERE ItemTicket.ID IN (select ID from results)
The bit flag is much easier and that would be the method that I would recommend.
The function that you are looking for is a window function. In standard SQL (SQL Server, MySQL), the function is row_number(). You use it as follows:
select row_number() over (partition by <col>)
from <table>
In order to use this in your case, you would delete the rows from the table, then use a with statement to recalculate the row numbers, and then assign them using an update. For transactional integrity, you might wrap the delete and update into a single transaction.
Oracle supports similar functionality, but the syntax is a bit different. Oracle calls these functions analytic functions and they support a richer set of operations on them.
I would strongly caution you from using cursors, since these have lousy performance. Of course, this will not work on an identity column, since such a column cannot be modified.

What is the most efficient way to count rows in a table in SQLite?

I've always just used "SELECT COUNT(1) FROM X" but perhaps this is not the most efficient. Any thoughts? Other options include SELECT COUNT(*) or perhaps getting the last inserted id if it is auto-incremented (and never deleted).
How about if I just want to know if there is anything in the table at all? (e.g., count > 0?)
The best way is to make sure that you run SELECT COUNT on a single column (SELECT COUNT(*) is slower) - but SELECT COUNT will always be the fastest way to get a count of things (the database optimizes the query internally).
If you check out the comments below, you can see arguments for why SELECT COUNT(1) is probably your best option.
To follow up on girasquid's answer, as a data point, I have a sqlite table with 2.3 million rows. Using select count(*) from table, it took over 3 seconds to count the rows. I also tried using SELECT rowid FROM table, (thinking that rowid is a default primary indexed key) but that was no faster. Then I made an index on one of the fields in the database (just an arbitrary field, but I chose an integer field because I knew from past experience that indexes on short fields can be very fast, I think because the index is stored a copy of the value in the index itself). SELECT my_short_field FROM table brought the time down to less than a second.
If you are sure (really sure) that you've never deleted any row from that table and your table has not been defined with the WITHOUT ROWID optimization you can have the number of rows by calling:
select max(RowId) from table;
Or if your table is a circular queue you could use something like
select MaxRowId - MinRowId + 1 from
(select max(RowId) as MaxRowId from table) JOIN
(select min(RowId) as MinRowId from table);
This is really really fast (milliseconds), but you must pay attention because sqlite says that row id is unique among all rows in the same table. SQLite does not declare that the row ids are and will be always consecutive numbers.
The fastest way to get row counts is directly from the table metadata, if any. Unfortunately, I can't find a reference for this kind of data being available in SQLite.
Failing that, any query of the type
SELECT COUNT(non-NULL constant value) FROM table
should optimize to avoid the need for a table, or even an index, scan. Ideally the engine will simply return the current number of rows known to be in the table from internal metadata. Failing that, it simply needs to know the number of entries in the index of any non-NULL column (the primary key index being the first place to look).
As soon as you introduce a column into the SELECT COUNT you are asking the engine to perform at least an index scan and possibly a table scan, and that will be slower.
I do not believe you will find a special method for this. However, you could do your select count on the primary key to be a little bit faster.
sp_spaceused 'table_name' (exclude single quote)
this will return the number of rows in the above table, this is the most efficient way i have come across yet.
it's more efficient than select Count(1) from 'table_name' (exclude single quote)
sp_spaceused can be used for any table, it's very helpful when the table is exceptionally big (hundreds of millions of rows), returns number of rows right a way, whereas 'select Count(1)' might take more than 10 seconds. Moreover, it does not need any column names/key field to consider.