Here are my sqlite db details,
sqlite> .table
url_db
sqlite> .schema url_db
CREATE TABLE url_db(URL TEXT UNIQUE);
sqlite> select * from url_db;
play.googleapis.com
notifications.google.com
contacts.skype.com
edge.skype.com
people.skype.com
I wanted to execute the command delete from url_db limit 1.
So I've downloaded the full source code (sqlite-src-3240000.zip) from the official download page.
Compiled the source code with the option 'SQLITE_ENABLE_UPDATE_DELETE_LIMIT=1'
When I execute that command (command executed), sometimes it deletes the random entry but not the first entry. I wanted to delete play.googleapis.com, instead that command deleted contacts.skype.com.
sqlite> select * from url_db;
play.googleapis.com
notifications.google.com
edge.skype.com
people.skype.com
What's the cause for this behavior? I am implementing a FIFO list in which when the entries reach 500, I need to delete the first entry.
sometimes it deletes the random entry but not the first entry.
You misunderstand relational databases. A table represents an unordered set. There is no "first" row in a table, unless you explicitly define the ordering.
Because this problem occurs so often, SQLite has a built-in work-around, the rowid "column". You can use this as a regular column, resulting in:
delete from url_db
order by rowid
limit 1;
Personally, I prefer an explicitly declared auto-increment column, but SQLite builds in this functions (unlike other databases).
Related
Background: I am migrating from postgreSQL to Vertica and found, that there are some issues in IDENTITY or AUTO_INCREMENT columns. One of these issues is, that vertica cannot assign values to IDENTITY columns or alter a column, that already has data into an IDENTITY column. Therefore I created a sequence and set the default value of the column to be unique doing:
SELECT MAX(id_column) FROM MY_SCHEMA.my_table;
which is 12345
CREATE SEQUENCE MY_SCHEMA.seq_id_column MINVALUE 12346 CACHE 1;
ALTER TABLE MY_SCHEMA.my_table
ALTER COLUMN id_column SET DEFAULT(MY_SCHEMA.seq_id_column.nextval);
ALTER TABLE MY_SCHEMA.log ADD UNIQUE(id_column);
Which works as expected. In this case, I have the cache deactivated, as I am on a single node installation and I want my ID column to be contiguous. However, this is not an option on a cluster installation as the needed lock leads to a bottleneck.
Question: In a vertica cluster with several nodes, how can I access the ID of the last insert in a session (without an additional select)?
E.g. in postgreSQL I could do something like
INSERT INTO MY_SCHEMA.my_table RETURNING id_column;
which does not work in Vertica. Furthermore, the LAST_INSERT_ID() function of Vertica does not work for named sequences. I also feel, that querying the current_value of MY_SCHEMA.seq_id_column could be giving wrong results due to caching, but I am unsure about this.
Why no additional SELECT?
To my knowledge, the select will only give correct values after a commit. I cannot do a commit after every single insert due to performance.
The comments from LukStorms pointed me in the right direction.
The NEXTVAL() function (as far as I have tested) gives contiguous values in the case, where one single session queries them. Furthermore, on concurrent access, if issued after an insert, CURRVAL retrieves the cached value, which is guaranteed to be unique but not necessarily contiguous. As I never call NEXTVAL anywhere else as in my default clause, this solves the problem for me, although there might be cases, where an additional call to NEXTVAL between inserts increments the sequence counter.
One case I can think of (and that I will test in the future) is what happens if AUTO COMMIT is set to OFF, which is ON by default for the vertica client drivers.
UPDATE:
This even seems to work with AUTOCOMMIT being OFF (shown using the vertica-python client driver, where C is the connection and cur the cursor):
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.fetchall()
--> 1
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 1
cur.execute("SET SESSION AUTOCOMMIT TO OFF")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 4
However, this seems to be unchanged during a rollback of the connection. So the following happens:
C.rollback()
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 4
I have 730000+ records which I need to delete in Ingres db which work with ANSI92 and I need to delete then without overload db, simple delete where search condition, doesn't work, DB just use all memory and trowing error. thinking to run it in loop, and delete by portions 10-20K of records .
i tried to use top and it didn't work
delete top (10)from TABLE where web_id <0 ;
, also was trying to use Limit also didnt work
DELETE FROM from TABLE where web_id <0 LIMIT 10;
any ideas how to do it ? Thank you !
You could use a session temporary table to hold the first 10 tids (tuple id's) and then delete based on those:
declare global temporary table session.tenrows as
select first 10 tid the_tid from "table" where web_id<0
on commit preserve rows with norecovery;
delete from "table" where tid in (select the_tid from session.tenrows);
When you say "without overload db", do you mean avoiding hitting the force-abort limit of the transaction log file? If so what might work for you is:
set session with on_logfull=notify;
delete from table where web_id<0;
This would automatically commit your transaction at points where force-abort is reached then carry on, rather than rolling back and reporting an error.
A downside of using this setting is that it can be tricky to unpick what has/hasn't been done if any other error should occur (your work will likely be partially committed), but since this appears to be a straight delete from a table it should be quite obvious which rows remain and which don't.
The "set session" statement must be run at the start of a transaction.
I would advise not running concurrent sessions with "on_logfull=notify" (there have been bugs in this area, whether they're fixed in your installation depends on your version/patch level).
When entering the following command:
\copy mmcompany from '<path>/mmcompany.txt' delimiter ',' csv;
I get the following error:
ERROR: duplicate key value violates unique constraint "mmcompany_phonenumber_key"
I understand why it's happening, but how do I execute the command in a way that valid entries will be inserted and ones that create an error will be discarded?
The reason PostgreSQL doesn't do this is related to how it implements constraints and validation. When a constraint fails it causes a transaction abort. The transaction is in an unclean state and cannot be resumed.
It is possible to create a new subtransaction for each row but this is very slow and defeats the purpose of using COPY in the first place, so it isn't supported by PostgreSQL in COPY at this time. You can do it yourself in PL/PgSQL with a BEGIN ... EXCEPTION block inside a LOOP over a select from the data copied into a temporary table. This works fairly well but can be slow.
It's better, if possible, to use SQL to check the constraints before doing any insert that violates them. That way you can just:
CREATE TEMPORARY TABLE stagingtable(...);
\copy stagingtable FROM 'somefile.csv'
INSERT INTO realtable
SELECT * FROM stagingtable
WHERE check_constraints_here;
Do keep concurrency issues in mind though. If you're trying to do a merge/upsert via COPY you must LOCK TABLE realtable; at the start of your transaction or you will still have the potential for errors. It looks like that's what you're trying to do - a copy if not exists. If so, skipping errors is absolutely the wrong approach. See:
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
Insert, on duplicate update in PostgreSQL?
Postgresql - Clean way to insert records if they don't exist, update if they do
Can COPY be used with a function?
Postgresql csv importation that skips rows
... this is a much-discussed issue.
One way to handle the constraint violations is to define triggers on the target table to handle the errors. This is not ideal as there can still be race conditions (if concurrently loading), and triggers have pretty high overhead.
Another method: COPY into a staging table and load the data into the target table using SQL with some handling to skip existing entries.
Additionally, another useful method is to use pgloader
I use the SQL script that was generated (sql 2k5), to add a column to a table.
I need to add a "check if exists" because my clients sometimes run the script twice. (i have no control over this part, and this is happening over and over again)
I found a way joining the sysobjects and syscolumns, it works.
My problem is that I have to add a column to an other table, where the column is not at the end of the table.
For this one, SQL is generating that long code ... create new temp table with the new column, filling up from old table, dropping the old table, and finally renaming the temp table.
The issue here is that the script for this one has lots of GO -s in there along with transactions ...
What can i do?
1.) remove all the GO - s? (don't like the idea)
2.) adding my IF between every GO pair? (don't like the idea)
3.) is there an other way that makes sense, and it would not be too hard to implement
I cannot think of anything really, I could check for release version, or anything, not just my sysobjects and syscolumns join, but the issue will be the same.
because of the GO-s, my If will be "forgotten" when it gets to the END of the BEGIN ...
I'm not sure I follow the entirety of your question, but you would check for the existence of a column like this:
if not exists (select * from information_schema.columns
where table_name = '[the tables name]'
and column_name = '[column name')
begin
--alter table here
end
Why worry about the ordinal position of the column? New columns get a new colid and are appended to the "end", this shouldn't cause any problems.
If you make frequent updates by shipping these kinds of scripts, I would create a version table and just query this at the beginning of the script.
How are they running the scripts (since you are using a tool which supports the GO batch separator) - SQL CMD?
I would consider putting it all in a string and using EXEC. Several DDL commands have to be the first command in a batch. Also, you can sometimes run into parsing issues:
ALTER TABLE Executing regardless of Condition Evaluational Results
Also, you may want to look at SQLCMD's control features http://www.simple-talk.com/sql/sql-tools/the-sqlcmd-workbench/
(I've tried this in MySql)
I believe they're semantically equivalent. Why not identify this trivial case and speed it up?
truncate table cannot be rolled back, it is like dropping and recreating the table.
...just to add some detail.
Calling the DELETE statement tells the database engine to generate a transaction log of all the records deleted. In the event the delete was done in error, you can restore your records.
Calling the TRUNCATE statement is a blanket "all or nothing" that removes all the records with no transaction log to restore from. It is definitely faster, but should only be done when you're sure you don't need any of the records you're going to remove.
Delete from table deletes each row from the one at a time and adds a record into the transaction log so that the operation can be rolled back. The time taken to delete is also proportional to the number of indexes on the table, and if there are any foreign key constraints (for innodb).
Truncate effectively drops the table and recreates it and can not be performed within a transaction. It therefore required fewer operations and executes quickly. Truncate also does not make use of any on delete triggers.
Exact details about why this is quicker in MySql can be found in the MySql documentation:
http://dev.mysql.com/doc/refman/5.0/en/truncate-table.html
Your question was about MySQL and I know little to nothing about MySQL as a product but I thought I'd add that in SQL Server a TRUNCATE statement can be rolled back. Try it for yourself
create table test1 (col1 int)
go
insert test1 values(3)
begin tran
truncate table test1
select * from test1
rollback tran
select * from test1
In SQL Server TRUNCATE is logged, it's just not logged in such a verbose way as DELETE is logged. I believe it's referred to as a minimally logged operation. Effectively the data pages still contain the data but their extents have been marked for deletion. As long as the data pages still exist you can roll back the truncate. Hope this is helpful. I'd be interested to know the results if somebody tries it on MySQL.
For MySql 5 using InnoDb as the storage engine, TRUNCATE acts just like DELETE without a WHERE clause: i.e. for large tables it takes ages because it deletes rows one-by-one. This is changing in version 6.x.
see
http://dev.mysql.com/doc/refman/5.1/en/truncate-table.html
for 5.1 info (row-by-row with InnoDB) and
http://blogs.mysql.com/peterg/category/personal-opinion/
for changes in 6.x
Editor's note
This answer is clearly contradicted by the MySQL documentation:
"For an InnoDB table before version 5.0.3, InnoDB processes TRUNCATE TABLE by deleting rows one by one. As of MySQL 5.0.3, row by row deletion is used only if there are any FOREIGN KEY constraints that reference the table. If there are no FOREIGN KEY constraints, InnoDB performs fast truncation by dropping the original table and creating an empty one with the same definition, which is much faster than deleting rows one by one."
Truncate is on a table level, while Delete is on a row level. If you would translate this to sql in an other syntax, truncate would be:
DELETE * FROM table
thus deleting all rows at once, while DELETE statement (in PHPMyAdmin) goes like:
DELETE * FROM table WHERE id = 1
DELETE * FROM table WHERE id = 2
Just until the table is empty. Each query taking a number of (milli)seconds which add up to taking longer than a truncate.