Can I use the ROWID in place of a timestamp in an SQLite table?
I need to get the 50 most recent items of an SQLite table, I was thinking about using a separate timestamp field, but then I figured a bigger ROWID means a newer item, and the ROWID is already there, no need to change the schema.
Well, as it says here (with my emphasis):
The data for rowid tables is stored as a B-Tree structure containing
one entry for each table row, using the rowid value as the key. This
means that retrieving or sorting records by rowid is fast. Searching
for a record with a specific rowid, or for all records with rowids
within a specified range is around twice as fast as a similar search
made by specifying any other PRIMARY KEY or indexed value.
However, it also says:
Rowid values may be modified using an UPDATE statement in the same way
as any other column value can, either using one of the built-in
aliases ("rowid", "oid" or "rowid") or by using an alias created by
an integer primary key.
It would certainly be faster, but it sort of hurts my feeling for "open" design, in that you're relying on a feature of the implementation rather than making it specific.
Having said that, the same link also says this:
With one exception noted below, if a rowid table has a primary key
that consists of a single column and the declared type of that column
is "INTEGER" in any mixture of upper and lower case, then the column
becomes an alias for the rowid. Such a column is usually referred to
as an "integer primary key". A PRIMARY KEY column only becomes an
integer primary key if the declared type name is exactly "INTEGER".
Other integer type names like "INT" or "BIGINT" or "SHORT INTEGER" or
"UNSIGNED INTEGER" causes the primary key column to behave as an
ordinary table column with integer affinity and a unique index, not as
an alias for the rowid.
Which I think gives you the perfect answer:
Define an INTEGER primary key on your table and use that for selection. You'll get the speed of using the ROWID (because as it says above, it's just an alias) and you'll get visibility in the schema of what you're doing.
Related
I have this table that has a 'unique together' index:
CREATE TABLE IF NOT EXISTS entity_properties
(
entity_id INTEGER NOT NULL REFERENCES entities,
property_id UUID NOT NULL REFERENCES properties,
value VARCHAR,
value_label VARCHAR,
UNIQUE (entity_id, property_id)
);
I want to create an index on 'value' column to minimize search time:
CREATE INDEX index_property_value ON entity_properties (value)
I get this error:
index row requires 8296 bytes, maximum size is 8191
As the error clearly says creating this index would exceed the maximum limit size.
You can see this answer.
But I really need 'value' column to be indexed for efficiency reasons. In my database this table holds the largest part of data (millions of rows). Also it gets updated very frequently. As far as I know updating indexed columns has effects on performance. That is why I am concerned about performance
How can I achieve this?
PS: my other thought is that I can add the 'value' column to 'unique together' index.
CREATE TABLE IF NOT EXISTS entity_properties
(
entity_id INTEGER NOT NULL REFERENCES entities,
property_id UUID NOT NULL REFERENCES properties,
value VARCHAR,
value_label VARCHAR,
UNIQUE (entity_id, property_id, value)
);
Can this be a solution? If so is it best approach? If not, what is the best approach
PostgreSQL has a built-in hash index type which doesn't suffer from this limitation, so you can just create one of those:
CREATE INDEX index_property_value ON entity_properties using hash (value)
This has the advantage (over using a functional index as Laurenz suggests) in that you don't need to write your query in an unnatural way.
But, is it sensible that the "value" column can contain values this large? Maybe the best solution would be to investigate the large data and clean it up if it is not sensible.
Trying to add this as another column into an existing unique index would just make things worse. It would still need 8296 bytes, plus more for the other columns
It is an unusual requirement to search for long texts. To avoid the error and get efficient index access, use a hash of the column:
CREATE INDEX ON entity_properties (hashtext(value));
This can be used with a query like
SELECT ...
FROM entity_properties
WHERE value = 'long string'
AND hashtext(value) = hashtext('long string');
The first condition is necessary to deal with hash collisions.
I am following a tutorial and encountered the following code. What does date mean here? I thought date was not supported in sqlite3.
create table log_date (
id integer primary key autoincrement,
entry_date date not null
);
SQLite does not have a DATE type, but that doesn't mean it won't let you use it in a CREATE TABLE and then use a different data type underneath.
DATE is converted to NUMERIC as described in Affinity Name Examples.
There's a prioritized set of rules to determine what a not-formally-supported type gets converted to, and DATE/DATETIME both work out to NUMERIC.
date is the type affinity, which due to SQLite's flexibility is actually a type affinity of NUMERIC (see 3.1 in the link below for how the type affinity is determined).
The type affinity specified can be virtually anything (not KEYWORDS and there are limitations in regards to values in parenthesises). A type affinity of grUmpleStiltskin is even usable (again gets converted to the NUMERIC bucket).
However, due to SQLite's flexibility any type of data can be stored in any type of column (with the exception of a column that is an alias of the rowid (as is your id column as it has INTEGER PRIMARY KEY), an alias of the rowid column, or the rowid column itself can only store an Integer value (64 bit signed)).
You may wish to have a read of Datatypes In SQLite Version 3.
one called ID that has to be an integer, it is assigned a unique key
and updates (w.e that means).
As said because you have INTEGER PRIMARY KEY then the id column is an alias of the normally hidden rowid column. If no value is specified when inserting a row then SQLite will attempt to assign a unique 64 bit integer value. It will typically be 1 greater than the highest existing rowid.
AUTOINCREMENT, as you have coded, introduces a rule/constraint that only values higher than the highest used (even if deleted) rowid value. AUTOINCREMENT can only be used for columns that are aliases of the rowid column can be used.
However, the very first sentence in SQLite Autoincrement is :-
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and
disk I/O overhead and should be avoided if not strictly needed. It is
usually not needed.
I am using auto increment on an integer data column in SQLite. Since it is auto increment, the data is already sorted by that column in ascending order. So, I was wondering if SQLite will perform a binary search over the auto increment column whenever a data is searched by that column.
Effectively yes, but not really.
That is, all AUTOINCREMENT does is add a constraint that requires the value assigned to the column to be higher than any existing value, or higher than any value that has been used, in that column.
But there's more to it than that it is your_column INTEGER PRIMARY KEY (AUTOINCREMENT can only be used on such a column and there can only be 1 such column per table) makes that column an alias of the hidden rowid column.
The rowid is what is indexed, and is basically the most primary index and the most efficient, which always exists unless the table is defined using the WITHOUT ROWID keyword.
So an AUTOINCREMENT column is an alias of the rowid column and uses a differnt, more expensive algorithm than an alias of the rowid without AUTOINCREMENT.
That is without AUTOINCREMENT the value generated for the rowid column will find the maximum value in the table and increment it. Unless that value exceeds 9223372036854775807 in which case SQlite will make some attempts to find an unused lower value (normally between 1 and 9223372036854775807).
With AUTOINCREMENT the algorithm takes the higher value of the maximum value and a value stored in the table sqlite_sequence for the respective table and uses that (thus any deleted higher valueswill not be re-used). However if 9223372036854775807 has been used then an SQLITE_FULL error will be raised.
The following should be noted :-
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and
disk I/O overhead and should be avoided if not strictly needed. It is
usually not needed.
SQLite Autoincrement you may well want to read this.
Additional
regrading the comment :-
If I don't use AUTOINCREMENT I have to explicitly create unique
integer IDs and then insert them in database each time a new row is
inserted.
The following demonstrates that there is no requirement for AUTOINCREMENT:-
CREATE TABLE IF NOT EXISTS xyz (ID INTEGER PRIMARY KEY);
INSERT INTO xyz VALUES(null);
SELECT * FROM xyz;
After running twice the result is :-
I've a table TABLE1 with my data. Another clean table TABLE2 as follow:
"TABLE2"(ipt TEXT,instant NUM, id integer auto_increment);
I want to select IP and instant in TABLE1 and insert it into TABLE2 but I don't know why the auto_increment doesn't work.
If someone has an idea.
AUTOINCREMENT can only be used in one situation. and that is for a column defined with a type/constraint of INTEGER PRIMARY KEY.
auto_increment is not a valid keyword and if you used column id integer auto_increment the result would be a column named id with a type of integer auto_increment which would then effectively be a column type of INTEGER i.e. the column's type affinity would be INTEGER. Datatypes In SQLite Version 3
i.e. you must have exactly column_name INTEGER PRIMARY KEY AUTOINCREMENT. (case is irrelevant)
It can also only be coded once per table.
INTEGER PRIMARY KEY with or without AUTOINCREMENT is a special case where the named column is made to be an alias of the rowid column. The rowid being a column that uniquely identifies the row, which is generally hidden (for want of a better description). rowid will not exists if the table is created with the WITHOUT ROWID keyword(s). In which case AUTOINCREMENT cannot be coded within a column definition.Clustered Indexes and the WITHOUT ROWID Optimization
Saying that it is almost certain that you do not in fact need to code AUTOINCREMENT just coding column_name INTEGER PRIMARY KEY will very likely be sufficient and better for your needs. The column will still be given a unique identifier. an integer (64 bit signed) 1 for the first row inserted then likely 2, 3, 4 .....
- Note there is no guarantee that numbering will be sequential/monotonically increased.
Adding AUTOINCREMENT only ensures that the unique identifier is greater and in doing so imposes a limit that the identifier when reaching the highest possible value will subsequently result in an SQLITE_FULL exception, whilst without unused/free values (e.g. deleted rows) that are lower than the highest could be utilised.
To quote the SQLite documentation :-
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and
disk I/O overhead and should be avoided if not strictly needed. It is
usually not needed.
SQLite Autoincrement
I have a large number of records of type [crc INT, name VARCHAR]
Some (few) of the crc records will be duplicates. I am interested in a fast way to select items that have a specific crc value.
Does it worth (in terms of performance) to make crc field INTEGER PRIMARY KEY (that is unique) and make name a compound value (it's doable but ugly - i think) or just create an index on crc field ?
Making the crc column PRIMARY will add significant performance ?
Can someone help ?
1) select items that have a specific crc value:
SELECT * FROM tablename WHERE crc=777;
2)don't make the crc field as INTEGER PRIMARY KEY, cause it may duplicate. You should better create the 'id' field that would be unique and mark it as INTEGER PRIMARY KEY AUTOINCREMENT
Storing a list of names will create problem when you want to access individual names.
(But if you always look up or retrieve all those names at once, there will be not much of a problem.)
If there are duplicates, you cannot make that column the primary key.
Just creating a (non-unique) index is the simplest solution.
Using an INTEGER PRIMARY KEY index is a little bit more efficient that a separate index, but the difference is unlikely to be significant.