SQLite scanning table performance - sql

My table has the following fields:
Date (integer)
State (integer)
ProductId (integer)
ProductName (integer)
Description (text) (maximum text lenght
3000 characters)
There will be more than 8 million rows. I need to decide whether I should put the product description in another table. My main goal is to have this statement very fast:
SELECT Date,State,ProductId,ProductName FROM tablename ORDER BY DATE desc LIMIT 100
The SQL result will not fetch the Description field value in the above statement. The user will see the description only when the row is selected in the application (new query).
I would really want to have the product Description in the same table, but I'm not sure how SQLite scans the rows. If Date value doesn't match I would assume that SQLite can quickly skip to the next row. Or maybe it needs to scan all fields of the row till it gets to the end of the Description field value in order to know that the row has ended? If it needs to scan all fields to get to the next row will the value of 3000 characters in the Description field decrease the speed a lot?
EDIT: No indexing should be used since INSERT speed is important.
EDIT: The only reason of trying to have it all in one table is that I want to do INSERTs and UPDATEs in one transaction of hundreds of items. The same item could be inserted and later updated in the same transaction, so I can not know the last insert id per item.

When you use that query and do not have an index on the Date column, SQLite will read all records from the table, and use a temporary table to sort the result.
When you have an index on the Date column, SQLite will look up the last 100 records in the index, then read all the data of those records from the table.
When you have a covering index, i.e., one index with the four columns Date, State, ProductId, and ProductName, SQLite will just read the last 100 entries from the index.
Whenever SQLite reads from the database file, it does not read values or records, but entire pages (typically, 1 KB or 4 KB).
In case 1, SQLite will read all pages of the table.
In case 2, SQLite will read the last page of the index (because the 100 dates will fit into one page), and 100 pages of the table (one for each record, assuming that no two of these records happen to be in the same page).
In case 3, SQLite will read the last few pages of the index.
Case 2 will be much faster than case 1; case 3 will be faster still, but probably not enough to be noticeable.

I would suggest to rely on good old database normalization rules, in this case specifically 1NF. If that Description (same goes for the ProductName) is going to be repeated, you have a database design issue, and it being in SQLite or other has little to do with it. CL is right with his indexes, mind you, proper indexing will still matter.
Review your model, make a table for products and another for inventory.

Related

Updating rows in order with SQL

I have a table with 4 columns. The first column is unique for each row, but it's a string (URL format).
I want to update my table, but instead of using "WHERE", I want to update the rows in order.
The first query will update the first row, the second query updates the second row and so on.
What's the SQL code for that? I'm using Sqlite.
Edit: My table schema
CREATE table (
url varchar(150),
views int(5),
clicks int(5)
)
Edit2: What I'm doing right now is a loop of SQL queries
update table set views = 5, click = 10 where url = "http://someurl.com";
There is around 4 million records in the database. It's taking around 16 seconds in my server to make the update. Since the loop update the row in order, so the first query update the first row; I'm thinking if updating the rows in order could be faster than using the WHERE clause which needs to browse 4 million rows.
You can't do what you want without using WHERE as this is the only way to select rows from a table for reading, updating or deleting. So you will want to use:
UPDATE table SET url = ... WHERE url = '<whatever>'
HOWEVER... SqlLite has an extra feature - the autogenerated column, ROWID. You can use this column in queries. You don't see this data by default, so if you want the data within it you need to explicitly request it, e.g:
SELECT ROWID, * FROM table
What this means is that you may be able to do what you want referencing this column directly:
UPDATE table SET url = ... WHERE ROWID = 1
you still need to use the WHERE clause, but this allows you to access the rows in insert order without doing anything else.
CAVEAT
ROWID effectively stores the INSERT order of the rows. If you delete rows from the table, the ROWIDs for remaining rows will NOT change - hence it is possible to have gaps in the ROWID sequence. This is by design and there is no workaround short of re-creating the table and re-populating the data.
PORTABILITY
Note that this only applies to SQLite - you may not be able to do the same thing with other SQL engines should you ever need to port this. It would be MUCH better to add an EXPLICIT auto-number column (aka an IDENTITY field) that you can use and manage.

sqlite: save array?

I have simple table with two columns: "id" INTEGER as a key, and "data" INTEGER.
One of the user's requirement is to save order in which he view data.
So I have to save order of records in table.
The simple solution as I see: id, data, order_id.
But in this case if user add record into the middle of his table's view,
we have to update many records.
Another idea: id, data, next_id, previous_id.
The insertion is fast, but extraction of record in defined order is slow.
So what is the best (fast) method to save order of records in table
using sqlite? fast = fast insert + fast extraction of records in defined order.
Update:
The problem with order_id is as I see in insertion of new record. I expect that we have 10 * 10^3 records. The insertion of new records will in the worst case update of all 10 * 10^3 records. sqlite database file is on flash memory. So it is not as fast as on PC, and will be better reduce "write" size, to increase life time of flash.
I think the order_id is better, you only need one update instruction
update table
set order_id = order_id + #newRecordOrder
where id = #id
and order_id > #newRecordOrder
I do wonder if that order is unique to all the table or to a subset, thus needing a second pk field.

How to insert X amount of rows at the beginning of a pre-existing table of data in sqlite

I am relatively new to SQL, so I had a question about insertion.
I have a table of data that I need to import above the existing content of another table. For example, the table I am bringing in has 100 rows, and the table I'm bringing the data into has 100.
I need to make the table I am bringing new data into have 200 rows, and have the first 100 rows blank (so I can update those rows with my new content).
Is there an easy way to do that that I am just missing? Thanks for your help!!
Consider that the database is just a data store. How it's ordered should be up to the client or the caller. Usually the best means of this is with the ORDER BY clause when SELECTing.
So I'd suggest not worrying about how the RDBMS is storing the data, but how it's being extracted.
Likely there's a column or attribute that you're focused on keeping/maintaining order. Perhaps it's a date or number? Consider using that column in your ORDER BY, and remember you can use more than one column in your ordering.
We shouldn't rely on how the data is stored for presentation later on.
/* use SQLite's current_time to save when these records were created*/
INSERT INTO MyTable (Foo, Bar, CreatedOn)
SELECT Foo, Bar, current_time
FROM OtherTable

How are these tasks done in SQL?

I have a table, and there is no column which stores a field of when the record/row was added. How can I get the latest entry into this table? There would be two cases in this:
Loop through entire table and get the largest ID, if a numeric ID is being used as the identifier. But this would be very inefficient for a large table.
If a random string is being used as the identifier (which is probably very, very bad practise), then this would require more thinking (I personally have no idea other than my first point above).
If I have one field in each row of my table which is numeric, and I want to add it up to get a total (so row 1 has a field which is 3, row 2 has a field which is 7, I want to add all these up and return the total), how would this be done?
Thanks
1) If the id is incremental, "select max(id) as latest from mytable". If a random string was used, there should still be an incremental numeric primary key in addition. Add it. There is no reason not to have one, and databases are optimized to use such a primary key for relations.
2) "select sum(mynumfield) as total from mytable"
for the last thing use a SUM()
SELECT SUM(OrderPrice) AS OrderTotal FROM Orders
assuming they are all in the same column.
Your first question is a bit unclear, but if you want to know when a row was inserted (or updated), then the only way is to record the time when the insert/update occurs. Typically, you use a DEFAULT constraint for inserts and a trigger for updates.
If you want to know the maximum value (which may not necessarily be the last inserted row) then use MAX, as others have said:
SELECT MAX(SomeColumn) FROM dbo.SomeTable
If the column is indexed, MSSQL does not need to read the whole table to answer this query.
For the second question, just do this:
SELECT SUM(SomeColumn) FROM dbo.SomeTable
You might want to look into some SQL books and tutorials to pick up the basic syntax.

performance issue in a select query from a single table

I have a table as below
dbo.UserLogs
-------------------------------------
Id | UserId |Date | Name| P1 | Dirty
-------------------------------------
There can be several records per userId[even in millions]
I have clustered index on Date column and query this table very frequently in time ranges.
The column 'Dirty' is non-nullable and can take either 0 or 1 only so I have no indexes on 'Dirty'
I have several millions of records in this table and in one particular case in my application i need to query this table to get all UserId that have at least one record that is marked dirty.
I tried this query - select distinct(UserId) from UserLogs where Dirty=1
I have 10 million records in total and this takes like 10min to run and i want this to run much faster than this.
[i am able to query this table on date column in less than a minute.]
Any comments/suggestion are welcome.
my env
64bit,sybase15.0.3,Linux
my suggestion would be to reduce the amount of data that needs to be queried by "archiving" log entries to an archive table in suitable intervals.
You can still access all entries if you provide a union-view over current and archived log data, but accessing current logs would be much reduced.
Add an index containing both the UserId and Dirty fields. Put UserId before Dirty in the index as it has more unique values.