sqlite: save array? - sql

I have simple table with two columns: "id" INTEGER as a key, and "data" INTEGER.
One of the user's requirement is to save order in which he view data.
So I have to save order of records in table.
The simple solution as I see: id, data, order_id.
But in this case if user add record into the middle of his table's view,
we have to update many records.
Another idea: id, data, next_id, previous_id.
The insertion is fast, but extraction of record in defined order is slow.
So what is the best (fast) method to save order of records in table
using sqlite? fast = fast insert + fast extraction of records in defined order.
Update:
The problem with order_id is as I see in insertion of new record. I expect that we have 10 * 10^3 records. The insertion of new records will in the worst case update of all 10 * 10^3 records. sqlite database file is on flash memory. So it is not as fast as on PC, and will be better reduce "write" size, to increase life time of flash.

I think the order_id is better, you only need one update instruction
update table
set order_id = order_id + #newRecordOrder
where id = #id
and order_id > #newRecordOrder
I do wonder if that order is unique to all the table or to a subset, thus needing a second pk field.

Related

Optimize Delete

I have a importer system which updates the column of already existing rows in a Table. Since UPDATE was taking time I changed it to DELETE and BULK INSERT.
Here is my database setup snippet
Table: ParameterDefinition
Columns: Id, Name, Other Cols
Table: ParameterValue
Columns: Id, CustId, ParameterDefId, Value
I get the values associated to ParamterDefinition.Name from my XML source, so to import I first delete all the existing ParamterValue with all the ParamterDefinition.Name passed in the XML and finally do bulk insert of all the values from XML. Here is my query
DELETE FROM ParameterValue WHERE CustId = ? AND ParameterDefId IN (?,?...?);
For 1000 Customers the above DELETE statement is called 1000 times which is very time consuming now, approximately 64 seconds.
Is there any better way to handle DELETE of 1000 customers?
Thanks,
Sheeju
Create a temporary table for the bulk-insert (ParameterValue_Import). Do the bulk-inserts to this table, then update/insert/delete based on the imported data.
INSERT INTO .. SELECT .. WHERE NOT EXISTS ( .. ) for the new rows
UPDATE .. FROM for the updates
DELETE FROM WHERE NOT EXISTS ( .. ) for the deletion
Bulk operations have better performance than standalone operations. Most DBMSs are designed to handle set based operations instead of record based ones.
Edit
To delete or update one record based on a WHERE clause which refers to only one record, the DBMS should either do a full table scan (if there is no index for the where condition) or do an index lookup. Only after the record successfully identified, the DBMS proceeds the original request (update or delete). Based on the number of records in the table and/or the size/depth of the index, this could be really expensive. This process are done for each and every command in the batch. Summing up the total cost, it could be more than if you are updating/deleting records based on another table. (Especially if the operations are update/delete nearly all records in the target table.)
When you are trying to delete/update several records at once (e.g. based on another table), the DBMS could do the lookups with only one table scan/index lookup and do a logical join when processing your request.
The cost of purely updating a record is the same in each case, just the total cost of lookup could be significantly different.
Furthermore deleting then inserting a record to update it could require more resources: when you are deleting a record, all related indexes will be updates, and when you insert the new record, the indexes will be updated once more, while with updating the record, only those indexes should be updated, which are related to an updated column (and the index update should be done only once).
I am giving the exact syntax to the above idea given by #Pred
After Bulk Insert lets say you have data in "ParamterValue_Import"
To INSERT The Records in "ParamterValue_Import" which are not in "ParamterValue"
INSERT INTO ParameterValue (
CustId, ParameterDefId, Value
)
SELECT
CustId, ParameterDefId, Value
FROM
ParameterValue_Import
WHERE
NOT EXISTS (
SELECT null
FROM ParameterValue
WHERE ParameterValue.CustId = ParameterValue_Import.CustId
);
To UPDATE The Records in "ParamterValue" which are also in "ParamterValue_Import"
UPDATE
ParameterValue
SET
Value = ParameterValue_Import.Value
FROM
ParameterValue_Import
WHERE
ParameterValue.ParameterDefId = ParameterValue_Import.ParameterDefId
AND ParameterValue.CustId = ParameterValue_Import.CustId;

Moving large amounts of data instead of updating it

I have a large table (about 40M Rows) where I had a number of columns that are 0 which need to be null instead so we can better key the data.
I've written scripts to look chop the update into chunks of 10000 records, find the occurance of the columns with zero and update them to null.
Example:
update FooTable
set order_id = case when order_id = 0 then null else order_id end,
person_id = case when person_id = 0 then null else person_id end
WHERE person_id = 0
OR order_id = 0
This works great, but it takes for ever.
I thinking the better way to do this would be to create a second table and insert the data into it and then rename it to replace the old table with the columns having zero.
Question is - can I do a insert into table2 select from table1 and in the process cleanse the data from table1 before it goes in?
You can usually create a new, sanitised, table, depending on the actual DB server you are using.
The hard thing is that if there are other tables in the database, you may have issues with foreign keys, indexes, etc which will refer to the original table.
Whether making a new sanitised table will be quicker than updating your existing table is something you can only tell by trying it.
Dump the pk/clustered key of all the records you want to update into a temp table. Then perform the update joining to the temp table. That will ensure the lowest locking level and quickest access. You can also add an identity column to the temp table, than you can loop through and do the updates in batches.

SQLite scanning table performance

My table has the following fields:
Date (integer)
State (integer)
ProductId (integer)
ProductName (integer)
Description (text) (maximum text lenght
3000 characters)
There will be more than 8 million rows. I need to decide whether I should put the product description in another table. My main goal is to have this statement very fast:
SELECT Date,State,ProductId,ProductName FROM tablename ORDER BY DATE desc LIMIT 100
The SQL result will not fetch the Description field value in the above statement. The user will see the description only when the row is selected in the application (new query).
I would really want to have the product Description in the same table, but I'm not sure how SQLite scans the rows. If Date value doesn't match I would assume that SQLite can quickly skip to the next row. Or maybe it needs to scan all fields of the row till it gets to the end of the Description field value in order to know that the row has ended? If it needs to scan all fields to get to the next row will the value of 3000 characters in the Description field decrease the speed a lot?
EDIT: No indexing should be used since INSERT speed is important.
EDIT: The only reason of trying to have it all in one table is that I want to do INSERTs and UPDATEs in one transaction of hundreds of items. The same item could be inserted and later updated in the same transaction, so I can not know the last insert id per item.
When you use that query and do not have an index on the Date column, SQLite will read all records from the table, and use a temporary table to sort the result.
When you have an index on the Date column, SQLite will look up the last 100 records in the index, then read all the data of those records from the table.
When you have a covering index, i.e., one index with the four columns Date, State, ProductId, and ProductName, SQLite will just read the last 100 entries from the index.
Whenever SQLite reads from the database file, it does not read values or records, but entire pages (typically, 1 KB or 4 KB).
In case 1, SQLite will read all pages of the table.
In case 2, SQLite will read the last page of the index (because the 100 dates will fit into one page), and 100 pages of the table (one for each record, assuming that no two of these records happen to be in the same page).
In case 3, SQLite will read the last few pages of the index.
Case 2 will be much faster than case 1; case 3 will be faster still, but probably not enough to be noticeable.
I would suggest to rely on good old database normalization rules, in this case specifically 1NF. If that Description (same goes for the ProductName) is going to be repeated, you have a database design issue, and it being in SQLite or other has little to do with it. CL is right with his indexes, mind you, proper indexing will still matter.
Review your model, make a table for products and another for inventory.

Updating rows in order with SQL

I have a table with 4 columns. The first column is unique for each row, but it's a string (URL format).
I want to update my table, but instead of using "WHERE", I want to update the rows in order.
The first query will update the first row, the second query updates the second row and so on.
What's the SQL code for that? I'm using Sqlite.
Edit: My table schema
CREATE table (
url varchar(150),
views int(5),
clicks int(5)
)
Edit2: What I'm doing right now is a loop of SQL queries
update table set views = 5, click = 10 where url = "http://someurl.com";
There is around 4 million records in the database. It's taking around 16 seconds in my server to make the update. Since the loop update the row in order, so the first query update the first row; I'm thinking if updating the rows in order could be faster than using the WHERE clause which needs to browse 4 million rows.
You can't do what you want without using WHERE as this is the only way to select rows from a table for reading, updating or deleting. So you will want to use:
UPDATE table SET url = ... WHERE url = '<whatever>'
HOWEVER... SqlLite has an extra feature - the autogenerated column, ROWID. You can use this column in queries. You don't see this data by default, so if you want the data within it you need to explicitly request it, e.g:
SELECT ROWID, * FROM table
What this means is that you may be able to do what you want referencing this column directly:
UPDATE table SET url = ... WHERE ROWID = 1
you still need to use the WHERE clause, but this allows you to access the rows in insert order without doing anything else.
CAVEAT
ROWID effectively stores the INSERT order of the rows. If you delete rows from the table, the ROWIDs for remaining rows will NOT change - hence it is possible to have gaps in the ROWID sequence. This is by design and there is no workaround short of re-creating the table and re-populating the data.
PORTABILITY
Note that this only applies to SQLite - you may not be able to do the same thing with other SQL engines should you ever need to port this. It would be MUCH better to add an EXPLICIT auto-number column (aka an IDENTITY field) that you can use and manage.

How to insert X amount of rows at the beginning of a pre-existing table of data in sqlite

I am relatively new to SQL, so I had a question about insertion.
I have a table of data that I need to import above the existing content of another table. For example, the table I am bringing in has 100 rows, and the table I'm bringing the data into has 100.
I need to make the table I am bringing new data into have 200 rows, and have the first 100 rows blank (so I can update those rows with my new content).
Is there an easy way to do that that I am just missing? Thanks for your help!!
Consider that the database is just a data store. How it's ordered should be up to the client or the caller. Usually the best means of this is with the ORDER BY clause when SELECTing.
So I'd suggest not worrying about how the RDBMS is storing the data, but how it's being extracted.
Likely there's a column or attribute that you're focused on keeping/maintaining order. Perhaps it's a date or number? Consider using that column in your ORDER BY, and remember you can use more than one column in your ordering.
We shouldn't rely on how the data is stored for presentation later on.
/* use SQLite's current_time to save when these records were created*/
INSERT INTO MyTable (Foo, Bar, CreatedOn)
SELECT Foo, Bar, current_time
FROM OtherTable