I've got an SQLite table with potentially hundreds of thousands of entries, which is being added to (and occasionally removed from) in the background at irregular intervals. The UI needs to display this table in an arbitrary user-selected sorted order, within a wxWidgets wxListCtrl item.
I'm planning to use a wxLC_VIRTUAL list control, and query the table for small groups of items as needed using LIMIT and OFFSET, but I foresee trouble. When the background process makes changes to items that are "above" the currently-viewed ones, I can't see any way to know how the offsets of the currently-viewed items will change.
Is there some SQLite trick to handle this? Maybe a way to identify what offset a particular record is at in a specific sorted order, without iterating through all of the records returned by a SELECT statement?
Alternatively, is there some way to create an unchanging view of the database at a particular time, without a time-consuming duplication of it?
If all else fails, I can store the changed items and add them later, but I'm hoping I won't have to.
Solved it by creating a query to find the index of an item, by counting the number of items that are "less than" (in the user-defined order) the one I'm looking for. A little complex to write, because of the user-defined ordering, but it works, and runs surprisingly fast even on a huge table.
Related
I have redesigned an existing data table so that, instead of bunching up unrelated information in a single column, there are more columns per row, and most of them are sortable now.
After the redesign, paginating in a data table (25 or 50 rows / page), the browser now locks up for about 3-5 seconds instead of being less than 1 second.
The items of the tables come from a computed property. I was going to try optimizing the calculation of the computed value, but when paginating, the computed property doesn't get recalculated, so I don't see a point of doing that. I don't know how else I can speed things up again.
Answering my own question:
Putting the data that feeds the table into variable rows in data() rather than in a computed property made everything a lot faster.
In my case, there are only a handful places that necessitate a recalculation of the table data, so I added a tableUpdated() method which will update rows whenever I need.
This change made not only pagination, but also chaging rows per page, dynamically adding/removing columns, and search filtering a lot faster.
For an Android Launcher (Home Screen) app project i want to implement a feature called "Sort by usage". This will sort by the launch count of an app within a user settable timeframe.
The current idea for the implementation is to store an array of unich epoch timestamps, one for each launch.
Additionaly it'll store a counter caching the current amount of launches within the selected timeframe, incremented with every launch. Of course, this would regularly have to be rebuild as time passes, but merely every few hours or at least x percent of the selected timeframe, so computations definitely wouldn't run as often as without the counter, since this information is required everytime when any app entries on screen need to get sorted - but i'm not quite sure if it matters in any way during actual use.
I am now unsure how to store the timestamp array inside the SQL database. As there is a table holding one record with information about each launcher entry i thought about the following options:
Store the array of unix epochs in serialized form (maybe JSON Array) to one field of the entries record
Create a seperate table for launch times with
a. each record starting with an id associated with an entry followed by all launch times, one for each field
b. each record a combination of entry id and one launch time
these options would obvously have the advatage of storing the timestamp using an appropriate type
I probably didn't quite understand why you need a second piece of data for your launch counter - the fact you saved a timestamp already means a launch - why not just count timestamps? Less updating, less record locking, more concurrency.
Now, let's say you've got a separate table with timestamps in a classic one to many setting.
Pros of this setup - you never need to update anything - just keep inserting. You can easily cluster your table by timestamp, run a filter on your timeframe and issue a group by and count rows. The client then will get the numbers and sort by count (I believe it's generally better to not sort in SQL). Cons - you need a join to parent table and probably need to get your indexes right.
Alternatively you store timestamps in a blob text (JSON, CSV, whatever) with your main records. This definitely means you'll have to update your records a lot, which potentially opens you up to locking issues. Then, I'm not entirely sure what you'll have to do to get your final launch counts - you read all entities, deserialise all timestamps, filter by timeframe and then count? It does feel a bit more convoluted in your case.
I don't think there's such thing as a "best" way. You have to consider pros and cons. From what I gather, you might be better off with classic SQL approach unless there's something I didn't catch that will outweigh my points above
We have a navbar menu that will need to contain the quantity of certain items following the section name. Additionally, each user of the system has a different quantity value. The closest analogy I can make to this is a user's inbox and mail folders with the count parenthesized near the name of the folder.
inbox (113)
sent (45)
MyFolder (161)
etc....
My question is this - The navbar is displayed on every page. I am inclined to store the count value of each folder for a user in a separate table, but know that this is considered a bad practice (ie. table normalization). The cost of making the query right now is small but the database will grow. Is querying the DB each time a view is requested (with aggregates) the best practice?
The cost of making the query right now is small but the database will grow.
Growing the database does not necessarily means that the time that it takes to get you the counts would grow proportionally. A good index structure would help your queries remain fast, even when the size of the database grows considerably. For example, if your query retrieves the count of child entities, and the child table has an index on the foreign key, then retrieving the count goes as fast as if you have stored the count in a separate table.
Is querying the DB each time a view is requested (with aggregates) the best practice?
There is no universal answer. There are situations when you have to denormalize your data in order to achieve acceptable performance. This happens when your aggregate query is inherently complex, so trying to optimize it by adding an index or two does not work.
It does not look like your application is at the point where you have to denormalize, so I would keep the normalized structure in place. You can always add a table with aggregates if the performance becomes unacceptable later.
Is there any diffrenece in the performance of read and write operations in SQL? Using Linq to SQL in an ASP.NET MVC application, I often update many values in one of my tables in single posts (during this process, many posts of this type will come in rapidly from the user, although the user is unable to submit new data until the previous update is complete). My current implementation is to loop through the input (a list of the current values for each row), and write them to the field (nullable int). I wonder if there would be any performance difference if instead I read the current db value, and only wrote if it has changed. Most of these operations change the values for roughly 1/4 to 2/3 of the rows, some change fewer, and few change more than 2/3 of the rows.
I don't know much about the comparative speeds of these operations (or if there is even any difference). Is there any benefit to be gained from doing this? If so, what table sizes would benefit the most/not benefit at all, and would there be any percentage of the rows changing that would be a threshold for this improvement?
It's always faster to read.
A write is actually always a read followed by a write.
SQL needs to know which row to write to, which involves reading either an index or the table itself in a seek or scan operation, then writing to the appropriate row.
Writing also needs to update any applicable indexes. Depending on the circumstance, the index may get "updated" even when the data doesn't change.
As a very general rule, it's a good idea only to modify the data that needs to be changed.
I have a normalized database and need to produce web based reports frequently that involve joins across multiple tables. These queries are taking too long, so I'd like to keep the results computed so that I can load pages quickly. There are frequent updates to the tables I am summarising, and I need the summary to reflect all update so far.
All tables have autoincrement primary integer keys, and I almost always add new rows and can arrange to clear the computed results in they change.
I approached a similar problem where I needed a summary of a single table by arranging to iterate over each row in the table, and keep track of the iterator state and the highest primary keen (i.e. "highwater") seen. That's fine for a single table, but for multiple tables I'd end up keeping one highwater value per table, and that feels complicated. Alternatively I could denormalise down to one table (with fairly extensive application changes), which feels a step backwards and would probably change my database size from about 5GB to about 20GB.
(I'm using sqlite3 at the moment, but MySQL is also an option).
I see two approaches:
You move the data in a separate database, denormalized, putting some precalculation, to optimize it for quick access and reporting (sounds like a small datawarehouse). This implies you have to think of some jobs (scripts, separate application, etc.) that copies and transforms the data from the source to the destination. Depending on the way you want the copying to be done (full/incremental), the frequency of copying and the complexity of data model (both source and destination), it might take a while to implement and then to optimizie the process. It has the advantage that leaves your source database untouched.
You keep the current database, but you denormalize it. As you said, this might imply changing in the logic of the application (but you might find a way to minimize the impact on the logic using the database, you know the situation better than me :) ).
Can the reports be refreshed incrementally, or is it a full recalculation to rework the report? If it has to be a full recalculation then you basically just want to cache the result set until the next refresh is required. You can create some tables to contain the report output (and metadata table to define what report output versions are available), but most of the time this is overkill and you are better off just saving the query results off to a file or other cache store.
If it is an incremental refresh then you need the PK ranges to work with anyhow, so you would want something like your high water mark data (except you may want to store min/max pairs).
You can create triggers.
As soon as one of the calculated values changes, you can do one of the following:
Update the calculated field (Preferred)
Recalculate your summary table
Store a flag that a recalculation is necessary. The next time you need the calculated values check this flag first and do the recalculation if necessary
Example:
CREATE TRIGGER update_summary_table UPDATE OF order_value ON orders
BEGIN
UPDATE summary
SET total_order_value = total_order_value
- old.order_value
+ new.order_value
// OR: Do a complete recalculation
// OR: Store a flag
END;
More Information on SQLite triggers: http://www.sqlite.org/lang_createtrigger.html
In the end I arranged for a single program instance to make all database updates, and maintain the summaries in its heap, i.e. not in the database at all. This works very nicely in this case but would be inappropriate if I had multiple programs doing database updates.
You haven't said anything about your indexing strategy. I would look at that first - making sure that your indexes are covering.
Then I think the trigger option discussed is also a very good strategy.
Another possibility is the regular population of a data warehouse with a model suitable for high performance reporting (for instance, the Kimball model).