SQLite caching vs Application Caching - objective-c

So I'm writing an application that is very heavy on SQLite usage. I'm working on writing into my application an in memory caching system that will allow me to sort and filter my data (my own personal Core Data...in essence). I'm doing this because it seems to me that this is a better/faster option than to constantly make read requests from the SQLite database. Plus, most fields/columns will be searchable/sortable, and to set up indexes on each one seems less than ideal. But I'm not sure. I know the SQLite database is cached some in memory but I don't know to what extent or how much of an advantage that would be for me. Implementing my own caching system will be complex and will probably add to my memory footprint, especially since I'm loading each table completely into memory to perform the sort/filters. I'm more than willing to do it if it helps the performance of my app, but will it? Is the SQLite caching sufficient for me to rely solely on that or will it get bogged down when the tables start getting large (10,000+ rows)? I guess I'm asking if anyone has enough experience with SQLite to recommend one over the other.
Before anyone asks: no I can't use Core Data. Core Data isn't flexible enough for me to use in my application.

Ok, so here's what I've figured: the choice depends greatly on your requirements. I ended up removing (as much as possible) the SQLite cache, loading up what I needed, and sorting/filtering it using my own routines. This works remarkably well for me. But I've realized by implementing this that this wouldn't work in a lot of situations. Specifically, I've done a lot to make sure that my DB size is as small as possible. I'm basically storing only simple/small text and numbers. Everything else is a reference to an outside file. This makes my database small enough to use less as a database and more as an indexing service, which works well for loading the info into memory and sorting/filtering.
So, the answer depends a lot on the database. If you're storing large fields that might potentially take a lot of memory, it's probably best to let SQLite handle the cache. On the other hand, if you know the fields will be small, the SQLite cache will only inflate your memory and the round trips to the database to sort/filter data will only increase your latency. Instead, it's better to do the sorting/filtering yourself, though I will admit that's a lot work. But in the end it made my app a lot faster than round-tripping it to the DB.

Related

Writing many values received in real time to database on iPhone with SQLite3

I'm currently writing an iOS app and I have many records that I'm writing to a database.
Even though with the iPhone you are writing to flash memory, the ram still has a faster access time.
To improve performance I am writing to a temporary cache in ram and then at one point I append that cache to the database.
What is a standard practice / technique with knowing how often to write the cache to the database?
How can I fine tune this?
Thanks in advance!
I had a similar issue with a cache that needed to be flushed to a server instead of a local DB. I used instruments to find the "typical" size of one of the cached objects (mine were fairly uniformed) and I just maintain a count of how many are in the cache and when I cross the threshold I empty my cache to the server. I then learned about NSCache that has much of this same behavior. I investigated ways to dynamically determine the size of each object in the cache, but found it tedious and brittle.
Basically, I think you need to decide what makes sense from your app based on the usage characteristics gathered with instruments. I found the video from the 2011 WWDC conference "Section 318 - iOS Performance in Depth" to be very helpful for similar situations. You can find it on itunes U.

Web Caching Servers for SQL Server OLTP Env. Recommendations

I inhereted a high volume OLTP DB which I have free reign to improve as much as I find reasonably possible. The improvements already were very helpful but I want to take it to the next level. The data access patterns I found made it a good candidate IMO to cache the data on other servers and I would love to hear anyone's experience or recommendations with this type of setup.
We have a DB that gets about 3GB of data added to a table every day and reporting on it used to be very slow. The data does not change once it's put in, and no data gets inserted that is over a week old. Rows inputted within the last 3 days tend to see thousands of inserts between tens of millions of rows.
I was thinking of having data over 2 weeks old get pushed out to MongoDB. I could then have the 2 week sliding window data that is not pushed out to Mongo be, be cached by some kind of caching software so those get queried and displayed instead of the data being read out of the DB the whole time. I figure this way we still get full A.C.I.D compliance by having the DB engine validate all the data, have high read performance as it is not hitting the DB, then Mongo can take it when it is no longer a 'transaction'.
Anyone have any recommended solutions? I was looking at MemCached, but not quite sure if that's a good or even plausible solution. Thanks!
Another thing you could consider is using the new In-Memory OLTP feature in SQL Server 2014. That feature improves efficiency and scaling for OLTP workloads. You will potentially be able to get a lot more out of your existing server, without the need to consider specific caching mechanisms.
I don't have specific experience with SQL Server, but what you are describing does seem like a valid use case for MongoDB.
Note that while MongoDB can't directly handle transactions, it is capable of handling certain operations in an atomic fashion (see findAndModify, for instance). Additionally, with journaling enabled, you shouldn't have any reason to worry about durability. MongoDB is a reliable data store and will not lose or corrupt your data.
MongoDB itself can also act as a performant cache if you run a second deployment with journaling disabled. In this instance, writes will take place in memory and only be persisted to the disk every 60 seconds (unless otherwise configured). This will provide comparable performance to memcache which is solely in-memory while allowing you to keep your stack a bit simpler.
Hope this helps!

Which ORM has better performance !! OpenAccess or LLBLGen Pro?

I'm working on a new project right now and thinking of using an ORM beyond that of OpenAccess or LLBLGen Pro or Subsonic.This project may have great quantities and hits concurrent,So our performance requirements is very high.
Please compare and recommend it to me.
Thanks
Jim.
Jim,
For the best results in answering this question, you'll need to do your own comparison since your specific requirements and data access scenarios will likely affect the results of any such performance testing.
That said, we use LLBLGen for a high throughput web application and the performance is exceptional. We have found that the big issue is in the application design itself. Using SQL Server Profiler we are able to see (during development) which parts of the application create a lot of hits on the database. The biggest penalty we found was with loading a grid and then doing another database operation OnDataBinding / DataBound events. This creates a huge amount of traffic to the SQL Server database, a lot of reads and a lot of disk swapping. We have been very well served by making sure we get all the data in the first query by making a good design choice when building the set of data/joins/etc. when building the application -- or refactoring it later once we find the performance is slow.
The overhead for LLBLGen, at least, is very minimal. It's fast even when creating huge numbers of objects. The much, much bigger performance hit comes when we make queries that spawn other queries (example above) and our DB reads go through the roof.
You may wish to evaluate both for which one you feel is a better match for your skills and productivity as well.

Design a database with a lot of new data

Im new to database design and need some guidance.
A lot of new data is inserted to my database throughout the day. (100k rows per day)
The data is never modified or deleted once it has been inserted.
How can I optimize this database for retrieval speed?
My ideas
Create two databases (and possible on different hard drives) and merge the two at night when traffic is low
Create some special indexes...
Your recommendation is highly appreciated.
UPDATE:
My database only has a single table.
100k/day is actually fairly low. 3M/month, 40M/year. You can store 10 years archive and not reach 1B rows.
The most important thing to choose in your design will be the clustered key(s). You need to make sure that they are narrow and can serve all the queries your application will normally use. Any query that will end up in table scan will completely trash your memory by fetching in the entire table. So, no surprises there, your driving factor in your design is the actual load you'll have: exactly what queries will you be running.
A common problem (more often neglected than not) with any high insert rate is that eventually every row inserted will have to be deleted. Not acknowledging this is a pipe dream. The proper strategy depends on many factors, but probably the best bet is on a sliding window partitioning scheme. See How to Implement an Automatic Sliding Window in a Partitioned Table. This cannot be some afterthought, the choice for how to remove data will permeate every aspect of your design and you better start making a strategy now.
The best tip I can give which all big sites use to speed up there website is:
CACHE CACHE CACHE
use redis/memcached to cache your data! Because memory is (blazingly)fast and disc I/O is expensive.
Queue writes
Also for extra performance you could queue up the writes in memory for a little while before flushing them to disc -> writting them to SQL database. Off course then you have the risk off losing data if you keep it in memory and your computer crashes or has power failure or something
Context missing
Also I don't think you gave us much context!
What I think is missing is:
architecture.
What kind of server are you having VPS/shared hosting.
What kind of Operating system does it have linux/windows/macosx
computer specifics like how much memory available, cpu etc.
a find your definition of data a bit vague. Could you not attach a diagram or something which explains your domain a little bit. For example something like
this using http://yuml.me/
Your requirements are way to general. For MS SQL server 100k (more or less "normal") records per days should not be a problem, if you have decent hardware. Obviously you want to write fast to the database, but you ask for optimization for retrieval performance. That does not match very well! ;-) Tuning a database is a special skill on its own. So you will never get the general answer you would like to have.

Is ORM slow? Does it matter? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I really like ORM as compared to store procedure, but one thing that I afraid is that ORM could be slow, because of layers and layers of abstraction. Will using ORM slow down my application? Or does it matter?
Yes, it matters. It is using more CPU cycles and consequently slowing your application down. Hear me out though...
But, consider this: what is more expensive? Server hardware or another programmer? Server hardware, generally, is cheaper than hiring another team of programmers. So, while ORM may be costing you CPU cycles, you need one less programmer to manage your SQL queries, often resulting in a lower net cost.
To determine if it's worth it for you, calculate or determine how many hours you saved by using an ORM. Then, figure out how much money you spent on the server to support ORM. Multiply the hours you saved by your hourly rate and compare to the server cost.
Of course, whether an ORM actually saves you time is a whole another debate...
Is ORM slow?
Not inherently. Some heavyweight ORMs can add a general drag to things but we're not talking orders of magnitude slowdown.
What does make ORM slow is naïve usage. If you're using an ORM because it looks easy and you don't know how the underlying relational data model works, you can easily write code that seems reasonable to an OO programmer, but will murder performance.
ORM is a handy tool, but you need the lower-level understanding (that usually comes from writing SQL queries) to go with it.
Does it matter?
If you end up performing a looped query for each of thousands of entities at once, instead of a single fast join, then certainly it can.
ORM's are slower and do add overhead to applications (unless you specifically know how to get around these problems, which is not very common). The database is the most critical element and Web applications should be designed around it.
Many OOP frameworks using Active Record or ORMs, developers in general - treat the database as an unimportant afterthought and tend to look at it as something they don't really need to learn. But performance and scalability usually suffer as the db is heavily taxed!
Many large scale web apps have fallen flat, wasting millions and months to years of time because they didn't recognize the importance of the database. Hundreds of concurrent users and tables with millions of records require database tuning and optimization. But I believe the problem is noticeable with a few users and less data.
Why are developers so afraid to learn proper SQL and tuning measures when it's the key to performance?
In a Windows Mobile 5 project against using SqlCe, I went from using hand-coded objects to code generated (CodeSmith) objects using an ORM template. In the process all my data access used CSLA as a base layer.
The straight conversion improved my performance by 32% in local testing, almost all of it a result of better access methods.
After that change, we adjusted the templates (after seeing some SqlCe performance stuff at PDC by Steve Lasker) and in less then 20 minutes, our entire data layer was greatly improved, our average 'slow' calls went from 460ms to ~20ms. The cool part about the ORM stuff is that we only had to implement (and unit test) these changes once and all the data access code got changed. It was an amazing time saver, we maybe saved 40 hours or more.
The above being said, we did lose some time by taking out a bunch of 'waiting' and 'progress' dialogs that were no longer needed.
I have used a few of the ORM tools, and I can recommend two of them:
.NET Tiers
CSLA codegen templates
Both of them have performed quite nicely and any performance loss has not been noticeable.
I've always found it doesn't matter. You should use whatever will make you the most productive, responsive to changes, and whatever is easiest to debug and maintain.
Most applications never need enough load for the difference between ORM and SPs to noticeable. And there are optimizations to make ORM faster.
Finally, a well-written app will have its data access seperated from everything else so that in the future switching from ORM to whatever would be possible.
Is ORM slow?
Yes ( compared with stored procedures )
Does it matter?
No ( except your concern is speed )
I think the problem is many people think of ORM as a object "trick" to databases, to code less or simplify SQL usage, while in reality is .. well an Object - To Relational ( DB ) - Mapping.
ORM is used to persist your objects to a relational database manager system, and not ( just ) to substitute or make SQL easier ( although it make a good job at that too )
If you don't have a good object model, or you're using to make reports, or even if you're just trying to get some information, ORM is not worth it.
If in the other hand you have a complex system modeled through objects were each one have different rules and they interact dynamically and you your concern is persist that information into the database rather than substitute some existing SQL scripts then go for ORM.
Yes, ORM will slow down your application. By how much depends on how far the abstraction goes, how well your object model maps to the database, and other factors. The question should be, are you willing to spend more developer time and use straight data access or trade less dev time for slower runtime performance.
Overall, the good ORMs have little overhead and, by and large, are considered well worth the trade off.
Yes, ORMs affect performance, whether that matters ultimately depends on the specifics of your project.
Programmers often love ORM because they like the nice front-end cding environments like Visual Studio and dislike coding raw SQL with no intellisense, etc.
ORMs have other limitations besides a performance hit--they also often do not do what you need 100% of the time, add the complexity of an additional abstraction layer that must be maintained and re-established every time chhnges are made, there are also caching issues to be dealt with.
Just a thought -- if the database vendors would make the SQL programming environment as nice as Visual Studio, and provide a more natural linkage between the db code and front-end code, we wouldn't need the ORMs...I guess things may go in that direction eventually.
Obvious answer: It depends
ORM does a good job of insulating a programmer from SQL. This in effect substitutes mediocre, computer generated queries for the catastrophically bad queries a programmer might give.
Even in the best case, an ORM is going to do some extra work, loading fields it doesn't need to, explicitly checking constraints, and so forth.
When these become a bottle-neck, most ORM's let you side-step them and inject raw SQL.
If your application fits well with objects, but not quite so easily with relations, then this can still be a win. If instead your app fits nicely around a relational model, then the ORM represents a coding bottleneck on top of a possible performance bottleneck.
One thing I've found to be particularly offensive about most ORM's is their handling of primary keys. Most ORM's require pk's for everything they touch, even if there is no concievable use for them. Example: Authors should have pk's, Blog posts SHOULD have pk's, but the links (join table) between authors and posts not.
I have found that the difference between "too slow" and "not too much slower" depends on if you have your ORM's 2nd level (SessionFactory) cache enabled. With it off it handles fine under development load, but will crush your system under mild production load. After turning on the 2nd Level cache the server handled the expected load and scaled nicely.
ORM can get an order of magnitude slower, not just on the grount=s of wasting a lot of CPU cycles on it's own but also using much more memeory which then has to be GC-d.
Much worse that that however is that the is no standard for ORM (unlike SQL) and that my and large ORM-s use SQL vary inefficiently so at the end of the day you still have to dig into SQL to fix per issues and every time an ORM makes a mess and you have to debug it. Meaning that you haven't gained anything at all.
It's terribly immature technology for real production-level applications. Very problematic things are handling indexes, foreign keys, tweaking tables to fit object hierarchies and terribly long transactions, which means much more deadlocks and repeats - if an ORM knows hows to handle that at all.
It actually makes servers less scalable which multiplies costs but these costs don't get mentioned at the begining - a little inconvenient truth :-) When something uses transactions 10-100 times bigger than optimal it becomes impossible to scale SQL side at all. Talking about serious systems again not home/toy/academic stuff.
An ORM will always add some overhead because of the layers of abstraction but unless it is a poorly designed ORM that should be minimal. The time to actually query the database will be many times more than the additional overhead of the ORM infrastructure if you are doing it correctly, for example not loading the full object graph when not required. A good ORM (nHibernate) will also give you many options for the queries run against the database so you can optimise as required as well.
Using an ORM is generally slower. But the boost in productivity you get will get your application up and running much faster. And the time you save can later be spent finding the portions of your application that are causing the biggest slow down - you can then spend time optimizing the areas where you get the best return on your development effort. Just because you've decided to use an ORM doesn't mean you can't use other techniques in the sections of code that can really benefit from it.
An ORM can be slower, but this is offset by their ability to cache data, therefore however fast the alternative, you can't get much faster than reading from memory.
I never really understood why people think that this is slower or that is slower... get a real machine I say. I have had mixed results... I've seen where execution time for a stored procedure is much slower than ORM and vise versa.. But in both cases the performance was due to difference in hardware.