I'm creating Android application that contains a large amount of data. It takes lot of time to access the data. What are optimization technique that I can use ? Are there any optimized queries?
This page will give you a lot of good tips on how to optimize SQLite things:
http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
Keep in mind that some stuff SQLite will optimize for you as well when it runs a query:
http://www.sqlite.org/optoverview.html
Use parameterized queries to reduce the amount of queries that needs to be parsed, see: How do I get around the "'" problem in sqlite and c#?
You can use parametrized statements in Android:
http://developer.android.com/reference/android/database/sqlite/SQLiteDatabase.html#compileStatement%28java.lang.String%29
SQLite does not support dates natively (SQLite uses strings to store dates). If you are using an index to access dates - you'll get slow query times (not to mention inaccurate or wrong results..)
If you really need to sort through dates then I'd suggest you create separate columns for the various date elements that you want to index (like years, months, and days). Define these columns as integers and add INDEX statements to index their contents.
Related
I haven been working with R interfacing my Oracle DB using the DBI package. I read that preparing a query is often a good practice when trying to query the same statment different times.
My question is, assuming infinite RAM to accomodate for the data downloaded, which factors may influence in the different run times between two scenarios: running a prepared query N times or using a WHERE ... BETWEEN filter?
Let's say I have to run a query to analyze some time series information between 2012 and 2018. I have found different download times between running a prepared query for each month between my analysis window and just filtering the whole window.
It depends on how the database optimizes your query. Maybe it chooses to optimize with an index when selecting just a single month, maybe it chooses to use a full table scan to retrieve the whole window at once.
Usually I would expect the query to retrieve the entire dataset at once to be more efficient than breaking it up in several parts per month.
Factors that play a role are, among others:
What percentage of rows in the table are you accessing?
Are there indexes that can be used?
Which data was recently accessed (and might be cached)?
How much data can the database handle/cache in memory?
Did you use bind variables for the statements?
i am facing problem in compare data from two huge tables.
scenario :
problem : i have to find out gaps between two set of data which is stored in tables in oracle DB and having live siebel application on it. i can't simply use select statement on whole set of data (8,000,000 rows) which is effecting performance of application.
what i have done it now :
simply put cursor on one set of data and comparing on my logic with other set of data and inserting gaps in other tables acc to logics, but in this solution it comparing one row at a time which is very slow process and getting time out after sometime.
can anyone suggest any better solution than this so that it can speed up the process really appreciate the help.
There can be several different ways to improve performance depending on the exact use case. Based on the information you had provided, below are few things that might work.
Rewrite the queries using 'minus' or 'not exists'.
Create index on the columns that are used in the where clause. Note that index creation will take time and resources and impact system, so it is advisable to do that when the load on the server is low. If indexes are already there and not being used, try to use hints.
If the data in those tables is static try to duplicate tables in test environment and run appropriate tests.
Using cursor on 8M rows does not sound very efficient unless that is the only way to go.
If you give more details, we might be able to give better suggestions.
To perform good performance with Spark. I'm a wondering if it is good to use sql queries via SQLContext or if this is better to do queries via DataFrame functions like df.select().
Any idea? :)
There is no performance difference whatsoever. Both methods use exactly the same execution engine and internal data structures. At the end of the day, all boils down to personal preferences.
Arguably DataFrame queries are much easier to construct programmatically and provide a minimal type safety.
Plain SQL queries can be significantly more concise and easier to understand. They are also portable and can be used without any modifications with every supported language. With HiveContext, these can also be used to expose some functionalities which can be inaccessible in other ways (for example UDF without Spark wrappers).
Ideally, the Spark's catalyzer should optimize both calls to the same execution plan and the performance should be the same. How to call is just a matter of your style.
In reality, there is a difference accordingly to the report by Hortonworks (https://community.hortonworks.com/articles/42027/rdd-vs-dataframe-vs-sparksql.html ), where SQL outperforms Dataframes for a case when you need GROUPed records with their total COUNTS that are SORT DESCENDING by record name.
By using DataFrame, one can break the SQL into multiple statements/queries, which helps in debugging, easy enhancements and code maintenance.
Breaking complex SQL queries into simpler queries and assigning the result to a DF brings better understanding.
By splitting query into multiple DFs, developer gain the advantage of using cache, reparation (to distribute data evenly across the partitions using unique/close-to-unique key).
The only thing that matters is what kind of underlying algorithm is used for grouping.
HashAggregation would be more efficient than SortAggregation. SortAggregation - Will sort the rows and then gather together the matching rows. O(n*log n)
HashAggregation creates a HashMap using key as grouping columns where as rest of the columns as values in a Map.
Spark SQL uses HashAggregation where possible(If data for value is mutable). O(n)
Chasing down some DB performance issues in a fairly typical EclipseLink/JPA application.
I am seeing frequent queries that are taking 25-100ms. These are simple queries, just selecting all columns from a table where its primary key is equal to a value. They shouldn't be slow.
I'm looking at the query time in the postgres log, using the log_min_duration_statement so this should eliminate any network or application overhead.
This query is not slow, but it is used very often.
Why would selecting * by primary key be slow?
Is this specific to postgres or is it a generic DB issue?
How can I speed this up? In general? For postgres?
Sample query from the pg log:
2010-07-28 08:19:08 PDT - LOG: duration: 61.405 ms statement: EXECUTE <unnamed> [PREPARE: SELECT coded_ele
ment_key, code_system, code_system_label, description, label, code, concept_key, alternate_code_key FROM coded
_element WHERE (coded_element_key = $1)]
Table has around 3.5 million rows.
I have also run EXPLAIN and EXPLAIN ANALYZE on this query, its only doing an index scan.
Select * makes your database work harder, and as a general rule, is a bad practice. There are tons of questions/answers on stackoverflow talking about that.
have you tried replacing * with the field names?
Could you be getting some kind of locking contention? What kind of locks are you taking when performing these queries?
Well, I don't know much about postgres SQL, so I'll give you a tip for MS SQL Server which might be applicable.
MS SQL Server has the concept of a "cluster index" which is the physical layout of the data on the disk. It's good to use on field where you'll be seeking a range between to values (date fields mostly). It's not much use if you're looking for a exact value (like a primary key lookup). However, sometimes the primary key index is inadvertantly set as a clustered index. This makes an index lookup into a table scan.
The the row unusually large or contain BLOBs and large binary fields?
Is this directly through console or is this query being run through some data access API like jdbc or ADO.NET? You mention JPA that looks like a data access API. For short queries, data access API become a larger percent of execution time-- creating the command, creating objects to hold the rows and cells, etc.
select * is almost always a very very bad idea.
If the order of the fields changes, it will break your code.
According to comments, this isn't really important given the abstraction library you're using.
You're probably returning more data from the table than you actually want. Selecting for the specific fields you want can save transfer time.
25ms is about the lower bound you're going to see on almost any kind of SQL query -- that's only two disk accesses! You might want to look into ways to reduce the number of times the query is run rather than trying to optimize the query.
I want to know optimization techniques for databases that has nearly 80,000 records,
list of possibilities for optimizing
i am using for my mobile project in android platform
i use sqlite,i takes lot of time to retreive the data
Thanks
Well, with only 80,000 records and assuming your database is well designed and normalized, just adding indexes on the columns that you frequently use in your WHERE or ORDER BY clauses should be sufficient.
There are other more sophisticated techniques you can use (such as denormalizing certain tables, partitioning, etc.) but those normally only start to come into play when you have millions of records to deal with.
ETA:
I see you updated the question to mention that this is on a mobile platform - that could change things a bit.
Assuming you can't pare down the data set at all, one thing you might be able to do would be to try to partition the database a bit. The idea here is to take your one large table and split it into several smaller identical tables that each hold a subset of the data.
Which of those tables a given row would go into would depend on how you choose to partition it. For example, if you had a "customer_id" field that could range from 0 to 10,000 you might put customers 0 - 2500 in table1, 2,500 - 5,000 in table2, etc. splitting the one large table into 4 smaller ones. You would then have logic in your app that would figure out which table (or tables) to query to retrieve a given record.
You would want to partition your data in such a way that you generally only need to query one of the partitions at a time. Exactly how you would partition the data would depend on what fields you have and how you are using them, but the general idea is the same.
Create indexes
Delete indexes
Normalize
DeNormalize
80k rows isn't many rows these days. Clever index(es) with queries that utlise these indexes will serve you right.
Learn how to display query execution maps, then learn to understand what they mean, then optimize your indices, tables, queries accordingly.
Such a wide topic, which does depend on what you want to optimise for. But the basics:
indexes. A good indexing strategy is important, indexing the right columns that are frequently queried on/ordered by is important. However, the more indexes you add, the slower your INSERTs and UPDATEs will be so there is a trade-off.
maintenance. Keep indexes defragged and statistics up to date
optimised queries. Identify queries that are slow (using profiler/built-in information available from SQL 2005 onwards) and see if they could be written more efficiently (e.g. avoid CURSORs, used set-based operations where possible
parameterisation/SPs. Use parameterised SQL to query the db instead of adhoc SQL with hardcoded search values. This will allow better execution plan caching and reuse.
start with a normalised database schema, and then de-normalise if appropriate to improve performance
80,000 records is not much so I'll stop there (large dbs, with millions of data rows, I'd have suggested partitioning the data)
You really have to be more specific with respect to what you want to do. What is your mix of operations? What is your table structure? The generic advice is to use indices as appropriate but you aren't going to get much help with such a generic question.
Also, 80,000 records is nothing. It is a moderate-sized table and should not make any decent database break a sweat.
First of all, indexes are really a necessity if you want a well-performing database.
Besides that, though, the techniques depend on what you need to optimize for: Size, speed, memory, etc?
One thing that is worth knowing is that using a function in the where statement on the indexed field will cause the index not to be used.
Example (Oracle):
SELECT indexed_text FROM your_table WHERE upper(indexed_text) = 'UPPERCASE TEXT';