Non-trivial geolocation query db caching

Non-trivial geolocation query db caching - sql

I first have to say that I really am a rookie in caching, so please do elaborate on any explanation and bear with me if my question is stupid.
I have a server with pretty limited resources, so I'm really interested in caching db-queries as effectively as I can. My issue is this:
I have a MYSQL DB with a table for geolocations, there are columns (lat and lng) - I only indexed lat since a query will always have both lat and lng, and only 1 index can be effectively used to my understanding (?).
The queries are very alternating in coordinates like
select lat, lng
where lat BETWEEN 123123123 AND 312412312 AND lng BETWEEN 235124231 AND 34123124
where the long numbers that are the boundaries of the BETWEEN query are constantly changing, so IS there a way to cache this the smart way, so that the cache doesn't have to be a complete query match, but the values of previous between queries can be held against a new to save some db resources?
I hope you get my question - if not please ask.
Thank you so much
Update 24/01/2011
Now that I've gotten some response I want to know what the most efficient way of querying would be.
Would the Between query with int values execute faster or
would the radius calculation with point values execute faster
if 1. then how would the optimal index look like?

If your table is MyISAM you can use Point datatype (see this answer for more details)
If you are not willing or are not able to use spatial indexes, you should two separate indexes:
CREATE INDEX ix_mytable_lat_lon ON mytable (lat, lon)
CREATE INDEX ix_mytable_lon_lat ON mytable (lon, lat)
In this case, MySQL can use an index_intersect over these indexes which is sometimes faster than mere filtering with a single index.
Even if it does not, it can pick a more selective index if there are two of those.
As for the caching, all pages read from the indexes are cached and reside in memory until they will be overwritten with hotter data (it not all database fits to the cache).
This will prevent MySQL from the need to read the data from disk.
MySQL is also able to cache the whole resultsets in memory, however, this requires the query to be repeated verbatim, with all parameters exactly the same.

I think to do significantly better you'll need to characterize your data better. If you've got data that's uniformly distributed across longitude and latitude, with no correlation, and if your queries are similarly distributed and independent - you're stuck. But if your data or your queries cluster in interesting ways, you may find that you can introduce new columns that make at least some queries quicker. If most queries happen within some hard range, maybe you can set that data aside - add a flag, link it to some other table, even put the frequently-requested data into its own table. Can you tell us any more about the data?

Related

sql indexing and performance tuning

I am new to sql but i am learning, I have a forum were people can search for a location using radius and latitudes, the site is loading pretty slow would indexing the search fields be appropriate? I have read that indexs should not be used on table that under go updates and creates yet I do not know what other things I can do to improve performance. I was thinking of maybe creating an identical table with the same information and using that only for searches with an index.
"Select * from threads where latitudes >= #ThirtyLatMin AND #ThirtyLatMax >= latitudes AND longitudes >= #ThirtyLonMin AND #ThirtyLonMax>=longitudes order by activities desc"

Any modern RDBMS should be able to handle a few indexes on a given table; while it's true that index maintenance has a cost during table INSERTs and UPDATEs, that cost is generally outweighed by appropriate index utilization during a SELECT statement (and other queries).
Some general tips, based on your question (and sample query):
Only return the data you need; using SELECT * in a query is usually a red flag when performance tuning, because it means the coder didn't take the time to spell out what was needed by the calling application. If you're returning a bunch of data that you're not using, you're holding your performance back.
Once you've specified the columns to be returned, try to "cover" them with an index; in other words, make an index column that contains all of the information necessary to satisfy the query. Note that this becomes a balancing act when dealing with multiple different queries (different columns) that hit the same table. Too many indexes on a table increases the cost (see header), and may eventually outweigh the benefits. Note that different RDBMS's have different ways of dealing with this; for example, Microsoft SQL Server uses an INCLUDE syntax to represent broad, multi-column indexes in a covering capacity without significant overhead.
Index not only the columns returned, but make the first column of your indexes be the columns used in your WHERE clause. From the example above, I'd put an index that started with latitude and longitude and included the remaining necessary columns.
Whenever possible, move sorting out of the database. SORTS are expensive operations, and unless you're using it to assist in filtering data (i.e, TOP N searches), you'll probably see an improvement by moving it out of SQL.

I have read that indexs should not be used on table that under go updates and creates
I would love to see where you have read that because that is incorrect. Indexes are crucial to performance.
Please go read http://use-the-index-luke.com for a good introduction to indexes in SQL.

How to improve query performance

I have a lot of records in table. When I execute the following query it takes a lot of time. How can I improve the performance?
SET ROWCOUNT 10
SELECT StxnID
,Sprovider.description as SProvider
,txnID
,Request
,Raw
,Status
,txnBal
,Stxn.CreatedBy
,Stxn.CreatedOn
,Stxn.ModifiedBy
,Stxn.ModifiedOn
,Stxn.isDeleted
FROM Stxn,Sprovider
WHERE Stxn.SproviderID = SProvider.Sproviderid
AND Stxn.SProviderid = ISNULL(#pSProviderID,Stxn.SProviderid)
AND Stxn.status = ISNULL(#pStatus,Stxn.status)
AND Stxn.CreatedOn BETWEEN ISNULL(#pStartDate,getdate()-1) and ISNULL(#pEndDate,getdate())
AND Stxn.CreatedBy = ISNULL(#pSellerId,Stxn.CreatedBy)
ORDER BY StxnID DESC
The stxn table has more than 100,000 records.
The query is run from a report viewer in asp.net c#.

This is my go-to article when I'm trying to do a search query that has several search conditions which might be optional.
http://www.sommarskog.se/dyn-search-2008.html
The biggest problem with your query is the column=ISNULL(#column, column) syntax. MSSQL won't use an index for that. Consider changing it to (column = #column AND #column IS NOT NULL)

You should consider using the execution plan and look for missing indexes. Also, how long it takes to execute? What is slow for you?
Maybe you could also not return so many rows, but that is just a guess. Actually we need to see your table and indexes plus the execution plan.
Check sql-tuning-tutorial

For one, use SELECT TOP () instead of SET ROWCOUNT - the optimizer will have a much better chance that way. Another suggestion is to use a proper inner join instead of potentially ending up with a cartesian product using the old style table,table join syntax (this is not the case here but it can happen much easier with the old syntax). Should be:
...
FROM Stxn INNER JOIN Sprovider
ON Stxn.SproviderID = SProvider.Sproviderid
...
And if you think 100K rows is a lot, or that this volume is a reason for slowness, you're sorely mistaken. Most likely you have really poor indexing strategies in place, possibly some parameter sniffing, possibly some implicit conversions... hard to tell without understanding the data types, indexes and seeing the plan.

There are a lot of things that could impact the performance of query. Although 100k records really isn't all that many.
Items to consider (in no particular order)
Hardware:
Is SQL Server memory constrained? In other words, does it have enough RAM to do its job? If it is swapping memory to disk, then this is a sure sign that you need an upgrade.
Is the machine disk constrained. In other words, are the drives fast enough to keep up with the queries you need to run? If it's memory constrained, then disk speed becomes a larger factor.
Is the machine processor constrained? For example, when you execute the query does the processor spike for long periods of time? Or, are there already lots of other queries running that are taking resources away from yours...
Database Structure:
Do you have indexes on the columns used in your where clause? If the tables do not have indexes then it will have to do a full scan of both tables to determine which records match.
Eliminate the ISNULL function calls. If this is a direct query, have the calling code validate the parameters and set default values before executing. If it is in a stored procedure, do the checks at the top of the s'proc. Unless you are executing this with RECOMPILE that does parameter sniffing, those functions will have to be evaluated for each row..
Network:
Is the network slow between you and the server? Depending on the amount of data pulled you could be pulling GB's of data across the wire. I'm not sure what is stored in the "raw" column. The first question you need to ask here is "how much data is going back to the client?" For example, if each record is 1MB+ in size, then you'll probably have disk and network constraints at play.
General:
I'm not sure what "slow" means in your question. Does it mean that the query is taking around 1 second to process or does it mean it's taking 5 minutes? Everything is relative here.
Basically, it is going to be impossible to give a hard answer without a lot of questions asked by you. All of these will bear out if you profile the queries, understand what and how much is going back to the client and watch the interactions amongst the various parts.
Finally depending on the amount of data going back to the client there might not be a way to improve performance short of hardware changes.

Make sure Stxn.SproviderID, Stxn.status, Stxn.CreatedOn, Stxn.CreatedBy, Stxn.StxnID and SProvider.Sproviderid all have indexes defined.
(NB -- you might not need all, but it can't hurt.)

I don't see much that can be done on the query itself, but I can see things being done on the schema :
Create an index / PK on Stxn.SproviderID
Create an index / PK on SProvider.Sproviderid
Create indexes on status, CreatedOn, CreatedBy, StxnID

Something to consider: When ROWCOUNT or TOP are used with an ORDER BY clause, the entire result set is created and sorted first and then the top 10 results are returned.
How does this run without the Order By clause?

Database/SQL: How to store longitude/latitude data?

Performance question ...
I have a database of houses that have geolocation data (longitude & latitude).
What I want to do is find the best way to store the locational data in my MySQL (v5.0.24a) using InnoDB database-engine so that I can perform a lot of queries where I'm returning all the home records that are between x1 and x2 latitude and y1 and y2 longitude.
Right now, my database schema is
---------------------
Homes
---------------------
geolat - Float (10,6)
geolng - Float (10,6)
---------------------
And my query is:
SELECT ...
WHERE geolat BETWEEN x1 AND x2
AND geolng BETWEEN y1 AND y2
Is what I described above the best way to store the
latitude and longitude data in MySQL using Float (10,6) and separating out the longitude/latitude? If not, what is? There exist Float, Decimal and even Spatial as a data type.
Is this the best way to perform the
SQL from a performance standpoint? If not, what is?
Does using a different MySQL
database-engine make sense?
UPDATE: Still Unanswered
I have 3 different answers below. One person say to use Float. One person says to use INT. One person says to use Spatial.
So I used MySQL "EXPLAIN" statement to measure the SQL execution speed. It appears that absolutely no difference in SQL execution (result set fetching) exist if using INT or FLOAT for the longitude and latitude data type..
It also appears that using the "BETWEEN" statement is SIGNIFICANTLY faster than using the ">" or "<" SQL statements. It's nearly 3x faster to use "BETWEEN" than to use the ">" and "<" statement.
With that being said, I still am unceratin on what the performance impact would be if using Spatial since it's unclear to me if it's supported with my version of MySQL running (v5.0.24) ... as well as how I enable it if supported.
Any help would be greatly appreacited

float(10,6) is just fine.
Any other convoluted storage schemes will require more translation in and out, and floating-point math is plenty fast.

I know you're asking about MySQL, but if spatial data is important to your business, you might want to reconsider. PostgreSQL + PostGIS are also free software, and they have a great reputation for managing spatial and geographic data efficiently. Many people use PostgreSQL only because of PostGIS.
I don't know much about the MySQL spatial system though, so perhaps it works well enough for your use-case.

The problem with using any other data type than "spatial" here is that your kind of "rectangular selection" can (usually, this depends on how bright your DBMS is - and MySQL certainly isn't generally the brightest) only be optimised in one single dimension.
The system can pick either the longitude index or the latitude index, and use that to reduce the set of rows to inspect. But after it has done that, there is a choice of : (a) fetching all found rows and scanning over those and test for the "other dimension", or (b) doing the similar process on the "other dimension" and then afterwards matching those two result sets to see which rows appear in both. This latter option may not be implemented as such in your particular DBMS engine.
Spatial indexes sort of do the latter "automatically", so I think it's safe to say that a spatial index will give the best performance in any case, but it may also be the case that it doesn't significantly outperform the other solutions, and that it's just not worth the bother. This depends on all sorts of things like the volume of and the distribution in your actual data etc. etc.
It is certainly true that float (tree) indexes are by necessity slower than integer indexes, because of the longer time it usually takes to execute '>' on floats than it does on integers. But I would be surprised if this effect were actually noticeable.

Google uses float(10,6) in their "Store locator" example. That's enough for me to go with that.
https://stackoverflow.com/a/5994082/1094271
Also, starting MySQL 5.6.x, spatial extensions support is much better and comparable to PostGIS in features and performance.

I would store it as integers (int, 4-bytes) represented in 1/1,000,000th degrees. That would give you a resolution of few inches.
I don't think there is any intrinsic spatial datatype in MySQL.

Float (10,6)
Where is latitude or longitude 5555.123456?
Don't you mean Float(9,6) instead?

I have the exact same schema (float(10,6)) and query (selecting inside a rectangle) and I found that switching the db engine from innoDB to myisam doubled the speed for a "point in rectangle look-up" in a table with 780,000 records.
Additionally, I converted all lng/lat values to cartesian integers (x,y) and created a two-column index on the x,y and my speed went from ~27 ms to 1.3 ms for the same look-up.

It really depends on how you are using the data. But in a gross over-simplification of the facts, decimal is faster but less accurate in aproximations. More info here:
http://msdn.microsoft.com/en-us/library/aa223970(SQL.80).aspx
Also, The standard for GPS coordinates is specified in ISO 6709:
http://en.wikipedia.org/wiki/ISO_6709

I know probably you would have moved past this problem. I just wanted to add another approach to this question, in case someone is looking to store geolocation data.
You could encode latitude and longitude information into a geohash. Since they are prefixed searchable to a required degree of precision. It seems you can convert your query to a start and end prefix and do a prefix search with LIKE query.

Sql optimization Techniques

I want to know optimization techniques for databases that has nearly 80,000 records,
list of possibilities for optimizing
i am using for my mobile project in android platform
i use sqlite,i takes lot of time to retreive the data
Thanks

Well, with only 80,000 records and assuming your database is well designed and normalized, just adding indexes on the columns that you frequently use in your WHERE or ORDER BY clauses should be sufficient.
There are other more sophisticated techniques you can use (such as denormalizing certain tables, partitioning, etc.) but those normally only start to come into play when you have millions of records to deal with.
ETA:
I see you updated the question to mention that this is on a mobile platform - that could change things a bit.
Assuming you can't pare down the data set at all, one thing you might be able to do would be to try to partition the database a bit. The idea here is to take your one large table and split it into several smaller identical tables that each hold a subset of the data.
Which of those tables a given row would go into would depend on how you choose to partition it. For example, if you had a "customer_id" field that could range from 0 to 10,000 you might put customers 0 - 2500 in table1, 2,500 - 5,000 in table2, etc. splitting the one large table into 4 smaller ones. You would then have logic in your app that would figure out which table (or tables) to query to retrieve a given record.
You would want to partition your data in such a way that you generally only need to query one of the partitions at a time. Exactly how you would partition the data would depend on what fields you have and how you are using them, but the general idea is the same.

Create indexes
Delete indexes
Normalize
DeNormalize

80k rows isn't many rows these days. Clever index(es) with queries that utlise these indexes will serve you right.

Learn how to display query execution maps, then learn to understand what they mean, then optimize your indices, tables, queries accordingly.

Such a wide topic, which does depend on what you want to optimise for. But the basics:
indexes. A good indexing strategy is important, indexing the right columns that are frequently queried on/ordered by is important. However, the more indexes you add, the slower your INSERTs and UPDATEs will be so there is a trade-off.
maintenance. Keep indexes defragged and statistics up to date
optimised queries. Identify queries that are slow (using profiler/built-in information available from SQL 2005 onwards) and see if they could be written more efficiently (e.g. avoid CURSORs, used set-based operations where possible
parameterisation/SPs. Use parameterised SQL to query the db instead of adhoc SQL with hardcoded search values. This will allow better execution plan caching and reuse.
start with a normalised database schema, and then de-normalise if appropriate to improve performance
80,000 records is not much so I'll stop there (large dbs, with millions of data rows, I'd have suggested partitioning the data)

You really have to be more specific with respect to what you want to do. What is your mix of operations? What is your table structure? The generic advice is to use indices as appropriate but you aren't going to get much help with such a generic question.
Also, 80,000 records is nothing. It is a moderate-sized table and should not make any decent database break a sweat.

First of all, indexes are really a necessity if you want a well-performing database.
Besides that, though, the techniques depend on what you need to optimize for: Size, speed, memory, etc?

One thing that is worth knowing is that using a function in the where statement on the indexed field will cause the index not to be used.
Example (Oracle):
SELECT indexed_text FROM your_table WHERE upper(indexed_text) = 'UPPERCASE TEXT';

How do you optimize tables for specific queries?

What are the patterns you use to determine the frequent queries?
How do you select the optimization factors?
What are the types of changes one can make?

This is a nice question, if rather broad (and none the worse for that).
If I understand you, then you're asking how to attack the problem of optimisation starting from scratch.
The first question to ask is: "is there a performance problem?"
If there is no problem, then you're done. This is often the case. Nice.
On the other hand...
Determine Frequent Queries
Logging will get you your frequent queries.
If you're using some kind of data access layer, then it might be simple to add code to log all queries.
It is also a good idea to log when the query was executed and how long each query takes. This can give you an idea of where the problems are.
Also, ask the users which bits annoy them. If a slow response doesn't annoy the user, then it doesn't matter.
Select the optimization factors?
(I may be misunderstanding this part of the question)
You're looking for any patterns in the queries / response times.
These will typically be queries over large tables or queries which join many tables in a single query. ... but if you log response times, you can be guided by those.
Types of changes one can make?
You're specifically asking about optimising tables.
Here are some of the things you can look for:
Denormalisation. This brings several tables together into one wider table, so in stead of your query joining several tables together, you can just read one table. This is a very common and powerful technique. NB. I advise keeping the original normalised tables and building the denormalised table in addition - this way, you're not throwing anything away. How you keep it up to date is another question. You might use triggers on the underlying tables, or run a refresh process periodically.
Normalisation. This is not often considered to be an optimisation process, but it is in 2 cases:
updates. Normalisation makes updates much faster because each update is the smallest it can be (you are updating the smallest - in terms of columns and rows - possible table. This is almost the very definition of normalisation.
Querying a denormalised table to get information which exists on a much smaller (fewer rows) table may be causing a problem. In this case, store the normalised table as well as the denormalised one (see above).
Horizontal partitionning. This means making tables smaller by putting some rows in another, identical table. A common use case is to have all of this month's rows in table ThisMonthSales, and all older rows in table OldSales, where both tables have an identical schema. If most queries are for recent data, this strategy can mean that 99% of all queries are only looking at 1% of the data - a huge performance win.
Vertical partitionning. This is Chopping fields off a table and putting them in a new table which is joinned back to the main table by the primary key. This can be useful for very wide tables (e.g. with dozens of fields), and may possibly help if tables are sparsely populated.
Indeces. I'm not sure if your quesion covers these, but there are plenty of other answers on SO concerning the use of indeces. A good way to find a case for an index is: find a slow query. look at the query plan and find a table scan. Index fields on that table so as to remove the table scan. I can write more on this if required - leave a comment.
You might also like my post on this.

That's difficult to answer without knowing which system you're talking about.
In Oracle, for example, the Enterprise Manager lets you see which queries took up the most time, lets you compare different execution profiles, and lets you analyze queries over a block of time so that you don't add an index that's going to help one query at the expense of every other one you run.

Your question is a bit vague. Which DB platform?
If we are talking about SQL Server:
Use the Dynamic Management Views. Use SQL Profiler. Install the SP2 and the performance dashboard reports.
After determining the most costly queries (i.e. number of times run x cost one one query), examine their execution plans, and look at the sizes of the tables involved, and whether they are predominately Read or Write, or a mixture of both.
If the system is under your full control (apps. and DB) you can often re-write queries that are badly formed (quite a common occurrance), such as deep correlated sub-queries which can often be re-written as derived table joins with a little thought. Otherwise, you options are to create covering non-clustered indexes and ensure that statistics are kept up to date.

For MySQL there is a feature called log slow queries
The rest is based on what kind of data you have and how it is setup.

In SQL server you can use trace to find out how your query is performing. Use ctrl + k or l
For example if u see full table scan happening in a table with large number of records then it probably is not a good query.
A more specific question will definitely fetch you better answers.

If your table is predominantly read, place a clustered index on the table.

My experience is with mainly DB2 and a smattering of Oracle in the early days.
If your DBMS is any good, it will have the ability to collect stats on specific queries and explain the plan it used for extracting the data.
For example, if you have a table (x) with two columns (date and diskusage) and only have an index on date, the query:
select diskusage from x where date = '2008-01-01'
will be very efficient since it can use the index. On the other hand, the query
select date from x where diskusage > 90
would not be so efficient. In the former case, the "explain plan" would tell you that it could use the index. In the latter, it would have said that it had to do a table scan to get the rows (that's basically looking at every row to see if it matches).
Really intelligent DBMS' may also explain what you should do to improve the performance (add an index on diskusage in this case).
As to how to see what queries are being run, you can either collect that from the DBMS (if it allows it) or force everyone to do their queries through stored procedures so that the DBA control what the queries are - that's their job, keeping the DB running efficiently.

indices on PKs and FKs and one thing that always helps PARTITIONING...

1. What are the patterns you use to determine the frequent queries?
Depends on what level you are dealing with the database. If you're a DBA or a have access to the tools, db's like Oracle allow you to run jobs and generate stats/reports over a specified period of time. If you're a developer writing an application against a db, you can just do performance profiling within your app.
2. How do you select the optimization factors?
I try and get a general feel for how the table is being used and the data it contains. I go about with the following questions.
Is it going to be updated a ton and on what fields do updates occur?
Does it have columns with low cardinality?
Is it worth indexing? (tables that are very small can be slowed down if accessed by an index)
How much maintenance/headache is it worth to have it run faster?
Ratio of updates/inserts vs queries?
etc.
3. What are the types of changes one can make?
-- If using Oracle, keep statistics up to date! =)
-- Normalization/De-Normalization either one can improve performance depending on the usage of the table. I almost always normalize and then only if I can in no other practical way make the query faster will de-normalize. A nice way to denormalize for queries and when your situation allows it is to keep the real tables normalized and create a denormalized "table" with a materialized view.
-- Index judiciously. Too many can be bad on many levels. BitMap indexes are great in Oracle as long as you're not updating the column frequently and that column has a low cardinality.
-- Using Index organized tables.
-- Partitioned and sub-partitioned tables and indexes
-- Use stored procedures to reduce round trips by applications, increase security, and enable query optimization without affecting users.
-- Pin tables in memory if appropriate (accessed a lot and fairly small)
-- Device partitioning between index and table database files.
..... the list goes on. =)
Hope this is helpful for you.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas