SQL Server 2005 Scalar UDF performance - sql-server-2005

I have a table where I'm storing Lat/Long coordinates, and I want to make a query where I want to get all the records that are within a distance of a certain point.
This table has about 10 million records, and there's an index over the Lat/Long fields
This does not need to be precise. Among other things, I'm considering that 1 degree Long == 1 degree Lat, which I know is not true, but the ellipse I'm getting is good enough for this purpose.
For my examples below, let's say the point in question is [40, 140], and my radius, in degrees, is 2 degrees.
I've tried this 2 ways:
1) I created a UDF to calculate the Square of the Distance between 2 points, and I'm running that UDF in a query.
SELECT Lat, Long FROM Table
WHERE (Lat BETWEEN 38 AND 42)
AND (Long BETWEEN 138 AND 142)
AND dbo.SquareDistance(Lat, Long, 40, 140) < 4
I'm filtering by a square first, to speed up the query and let SQL use the index, and then refining that to match only the records that fall within the circle with my UDF.
2) Run the query to get the square (same as before, but without the last line), feed ALL those records to my ASP.Net code, and calculate the circle in the ASP.Net side (same idea, calculate the square of the distance to save the Sqrt call, and compare to the square of my radius).
To my suprise, calculating the circle in the .Net side is about 10 times faster than using the UDF, which leads me to believe that I'm doing something horribly wrong with that UDF...
This is the code I'm using:
CREATE FUNCTION [dbo].[SquareDistance]
(#Lat1 float, #Long1 float, #Lat2 float, #Long2 float)
RETURNS float
AS
BEGIN
-- Declare the return variable here
DECLARE #Result float
DECLARE #LatDiff float, #LongDiff float
SELECT #LatDiff = #Lat1 - #Lat2
SELECT #LongDiff = #Long1 - #Long2
SELECT #Result = (#LatDiff * #LatDiff) + (#LongDiff * #LongDiff)
-- Return the result of the function
RETURN #Result
END
Am I missing something here?
Shouldn't using a UDF within SQL Server be much faster than feeding about 25% more records than necessary to .Net, with the overhead of the DataReader, the communication between processes and whatnot?
Is there something I'm doing horribly wrong in that UDF that makes it run slow?
Is there any way to improve it?
Thank you very much!

You can improve the performance of this UDF by NOT declaring variables and doing your calculations more in-line. This will likely improve performance a little but (but probably not much).
CREATE FUNCTION [dbo].[SquareDistance]
(#Lat1 float, #Long1 float, #Lat2 float, #Long2 float)
RETURNS float
AS
BEGIN
Return ( SELECT ((#Lat1 - #Lat2) * (#Lat1 - #Lat2)) + ((#Long1 - #Long2) * (#Long1 - #Long2)))
END
Even better would be to remove the function and put the calculations in the original query.
SELECT Lat, Long FROM Table
WHERE (Lat BETWEEN 38 AND 42)
AND (Long BETWEEN 138 AND 142)
AND ((Lat - 40) * (Lat - 40)) + ((Long - 140) * (Long - 140)) < 4
There is a little bit of overhead with calling a user defined function. By removing the function, you are likely to gain a little in performance.
Also, I encourage you to check your execution plan just to make sure you are getting index seeks like you expect.

There is a lot of overhead in using a UDF.
Even coding it in-line may not be good because an index can not be used, although here the BETWEEN clauses should reduce the data that needs crunched.
To extend G Mastros' idea, separate the select bit from the square bit. It may help the optimiser.
SELECT
Lat, Long
FROM
(
SELECT
Lat, Long
FROM
Table
WHERE
(Lat BETWEEN 38 AND 42)
AND
(Long BETWEEN 138 AND 142)
) foo
WHERE
((Lat - 40) * (Lat - 40)) + ((Long - 140) * (Long - 140)) < 4
Edit: You may be able to reduce the actual calculations involved.
This next idea may reduce the number of calcs from 7 to 5
...
SELECT
Lat, Long,
Lat - 40 AS LatDiff, Long - 140 AS LongDiff
FROM
...
(LatDiff * LatDiff) + (LongDiff * LongDiff) < 4
...
Basically, try the 3 solutions offered and see what works.
The optimiser may ignore the derived table, it may use it, or it may generate an even worse plan.

Check this article that describes why UDF in SQL Server are generically speaking a bad idea. Unless you're pretty sure the table you're invoking the UDF will not grow up a lot beware that UDF functions are always called on ALL the rows in your tables and not (as one can wrongly guess) only on resultset. This can give you a big performance hit when database grow.
The very good article linked details also some ways to overcome the problem but the real fact is that the SQL Server TSQL dialect misses a way to create a scalar function or a deterministic one (like Oracle does).

Updates:
GMastros: You were absolutely right. Doing the math in the query itself is infinitely faster than the UDF. I'm using the SQUARE() function to do the multiplication, which makes it a bit more concise, but performance is the same.
However, doing it this way is still twice as slow as doing the math in .Net.
I can't really understand that, but i've come to a compromise that is useful for my particular situation (which sucks, because I need to duplicate code, but it's the best scenario, unless we can find a way to make the circle calculation in SQL be faster)
Thanks!

Related

Redshift numeric precision truncating

I have encountered situation that I can't explain how Redshift handles division of SUMs.
There is example table:
create table public.datatype_test(
a numeric(19,6),
b numeric(19,6));
insert into public.datatype_test values(222222.2222, 333333.3333);
insert into public.datatype_test values(444444.4444, 666666.6666);
Now I try to run query:
select sum(a)/sum(b) from public.datatype_test;
I get result 0.6666 (4 decimals). It is not related to tool display, it really returns only 4 decimal places, and it doesn't matter how big or small numbers are in table. In my case 4 decimals is not precise enough.
Same stands true if I use AVG instead of SUM.
If I use MAX instead of SUM, I get : 0.6666666666666666666 (19 decimals).
It also returns correct result (0.6666666666666667) when no phisical table is used:
with t as (
select 222222.2222::numeric(19,6) as a, 333333.3333::numeric(19,6) as b union all
select 444444.4444::numeric(19,6) as a, 666666.6666::numeric(19,6) as b
)
select sum(a)/sum(b) as d from t;
I have looked into Redshift documentation about SUM and Computations with Numeric Values, but I still don't get result according to documentation.
Using float datatype for table columns is not an option as I need to store precise currency amounts and 15 significant digits is not enough.
Using cast on SUM aggregation also gives 0.6666666666666666666 (19 decimals).
select sum(a)::numeric(19,6)/sum(b) from public.datatype_test;
But it looks wrong, and I can't force BI tools to do this workaround, also everyone who uses this data should not use this kind of workaround.
I have tried to use same test in PostgreSQL 10, and it works as it should, returning sufficient amount of decimals for division.
Is there anything I can do with database setup to avoid casting in SQL Query?
Any advice or guidance is highly appreciated.
Redshift version:
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.4081
Using dc2.8xlarge nodes
I have run into similar issues, and although I don't have a solution that doesn't require a workaround, I can at least explain it.
The precision/scale of the result of division is defined by the rules in the "computations with numeric values" document.
A consequence of those rules is that a decimal(19,6) divided by another decimal(19,6) will return decimal(38,19).
What's happening to you, though, is that MAX returns the same precision/scale as the underlying column, but SUM returns decimal(38,*) no matter what.
(This is probably a safety precaution to prevent overflow on sums of "big data"). If you divide decimal(38,6) by another, you get decimal(38,4).
AWS support will probably not consider this a defect -- there is no SQL standard for how to treat decimal precision in division, and given that this is documented behavior, it's probably a deliberate decision.
The only way to address this is to typecast the numerator, or multiply it by something like sum(a) * cast(1 as decimal(10,9)) which is portable SQL and will force more decimal places in the numerator and thus the result.
As a convenience I made a calculator in JSFiddle with the rules so you can play around with different options:
scale = Math.max(4, s1 + p2 - s2 + 1)
precision = p1 - s1 + s2 + scale
if (precision > 38) {
scale = Math.max((38 + scale - precision), 4)
precision = 38
}

Optimizing SQL WHERE for calculating distance between latitude and longitude positions

I'm creating a database that is being hosted on a MS SQL 2012 server. The primary function of this database is to return results that is within a certain distance from an origin. Locations are being stored as latitude / longitude.
By reading here on Stack Overflow i found a very nice way to query the database for exactly what i am looking for and it works like a charm! However I'm thinking of a possible way to optimize this.
Original SQL query
DECLARE #orig_lat DECIMAL(12, 9)
DECLARE #orig_lng DECIMAL(12, 9)
SET #orig_lat=56.xxxxxx
SET #orig_lng=14.xxxxxx
DECLARE #orig geography = geography::Point(#orig_lat, #orig_lng, 4326);
SELECT *
FROM foobar
WHERE #orig.STDistance(geography::Point(foobar.latitude, foobar.longitude, 4326)) < 2000
My guess is that this query does a linear search of the foobar table only returning the matching columns. However since this table contains positions all over the world I want to know if I can help the database by reducing the amount of rows it has to run the distance calculation on. My guess is that this calculation is heavy for the server.
I know the origin of the request being made and I also know that the maximum distance between the points will never be larger than lets say 100km.
Hypothesis
Since I know that I don't have to search the whole world only up to 100km from point of origin I can improve upon the WHERE statement as seen below. By creating a minimum and maximum bound for the latitude and longitude that is done by moving the position by some number in each direction.
I explain:
Origin latitude 56.xxxxxx
Min latitude 55.xxxxxx
Max latitude 57.xxxxxx
Origin longitude 14.xxxxxx
Min longitude 13.xxxxxx
Max longitude 15.xxxxxx
By doing this I create a zone around the origin reaching about 126km. By adding this to the WHERE statement I first make sure the requested position is within the correct bounds. After that I run the distance calculation to get exact distance. The distance calculation is now only run against the rows that is within the min and max bounds instead of the whole world.
Optimization proposal
DECLARE #orig_lat DECIMAL(12, 9)
DECLARE #orig_lng DECIMAL(12, 9)
DECLARE #orig_latMin DECIMAL(12, 9)
DECLARE #orig_latMax DECIMAL(12, 9)
DECLARE #orig_lngMin DECIMAL(12, 9)
DECLARE #orig_lngMax DECIMAL(12, 9)
SET #orig_lat=56.xxxxxx
SET #orig_lng=14.xxxxxx
SET #orig_latMin=55.xxxxxx
SET #orig_latMax=57.xxxxxx
SET #orig_lngMin=13.xxxxxx
SET #orig_lngMax=15.xxxxxx
DECLARE #orig geography = geography::Point(#orig_lat, #orig_lng, 4326);
SELECT *
FROM foobar
WHERE ([latitude] > #orig_latMin
AND [latitude] < #orig_latMax
AND [longitude] > #orig_lngMin
AND [longitude] < #orig_lngMax)
AND #orig.STDistance(geography::Point(foobar.latitude, foobar.longitude, 4326)) < 2000
I don't know database implementation details but does this improve the query or does it make it worse? My guess is that it depends on how the WHERE statement actually work and in what order it dose things. My hope is that the boundary checks will be run before the distance calculation in order to reduce the amount of time a distance calculation is done.
EDIT
Just implemented the suggested index proposal with the following results.
Without indexing:
With optimized statement have a cost of 0,025352
Without optimized statement have a cost of 0,025323
With indexing:
With optimized statement have a cost of 0,0104057
Without optimized statement have a cost of 0,0253234
A good rule of thumb is that the execution time of a database query depends on the number of disk pages that have to be read. The CPU time can usually be ignored.
According to this rule, your proposed optimization will improve the execution time if it makes a difference for the number of disk pages. This will be the case if there is an index on latitude and longitude that will allow to skip many table rows and therefore many disk pages. If that's the case, the optimizer will certainly evalute that part of the WHERE clause before the distance.
If there's no index that helps with those two columns, I doubt you will see a big difference.
You can analyze the query time using MS Management Studio, run a big query with differents where's, it will even show you what part of the query requires which amount of time.
You can click CTRL+L: Display estimated executionplan
or CTRL+M: display actual execution plan (when you run it)
Run it once with the "boundaries" first, then again with boundaries after.
you will be able to see which is slower, then try again without the boundaries.
If you don't have enough data the difference might not be visible.

How does the Average function work in relational databases?

I'm trying to find geometric average of values from a table with millions of rows. For those that don't know, to find the geometric average, you mulitply each value times each other then divide by the number of rows.
You probably already see the problem; The number multiplied number will quickly exceed the maximum allowed system maximum. I found a great solution that uses the natural log.
http://timothychenallen.blogspot.com/2006/03/sql-calculating-geometric-mean-geomean.html
However that got me to wonder wouldn't the same problem apply with the arithmetic mean? If you have N records, and N is very large the running sum can also exceed the system maximum.
So how do RDMS calculate averages during queries?
I don't know an exact implementation for arithmetic mean in an RDBMS, nor did you specify one in your original question. But the RDBMS does not need to sum a million rows in a column in order to obtain the arithmetic mean. Consider the following summation:
sum = (x1 + x2 + x3 + ... + x1000000)
Then the mean can be written as
mean = sum / N = (x1 + x2 + x3 + ... + x1000000) / N, for N = 1,000,000
But this expression can be broken up into pieces like this:
mean = [(x1 + x2 + x3) / N ] + [(x4 + x5 + x6) / N] + ...
In other words, the RDBMS can simply scan down the million rows in a column and find the mean section by section, without running the risk of an overflow. And since each number in the column is presumably within range for the type storing it, there is no chance of the mean value itself overflowing.
Most databases don't support a product() function the way they support an average.
However, you can use do what you want with logs. The product (simplified) is like:
select exp(sum(ln(x)) as product
The average would be:
select power(exp(sum(ln(x))), 1.0 / count(*)) as geoaverage
or
select EXP(AVG(LN(x))) as geoaverage
The LN() function might be LOG() on some platforms...
These are schematics. The functions for exp() and ln() and power() vary, depending on the database. Plus, if you have to take into account zero or negative numbers, the logic is more complicated.
Very easy to check. For example, SQL Server 2008.
DECLARE #T TABLE(i int);
INSERT INTO #T(i) VALUES
(2147483647),
(2147483647);
SELECT AVG(i) FROM #T;
result
(2 row(s) affected)
Msg 8115, Level 16, State 2, Line 7
Arithmetic overflow error converting expression to data type int.
There is no magic. Column type is int, server adds values together using internal variable of the same type int and intermediary result exceeds range for int.
You can run the similar check for any other DBMS that you use. Different engines may behave differently, but I would expect all of them to stick to the original type of the column. For example, averaging two int values 100 and 101 may result in 100 or 101 (still int), but never 100.5.
For SQL Server this behavior is documented. I would expect something similar for all other engines:
AVG () computes the average of a set of values by dividing the sum of
those values by the count of nonnull values. If the sum exceeds the
maximum value for the data type of the return value an error will be
returned.
So, you have to be careful when calculating simple average as well, not just product.
Here is extract from SQL 92 Standard:
6) Let DT be the data type of the < value expression >.
9) If SUM or AVG is specified, then:
a) DT shall not be character string, bit string, or datetime.
b) If SUM is specified and DT is exact numeric with scale S, then the
data type of the result is exact numeric with implementation-defined
precision and scale S.
c) If AVG is specified and DT is exact numeric, then the data type of
the result is exact numeric with implementation- defined precision not
less than the precision of DT and implementation-defined scale not
less than the scale of DT.
d) If DT is approximate numeric, then the data type of the result is
approximate numeric with implementation-defined precision not less
than the precision of DT.
e) If DT is interval, then the data type of the result is inter- val
with the same precision as DT.
So, DBMS can convert int to larger type when calculating AVG, but it has to be an exact numeric type, not floating-point. In any case, depending on the values you can still get arithmetic overflow.
Some DBMS — specifically, the Informix DBMS — convert from an INT type to a floating point type to do the calculation:
SQL[2148]: create table t(i int);
SQL[2149]: insert into t values(214748347);
SQL[2150]: insert into t values(214748347);
SQL[2151]: insert into t values(214748347);
SQL[2152]: select avg(i) from t;
214748347.0
SQL[2153]: types on;
SQL[2154]: select i from t;
INTEGER
214748347
214748347
214748347
SQL[2155]: select avg(i) from t;
DECIMAL(32)
214748347.0
SQL[2156]:
Similarly with other types. This can still end with an overflow under some circumstances; you then get a runtime error. However, it is rather seldom that you exceed the precision — it typically takes a very large number of rows for the sum to exceed the limits, even if you're counting the US deficit over the next century in atto-Zimbabwean dollars circa 2009.

Distance between two coordinates, how can I simplify this and/or use a different technique?

I need to write a query which allows me to find all locations within a range (Miles) from a provided location.
The table is like this:
id | name | lat | lng
So I have been doing research and found: this my sql presentation
I have tested it on a table with around 100 rows and will have plenty more! - Must be scalable.
I tried something more simple like this first:
//just some test data this would be required by user input
set #orig_lat=55.857807; set #orig_lng=-4.242511; set #dist=10;
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((orig.lat - abs(dest.lat)) * pi()/180 / 2), 2)
+ COS(orig.lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((orig.lng - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest, locations orig
WHERE orig.id = '1'
HAVING distance < 1
ORDER BY distance;
This returned rows in around 50ms which is pretty good!
However this would slow down dramatically as the rows increase.
EXPLAIN shows it's only using the PRIMARY key which is obvious.
Then after reading the article linked above. I tried something like this:
// defining variables - this when made into a stored procedure will call
// the values with a SELECT query.
set #mylon = -4.242511;
set #mylat = 55.857807;
set #dist = 0.5;
-- calculate lon and lat for the rectangle:
set #lon1 = #mylon-#dist/abs(cos(radians(#mylat))*69);
set #lon2 = #mylon+#dist/abs(cos(radians(#mylat))*69);
set #lat1 = #mylat-(#dist/69);
set #lat2 = #mylat+(#dist/69);
-- run the query:
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((#mylat - abs(dest.lat)) * pi()/180 / 2) ,2)
+ COS(#mylat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((#mylon - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest
WHERE dest.lng BETWEEN #lon1 AND #lon2
AND dest.lat BETWEEN #lat1 AND #lat2
HAVING distance < #dist
ORDER BY distance;
The time of this query is around 240ms, this is not too bad, but is slower than the last. But I can imagine at much higher number of rows this would work out faster. However anEXPLAIN shows the possible keys as lat,lng or PRIMARY and used PRIMARY.
How can I do this better???
I know I could store the lat lng as a POINT(); but I also haven't found too much documentation on this which shows if it's faster or accurate?
Any other ideas would be happily accepted!
Thanks very much!
-Stefan
UPDATE:
As Jonathan Leffler pointed out I had made a few mistakes which I hadn't noticed:
I had only put abs() on one of the lat values. I was using an id search in the WHERE clause in the second one as well, when there was no need. In the first query was purely experimental the second one is more likely to hit production.
After these changes EXPLAIN shows the key is now using lng column and average time to respond around 180ms now which is an improvement.
Any other ideas would be happily accepted!
If you want speed (and simplicity) you'll want some decent geospatial support from your database. This introduces geospatial datatypes, geospatial indexes and (a lot of) functions for processing / building / analyzing geospatial data.
MySQL implements a part of the OpenGIS specifications although it is / was (last time I checked it was) very very rough around the edges / premature (not useful for any real work).
PostGis on PostgreSql would make this trivially easy and readable:
(this finds all points from tableb which are closer then 1000 meters from point a in tablea with id 123)
select
myvalue
from
tablea, tableb
where
st_dwithin(tablea.the_geom, tableb.the_geom, 1000)
and
tablea.id = 123
The first query ignores the parameters you set - using 1 instead of #dist for the distance, and using the table alias orig instead of the parameters #orig_lat and #orig_lon.
You then have the query doing a Cartesian product between the table and itself, which is seldom a good idea if you can avoid it. You get away with it because of the filter condition orig.id = 1, which means that there's only one row from orig joined with each of the rows in dest (including the point with dest.id = 1; you should probably have a condition AND orig.id != dest.id). You also have a HAVING clause but no GROUP BY clause, which is indicative of problems. The HAVING clause is not relating any aggregates, but a HAVING clause is (primarily) for comparing aggregate values.
Unless my memory is failing me, COS(ABS(x)) === COS(x), so you might be able to simplify things by dropping the ABS(). Failing that, it is not clear why one latitude needs the ABS and the other does not - symmetry is crucial in matters of spherical trigonometry.
You have a dose of the magic numbers - the value 69 is presumably number of miles in a degree (of longitude, at the equator), and 3956 is the radius of the earth.
I'm suspicious of the box calculated if the given position is close to a pole. In the extreme case, you might need to allow any longitude at all.
The condition dest.id = 1 in the second query is odd; I believe it should be omitted, but its presence should speed things up, because only one row matches that condition. So the extra time taken is puzzling. But using the primary key index is appropriate as written.
You should move the condition in the HAVING clause into the WHERE clause.
But I'm not sure this is really helping...
The NGS Online Inverse Geodesic Calculator is the traditional reference means to calculate the distance between any two locations on the earth ellipsoid:
http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl
But above calculator is still problematic. Especially between two near-antipodal locations, the computed distance can show an error of some tens of kilometres !!! The origin of the numeric trouble was identified long time ago by Thaddeus Vincenty (page 92):
http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf
In any case, it is preferrable to use the reliable and very accurate online calculator by Charles Karney:
http://geographiclib.sourceforge.net/cgi-bin/Geod
Some thoughts on improving performance. It wouldn't simplify things from a maintainability standpoint (makes things more complex), but it could help with scalability.
Since you know the radius, you can add conditions for the bounding box, which may allow the db to optimize the query to eliminate some rows without having to do the trig calcs.
You could pre-calculate some of the trig values of the lat/lon of stored locations and store them in the table. This would shift some of the performance cost when inserting the record, but if queries outnumber inserts, this would be good. See this answer for an idea of this approach:
Query to get records based on Radius in SQLite?
You could look at something like geohashing.
When used in a database, the structure of geohashed data has two advantages. ,,, Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes.
You could search SO for some ideas on how to implement:
https://stackoverflow.com/search?q=geohash
If you're only interested in rather small distances, you can approximate the geographical grid by a rectangular grid.
SELECT *, SQRT(POWER(RADIANS(#mylat - dest.lat), 2) +
POWER(RADIANS(#mylon - dst.lng)*COS(RADIANS(#mylat)), 2)
)*#radiusOfEarth AS approximateDistance
…
You could make this even more efficient by storing radians instead of (or in addition to) degrees in your database. If your queries may cross the 180° meridian, some extra care would be neccessary there, but many applications don't have to deal with those locations. You could also try to change POWER(x) to x*x, which might get computed faster.

Optimizing Sqlite query for INDEX

I have a table of 320000 rows which contains lat/lon coordinate points. When a user selects a location my program gets the coordinates from the selected location and executes a query which brings all the points from the table that are near. This is done by calculating the distance between the selected point and each coordinate point from my table row. This is the query I use:
select street from locations
where ( ( (lat - (-34.594804)) *(lat - (-34.594804)) ) + ((lon - (-58.377676 ))*(lon - (-58.377676 ))) <= ((0.00124)*(0.00124)))
group by street;
As you can see the WHERE clause is a simple Pythagoras formula to calculate the distance between two points.
Now my problem is that I can not get an INDEX to be usable. I've tried with
CREATE INDEX indx ON location(lat,lon)
also with
CREATE INDEX indx ON location(street,lat,lon)
with no luck. I've notice that when there is math operation with lat or lon, the index is not being called . Is there any way I can optimize this query for using an INDEX so as to gain speed results?
Thanks in advance!
The problem is that the sql engine needs to evaluate all the records to do the comparison (WHERE ..... <= ...) and filter the points so the indexes don’t speed up the query.
One approach to solve the problem is compute a Minimum and Maximum latitude and longitude to restrict the number of record.
Here is a good link to follow: Finding Points Within a Distance of a Latitude/Longitude
Did you try adjusting the page size? A table like this might gain from having a different (i.e. the largest?) available page size.
PRAGMA page_size = 32768;
Or any power of 2 between 512 and 32768. If you change the page_size, don't forget to vacuum the database (assuming you are using SQLite 3.5.8. Otherwise, you can't change it and will need to start a fresh new database).
Also, running the operation on floats might not be as fast as running it on integers (big maybe), so that you might gain speed if you record all your coordinates times 1 000 000.
Finally, euclydian distance will not yield very accurate proximity results. The further you get from the equator, the more the circle around your point will flatten to ressemble an ellipse. There are fast approximations which are not as calculation intense as a Great Circle Distance Calculation (avoid at all cost!)
You should search in a square instead of a circle. Then you will be able to optimize.
Surely you have a primary key in locations? Probably called id?
Why not just select the id along with the street?
select id, street from locations
where ( ( (lat - (-34.594804)) *(lat - (-34.594804)) ) + ((lon - (-58.377676 ))*(lon - (-58.377676 ))) <= ((0.00124)*(0.00124)))
group by street;