Searching by Zip Code proximity - MySql - sql

I'm having some trouble getting a search by zip code proximity query working. I've search and searched google but everything I find is either way too slow or I can't get working. Here's the issue:
I have a database with a table with all the US Zip Codes (~70,500 of them), and I have a table of several thousand stores (~10,000+), which includes their zip code. I need to be able to provide a zip code and return a list of the closest stores to that zip code, sorted by distance.
Can anyone point me to a good resource for this that they have used and can handle this much load, or share a query they've used that works and is fairly quick about it? It would be MUCH appreciated. Thanks!

You should build a table that has each zip code with an associated latitude and longitude. When someone enters a zip and a distance, you calculate the range of latitudes and longitudes that fall within it, and then select all the zip codes that fall within that bounding box. Then you select any stores that have zip codes within that set, and calculate their distance from the provided zip and sort by it. (Use the haversine formula for calculating the distance between points on a globe)
If speed is your main concern, you might want to precompute all the distances. Have a table that contains a store zip code column, the other zip code, and a distance column. You can restrict the other zip codes to zip codes within a certain distance (say 100 miles, or what have you) if you need to cut down on rows. If you don't restrict the links based on distance, you'll have a table with > 700 million rows, but you could certainly do fast lookups.

Related

What is the industry standard way to store country / state / city in a database of web APP?

For country and state, there are ISO numbers. With City, there is not.
Method 1:
Store in one column:
[Country ISO]-[State ISO]-[City Name]
Method 2:
Store in 3 separate columns.
Also, how to handle city names if there is no unique identifier?
First and foremost, three separate columns to keep your data. If you want to create a unique identifier, the easiest way would be giving a random 3-10 digit code depending on the size of your data set. However, I would suggest concatenating [country-code]-[state-code]-[code] if you have a small data set and if you want human readability to a certain point. code can be several things. Here are some ideas:
of course a random id or even a database row id
licence plate number/code if there is for a city
phone area code of the city or the code of the center
same logic may apply to zip codes
combination of latitude and longitude of the city center up to certain degree
Here are also more references that can be used:
ISO 3166 is a country codes. In there you can find codes for states or cities depending on the country.
As mentioned IATA has both Airport and City codes list but they are hard to obtain.
UN Location list is a good mention but it can be difficult to gather the levels of differentiation, like the airport code or city code or a borough code can be on the same list, but eventually the UN/LOCODE must be unique. (Airport codes are used for ICAO, similar to IATA but not the same)
there are several data sets out there like OpenTravelData or GeoNames that can be used for start but may require digging and converting. They provide unique codes for locations. And many others can be found.
Bonus:
I would suggest checking Schema.org's City Schema and other Place Schemas for a conscious setup.

How to populate all possible combination of values in columns, using Spark/normal SQL

I have a scenario, where my original dataset looks like below
Data:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
Image of the above csv:
Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB:
I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present).
So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values.
The original dataset from the above has to be converted into below:
Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae):
Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh
US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F
US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F
US,Vegetable,2010,Production,6.48,T,F,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
US,Vegetable,2011,Harvested,6,T,F,F,F,F,F
US,Vegetable,2011,Yield,18,T,F,F,F,F,F
US,Vegetable,2011,Production,3,T,F,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F
US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F
Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F
Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F
Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F
Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F
Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F
Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F
Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F
Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
The image of the above expected output data for a structured look at it:
Part 1 -
Part 2 -
Formulae for populating Amount Field for Derived Type:
Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns.
So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries.
Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question.
Any help shall be greatly acknowledged.

Advice to create a little script to associate near shipping address in the next 3 weeks

Hi guys I'm searching an advice on how to proceed to create a script that can calculate the distance between all shipping address in the next 3 weeks and if the distance between some of these locations is under 50 km it must send an email to make that thing notice to the operator. I'm starting from sql 2005 to take location and make all combination. Then I was thinking if there is some online service that, with a link structure where I can put the two location name, can retrieve the distance between those two point, then store it in different variable to use to make comparisons. I'll wait for some advice.thank you guys
Your script will probably take in 2 locations as parameters and you can determine the distance using Google Maps Matrix to get your distances from one place to another. Using that info you should be able to send an email using the script.
I'm not really sure what a database has to do with this so I recommend being more specific on your SQL implementation.

search Big data table

I have a table with 10 million records. Each record indicates one person. Each record has person_id, latitude, longitude, postal-code. I want to pick one query and tell how many other people in 10 miles radius (Distance can be calculated from Latitudes and Longitudes). Searching 10 million records and calculating distance to check if inside 10 million is not a good way. So, I will search only in neighboring postal codes(I will get it somehow). How can I search entry having specific postal code(not all 10 million records)?
Why not take lat/long and create a box extending 10 miles in all four directions first?
Then issue a query looking for people with lat/long in that box. Use a WHERE that does
x > xLess10 and x < xPlus10 and y > yLess10 and y < yPlus10
Now you have a smaller list and you can calculate the actual distance with something similar to sqrt((x1 - x2)^2 + (y1 - y2)^2) for that smaller list. But it has to work on a sphere, not a grid marked off in miles.
You can try adding a and zip in (555555, 555556, etc) to see if that runs faster or not. A precomputed list of all other zip codes with locations within 10 miles of anywhere within a zip code would be pretty easy to set up in another table.
#Randy made a comment that made me realize that this doesn't work very well for locations within 10 miles of the north and south poles. Maybe that doesn't matter because the population is pretty small up there. Or use another method of just getting everyone within a cirle around the pole and 10 miles south (or north) or the x,y location.
Also, you have to find a way to convert from lat/long to miles. The longitudinal lines get closer together the farther you are from the equator.

Get geo-locations from cities

I am trying to display a list of closest restaurants of a given city within a radius. In order to do this, I'll have to convert the city to a longitude/latitude.
When the user fills in his/her restaurant information, he/she will fill in the address. Based on that address i need to get a longitude/latitude and save that to the database.
I don't need to point this on a map. I'll be just displaying a list of all closest restaurants within the given radius to the user.
How can I do this with ASP.NET-MVC4?
Also, this project is based on Code First. And for address I have DbGeography as datatype set.
The project is kind of based on this tutorial
Edit
To avoid anyone else telling me this is not possible with ASP.NET on it's own, I am aware of that. an example with Google maps / .NET wrapper would be great.
This query itself is typically done in the database back-end because the program is the consumer of the data not the keeper of the data. You will have a table of restaurants, possibly for many cities, each restaurant having a latitude and longitude. You create a geospatial index on the table, which performs a tesselation for performance; you configure the tesselation parameters. You must determine the latitude/longitude of your restaurant-seeker's location. Your command object would feed that data to a stored procedure, along with the desired radius. The stored procedure would return an enumerable set of rows to your client program.
I found this .NET wrapper which fulfills my need.