search Big data table - sql

I have a table with 10 million records. Each record indicates one person. Each record has person_id, latitude, longitude, postal-code. I want to pick one query and tell how many other people in 10 miles radius (Distance can be calculated from Latitudes and Longitudes). Searching 10 million records and calculating distance to check if inside 10 million is not a good way. So, I will search only in neighboring postal codes(I will get it somehow). How can I search entry having specific postal code(not all 10 million records)?

Why not take lat/long and create a box extending 10 miles in all four directions first?
Then issue a query looking for people with lat/long in that box. Use a WHERE that does
x > xLess10 and x < xPlus10 and y > yLess10 and y < yPlus10
Now you have a smaller list and you can calculate the actual distance with something similar to sqrt((x1 - x2)^2 + (y1 - y2)^2) for that smaller list. But it has to work on a sphere, not a grid marked off in miles.
You can try adding a and zip in (555555, 555556, etc) to see if that runs faster or not. A precomputed list of all other zip codes with locations within 10 miles of anywhere within a zip code would be pretty easy to set up in another table.
#Randy made a comment that made me realize that this doesn't work very well for locations within 10 miles of the north and south poles. Maybe that doesn't matter because the population is pretty small up there. Or use another method of just getting everyone within a cirle around the pole and 10 miles south (or north) or the x,y location.
Also, you have to find a way to convert from lat/long to miles. The longitudinal lines get closer together the farther you are from the equator.

Related

GPS creating anchor point by trip mode and time

I was hoping for some help:
I have 200 participants, each with their own GPS file. Each GPS file has 1000s of points, one for every 15 second epoch. I have already cleaned the data through PALMS and have brought the file into arcmap. I would like to merge multiple points that share the same location and trip mode type (e.g., stationary, location number 1, see table attached). The main difference is that the merging needs to be sensitive to time (e.g., there might be 4 points at location 1 as that person visited the location on 4 separate occasions). I don't have tracking analyst, so any help would be much appreciated! The end game is that I need to create 1km anchor points based on places where the track stayed in one location for a large amount of time (e.g., a minimum of two hours, at least three times per week). Could someone suggest how I might do this?
Thanking you.
Erika
enter image description here
enter image description here
enter image description here

Can you graph a time-series with continuously changing entries?

How do I graphically represent a time-series where entries of this graph change over time?
For example I have a database of cities and their corresponding average temperature per day. I want a graphical representation of the ten hottest cities per day, and how they change over time. Do new cities appear on this list? Which cities drop out of this list?
Normally 6/10 of these cities will always be on the “top ten hottest” list, but sometimes a particular entry may spike up and join the top ten list. Is there a way to analyze the top ten list and compare it over time?
I’m having trouble thinking of a way to graph this because of the varying entries.
Your x-axis is day, but what's on the y-axis? Temperature? If so, you can have a different series (may be called something else depending on your charting package) for each city, and just add points to the series when it is one of the top ten for that day. This may require you to do some pre-processing on your data, in order to figure out which set of cities makes the top-ten list over your time frame.
In one of our widget implementations we have a setting for the time chart to display only top-n series, ranked by avg or last value. This was done to remove clutter by hiding too many unimportant series from the chart.
In your case why not show a bar for each period where the bar would contain top-N series for the period and a grey area for the remainder?

Ordering a list SQL or Excel

I have a simple task to do at work almost daily and I think it can be done easily with some help. There is a table with a single column "PC name". I have to divide the list of PC's into waves.
Wave 1 : 2%
wave 2: 3%
wave 3: 25%
wave 4: 45%
wave 5: 25%
So what I usually do is to copy the list of PC's into Excel and add a column named "wave assign". So for example if the list is 100 PC's first two PC's will be assign to Wave 1, three PCs towave 2, 25 PCs to wave 3 and so on.
I need a way to automate this since it takes me too long to do it manually. It doesn't matter if there is a small change in the % in order to round up the number of PCs in each wave.
Assuming the list is in ColumnA starting in Row1:
=VLOOKUP(ROWS(A$1:A1)/COUNTA(A:A),wArray,2)
in Row1 and copied down should work, provided a lookup array of the following kind is created:
and named wArray.
In case the list is shorter than 100 I have added .002 to the 'logical' breakpoints (cumulative proportions) so that it is not the minority waves that are rounded down such that, at say 50 items, Wave 1 does not feature (and hence stand out rather more than an approximation in a larger group).

Ordering Dimension Hierarchy SSAS

I'm quite new to SSAS so bear with me!
I have created a snowflake schema with Members in the Fact table and I have create a distance from club table with DistanceID,Distance,DistanceRange (this is denormalised in SQL Server with distance range appearing multiple times per distance. e.g Distance 1 has a range of 1 - 10 and Distance 2 also has a range of 1 - 10
I have then created a hierarchy with Distance Range at the top and Distance beneath it. This works OK in terms of providing drill down functionality but the ordering is wrong for distance range. It is ordering them by Distance Range as a string so I get 1-10 followed by 100-10 and then 20-30.
How do I tell the Distance Range to order by Distance ID
Not sure if I'm doing it right.
When you are editing your Dimension, click on the attribute DistanceRange and in the properties, there should be an option to 'OrderBy' and 'OrderByAttribute'. Try and use those the get the result you need. Otherwise, you might want to try change the 'Type' in the properties menu and see if that works.

Searching by Zip Code proximity - MySql

I'm having some trouble getting a search by zip code proximity query working. I've search and searched google but everything I find is either way too slow or I can't get working. Here's the issue:
I have a database with a table with all the US Zip Codes (~70,500 of them), and I have a table of several thousand stores (~10,000+), which includes their zip code. I need to be able to provide a zip code and return a list of the closest stores to that zip code, sorted by distance.
Can anyone point me to a good resource for this that they have used and can handle this much load, or share a query they've used that works and is fairly quick about it? It would be MUCH appreciated. Thanks!
You should build a table that has each zip code with an associated latitude and longitude. When someone enters a zip and a distance, you calculate the range of latitudes and longitudes that fall within it, and then select all the zip codes that fall within that bounding box. Then you select any stores that have zip codes within that set, and calculate their distance from the provided zip and sort by it. (Use the haversine formula for calculating the distance between points on a globe)
If speed is your main concern, you might want to precompute all the distances. Have a table that contains a store zip code column, the other zip code, and a distance column. You can restrict the other zip codes to zip codes within a certain distance (say 100 miles, or what have you) if you need to cut down on rows. If you don't restrict the links based on distance, you'll have a table with > 700 million rows, but you could certainly do fast lookups.