How to query supplementary relationship in graph? - cypher

I have a graph database where every node is having a relationship intersects with the nodes that it intersects. Additionally, the intersects has a property degrees (angle at which the intersection of these two nodes happen).
I want to order the nodes which intersect a given node at a specified angle.
Since the intersection between two nodes is supplementary, I am saving only one relationship between the two nodes (example, (a)-[:intersects]->(b) and (b)-[:intersects]->(a) are similar things, but the angle will be different for the two, and this angle is supplementary). For example, if I need all cities intersecting City A at 45 degrees angle, the two queries are:
GRAPH.QUERY Nearby "MATCH (a:City { name: 'A' })-[t:intersects]->(b:City) return b.name, b.degrees ORDER BY abs(t.degrees - 45)"
GRAPH.QUERY Nearby "MATCH (a:City { name: 'A' })<-[t:intersects]-(b:City) return b.name, b.degrees ORDER BY abs(abs(t.degrees - 180) - 45)"
I want to combine these two queries in one (avoid the sorting in the client library). Is it possible to do it in redis graph? I was going to try (a)-[:intersects]-(b) and use the case...when but seems like the support to get the src node is not available yet.
Any help would be really appreciated.

Consider the following:
MATCH (a:City {name: 'A'})-[t0:intersects]->(b:City)
WITH a, collect({x:b, angle:abs(t0.degrees - 45)}) AS a_to_b
MATCH (a)<-[t1:intersects]-(c:City)
WITH a_to_b, collect({x:c, angle:abs(abs(t1.degrees - 180) - 45)}) AS c_to_a
WITH a_to_b + c_to_a AS intersections
UNWIND intersections AS city
RETURN city.x ORDER BY city.angle

Related

How to find distance between two points using latitude and longitude

I have a ROUTES table which has columns SOURCE_AIRPORT and DESTINATION_AIRPORT and describes a particular route that an airplane would take to get from one to the other.
I have an AIRPORTS table which has columns LATITUDE and LONGITUDE which describes an airports geographic position.
I can join the two tables using columns which they both share called SOURCE_AIRPORT_ID and DESTINATION_AIRPORT_ID in the routes table, and called IATA in the airports table (a 3 letter code to represent an airport such as LHR for London Heathrow).
My question is, how can I write an SQL query using all of this information to find, for example, the longest route out of a particular airport such as LHR?
I believe I have to join the two tables, and for every row in the routes table where the source airport is LHR, look at the destination airport's latitude and longitude, calculate how far away that is from LHR, save that as a field called "distance", and then order the data by the highest distance first. But in terms of SQL syntax i'm at a loss.
This would have probably been a better question for the Mathematics Stack Exchange, but I’ll provide some insight here. If you are relatively farmiliar with trigonometry, I’m sure you could understand the implementation given this resource: https://en.m.wikipedia.org/wiki/Haversine_formula. You are looking to compute the distance between two point on the surface of a sphere in terms of their distance across its surface (not a straight line, you can’t travel through the Earth).
The page displays this formula:
https://wikimedia.org/api/rest_v1/media/math/render/svg/a65dbbde43ff45bacd2505fcf32b44fc7dcd8cc0
Where
• φ1, φ2 are the latitude of point 1 and latitude of point 2 (in radians),
• λ1, λ2 are the longitude of point 1 and longitude of point 2 (in radians).

If you data is in degrees, you can simply convert to radians by multiplying by pi/180
There is a formula called great circle distance to calculate distance between two points. You probably can load is as a library for your operating system. Forget the haversine, our planet is not a perfect sphere.
If you use this value often, save it in your routes table.
I think you're about 90% there in terms of the solution method. I'll add in some additional detail regarding a potential SQL query to get your answer. So there's 2 steps you need to do to calculate the distances - step 1 is to create a table containing joining the ROUTES table to the AIRPORTS table to get the latitude/longitude for both the SOURCE_AIRPORT and DESTINATION_AIRPORT on the route. This might look something like this:
SELECT t1.*, CONVERT(FLOAT, t2.LATITUDE) AS SOURCE_LAT, CONVERT(FLOAT, t2.LONGITUDE) AS SOURCE_LONG, CONVERT(FLOAT, t3.LATITUDE) AS DEST_LAT, CONVERT(FLOAT, t3.LONGITUDE) AS DEST_LONG, 0.00 AS DISTANCE_CALC
INTO ROUTE_CALCULATIONS
FROM ROUTES t1 LEFT OUTER JOIN AIRPORTS t2 ON t1.SOURCE_AIRPORT_ID = t2.IATA
LEFT OUTER JOIN AIRPORTS t3 ON t1.DESTINATION_AIRPORT_ID = t3.IATA;
The resulting output should create a new table titled ROUTE_CALCULATIONS made up of all the ROUTES columns, the longitude/latitude for both the SOURCE and DESTINATION airports, and a placeholder DISTANCE_CALC column with a value of 0.
Step 2 is calculating the distance. This should be a relatively straightforward calculation and update.
UPDATE ROUTE_CALCULATIONS
SET DISTANCE_CALC = 2 * 3961 * asin(sqrt((sin(radians((DEST_LAT- SOURCE_LAT) / 2))) ^ 2 + cos(radians(SOURCE_LAT)) * cos(radians(DEST_LAT)) * (sin(radians((DEST_LONG- SOURCE_LONG) / 2))) ^ 2))
And that should give the calculated distance in the DISTANCE_CALC table for all routes seen in the data. From there you should be able to do whatever distance-related route analysis you want.

This is the query I am trying to run in neo4j but it takes too long to run:

I am trying to run this query using Neo4j but it takes too long (more than 30 min, for almost 2500 nodes and 1.8 million relationships) to run:
Match (a:Art)-[r1]->(b:Art)
with collect({start:a.url,end:b.url,score:r1.ed_sc}) as row1
MATCH (a:Art)-[r1]->(b:Art)-[r2]->(c:Art)
Where a.url<>c.url
with row1 + collect({start:a.url,end:c.url,score:r1.ed_sc*r2.ed_sc}) as row2
Match (a:Art)-[r1]->(b:Art)-[r2]->(c:Art)-[r3]->(d:Art)
WHERE a.url<>c.url and b.url<>d.url and a.url<>d.url
with row2+collect({start:a.url,end:d.url,score:r1.ed_sc*r2.ed_sc*r3.ed_sc}) as allRows
unwind allRows as row
RETURN row.start as start ,row.end as end , sum(row.score) as final_score limit 10;
Here :Art is the label under which there are 2500 nodes, and there are bidirectional relationships between these nodes which has a property called ed_sc. So basically I am trying to find the score between two nodes by traversing one, two and three degree paths, and then sum these scores.
Is there a more optimized way to do this?
For one I'd discourage use of bidirectional relationships. If your graph is densely connected, this kind of modeling will play havoc on most queries like this.
Assuming url is unique for each :Art node, it would be better to compare the nodes themselves rather than their properties.
We should also be able to use variable-length relationships in place of your current approach:
MATCH p = (start:Art)-[*..3]->(end:Art)
WHERE all(node in nodes(p) WHERE single(t in nodes(p) where node = t))
WITH start, end, reduce(score = 1, rel in relationships(p) | score * rel.ed_sc) as score
WITH start, end, sum(score) as final_score
LIMIT 10
RETURN start.url as start, end.url as end, final_score

How to concatenate the result set of multiple queries in orientDB without using UNIONALL

I have a student node which has the following edges:
Student - HasLocation - Location (HasLocation Edge from Student node to Location node)
Student-HasSchool-School
Student - HasUniversity - University
Student - HasInterest -Interest
Student -LookingFor - Type (Full Time or Intership)
Similarly, we have Companies providing internships which have the following edges:
Company - HasLocation - Location
Company - HasInterest - Interest
Commpany - HasType - Type (Full Time or Intership)
Now I want to write a query to match all the conditions i.e. the location, type interest etc. and then want to remove type and have the conditions as location, interest, etc.
As of for now I have written seperate queries for them and then take UNIONALL and then take the distinct of them. In order to achieve time efficiency, I have not been able to apply pagination either.
Is their any better way to achieve this in OrientDB?
Note: I am not using the concept of temp variable assigning it the number of conditions matching and then ordering by it as it will take immense amount of time by traversing to each and every node and then assigning a temp value.

geolocating self join too slow

I am trying to get the count of all records within 50 miles of each record in a huge table (1m + records), using self join as shown below:
proc sql;
create table lab as
select distinct a.id, sum(case when b.value="New York" then 1 else 0 end)
from latlon a, latlon b
where a.id <> b.id
and geodist(a.lat,a.lon,b.lat,b.lon,"M") <= 50
and a.state = b.state;
This ran for 6 hours and was still running when i last checked.
Is there a way to do this more efficiently?
UPDATE: My intention is to get the number of new yorkers in a 50 mile radius from every record identified in table latlon which has name, location and latitude/longitude where lat/lon could be anywhere in the world but location will be a person's hometown. I have to do this for close to a dozen towns. Looks like this is the best it could get. I may have to write a C code for this one i guess.
The geodist() function you're using has no chance of exploiting any index. So, you have an algorithn that's O(n**2) at best. That's gonna be slow.
You can take advantage of a simple fact of spherical geometry, though, to get access to an indexable query. A degree of latitude (north - south) is equivalent to sixty nautical miles, 69 statute miles, or 111.111 km. The British definition of nautical mile was originally equal to a minute. The original Napoleonic meter was defined as one part in ten thousand of the distance from the equator to the pole, also defined as 90 degrees.
(These defintions depend on the assumption that the earth is spherical. It isn't, quite. If you're a civil engineer these definitions break down. If you use them to design a parking lot, it will have some nasty puddles in it when it rains, and will encrooach on the neighbors' property.)
So, what you want is to use a bounding range. Assuming your latitude values a.lat and b.lat are in degrees, two of them are certainly more than fifty statute miles apart unless
a.lat BETWEEN b.lat - 50.0/69.0 AND b.lat + 50.0/69.0
Let's refactor your query. (I don't understand the case stuff about New York so I'm ignoring it. You can add it back.) This will give the IDs of all pairs of places lying within 50 miles of each other. (I'm using the 21st century JOIN syntax here).
select distinct a.id, b.id
from latlon a
JOIN latlon b ON a.id<>b.id
AND a.lat BETWEEN b.lat - 50.0/69.0 AND b.lat + 50.0/69.0
AND a.state = b.state
AND geodist(a.lat,a.lon,b.lat,b.lon,"M") <= 50
Try creating an index on the table on the lat column. That should help performance a LOT.
Then try creating a compound index on (state, lat, id, lon, value). Try those columns in the compound index in different orders, if you don't get satisfactory performance acceleration. It's called a covering index, because the some of its columns (the first two in this case) are used for quick lookups and the rest are used to provide values that would otherwise have to be fetched from the main table.
Your question is phrased ambiguously - I'm interpreting it as "give me all (A, B) city pairs within 50 miles of each other." The NYC special case seems to be for a one-off test - the problem is not to (trivially, in O(n) time) find all cities within 50 miles of NYC.
Rather than computing Great Circle distances, find Manhattan distances instead, using simple addition, and simple bounding boxes. Given (A, B) city tuples with Manhattan distance less than 50 miles, it is straightforward to prune out the few (on diagonals) that have Great Circle (or Euclidean) distance less than 50 miles.
You didn't show us EXPLAIN output describing the backend optimizer's plan.
You didn't tell us about indexes on the latlon table.
I'm not familiar with the SAS RDBMS. Oracle, MySQL, and others have geospatial extensions to support multi-dimensional indexing. Essentially, they merge high-order coordinate bits, down to low-order coordinate bits, to construct a quadtree index. The technique could prove beneficial to your query.
Your DISTINCT keyword will make a big difference for the query plan. Often it will force a tablescan and a filesort. Consider deleting it.
The equijoin on state seems wrong, but maybe you don't care about the tri-state metropolitan area and similar densely populated regions near state borders.
You definitely want the WHERE clause to prune out b rows that are more than 50 miles from the current a row:
too far north, OR
too far south, OR
too far west, OR
too far east
Each of those conditionals boils down to a simple range query that the RDBMS backend can evaluate and optimize against an index. Unfortunately, if it chooses the latitude index, any longitude index that's on disk will be ignored, and vice versa. Which motivates using your vendor's geospatial support.

find number of polygons a line intersects

I am working with SQL Server and spatial types. Have found the very interesting. I have reached a situation and I am not sure how I might work it out. Let me outline
I have 2 points (coordinates) eg. 3,9 and 50, 20 - this is a straight line, joining such.
I have multiple polygons (50 +).
I want to be able to calculate how many polygons that the above line passes through. What I mean by pass through, when I join the 2 coordinates, how many polygons the line intersects? I want to work this out with a SQL query.
Please let me know if not clear - it is difficult to explain!
Based on your coordinates, I'm assuming geometry (as opposed to geography), but the approach should hold regardless. If you have a table dbo.Shapes that has a Shape column of type geometry and each row contains one polygon, this should do:
declare #line geometry = geometry::STLineFromText('LINESTRING(3 9, 50 20)', 0);
select count(*)
from dbo.Shapes as s
where s.shape.STIntersects(#line) = 1;
If you want to know which polygons intersect, just change the count(*) to something more appropriate.