Creating a Hive view

Creating a Hive view - sql

I have a Hive UDF named find_distance which calculates the coordinate distance between a pair of lat-long coordinates.
I also have a table containing a list of city names and their respective lat-long coordinates.
So currently if I need to find the distance between two cities, say Denver and San Jose, I need to perform a self join:
Select find_Distance(cityA.latitude, cityA.longitude, cityB.latitude, cityB.longitude) from
(select latitude, longitude from city_data.map_info where city = 'Denver') cityA
join
(select latitude, longitude from city_data.map_info where city = 'San Jose') cityB;
How would I go about building a view that would accept just the city names as parameters? So in effect I could just use
SELECT distance from city_distance where cityA = 'Denver' and cityB = 'San Jose'

Try this VIEW:
CREATE VIEW city_distance AS
SELECT
cityA.city as city_from,
cityA.city as city_to,
find_Distance(cityA.latitude, cityA.longitude, cityB.latitude, cityB.longitude) as distance
FROM
(SELECT city, latitude, longitude FROM city_data.map_info) cityA
CROSS JOIN
(SELECT city, latitude, longitude FROM city_data.map_info) cityB;

Related

PostGIS - Assign value based on ST_Intersects and ST_Area

I have two data sets, US County boundaries (tiger.counties2020) and parcel data (temp.parcel). I would like to assign a county geoid to the parcel data based on a majority of the parcel polygon being within a county boundary. In some cases these parcel polygons overlap multiple counties, but I would like to assign the geoid to the county that the parcel has most of its area in.
UPDATE
temp.parcel
SET
geoid_majority = upd.geoid
FROM
tiger.counties2020 upd
WHERE
upd.geoid = '00000'
AND ST_Intersects(temp.parcel.geom_polygon, upd.geom) = TRUE
AND 51 <= 100 *(
ST_Area(
ST_Intersection(temp.parcel.geom_polygon, upd.geom)
) / LEAST(
ST_Area(temp.parcel.geom_polygon),
ST_Area(upd.geom)
)
)
AND geoid_majority IS NULL;
is what has yielded me the most results, however I still have parcels with a NULL value for geoid_majority

SQL in R: HAVING condition with only the condition of one row?

I am learning to use SQL in R.
I want to select cities that are more northern than Berlin from a dataset.
For this, I have tried:
sql4 = "SELECT Name, Latitude FROM city HAVING Latitude > Latitude(Berlin)"
result4 = dbSendQuery(con, sql4)
df4 = dbFetch(result4)
head(df4)
and
sql4 = "SELECT Name, Latitude FROM city HAVING Latitude > (Name IN 'Berlin')"
result4 = dbSendQuery(con, sql4)
df4 = dbFetch(result4)
head(df4)
Neither syntax works unfortunatley.
So my question is: How do I select all cities "north from Berlin", i.e. latitude value higher than that of the Name row 'Berlin'? Is there a different, better approach?

Assuming Berlin occur at most once in the city table, you may use:
SELECT Name, Latitude
FROM city
WHERE Latitude > (SELECT Latitude FROM city WHERE Name = 'Berlin');
You want to be using WHERE here to filter, rather than HAVING. HAVING is for filtering aggregates when using GROUP BY, which you are not using.

You cannot actually use Latitude(Berlin), I think. I typically use something like this:
SELECT Name, Latitude FROM city WHERE Latitude = (SELECT Latitude from city WHERE Name = "Berlin")
Hope this helps.
-Suhas

Complex SQL query for a large record set

I have 3 tables. Lets start by explaining the first one
tblDistance: (airport1, airport2, distance) // airport1 and airport 2 are airport codes
This table contains the distances in miles between all airports of America there are a total of 3745 airports and the distances were calculated using a nested for loop and with each loop the counter was decremented. So for the 1st airport we calculated 3744 distances. For the second we calculated 3743 distances as we have already calculated its distance in the first loop with the first airport. Now lets say the first airport was Animas Air Park(K00C) and the second aiport is Broadus Airport(K00F). The records would appear in tblDistance as
(KOOC, other3744aiports, distance)
For second airport
(K00C, K00F, distance) //This one record has been already calculated in 1st iteration of the loop
(KOOF, other3743aiports, distance)
So except for the 1st airport if we want to find all the distances for a particular airport lets say K00F we need a union query given below.
(SELECT * FROM tblDistances WHERE tblDistances.airport1 = 'K00F')
UNION ALL
(SELECT * FROM tblDistances WHERE tblDistances.airport2 = 'K00F');
I hope I have explained it clearly. Now lets come to the other 2 tables. They are called tblHave and tblNeed
tblHave: (departure, departCode, arrival, arrivalCode, flightDate)
tblNeed: (departure, departCode, arrival, arrivalCode, flightDate)
Departure is the name of the airport from which the flight will depart and the departCode(K00C, K00F) is the code of the airport and same goes for arrival and arrivalCode.
Assume that we have a flight from (departure) San Francisco Intl (KSFO) to (arrival) South Bend Rgnl (KSBN) in the tblNeed. Now comes the real problem we have to find all the flights in the tblHave that are
On the same date as the given flight and
Departure airport is (KSFO) or within 500 miles of San Francisco Intl (KSFO) using union as explined above (lets call it qryDepart) AND
Arrival airport is (KSBN) or within 500 miles of South Bend Rgnl (KSBN) using union as explined above (lets call it qryArrival)
Sample qryArrival
SELECT tblDistances.airport2 as nearBy
FROM tblDistances
WHERE tblDistances.airport1 = 'KSFO' AND (((Abs([tblDistances].[distance]))<=500))
UNION ALL SELECT tblDistances.airport1 as nearBy
FROM tblDistances
WHERE tblDistances.airport2 = 'KSFO' AND (((Abs([tblDistances].[distance]))<=500));
I cannot figure out how can I find this and also the total no.of distance commbinations for all airports is more than 7 million. The records are in Access database. What I have figure is that I find nearby departure airports and nearby arrival airports from tblDistances and then use the IN clause to find the final results
Select * from tblHave where arrivalCode IN (qryArrival) AND departCode IN (qryDepart) AND Date = #dd/mm/yyyy#;
this is not working and the union takes too much time as the no of records is very large.

You don't need to use UNION here. You can do it in one query which should cut your execution time in half at least since you won't be checking every record twice. You can use a nested iif statement to determine which field to use for nearBy, and then change your WHERE to check both fields of the record. Like this:
SELECT
iif(
tblDistances.airport = 'KSFO',
tblDistances.airport2,
iif(tblDistances.airport2 = 'KSFO',
tblDistances.airport1,
null)
) as nearBy
FROM tblDistances
WHERE
(
tblDistances.airport1 = 'KSFO'
OR tblDistances.airport2 = 'KSFO'
)
AND (((Abs([tblDistances].[distance]))<=500))
Which would be much easier to read if you used a CASE statement, but Access doesn't support CASE. The above query does the same thing as:
SELECT
CASE
WHEN tblDistances.airport1 = 'KSFO' then tblDistances.airport2
WHEN tblDistances.airport2 = 'KSFO' then tblDistances.airport1
END as nearBy
FROM tblDistances
WHERE
(
tblDistances.airport1 = 'KSFO'
OR tblDistances.airport2 = 'KSFO'
)
AND (((Abs([tblDistances].[distance]))<=500))

SQL Query to find 5 mile radius of a given zip code

I'm currently using Redshift and I have a table that has a list of zip codes along with their latitudes and longitudes. I'm trying to write a sql statement where I can specify a given zip code and have it return all of the zip codes within a 5 mile radius.
Any ideas on how I can approach this?
Here's what I had tried:
SELECT zip, city, latitude, longitude,
69.0* DEGREES(ACOS(COS(RADIANS(latpoint))
* COS(RADIANS(latitude))
* COS(RADIANS(longpoint) - RADIANS(longitude))
+ SIN(RADIANS(latpoint))
* SIN(RADIANS(latitude)))) AS distance_in_miles
FROM zip_code_db
JOIN (
SELECT 42.81 AS latpoint, -70.81 AS longpoint
) AS p ON 1=1
ORDER BY distance_in_miles
I'm trying to see if there is a way to use zip codes instead of specifying latitues and longitudes

Oracle Spatial Geometry covered by the most

I have a table which contains a number of geometries. I am attempting to extract the one which is most covered by another geometry.
This is best explained with pictures and code.
Currently I am doing this simple spatial query to get any rows that spatially interact with a passed in WKT Geometry
SELECT ID, NAME FROM MY_TABLE WHERE
sdo_anyinteract(geom,
sdo_geometry('POLYGON((400969 95600,402385 95957,402446 95579,400905 95353,400969 95600))',27700)) = 'TRUE';
Works great, returns a bunch of rows that interact in any way with my passed in geometry.
What I preferably want though is to find which one is covered most by my passed in geometry. Consider this image.
The coloured blocks represent 'MY_TABLE'. The black polygon over the top represents my passed in geometry I am searching with. The result I want returned from this is Polygon 2, as this is the one that is most covered by my polygon. Is this possible? Is there something I can use to pull the cover percentage in and order by that or a way of doing it that simply returns just that one result?
--EDIT--
Just to supplement the accepted answer (which you should go down and give an upvote as it is the entire basis for this) this is what I ended up with.
SELECT name, MI_PRINX,
SDO_GEOM.SDO_AREA(
SDO_GEOM.SDO_INTERSECTION(
GEOM,
sdo_geometry('POLYGON((400969.48717156524 95600.59583240788,402385.9445972018 95957.22742049221,402446.64806962677 95579.91508788493,400905.95874489535 95353.03765349534,400969.48717156524 95600.59583240788))',27700)
,0.005
)
,0.005) AS intersect_area
FROM LIFE_HEATHLAND WHERE sdo_anyinteract(geom, sdo_geometry('POLYGON((400969.48717156524 95600.59583240788,402385.9445972018 95957.22742049221,402446.64806962677 95579.91508788493,400905.95874489535 95353.03765349534,400969.48717156524 95600.59583240788))',27700)) = 'TRUE'
ORDER BY INTERSECT_AREA DESC;
This returns me all the results that intersect my query polygon with a new column called INTERSECT_AREA, which provides the area. I can then sort this and pick up the highest number.

Just compute the intersection between each of the returned geometries and your query window (using SDO_GEOM.SDO_INTERSECTION()), compute the area of each such intersection (using SDO_GEOM.SDO_AREA()) and return the row with the largest area (order the results in descending order of the computed area and only retain the first row).
For example, the following computes how much space Yellowstone National Park occupies in each state it covers. The results are ordered by area (descending).
SELECT s.state,
sdo_geom.sdo_area (
sdo_geom.sdo_intersection (
s.geom, p.geom, 0.5),
0.5, 'unit=sq_km') area
FROM us_states s, us_parks p
WHERE SDO_ANYINTERACT (s.geom, p.geom) = 'TRUE'
AND p.name = 'Yellowstone NP'
ORDER by area desc;
Which returns:
STATE AREA
------------------------------ ----------
Wyoming 8100.64988
Montana 640.277886
Idaho 154.657145
3 rows selected.
To only retain the row with the largest intersection do:
SELECT * FROM (
SELECT s.state,
sdo_geom.sdo_area (
sdo_geom.sdo_intersection (
s.geom, p.geom, 0.5),
0.5, 'unit=sq_km') area
FROM us_states s, us_parks p
WHERE SDO_ANYINTERACT (s.geom, p.geom) = 'TRUE'
AND p.name = 'Yellowstone NP'
ORDER by area desc
)
WHERE rownum = 1;
giving:
STATE AREA
------------------------------ ----------
Wyoming 8100.64988
1 row selected.
The following variant also returns the percentage of the park's surface in each intersecting state:
WITH p AS (
SELECT s.state,
sdo_geom.sdo_area (
sdo_geom.sdo_intersection (
s.geom, p.geom, 0.5),
0.5, 'unit=sq_km') area
FROM us_states s, us_parks p
WHERE SDO_ANYINTERACT (s.geom, p.geom) = 'TRUE'
AND p.name = 'Yellowstone NP'
)
SELECT state, area,
RATIO_TO_REPORT(area) OVER () * 100 AS pct
FROM p
ORDER BY pct DESC;
If you want to return the geometry of the intersections, just include that into your result set.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Creating a Hive view - sql

Related

PostGIS - Assign value based on ST_Intersects and ST_Area

SQL in R: HAVING condition with only the condition of one row?

Complex SQL query for a large record set

SQL Query to find 5 mile radius of a given zip code

Oracle Spatial Geometry covered by the most

Categories

Resources