Finding the closest geographic points between two tables BigQuery

Finding the closest geographic points between two tables BigQuery - sql

I have two tables with latitude and longitude points. I would like to create a new table which has information from both tables based on finding the closest points between tables. This is similar to a question previously asked; however one of the tables has arrays. The solution from the previously asked question did not seem to work with arrays.
Table A
|--------|-------------|-------------|-------------|
| id | latitude | longitude | address |
|--------|-------------|-------------|-------------|
| 1 | 39.79 | 86.03 | 123 Vine St |
|--------|-------------|-------------|-------------|
| 2 | 39.89 | 84.01 | 123 Oak St |
|--------|-------------|-------------|-------------|
Table B
|-------------|-------------|-------------|--------------|
| latitude | longitude | parameter1 | parameter2 |
|-------------|-------------|-------------|--------------|
| 39.74 | 86.33 | [1, 2, 3] | [.1, .2, .3] |
|-------------|-------------|-------------|--------------|
| 39.81 | 83.90 | [4, 5, 6] | [.4, .5, .6] |
|-------------|-------------|-------------|--------------|
I would like to create a new table, Table C, which has all the rows from TABLE A and adds the information from Table B. The information from Table B is added based on the closest point in Table B to the particular row in Table A.
Table C
|------|-------------|-------------|--------------|
| id_A | address | parameter1 | parameter2 |
|------|-------------|-------------|--------------|
| 1 | 123 Vine St | [1, 2, 3] | [.1, .2, .3] |
|------|-------------|-------------|--------------|
| 2 | 123 Oak St | [4, 5, 6] | [.4, .5, .6] |
|------|-------------|-------------|--------------|
Thank you in advance!

Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE
ARRAY_AGG(STRUCT(id, address, parameter1, parameter2) ORDER BY ST_DISTANCE(a.point, b.point) LIMIT 1)[OFFSET(0)]
FROM (SELECT *, ST_GEOGPOINT(longitude, latitude) point FROM `project.dataset.tableA`) a,
(SELECT *, ST_GEOGPOINT(longitude, latitude) point FROM `project.dataset.tableB`) b
GROUP BY id
If to apply to sample data from your question
WITH `project.dataset.tableA` AS (
SELECT 1 id, 39.79 latitude, 86.03 longitude, '123 Vine St' address UNION ALL
SELECT 2, 39.89, 84.01, '123 Oak St'
), `project.dataset.tableB` AS (
SELECT 39.74 latitude, 86.33 longitude, [1, 2, 3] parameter1, [.1, .2, .3] parameter2 UNION ALL
SELECT 39.81, 83.90, [4, 5, 6], [.4, .5, .6]
)
output is

Related

SQL Using specific input find corresponding columns and create new summary table

Data is a flat normalised table:
|ID | Product selected | Product Code 1 | Product Code 2 | Product Code 3 | Cost of Product 1 | Cost of Product 2 | Cost of Product 3 | Rate of Product 1 | Rate of Product 2 | Rate of Product 3 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|1 | ABCDEDFHIJKL | AAABBBCCCDDD | ABCDEDFHIJKL | DDDCCCBBBAAA | 995 | 495 | 0 | 4.4 | 6.3 | 7.8 |
|2 | DDDCCCBBBAAA | AAABBBCCCDDD | ABCDEDFHIJKL | DDDCCCBBBAAA | 995 | 495 | 0 | 4.4 | 6.3 | 7.8 |
What:
Using the product selected (ABCDEDFHIJKL), look across the rows to find the corresponding locations of columns with data relating to the product selected.
Desired Output:
| Product selected | Cost of Product | Rate of Product |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| ABCDEDFHIJKL | 495 | 6.3 |
| DDDCCCBBBAAA | 0 | 7.8 |
To do this in R is straight forward, and i'm sure for someone more knowledgable in SQL than I, this will be easy

You can use cross apply:
select t.product_selected, x.cost_of_product, x.rate_of_product
from mytable t
cross apply (values
(product_code_1, cost_of_product_1, rate_of_product_1),
(product_code_2, cost_of_product_2, rate_of_product_2),
(product_code_3, cost_of_product_3, rate_of_product_3)
) as x(product_selected, cost_of_product, rate_of_product)
where x.product_selected = t.product_selected

Either use unpivot or union or crossapply
Unpivot sample
SELECT [Product selected], ProductCode ,ProductCost,ProductRate
FROM
(
SELECT *
FROM dbo.table
) AS cp
UNPIVOT
(
ProductCode FOR PC IN ([Product Code 1], [Product Code 2], [Product Code 3])
) AS up
UNPIVOT
(
ProductCost FOR Po IN ([Cost of Product 1], [Cost of Product 2], [Cost of Product 3])
) AS up2
UNPIVOT
(
ProductRate FOR Pr IN ([Rate of Product 1], [Rate of Product 2], [Rate of Product 3])
) AS up3;

How to fetch records from DB which fulfill a certain criteria

I have the following problem and wanted to ask if this is the correct way to do it or if there is a better way of doing it:
Assume I have the following table/data in my DB:
|---|----|------|-------------|---------|---------|
|id |city|street|street_number|lastname |firstname|
|---|----|------|-------------|---------|---------|
| 1 | ar | K1 | 13 |Davenport| Hector |
| 2 | ar | L1 | 27 |Cannon | Teresa |
| 3 | ar | A1 | 135 |Brewer | Izaac |
| 4 | dc | A2 | 8 |Fowler | Milan |
| 5 | fr | C1 | 18 |Kaiser | Ibrar |
| 6 | fr | C1 | 28 |Weaver | Kiri |
| 7 | ny | O1 | 37 |Petersen | Derrick |
I now get some some requests of the following structures: (city/street/street_number)
E.g.: {(ar,K1,13),(dc,A2,8),(ny,01,37)}
I want to retrieve the last name of the person living there. Since the request amount is quite large I don't want to run over all the request one-by-one. My current implementation is to insert the data into a temporary table and join the values.
Is this the right approach or is there some better way of doing this?

You can construct a query using in with tuples:
select t.*
from t
where (city, street, street_number) in ( (('ar', 'K1', '13'), ('dc', 'A2', '8'), ('ny', '01', '37') );
However, if the data starts in the database, then a temporary table or subquery is better than bringing the results back to the application and constructing such a query.

I think you can use the hierarchy query and string function as follows:
WITH YOUR_INPUT_DATA AS
(SELECT '(ar,K1,13),(dc,A2,8),(ny,01,37)' AS INPUT_STR FROM DUAL),
--
CTE AS
( SELECT REGEXP_SUBSTR(STR,'[^,]',1,2) AS STR1,
REGEXP_SUBSTR(STR,'[^,]',1,3) AS STR2,
REGEXP_SUBSTR(STR,'[^,]',1,4) AS STR3
FROM (SELECT SUBSTR(INPUT_STR,
INSTR(INPUT_STR,'(',1,LEVEL),
INSTR(INPUT_STR,')',1,LEVEL) - INSTR(INPUT_STR,'(',1,LEVEL) + 1) STR
FROM YOUR_INPUT_DATA
CONNECT BY LEVEL <= REGEXP_COUNT(INPUT_STR,'\),\(') + 1))
--
SELECT * FROM YOUR_TABLE WHERE (city,street,street_number)
IN (SELECT STR1,STR2,STR3 FROM CTE);

In Hive, how to combine multiple tables to produce single row containing array of objects?

I have two tables as follows:
users table
==========================
| user_id name age |
|=========================
| 1 pete 20 |
| 2 sam 21 |
| 3 nash 22 |
==========================
hobbies table
======================================
| user_id hobby time_spent |
|=====================================
| 1 football 2 |
| 1 running 1 |
| 1 basketball 3 |
======================================
First question: I would like to make a single Hive query that can return rows in this format:
{ "user_id":1, "name":"pete", "hobbies":[ {hobby: "football", "time_spent": 2}, {"hobby": "running", "time_spent": 1}, {"hobby": "basketball", "time_spent": 3} ] }
Second question: If the hobbies table were to be as follows:
========================================
| user_id hobby scores |
|=======================================
| 1 football 2,3,1 |
| 1 running 1,1,2,5 |
| 1 basketball 3,6,7 |
========================================
Would it be possible to get the row output where scores is a list in the output as shown below:
{ "user_id":1, "name":"pete", "hobbies":[ {hobby: "football", "scores": [2, 3, 1]}, {"hobby": "running", "scores": [1, 1, 2, 5]}, {"hobby": "basketball", "scores": [3, 6, 7]} ] }

I was able to find the answer to my first question
select u.user_id, u.name,
collect_list(
str_to_map(
concat_ws(",", array(
concat("hobby:", h.hobby),
concat("time_spent:", h.time_spent)
))
)
) as hobbies
from users as u
join hobbies as h on u.user_id=h.user_id
group by u.user_id, u.name;

Is it possible to make an sql query that counts from one table and then subtract the number it obtained from another table?

I have an access db that i must use to manage interns and the places they work at. Right now, I have two tables: one for the persons, with their personal detail and a bridge to where they work, and another table with the name of the workplace with the respective boss.
Like so:
(table 1, where the persons are listed)
Cadastro_de_estagiarios
id | Ativo | Nível | Lotação | Nome
1 | Verdadeiro | Superior | 1ª Vara Cível | Marina x
3 | Verdadeiro | Médio | 1ª Vara Cível | Raquel x
and so on...
(table 2, where the locations and bosses are specificated)
Cadastro_de_varas_e_juizes
id | Vara | Juiz responsável | Vagas totais nível superior | Vagas totais nível médio
1 | 1ª Vara Cível | fist boss | 2 | 3
2 | 2ª Vara Cível | sec boss | 2 | 4
3 | 3ª Vara Cível | third boss | 2 | 3
and so on...
To clarify, I have two kinds of interns (nível superior e nível médio), as well as two kinds of job vacancies per workplace. Like this: In 1ª Vara Cível, I can have 2 interns with "superior" and 3 with "médio".
What I need to do is get the info on how many interns are placed on each workplace per job type, and then have a query that tells me how many vacancies I still have per place and type.
I appreciate any help. Thanks!
Translating the tables
table1
id | Active | Education level of intern | Workplace | Name
table2
id | Workplace | Boss | Vacancies for college students | Vacancies for high school students

This should give you a starting point:
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE table1 (
id INT,
Active VARCHAR(50),
Education_level VARCHAR(50),
Workplace VARCHAR(50),
Name VARCHAR(50)
);
CREATE TABLE table2 (
id INT,
Workplace VARCHAR(50),
Boss VARCHAR(50),
Vacancies_college INT,
Vacancies_high_school INT
);
INSERT INTO table1 VALUES
(1, 'Verdadeiro', 'Superior', '1ª Vara Cível', 'Marina x'),
(3, 'Verdadeiro', 'Médio', '1ª Vara Cível', 'Raquel x');
INSERT INTO table2 VALUES
(1, '1ª Vara Cível', 'fist boss', 2, 3),
(2, '2ª Vara Cível', 'sec boss', 2, 4),
(3, '3ª Vara Cível', 'third boss', 2, 3);
Query 1:
SELECT t1.Workplace, t1.Active, t1.Education_level,
(CASE
WHEN t1.Education_level = 'Médio' THEN t2.Vacancies_college
WHEN t1.Education_level = 'Superior' THEN t2.Vacancies_high_school
END) - COUNT(*) AS vacancies
FROM table1 t1 LEFT JOIN table2 t2
ON (t1.Workplace = t2.Workplace)
GROUP BY t1.Workplace, t1.Active, t1.Education_level
Results:
| Workplace | Active | Education_level | vacancies |
|---------------|------------|-----------------|-----------|
| 1ª Vara Cível | Verdadeiro | Médio | 1 |
| 1ª Vara Cível | Verdadeiro | Superior | 2 |

Joining tables on an array with string

I am looking to do a left join on a table that has an array column called tags with a table that has the definitions of the tag, tag_definitions. There will only be one (at the most) match per row in the Cities Table. I can't join an array with a string and i'm not sure how to proceed.
Cities_Table
City_Code | State |Tags
NYC | NY | 1, 4, 5
SF | CA | 2,4, 6
CHI | IL | 3, 8, 10
.
Tag_Definitions
Tag_ID | Name
5 | East_Coast
6 | West_Coast
10 | MidWest
So I'm looking to get something like this...
City_Code | State |Tags | Tag_Descr
NYC | NY | 1, 4, 5 | East_Coast
SF | CA | 2,4, 6 | West_Coast
CHI | IL | 3, 8, 10 | MidWest

Depending on your database (the syntax might be different), you can do something like the following:
select *
from cities_table c
join tag_definitions t on concat(',',c.tags,',') like concat('%,',t.tag_id,',%')
SQL Fiddle Demo
However as noted, a better idea would be to create a City_Tags table and store the individual ids in that table. Generally it's not a good idea to store comma delimited data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding the closest geographic points between two tables BigQuery - sql

Related

SQL Using specific input find corresponding columns and create new summary table

How to fetch records from DB which fulfill a certain criteria

In Hive, how to combine multiple tables to produce single row containing array of objects?

Is it possible to make an sql query that counts from one table and then subtract the number it obtained from another table?

Joining tables on an array with string

Categories

Resources