I've got a table with lat and lng coordnates, and need to add the distance into a new column called 'distance' in Bigquery.
table
start_lat
end_lat
start_lng
end_lng
41.8964
41.9322
-87.661
-87.6586
41.9244
41.9306
-87.7154
-87.7238
41.903
41.8992
-87.6975
-87.6722
I haven't a clue how to do it. I saw some examples, but simply couldn't apply it into this case.
Any tip?
The ST_DISTANCE function will calculate the distance (in meters) between 2 points.
with my_data as (
select 1 as trip_id, 41.8964 as start_lat, 41.9322 as end_lat, -87.661 as start_lng, -87.6586 as end_lng union all
select 2, 41., 41.9306, -87.7154, -87.7238
)
select trip_id,
ST_DISTANCE(ST_GEOGPOINT(start_lng, start_lat), ST_GEOGPOINT(end_lng, end_lat)) as distance_in_meters
from my_data
Output:
trip_id
distance_in_meters
1
3985.735019583467
2
103480.52812005761
Related
I am trying to find the average of a query which has 600,000 rows. I have created a CTE to calculate distances based off of longitude and latitude. Is it possible for me to add an aggregate function to this, as the calculated_distance is not an existing column?
Thanks in advance!
WITH name AS (
SELECT
id,
latitude,
longitude,
name,
docks
FROM
santander_stations
),
ride_data AS (
SELECT
startstationid,
endstationid
FROM
public.santander_2016
UNION
SELECT
startstationid,
endstationid
FROM
public.santander_2017
UNION
SELECT
startstationid,
endstationid
FROM
public.santander_2018
)
SELECT
calculate_distance( a.latitude, a.longitude, b.latitude, b.longitude, 'K' ),
a.name AS Start_Station,
b.name AS End_Station
FROM
name AS a,
name AS b
WHERE
a.id IN ( SELECT startstationid FROM ride_data )
AND
b.id IN ( SELECT endstationid FROM ride_data )
ORDER BY
1 DESC
The number of rides is much more than the number of possible start & end stations. Therefore, we can calculate the distances of all combinations of (start station, end station) first, and then use the result as a lookup table for the 600,000 rides.
Here's a way to calculate the average ride distance in year 2016, 2017 and 2018 (please replace (1). the distance_km formula with your calculated_distance() UDF, (2). Santander_rides with your ride_data subquery)
with cte_santander_station_distance as (
select x.id as start_station_id,
y.id as end_station_id,
acos(sin(x.latitude)*sin(y.latitude)+cos(x.latitude)*cos(y.latitude)*cos(y.longitude-x.longitude)) as distance_km
from santander_stations x, santander_stations y
where x.id <> y.id)
select avg(ssd.distance_km) as average_distance_km,
count(*) as rides
from santander_rides sr
join cte_santander_station_distance ssd
on sr.start_station_id = ssd.start_station_id
and sr.end_station_id = ssd.end_station_id;
If there're lots of combinations of start station and end station, you can materialize the lookup table such as below:
Hi I have a table of data that contains register_date, customer_id, lat_long (format as -6.2134,106.783876). I want to count unique customer id that registers between date range 21-23/jan/2022 and is in a radius of 500m from the longitude 106.8486583 and latitude -6.14462529 . How do I do this on google bigquery?
Currently I am stuck at this
WITH params AS (
SELECT ST_GeogPoint(106.8486583, -6.14462529) AS center,
0.2 AS maxn_stations,
0.2 AS maxdist_km
),
distance_from_center AS (
SELECT
customer_id,
register_date,
ST_GeogPoint(CAST(SPLIT(lat_long,",")[OFFSET(1)] AS float64), CAST(SPLIT(lat_long,",")[OFFSET(0)] AS float64)) AS loc,
ST_Distance(ST_GeogPoint(CAST(SPLIT(lat_long,",")[OFFSET(1)] AS float64), CAST(SPLIT(lat_long,",")[OFFSET(0)] AS float64)), params.center) AS dist_meters
FROM
`lat long data`,
params
WHERE ST_DWithin(ST_GeogPoint(CAST(SPLIT(lat_long,",")[OFFSET(1)] AS float64), CAST(SPLIT(lat_long,",")[OFFSET(0)] AS float64)), params.center, params.maxdist_km*1000)
)
SELECT
*
FROM distance_from_center
WHERE
date(register_date)
BETWEEN
'2022-01-21'
AND
'2022-01-23'
solved
I have this value
I want to calculate a new column, which will add the product of multiplication from ticket_units_count and price, so it must be:
5 * 33104.0 + 4 * 23449.0 = 259316
How to do that in bigquery?
I tried this one
SELECT
SUM(CAST(price AS FLOAT64) * CAST(ticket_units_count AS INT64))
FROM table
But it shows this error: Bad double value: 33104.0;23449.0
Need your help to specify the query to get the expected result
Consider below approach
select *,
( select sum(cast(_count as int64) * cast(_price as float64))
from unnest(split(ticket_units_count, ';')) _count with offset
join unnest(split(price, ';')) _price with offset
using (offset)
) as total
from your_table
if applied to sample data in your question - output is
as the title say, i am doing a query on a bikesharing data stored in bigquery
I am able to extract the data and arrange it in a correct order to be displayed in a path chart. In the data, there are coordinated with only start and end long and lat, or sometimes only start long and lat, how do i remove anything with less then 4 points?
this is the code , i am also limited to select only
SELECT
routeID ,
json_extract(st_asgeojson(st_makeline( array_agg(st_geogpoint(locs.lon, locs.lat) order by locs.date))),'$.coordinates') as geo
FROM
howardcounty.routebatches
where unlockedAt between {{start_date}} and {{end_date}}
cross join UNNEST(locations) as locs
GROUP BY routeID
order by routeID
limit 10
have also included a screen shot for clarity
To apply a condition after a group by, please use a having. For a simply condition -- Are there at least two dataset for the route? -- this query can be used:
With dummy as (
Select 1 as routeID, [struct(current_timestamp() as date, 1 as lon, 2 as lat),struct(current_timestamp() as date, 3 as lon, 4 as lat)] as locations
Union all select 2 as routeID, [struct(current_timestamp() as date, 10 as lon, 20 as lat)]
)
SELECT
routeID , count(locs.date) as amountcoord,
json_extract(st_asgeojson(st_makeline( array_agg(st_geogpoint(locs.lon, locs.lat) order by locs.date))),'$.coordinates') as geo
FROM
#howardcounty.routebatches
dummy
#where unlockedAt between {{start_date}} and {{end_date}}
cross join UNNEST(locations) as locs
GROUP BY routeID
having count(locs.date)>1
order by routeID
limit 10
For more complex ones, a nested select may do the job:
Select *
from (
--- your code ---
) where length(geo)-length(replace(geo,"]","")) > 1+4
The JSON is transformed to a string in your code. If you count the ] and substract one for the end of the JSON, the inside arrays are counted.
I have 2 different characters ('|' and ',') in one column in Bigquery. Using SQL standard how do I split a column with the string from these characters below into multiple columns separating by '|' and ',' ?
Inbr | Evermore | In Banner Video, Canary Island | 702B6
The code I have so far is:
Thank you here is the code scenario, how do I apply that with the other columns I need in the table?
SELECT CAST(Date AS DATE) Date,
Data_Source_type,
Data_Source_id,
Campaign,
Data_Source,
Data_Source_name,
Data_Source_type_name,
Ad_legacy__AdWords,
Ad_Group_Name__AdWords,
Ad_Type__AdWords,
SPLIT(Campaign,'|')[safe_ordinal(1)] as Media,SPLIT(Campaign,'|')[safe_ordinal(2)] as Client,SPLIT(Campaign, '|')[safe_ordinal(3)] as Market_Type,SPLIT(Campaign,'|')[safe_ordinal(4)] as Market,SPLIT(Campaign,'|')[safe_ordinal(5)] as Market_ID,
City__AdWords,
FROM `data.aud_summary'
Consider below (as this is Campaign info - I assume the structure of string in column is consistent across rows and has same number of columns to be extracted)
select * except(key) from (
select to_json_string(t) key, offset, value
from `project.dataset.table` t,
unnest(regexp_extract_all(Campaign, r'[^,|]+')) value with offset
)
pivot(max(value) for offset in (0 as Media, 1 as Client, 2 as Market_Type, 3 as Market, 4 as Code))
if applied to sample data in your question - output is
how do I apply that with the other columns I need in the table?
just add t.* as in below example
select * from (
select t.*, offset, value
from `project.dataset.table` t,
unnest(regexp_extract_all(Campaign, r'[^,|]+')) value with offset
)
pivot(max(value) for offset in (0 as Media, 1 as Client, 2 as Market_Type, 3 as Market, 4 as Code))
Use REPLACE to replace your , to | before splitting the column.
WITH
SampleData AS (
SELECT
"Inbr | Evermore | In Banner Video, Canary Island | 702B6" AS DATA )
SELECT
a[safe_ORDINAL(1)] AS Media,
a[safe_ORDINAL(2)] AS Client,
a[safe_ORDINAL(3)] AS Market_Type,
a[safe_ORDINAL(4)] AS Market,
a[safe_ORDINAL(5)] AS Fifth,
FROM (
SELECT
SPLIT(REPLACE(DATA, ",", "|"),'|') AS a
FROM
SampleData)
Result
Media
Client
Market_Type
Market
Fifth
Inbr
Evermore
In Banner Video
Canary Island
702B6
at last,
SELECT
* EXCEPT(a),
a[safe_ORDINAL(1)] AS Media,
a[safe_ORDINAL(2)] AS Client,
a[safe_ORDINAL(3)] AS Market_Type,
a[safe_ORDINAL(4)] AS Market,
a[safe_ORDINAL(5)] AS Fifth,
FROM (
SELECT
CAST(Date AS DATE) Date,
* EXCEPT(Date),
SPLIT(REPLACE(DATA, ',', '|'),'|') AS a
FROM
`data.aud_summary`)