Geography function over a column - sql

I am trying to use the st_makeline() function in order to create lines for every points and the next one in a single column.
Do I need to create another column with the 2 points already ?
with t1 as(
SELECT *, ST_GEOGPOINT(cast(long as float64) , cast(lat as float64)) geometry FROM `my_table.faissal.trajets_flix`
where id = 1
order by index_loc
)
select index_loc geometry
from t1
Here are the results
Thanks for your help

You seems to want to write this code:
https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_makeline
WITH t1 as (
SELECT *, ST_GEOGPOINT(cast(long as float64), cast(lat as float64)) geometry
FROM `my_table.faissal.trajets_flix`
-- WHERE id = 1
)
SELECT id, ST_MAKELINE(ARRAY_AGG(geometry ORDER BY index_loc)) traj
FROM t1
GROUP BY id;
with output:
When visualized on the map.

Consider also below simple and cheap option
select st_geogfromtext(format('linestring(%s)',
string_agg(long || ' ' || lat order by index_loc))
) as path
from `my_table.faissal.trajets_flix`
where id = 1
if applied to sample data in your question - output is
which is visualized as

Related

Convert struct values to row in big query

I want to convert values of struct to independent row
My table looks like
|id | details
| 1 | {d_0:{id:'1_0'},d_1:{id:'1_1'}}
| 2 | {d_0:{id:'2_0'},d_1:{id:'2_1'}}
Expected Result (will be flattening the inner struct here)
| id |
|'1_0'|
|'1_1'|
|'2_0'|
|'2_1'|
Since IDK how many fields will be there in details is there any way to convert all the individual fields of the struct as independent rows.
The schema for all values in the details.d_0, details.d_1,... will be the same.
Any help or pointer to resources is appreciated.
You may use this query that iterates array to achieve your desired output:
Creating table:
CREATE TABLE `<proj_id>.<dataset>.<table>` as
WITH data AS (
SELECT "1" AS id, STRUCT(STRUCT( '1_0' as id) as d_0, STRUCT( '1_1' as id) as d_1) as details,
union all SELECT "2" AS id, STRUCT(STRUCT( '2_0' as id) as d_0, STRUCT( '2_1' as id) as d_1) as details
),
tier_1 as (
select id,details.* from data
)
select * from tier_1
Actual Query:
DECLARE i INT64 DEFAULT 0;
DECLARE query_ary ARRAY<STRING> DEFAULT
ARRAY(
select concat(column_name,'.id') from `<dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE
table_name = <your-table> AND regexp_contains(column_name, r'd\_\d')
);
CREATE TEMP TABLE result(id STRING);
LOOP
SET i = i + 1;
IF i > ARRAY_LENGTH(query_ary) THEN
LEAVE;
END IF;
EXECUTE IMMEDIATE '''
INSERT result
SELECT ''' || query_ary[ORDINAL(i)] || ''' FROM `<proj_id>.<dataset>.<table>`
''';
END LOOP;
SELECT * FROM result;
Output:
Consider below approach
select id from your_table,
unnest(split(translate(format('%t', details), '()', ''), ', ')) id
if applied to sample data in your question as
with your_table as (
select "1" id, struct(struct('1_0' as id) as d_0, struct('1_1' as id) as d_1) details union all
select "2", struct(struct('2_0'), struct('2_1'))
)
output is

Perform loop and calculation on BigQuery Array type

My original data, B is an array of INT64:
And I want to calculate the difference between B[n+1] - B[n], hence result in a new table as follow:
I figured out I can somehow achieve this by using LOOP and IF condition:
DECLARE x INT64 DEFAULT 0;
LOOP
SET x = x + 1
IF(x < array_length(table.B))
THEN INSERT INTO newTable (SELECT A, B[OFFSET(x+1)] - B[OFFSET(x)]) from table
END IF;
END LOOP;
The problem is that the above idea doesn't work on each row of my data, cause I still need to loop through each row in my data table, but I can't find a way to integrate my scripting part into a normal query, where I can
SELECT A, [calculation script] from table
Can someone point me how can I do it? Or any better way to solve this problem?
Thank you.
Below actually works - BigQuery
select * replace(
array(select diff from (
select offset, lead(el) over(order by offset) - el as diff
from unnest(B) el with offset
) where not diff is null
order by offset
) as B
)
from `project.dataset.table` t
if to apply to sample data in your question - output is
You can use unnest() with offset for this purpose:
select id, a,
array_agg(b_el - prev_b_el order by n) as b_diffs
from (select t.*, b_el, lag(b_el) over (partition by t.id order by n) as prev_b_el
from t cross join
unnest(b) b_el with offset n
) t
where prev_b_el is not null
group by t.id, t.a

how to convert jsonarray to multi column from hive

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!
This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

How to create an aggregate function for median?

I need to create an aggregate function in Advantage-Database to calculate the median value.
SELECT
group_field
, MEDIAN(value_field)
FROM
table_name
GROUP BY
group_field
Seems the solutions I am finding are quite specific to the sql engine used.
There is no built-in median aggregate function in ADS as you can see in the help file:
http://devzone.advantagedatabase.com/dz/webhelp/Advantage10.1/index.html
I'm afraid that you have to write your own stored procedure or sql script to solve this problem.
The accepted answer to the following question might be a solution for you:
Simple way to calculate median with MySQL
I've updated this answer with a solution that avoids the join in favor of storing some data in a json object.
SOLUTION #1 (two selects and a join, one to get counts, one to get rankings)
This is a little lengthy, but it does work, and it's reasonably fast.
SELECT x.group_field,
avg(
if(
x.rank - y.vol/2 BETWEEN 0 AND 1,
value_field,
null
)
) as median
FROM (
SELECT group_field, value_field,
#r:= IF(#current=group_field, #r+1, 1) as rank,
#current:=group_field
FROM (
SELECT group_field, value_field
FROM table_name
ORDER BY group_field, value_field
) z, (SELECT #r:=0, #current:='') v
) x, (
SELECT group_field, count(*) as vol
FROM table_name
GROUP BY group_field
) y WHERE x.group_field = y.group_field
GROUP BY x.group_field
SOLUTION #2 (uses a json object to store the counts and avoids the join)
SELECT group_field,
avg(
if(
rank - json_extract(#vols, path)/2 BETWEEN 0 AND 1,
value_field,
null
)
) as median
FROM (
SELECT group_field, value_field, path,
#rnk := if(#curr = group_field, #rnk+1, 1) as rank,
#vols := json_set(
#vols,
path,
coalesce(json_extract(#vols, path), 0) + 1
) as vols,
#curr := group_field
FROM (
SELECT p.group_field, p.value_field, concat('$.', p.group_field) as path
FROM table_name
JOIN (SELECT #curr:='', #rnk:=1, #vols:=json_object()) v
ORDER BY group_field, value_field DESC
) z
) y GROUP BY group_field;

SQL Server : convert sub select query to join

I have 2 two tables questionpool and question where question is a many to one of question pool. I have created a query using a sub select query which returns the correct random results but I need to return more than one column from the question table.
The intent of the query is to return a random test from the 'question' table for each 'QuizID' from the 'Question Pool' table.
SELECT QuestionPool.QuestionPoolID,
(
SELECT TOP (1) Question.QuestionPoolID
FROM Question
WHERE Question.GroupID = QuestionPool.QuestionPoolID
ORDER BY NEWID()
)
FROM QuestionPool
WHERE QuestionPool.QuizID = '5'
OUTER APPLY is suited to this:
Select *
FROM QuestionPool
OUTER APPLY
(
SELECT TOP 1 *
FROM Question
WHERE Question.GroupID = QuestionPool.QuestionPoolID
ORDER BY NEWID()
) x
WHERE QuestionPool.QuizID = '5'
Another example of OUTER APPLY use http://www.ienablemuch.com/2012/04/outer-apply-walkthrough.html
Live test: http://www.sqlfiddle.com/#!3/d8afc/1
create table m(i int, o varchar(10));
insert into m values
(1,'alpha'),(2,'beta'),(3,'delta');
create table x(i int, j varchar, k varchar(10));
insert into x values
(1,'a','hello'),
(1,'b','howdy'),
(2,'x','great'),
(2,'y','super'),
(3,'i','uber'),
(3,'j','neat'),
(3,'a','nice');
select m.*, '' as sep, r.*
from m
outer apply
(
select top 1 *
from x
where i = m.i
order by newid()
) r
Not familiar with SQL server, but I hope this would do:
Select QuestionPool.QuestionPoolID, v.QuestionPoolID, v.xxx -- etc
FROM QuestionPool
JOIN
(
SELECT TOP (1) *
FROM Question
WHERE Question.GroupID = QuestionPool.QuestionPoolID
ORDER BY NEWID()
) AS v ON v.QuestionPoolID = QuestionPool.QuestionPoolID
WHERE QuestionPool.QuizID = '5'
Your query appears to be bringing back an arbitrary Question.QuestionPoolId for each QuestionPool.QuestionPoolId subject to the QuizId filter.
I think the following query does this:
select qp.QuestionPoolId, max(q.QuestionPoolId) as any_QuestionPoolId
from Question q join
qp.QuestionPoolId qp
on q.GroupId = qp.QuestionPoolId
WHERE QuestionPool.QuizID = '5'
group by qp.QuestionPoolId
This returns a particular question.
The following query would allow you to get more fields:
select qp.QuestionPoolId, q.*
from (select q.*, row_number() over (partition by GroupId order by (select NULL)) as randrownum
from Question q
) join
(select qp.QuestionPoolId, max(QuetionPool qp
on q.GroupId = qp.QuestionPoolId
WHERE QuestionPool.QuizID = '5' and
randrownum = 1
This uses the row_number() to arbitrarily enumerate the rows. The "Select NULL" provides the random ordering (alternatively, you could use "order by GroupId".
Common Table Expressions (CTEs) are rather handy for this type of thing...
http://msdn.microsoft.com/en-us/library/ms175972(v=sql.90).aspx