How to concat uncertain keys together in hive?

How to concat uncertain keys together in hive? - sql

Got source data like this. And all the keys are uncertain.
{k1: 0.5, k2: 0.6, k3: 0.3}
I want to concat(all_keys, ',') if value > 0.5 order by value desc. The correct result is 'k2,k1'
How should I do that in hive?

Assuming your data is stored under table test1 as column data.
select concat_ws(',',collect_list(key)) res
from (select key, vals
from (select explode(data) as (key,vals)
from test1) sq1
where vals > 0.5
sort by vals desc
) a;

Related

Geography function over a column

I am trying to use the st_makeline() function in order to create lines for every points and the next one in a single column.
Do I need to create another column with the 2 points already ?
with t1 as(
SELECT *, ST_GEOGPOINT(cast(long as float64) , cast(lat as float64)) geometry FROM `my_table.faissal.trajets_flix`
where id = 1
order by index_loc
)
select index_loc geometry
from t1
Here are the results
Thanks for your help

You seems to want to write this code:
https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_makeline
WITH t1 as (
SELECT *, ST_GEOGPOINT(cast(long as float64), cast(lat as float64)) geometry
FROM `my_table.faissal.trajets_flix`
-- WHERE id = 1
)
SELECT id, ST_MAKELINE(ARRAY_AGG(geometry ORDER BY index_loc)) traj
FROM t1
GROUP BY id;
with output:
When visualized on the map.

Consider also below simple and cheap option
select st_geogfromtext(format('linestring(%s)',
string_agg(long || ' ' || lat order by index_loc))
) as path
from `my_table.faissal.trajets_flix`
where id = 1
if applied to sample data in your question - output is
which is visualized as

BigQuery Standard SQL, get max value from json array

I have a BigQuery column which contains STRING values like
col1
[{"a":1,"b":2},{"a":2,"b":3}]
[{"a":3,"b":4},{"a":5,"b":6}]
Now when doing a SELECT for each I want to get just the max. value of "a" in each json array for example here I would want the output of the SELECT on the table to be
2
5
Any ideas please? Thanks!

Use JSON_EXTRACT_ARRAY() to retrieve each array element. Then JSON_EXTRACT_VALUE():
with t as (
select '[{"a":1,"b":2},{"a":2,"b":3}]' as col union all
select '[{"a":3,"b":4},{"a":5,"b":6}]'
)
select t.*,
(select max(json_value(el, '$.a'))
from unnest(JSON_QUERY_ARRAY(col, '$')) el
)
from t;

PANDAS divide for a given value with groupby

I want to divide each 'Value' in this dataset by the Value at TIME=='1970-Q1' grouped by LOCATION.
This is how I'd implement the logic in SQL
WITH first_year AS (
SELECT LOCATION, Value
FROM `table`
WHERE TIME = '1970-Q1'
)
SELECT t.LOCATION, t.TIME, ((t.Value / f.Value) * 100) normValue
FROM `table` t,
first_year f
WHERE t.LOCATION = f.LOCATION
ORDER BY LOCATION, TIME ASC
However, you can also assume that we can sort (ascending) the column TIME within the group and take the first value. It's always a string like 'YYYY-QX'
Expected result:

Try with transform
df['normal'] = df.Value / df['VALUE'].where(df.TIME.str[5:] =='Q1').groupby(df['LOCATION']).transform('first')

Perform loop and calculation on BigQuery Array type

My original data, B is an array of INT64:
And I want to calculate the difference between B[n+1] - B[n], hence result in a new table as follow:
I figured out I can somehow achieve this by using LOOP and IF condition:
DECLARE x INT64 DEFAULT 0;
LOOP
SET x = x + 1
IF(x < array_length(table.B))
THEN INSERT INTO newTable (SELECT A, B[OFFSET(x+1)] - B[OFFSET(x)]) from table
END IF;
END LOOP;
The problem is that the above idea doesn't work on each row of my data, cause I still need to loop through each row in my data table, but I can't find a way to integrate my scripting part into a normal query, where I can
SELECT A, [calculation script] from table
Can someone point me how can I do it? Or any better way to solve this problem?
Thank you.

Below actually works - BigQuery
select * replace(
array(select diff from (
select offset, lead(el) over(order by offset) - el as diff
from unnest(B) el with offset
) where not diff is null
order by offset
) as B
)
from `project.dataset.table` t
if to apply to sample data in your question - output is

You can use unnest() with offset for this purpose:
select id, a,
array_agg(b_el - prev_b_el order by n) as b_diffs
from (select t.*, b_el, lag(b_el) over (partition by t.id order by n) as prev_b_el
from t cross join
unnest(b) b_el with offset n
) t
where prev_b_el is not null
group by t.id, t.a

How to Join Multi Datatable(DataColumn) into One but separete Column?

I have the following 3 query statements:
sSQLSting1 = "SELECT * From Column1"
sSQLSting2 = "SELECT * From Column2"
sSQLSting3 = "SELECT * From Column3"
Edit: Column1 is the name of table1 but it's just a column.
I want to join them all into different columns of a new Datatable but when tried with:
sSQLSting1 & " Union " & sSQLSting2 & " Union " & sSQLSting3 , but the list returns only a column.
My desire result is to create a table that includes all of the above columns, so my question is:
can I do this with just one query statement or do I have to iterate and add data for each column? (i'm using c#).
Thanks a lot!

Perhaps this will help a bit
Again, there is NO GTD of the proper sequence. Also this assumes same number of rows in each table
Example
Select Col1 = A.SomeColumn
,Col2 = B.SomeColumn
,Col3 = C.SomeColumn
From ( Select SomeColumn,RN=row_number() over (order by SomeColumn ) from Column1 ) A
Join ( Select SomeColumn,RN=row_number() over (order by SomeColumn ) from Column2 ) B on A.RN=B.RN
Join ( Select SomeColumn,RN=row_number() over (order by SomeColumn ) from Column3 ) C on A.RN=C.RN
EDIT another option is a PIVOT
Select *
From (
Select Value=SomeColumn,Col=1,RN=row_number() over (order by SomeColumn ) from Column1
Union All
Select Value=SomeColumn,Col=2,RN=row_number() over (order by SomeColumn ) from Column2
Union All
Select Value=SomeColumn,Col=3,RN=row_number() over (order by SomeColumn ) from Column3
) src
Pivot (max(value) for Col in ([1],[2],[3]) ) pvt

This would probably be simplest to do on the client side:
for(int x = 1; x < dts.Length; x++){
dts[0].Columns.Add(dts[x].Columns[0].ColumnName);
for(int y = 0; y < dts[x].Rows.Count; y++)
dts[0].Rows[y][x] = dts[x].Rows[y][0];
}
It'll handle any number of datatables in an array (called dts) - change Length to Coubt if it's a list etc
All columns names must be unique (yours are) - Logic can be added to append something to the name to make it unique
All data is copied into the first table in the array
If your columns are other than string you can add the type of the new column added to dts[0] by taking the type of dts[x].Columns[0]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to concat uncertain keys together in hive? - sql

Got source data like this. And all the keys are uncertain. {k1: 0.5, k2: 0.6, k3: 0.3} I want to concat(all_keys, ',') if value > 0.5 order by value desc. The correct result is 'k2,k1' How should I do that in hive?

Assuming your data is stored under table test1 as column data. select concat_ws(',',collect_list(key)) res from (select key, vals from (select explode(data) as (key,vals) from test1) sq1 where vals > 0.5 sort by vals desc ) a;

Related

Geography function over a column

BigQuery Standard SQL, get max value from json array

PANDAS divide for a given value with groupby

Perform loop and calculation on BigQuery Array type

How to Join Multi Datatable(DataColumn) into One but separete Column?

Categories

Resources