How to Group By in Big Query - sql

I have record data structure in the BQ, when i run the following query my output is as follow:
Query : SELECT v.key, v.value from table unnest(dimensions.key_value) v;
key value
region region1
loc location1
region region1
loc location1
region region2
loc location2
Now i want to do group by using region and location so my output will be as follow:
groupBy Count
region1,location1 2
region2,location2 1
If i need to do group by using only one key then it would be a simple query:
SELECT v.key, count(*) from table, unnest(dimensions.key_value) v group by v.key;
But how to do for more than one key ?

Maybe pivot it first?
with pivotted as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc
from table t
)
select region, loc, count(*)
from pivotted
group by region, loc

Hmmm . . . You seem to be assuming that the ordering of the values is important. This is not a good way to store repeating pairs of data -- arrays of structs seems better. But you can use with offset and some arithmetic.
Assuming that region and loc are the only values and they are interleaved (as in your example):
with t as (
select struct(array[struct('region' as key, 'region1' as value),
struct('loc', 'location1'),
struct('region', 'region1'),
struct('loc', 'location1'),
struct('region', 'region2'),
struct('loc', 'location2')
] as key_value) as dimensions
)
select rl.region, rl.loc, count(*)
from (select (select array_agg(region_loc) as region_locs
from (select struct(max(case when kv.key = 'region' then kv.value end) as region,
max(case when kv.key = 'loc' then kv.value end) as loc
) as region_loc
from unnest(dimensions.key_value) kv with offset n
group by floor( n / 2 )
) rl2
) as region_locs
from t
) rl3 cross join
unnest(rl3.region_locs) rl
group by 1, 2;

Related

how to convert jsonarray to multi column from hive

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!
This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

Getting Number of Common Values from 2 comma-seperated strings

I have a table that contains comma-separated values in a column In Postgres.
ID PRODS
--------------------------------------
1 ,142,10,75,
2 ,142,87,63,
3 ,75,73,2,58,
4 ,142,2,
Now I want a query where I can give a comma-separated string and it will tell me the number of matches between the input string and the string present in the row.
For instance, for input value ',142,87,', I want the output like
ID PRODS No. of Match
------------------------------------------------------------------------
1 ,142,10,75, 1
2 ,142,87,63, 2
3 ,75,73,2,58, 0
4 ,142,2, 1
Try this:
SELECT
*,
ARRAY(
SELECT
*
FROM
unnest(string_to_array(trim(both ',' from prods), ','))
WHERE
unnest = ANY(string_to_array(',142,87,', ','))
)
FROM
prods_table;
Output is:
1 ,142,10,75, {142}
2 ,142,87,63, {142,87}
3 ,75,73,2,58, {}
4 ,142,2, {142}
Add the cardinality(anyarray) function to the last column to get just a number of matches.
And consider changing your database design.
Check This.
select T.*,
COALESCE(No_of_Match,'0')
from TT T Left join
(
select ID,count(ID) No_of_Match
from (
select ID,unnest(string_to_array(trim(t.prods, ','), ',')) A
from TT t)a
Where A in ('142','87')
group by ID
)B
On T.Id=b.id
Demo Here
OutPut
If you install the intarray extension, this gets quite easy:
select id, prods, cardinality(string_to_array(trim(prods, ','), ',')::int[] & array[142,87])
from bad_design;
Otherwise it's a bit more complicated:
select bd.id, bd.prods, m.matches
from bad_design bd
join lateral (
select bd.id, count(v.p) as matches
from unnest(string_to_array(trim(bd.prods, ','), ',')) as l(p)
left join (
values ('142'),('87') --<< these are your input values
) v(p) on l.p = v.p
group by bd.id
) m on m.id = bd.id
order by bd.id;
Online example: http://rextester.com/ZIYS97736
But you should really fix your data model.
with data as
(
select *,
unnest(string_to_array(trim(both ',' from prods), ',') ) as v
from myTable
),
counts as
(
select id, count(t) as c from data
left join
( select unnest(string_to_array(',142,87,', ',') ) as t) tmp on tmp.t = data.v
group by id
order by id
)
select t1.id, t1.prods, t2.c as "No. of Match"
from myTable t1
inner join counts t2 on t1.id = t2.id;

Select columns maximum and minimum value for all records

I have a table as below; I want to get the column names having maximum and minimum value except population column (ofcourse it will have maximum value) for all records.
State Population age_below_18 age_18_to_50 age_50_above
1 1000 250 600 150
2 4200 400 300 3500
Result :
State Population Maximum_group Minimum_group Max_value Min_value
1 1000 age_18_to_50 age_50_above 600 150
2 4200 age_50_above age_18_to_50 3500 300
Assuming none of the values are NULL, you can use greatest() and least():
select state, population,
(case when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_below_18'
when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_18_to_50'
when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_50_above'
end) as maximum_group,
(case when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_below_18'
when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_18_to_50'
when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_50_above'
end) as minimum_group,
greatest(age_below_18, age_18_to_50, age_50_above) as maximum_value,
least(age_below_18, age_18_to_50, age_50_above) as minimum_value
from t;
If your result set is actually being generated by a query, there is likely a better approach.
An alternative method "unpivots" the data and then reaggregates:
select state, population,
max(which) over (dense_rank first_value order by val desc) as maximum_group,
max(which) over (dense_rank first_value order by val asc) as minimum_group,
max(val) as maximum_value,
min(val) as minimum_value
from ((select state, population, 'age_below_18' as which, age_below_18 as val
from t
) union all
(select state, population, 'age_18_to_50' as which, age_18_to_50 as val
from t
) union all
(select state, population, 'age_50_above' as which, age_50_above as val
from t
)
) t
group by state, population;
This approach would have less performance than the first, although it is perhaps easier to implement as the number of values increases. However, Oracle 12C supports lateral joins, where a similar approach would have competitive performance.
with CTE as (
select T.*
--step2: rank value
,RANK() OVER (PARTITION BY "State", "Population" order by "value") "rk"
from (
--step1: union merge three column to on column
select
"State", "Population",
'age_below_18' as GroupName,
"age_below_18" as "value"
from TestTable
union all
select
"State", "Population",
'age_18_to_50' as GroupName,
"age_18_to_50" as "value"
from TestTable
union all
select
"State", "Population",
'age_50_above' as GroupName,
"age_50_above" as "value"
from TestTable
) T
)
select T1."State", T1."Population"
,T3.GroupName Maximum_group
,T4.GroupName Minimum_group
,T3."value" Max_value
,T4."value" Min_value
--step3: max rank get maxvalue,min rank get minvalue
from (select "State", "Population",max( "rk") as Max_rank from CTE group by "State", "Population") T1
left join (select "State", "Population",min( "rk") as Min_rank from CTE group by "State", "Population") T2
on T1."State" = T2."State" and T1."Population" = T2."Population"
left join CTE T3 on T3."State" = T1."State" and T3."Population" = T1."Population" and T1.Max_rank = T3."rk"
left join CTE T4 on T4."State" = T2."State" and T4."Population" = T2."Population" and T2.Min_rank = T4."rk"
SQL Fiddle DEMO LINK
Hope it help you :)
Another option: use a combination of UNPIVOT(), which "rotates columns into rows" (see: documentation) and analytic functions, which "compute an aggregate value based on a group of rows" (documentation here) eg
Test data
select * from T ;
STATE POPULATION YOUNGERTHAN18 BETWEEN18AND50 OVER50
1 1000 250 600 150
2 4200 400 300 3500
UNPIVOT
select *
from T
unpivot (
quantity for agegroup in (
youngerthan18 as 'youngest'
, between18and50 as 'middleaged'
, over50 as 'oldest'
)
);
-- result
STATE POPULATION AGEGROUP QUANTITY
1 1000 youngest 250
1 1000 middleaged 600
1 1000 oldest 150
2 4200 youngest 400
2 4200 middleaged 300
2 4200 oldest 3500
Include Analytic Functions
select distinct
state
, population
, max( quantity ) over ( partition by state ) maxq
, min( quantity ) over ( partition by state ) minq
, first_value ( agegroup ) over ( partition by state order by quantity desc ) biggest_group
, first_value ( agegroup ) over ( partition by state order by quantity ) smallest_group
from T
unpivot (
quantity for agegroup in (
youngerthan18 as 'youngest'
, between18and50 as 'middleaged'
, over50 as 'oldest'
)
)
;
-- result
STATE POPULATION MAXQ MINQ BIGGEST_GROUP SMALLEST_GROUP
1 1000 600 150 middleaged oldest
2 4200 3500 300 oldest middleaged
Example tested w/ Oracle 11g (see dbfiddle) and Oracle 12c.
Caution: {1} column (headings) need adjusting (according to your requirements). {2} If there are NULLs in your original table, you should adjust the query eg by using NVL().
An advantage of the described approach is: the code will remain rather clear, even if more 'categories' are used. Eg when working with 11 age groups, the query may look something like ...
select distinct
state
, population
, max( quantity ) over ( partition by state ) maxq
, min( quantity ) over ( partition by state ) minq
, first_value ( agegroup ) over ( partition by state order by quantity desc ) biggest_group
, first_value ( agegroup ) over ( partition by state order by quantity ) smallest_group
from T
unpivot (
quantity for agegroup in (
y10 as 'youngerthan10'
, b10_20 as 'between10and20'
, b20_30 as 'between20and30'
, b30_40 as 'between30and40'
, b40_50 as 'between40and50'
, b50_60 as 'between50and60'
, b60_70 as 'between60and70'
, b70_80 as 'between70and80'
, b80_90 as 'between80and90'
, b90_100 as 'between90and100'
, o100 as 'over100'
)
)
order by state
;
See dbfiddle.

Return column1 only if column 2 contains all zeros

In my t-SQL db, I have an ItemLocation table. It lists warehouses, storage locations in the warehouses, items stored in those locations, and the current qty on hand. See below -
A warehouse (whse) can have multiple locations, and a location can have multiple items. As you can see from the image, there are items within locations that have '0' qty_on_hand.
What I would like to do is write a query that only returns locations that have absolutely NO qty_on_hand. For example, the highlighted location in my image (01-00-00A) would not be present in the result set of the executed query because it contains items that do have quantity. I'm only interested in the locations that don't have any quantity for any item whatsoever.
SELECT itemloc.*
FROM itemloc
WHERE itemloc.qty_on_hand = '0'
AND whse IN ('MW10','MW40','MW60')
ORDER BY whse, itemloc.loc
My query depicts qty_on_hand should = '0', but I don't want the qty_on_hand to equal '0', because then it will return every location that has an item with no inventory. I can't quite figure out what my query would look like for this situation.
Assuming the qty can only be 0 or positive value, you can use aggregation to find if max value for that location is 0.
select loc
from itemloc
group by loc
having max(qty) = 0
if it can be negative too, then use min as well:
select loc
from itemloc
group by loc
having max(qty) = 0 and min(qty) = 0
If you want to get the other columns as well you can use :
select *
from itemloc
where loc in (
select loc
from itemloc
group by loc
having max(qty) = 0
);
or window function:
select *
from (
select
i.*,
max(qty) over (partition by loc) max_qty
from itemloc i
) t where max_qty = 0;
You can create a subset of any whse/location combo that has qty_on_hand > 0 (for any item), and return the results not included in that subset.
SELECT a.*
FROM itemloc a
LEFT JOIN (select distinct whse,location from itemloc where qty_on_hand <> '0') b
ON a.whse = b.whse
AND a.location = b.location
WHERE b.whse IS NULL
SELECT *
FROM itemloc
WHERE
loc IN
(
SELECT loc
FROM itemloc
GROUP BY qty_on_hand
HAVING SUM(qty_on_hand) = '0'
)

SQL: Add all values from different rows

I have two SQL queries I want to combine into one. The first one selects all the IDs of the rows I need to add in the second one:
SELECT t.mp_id FROM t_mp AS t
JOIN t_mp_og USING (mp_id)
WHERE og_id = 2928
AND t.description = 'Energy'
The second one should add together the values from the rows returned by the first query. Up until now I've only been able to add several selects with a + in between them. For a dynamic query that adds all the rows returned by query one, I'd like to do something equivalent to "foreach(value from query1){ sum += value }" and return that sum.
SELECT(
(SELECT current_value FROM t_value_time WHERE mp_id = 29280001 AND time_id =
(SELECT time_id FROM t_time WHERE time_stamp =
(SELECT max(time_stamp) FROM v_value AS v WHERE time_stamp is not null AND mp_id = 29280001)))
+
(SELECT current_value FROM t_value_time WHERE mp_id = 29280015 AND time_id =
(SELECT time_id FROM t_time WHERE time_stamp =
(SELECT max(time_stamp) FROM v_value AS v WHERE time_stamp is not null AND mp_id = 29280015)))
+
(SELECT current_value FROM t_value_time WHERE mp_id = 29280022 AND time_id =
(SELECT time_id FROM t_time WHERE time_stamp =
(SELECT max(time_stamp) FROM v_value AS v WHERE time_stamp is not null AND mp_id = 29280022)))
);
My two problems: I don't know how to add all rows in a set, only the manual "+" way. I also don't know how to put the ID from the row into the SELECT getting the value. I've tried AS, but it seems to only work for tables, not single values.
Thanks for you help,
MrB
here is the edited query
select t.mp_id,sum(current_value)
from t_value_time t, t_time tim, v_value v
where
where t.mp_id = v.mp_id
and v.time_stamp is not null
and tim.time_stamp = MAX(v.time_stamp)
and t.time_id=tim.time_id
and t.mp_id in ( 29280001,29280015,29280022)
group by t.mp_id
use SUM() for aggregation
Have you tried SELECT Name, SUM(X) FROM Table GROUP BY Name
SELECT SUM(CURRENT_VALUE )
FROM
T_VALUE_TIME INNER JOIN T_TIME ON T_VALUE_TIME.TIME_ID=T_TIME.TIME_ID
JOIN V_VALUE ON T_TIME.TIME_STAMP=V_VALUE.TIME_STAMP
WHERE T_VALUE_TIME.MP_ID IN (SELECT t.mp_id FROM t_mp AS t JOIN t_mp_og USING (mp_id)
WHERE og_id = 2928
AND t.description = 'Energy' )
AND T_TIME.TIME_ID=(SELECT MAX(TIME_STAMP) FROM V_VALUE WHERE TIME_STAMP IS NOT NULL)
GROUP BY V_VALUE.MP_ID