Given two arrays:
select split(concat_ws(',',array('a','b'),array('c','d')),',') as keys ,
split(concat_ws(',',array('1','2'),array('3','4')),',') as values
keys values
["a","b","c","d"] ["1","2","3","4"]
How can these be combined into a hive map? The typical map builder takes them interleaved instead:
map(k1,v1,k2,v2,k3,v3,...,k_n,v_n)
Related
Is it possible to divide all columns in a table by one of them? There are 168 of them so I'd rather not write column2/column1, column3/column1, etc.
As Suggested by #Aishwary, you can write a script to get the list of columns using INFORMATION_SCHEMA and then iterate over the list of columns to get the desired result.
I have a column of type string in my table, where multiple values are separated by pipe operator. For example, like this,
Value1|Value2|Value3
Now, what I want is to have a query, which will show three rows for this row. Basically something similar to the concept of explode in Dataframes.
Note that I am using Spark SQL. And I want to achieve this using SQL, not dataframes.
I got it working by using the following query.
select t.*, explode(split(values, "\\|")) as value
from table t
\\| here can also be replaced by [|]. Just specifying | doesn't work.
here's a python self learner trying to find a way working with columns with multiple values. the dataset is TMDb Movie Dataset and there are multiple values columns are like genres, cast etc.
I managed splitting values and counting them, it's okay. but what if I want to see the relationship between genres and for example popularity? how can I group all genres after a proper splitting process?
dataset looks like this:
I would use something like the stack function to create a row for every item when splitting. For example, when you want to group by genre, create a row for genre when splitting (keeping other columns the same). With that you can do simple groupby operations.
I don't know how to ask this so I'm going to create an example:
Suppose I have a table called "market" that consist in just two columns and three rows as follows:
So, what I want to know is if there is a way to take all the purchases products and put it in differente rows, for example:
You need to unnest the array into multiple rows, then you can extract the product name using the ->> operator:
select t.user_id, x.purchase ->> 'product' as product
from the_table t
cross join jsonb_array_elements(t.purchases) as x(purchase);
If your column is a json rather than jsonb you need to use json_array_elements() instead
I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities:
select country, concat_ws(city, "|") as cities
from my_table
group by country
Is this possible in Hive? I am using Hive 0.11 from CDH5 right now
In database management an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list.
Source: Aggregate function - Wikipedia
Hive's out-of-the-box aggregate functions listed on the following web-page:
Built-in Aggregate Functions (UDAF - user defined aggregation function)
So, the only built-in option (for Hive 0.11; for Hive 0.13 and above you have collect_list) is:
array collect_set(col)
This one will answer your request in case there is no duplicate city records per country (returns a set of objects with duplicate elements eliminated). Otherwise create your own UDAF or aggregate outside of Hive.
References for writing UDAF:
Writing GenericUDAFs: A Tutorial
HivePlugins
Create/Drop Function