Performing numerical operations on array values column in Hive

Performing numerical operations on array values column in Hive - hive

Perform numerical operations on a hive table
id
array
value
a
[10:20]
2
b
[30:40:50]
5
I want to convert above table into the following
id
array
value
converted_array
a
[10:20]
2
[20:40]
b
[30:40:50]
5
[150:200:250]
I want to multiply 'array' column with 'value' column, and create a new column 'converted_array' using hql. I know how to do this in python but I was wondering if there's a way to do it in hive.

Try using collect_set over exploded view as shown below, this will multiply the constant integer value in the value column with each element in the array(using explode) and then return the result(using collect_set):
select id, collect_set(pve) as pv, value1, collect_set(pve*value1) as converted_array
from stacko1
lateral view explode(price_lov) t as pve
group by id,value1;
This should give you the desired output.

Related

Using UNNEST for a column with (n) values duplicates all values for (n) rows

I have a single record with column (type) containing an array of values, e.g. [Map,Table,Pie]
Using Athena I need to to flatten this record into 3 separate records each with one value from the array in the (type) column.
SELECT type
FROM
athenadb.table,
UNNEST(type) as charttyp
This is the result of this query, three identical records.
1 [Map, Pie, Table]
2 [Map, Pie, Table]
3 [Map, Pie, Table]
What am I missing here ? Clearly on one hand it recognizes the array length = 3 but does not parse the array elements...

SELECT ntype
FROM
athenadb.table,
UNNEST(type) as t(ntype)

You are missing specifying the alias for column and using it in the resulting select as described in the documentation (trino one):
SELECT flattened_type
FROM athenadb.table,
UNNEST(type) as t(flattened_type)

how to get the first tuple in a string column using presto

so i am having a column in the table, the data type of the column is varchar, but it contains an array of tuples, so what I need is to extract the first tuple of the array in the table
this is the original table
userid
comments
1
[["hello world",1],["How did you",1],[" this is the one",1]]
2
[["hello ",1],["How ",1],[" this",1]]
and this is what i am looking for , please notice that the datatype of 'comments' column is varchar.
userid
comments
1
hello world
2
hello

json_extract_scalar should do the trick:
WITH dataset (userid, comments) AS (
VALUES (1, json '[["hello world",1],["How did you",1],[" this is the one",1]]'),
(2, json '[["hello ",1],["How ",1],[" this",1]]')
)
--query
select userid,
json_extract_scalar(comments, '$[0][0]')
from dataset
Output:
userid
comments
1
hello world
2
hello
Note that it will allow to extract only single value, if you want multiple values you will need to do some casting (similar to one done here but using arrays, for example array(json)).

How to extract all (including int and float) numerical values in a string column in Google BigQuery?

I have a table Table_1 on Google BigQuery which includes a string column str_column. I would like to write a SQL query (compatible with Google BigQuery) to extract all numerical values in str_column and append them as new numerical columns to Table_1. For example, if str_column includes first measurement is 22 and the other is 2.5; I need to extract 22 and 2.5 and save them under new columns numerical_val_1 and numerical_val_2. The number of new numerical columns should ideally be equal to the maximum number of numerical values in str_column, but if that'd be too complex, extracting the first 2 numerical values in str_column (and therefore 2 new columns) would be fine too. Any ideas?

Consider below approach
select * from (
select str_column, offset + 1 as offset, num
from your_table, unnest(regexp_extract_all(str_column, r'\b([\d.]+)\b')) num with offset
)
pivot (min(num) as numerical_val for offset in (1,2,3))
if applied to sample data like in your question - output is

Athena: How to check number of duplicate elements in arrays of different rows

My table is on AWS Athena. I am not familiar with SQL or HIVE or Athena in general. I have the following table
col_id , col_list
ABC , [abcde, 123gd, 12345, ...]
B3C , [bbbbb, ergdg, 12345, ...]
YUT , [uyteh, bbbbb, 12345, ...]
col_id is unique and the elements in the array of one single row are also unique.
I need to run a query that count the total number of elements that repeat in different arrays in different rows. In the example above, the array element 12345 shows up in 1st, 2nd, and 3rd rows, and bbbbb shows up in 2nd and 3rd rows, so the number of repetitive elements is 2.
The number of rows is not big so I guess the performance is not a concern here.
Could anyone please let me know how to write this query in Athena? Thank you!

You can explode the array and aggregate:
select col, count(*)
from t lateral view
explode(t.col_list) col
group by col
order by count(*) desc;

Return all rows after Aggregator - Informatica Powercenter

IMAGE -
Please refer to the image for better understanding of the scenarios
- For input from table I have 5 columns COL1,COL2,COL3,COL4,COL5
- Scenario 1, 2, 3, 4 explains the types of input I will receive. The Value in Col 4 can vary(for example 31-35 or 36-39 for same value in Col1)
-The column SUM is summation of values for all numbers in VALUE column of each scenario, and that has to be populated in all the rows. Like 50 in each cell for Scenario 1 under Column (SUM)
The requirement -
Summation to get the Value ex- 50 and then display all the rows (3-20) + Column G in the out put table
So input table has 17 rows 5 Columns(B,C,D,E,F) Output should have 17 rows 6 columns(B,C,D,E,F,G)
I could do the summation by grouping and using aggregate transformation in Informatica but I cannot display all the rows as grouping returns one row.

Do an aggregated sum based on the columns B, C, and D and then use a Joiner transformation to join your aggregated output (4 rows) with original source rows (17 rows). Do not forget to use sorted input in the joiner, which is mandatory for this kind of self join.
Source ------> Sorter ----> Aggregator -----> Joiner ----->Target
| ^
|________________________________|
Configure the joiner for normal join on the columns B, C and D

Why don't you just use the SUM(Value) OVER (PARTITION BY COL1, ..., COLN) AS ValueSum analytical functionality in Netezza? All you need to do is to define how to partition the sums.
Read more here: https://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_report_aggregation_family_syntax.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Performing numerical operations on array values column in Hive - hive

Related

Using UNNEST for a column with (n) values duplicates all values for (n) rows

how to get the first tuple in a string column using presto

How to extract all (including int and float) numerical values in a string column in Google BigQuery?

Athena: How to check number of duplicate elements in arrays of different rows

Return all rows after Aggregator - Informatica Powercenter

Categories

Resources