Get next struct item from array in HIve

Get next struct item from array in HIve - hive

I have a table with field of Array of Structures type:
CREATE TABLE complex_types(
key int,
value ARRAY<STRUCT<status:string,method:string>>
);
INSERT INTO TABLE complex_types
SELECT row_number() over () as key,
ARRAY(
named_struct('status', 'OK', 'method', 'Method 1'),
named_struct('status', 'Error', 'method', 'Method 2'),
named_struct('status', 'Failed', 'method', 'Method 3')
) as value
FROM some_table LIMIT 10
When I access the whole structure by index, it returns a correct item.
select key, value[0], value[1] from complex_types
Result is
key _c1 _c2
1 {"status":"OK","method":"Method 1"} {"status":"Error","method":"Method 2"}
2 {"status":"OK","method":"Method 1"} {"status":"Error","method":"Method 2"}
3 {"status":"OK","method":"Method 1"} {"status":"Error","method":"Method 2"}
But if I specify the key for structure item, it returns value from last item:
select key, value[0].method, value[1].method from complex_types
and result is
key _c1 _c2
1 Method 2 Method 2
2 Method 2 Method 2
3 Method 2 Method 2
Thank you

Update
It looks like the issue is present in HDP Ambari Hive View 2.0 only.
The query works fine in hive console and in Ambari Hive View 1st version.

Related

How to pivot table in sql?

I need to pivot a table, but I am stuck because of reapeated Action values.
Goal: extract values from the Action column and use them as the headers for new columns. Then, fill the new table with values from the Val column. In this instance, there is only one group, so you can utilize a window function to capture all groups with ID column. All SN are unique, but other actions can be repeated for the same SN
I have a table:
Val
Action
ID
SN1844Q
SN
94a52150-a24f-11ed
2000
Check_X
94a52150-a24f-11ed
1
Pass
94a52150-a24f-11ed
2022-01-12 23:51:31
DateTime
94a52150-a24f-11ed
up
Position
94a52150-a24f-11ed
back
Position
94a52150-a24f-11ed
890
Check_X
94a52150-a24f-11ed
SN1845Q
SN
28497a86-8e8e-44da
...
...
...
I want to see:
SN
Check_X
Pass
DateTime
Position
SN1844Q
2000
1
2022-01-12 23:51:31
up
SN1844Q
890
1
2022-01-12 23:51:31
back
...
...
...
...
...

You have to use SQL pivot with a dynamic query click here .

SELECT SN,
MAX(CASE WHEN Action = 'Action 1' THEN Val END) AS "Action 1",
MAX(CASE WHEN Action = 'Action 2' THEN Val END) AS "Action 2",
MAX(CASE WHEN Action = 'Action 3' THEN Val END) AS "Action 3"
FROM original_table
GROUP BY SN
In this query, the MAX function is used in the CASE statement to aggregate the values from the Val column, while the GROUP BY clause is used to group the results by the SN column. The CASE statement is used to match the values in the Action column and return the corresponding values from the Val column. The result of the query will be a new table with columns for each unique value in the Action column, with the values from the Val column filling in the appropriate cells

Using json_build_object in PostgreSQL v14.x with the result of a SELECT statement

I'm trying to create a VIEW of a JSON object, with a varying number of key/value pairs, in PostgreSQL v14.x, from the results of a SELECT statement.
Using json_agg is returning an array of objects - a key of each rating possibility as they occur, and a value which is the count of all the ratings selected from a table of reviews. Instead of an array, I need an object that has multiple key/value pairs, where the value corresponds to the aggregated count() of the ratings column(s), grouped by product_id. Trying to reuse json_build_object isn't working as expected.
Using:
CREATE VIEW reviews.meta AS
SELECT product_id, json_agg(json_build_object(reviews.list.rating, reviews.list.rating))
FROM reviews.list
GROUP BY reviews.list.product_id
ORDER BY product_id;
returns:
product_id | reviews_count
---------------------------
1 | [{"5" : 5}, {"4" : 4}]
2 | [{"4" : 4}, {"4" : 4}, {"3" : 3}, {"5" : 5}, {"2" : 2}]
But I'm looking for:
product_id | reviews_count
---------------------------
1 | {"5" : 1, "4" : 1}
2 | {"4" : 2, "3" : 1, "5" : 1, "2" : 1}
A dynamically created object:
in rows by product_id
where the values are quantities of Integer ratings (1-5) as they appear in the reviews.list table
in an object rather than an array of objects
I am new to SQL / PL/pgSQL language.

You need two levels of aggregation, one to get the counts, and one to package the counts up. It is easy to do that by nesting one query inside the FROM of another:
CREATE or replace VIEW meta AS
SELECT product_id, jsonb_object_agg(rating, count)
FROM (select product_id, rating, count(*) from list group by product_id, rating) foo
GROUP BY product_id
ORDER BY product_id;

Qlik/SQL Server

Long time reader first time poster.
I need some help in returning a corresponding value when a particular value appears.
Example, I want to return the spot A and the value associated with that position. In the first row, that would be 1, second row 3, and third row 1.
+-----------+-----------+
| Column A | Column B |
+-----------+-----------+
| A;B;C;D;E | 1;2;3;4;5 |
| B;A;C;D;E | 2;3;4;5;1 |
| D;C;E;A;B | 5;2;3;1;4 |
+-----------+-----------+

You can use SubField function to split both column values. Using SubField function (without the field_no parameter) creates row for each value.
For example: if we have A;B;C;D as value in MyField field and we use SubField(MyField, ';'). As a result Qlik will create 4 rows:
MyField
-------
A
B
C
D
So if we use SubField on both ColumnA and ColumnB fields (and keep an index between the new rows) we can create new "flat" table that contains the link between both column split values
Reloading the script below will result in two tables: Data and Flatten. Data is the "raw" data and Flatten is the result one
Have a look at Value and Index columns from Flatten table. They are the result of using SubField on ColumnA and ColumnB
// Load the sample data
// preceeding load to add ID to each row
Data:
Load
ColumnA,
ColumnB,
RowNo() as RowId
;
Load * inline [
ColumnA, ColumnB
A;B;C;D;E, 1;2;3;4;5
B;A;C;D;E, 2;3;4;5;1
D;C;E;A;B, 5;2;3;1;4
];
// Passing RowId across just in case if we want to link back
// to the Data table
Flatten_Temp:
// Preceding load to add ID (ValueId) for each value in ColumnA
// this ID will be used to join back to the values in ColumnB
Load
RowId,
ColumnASplit as Value,
RowNo() as ValueId
;
// Using SubField function to create row for each value in ColumnA
Load
RowId,
SubField(ColumnA, ';') as ColumnASplit
Resident
Data
;
JOIN
// Preceding load to add ID (ValueId) for each value in ColumnB
Load
RowId,
ColumnBSplit as Index,
RowNo() as ValueId
;
// Using SubField function to create row for each value in ColumnB
Load
RowId,
SubField(ColumnB, ';') as ColumnBSplit
Resident
Data
;
NoConcatenate
// Load only 'A' values from the table above
Flatten:
Load
*
Resident
Flatten_Temp
Where
Value = 'A'
;
Drop Table Flatten_Temp;
Update: Script updated to return only values for A

Postgresql add item in array except if null

I try to make a PostgreSQL array by using values from JSONB field.
When all data are in JSONB, result is fine. This query :
SELECT array [
(SELECT ('{"tech_id": 4, "admin_id": 5}'::jsonb->>'admin_id')::int),
(SELECT ('{"tech_id": 4, "admin_id": 5}'::jsonb->>'tech_id')::int)
];
returns me the right result because admin_id and tech_id are in JSONB field :
-[ RECORD 1 ]
array | {5,4}
But, if JSONB contains only one value, the array contains a NULL value.
This query :
SELECT array [
(SELECT ('{"tech_id": 4}'::jsonb->>'admin_id')::int),
(SELECT ('{"tech_id": 4}'::jsonb->>'tech_id')::int)
];
Gives me this result :
-[ RECORD 1 ]---
array | {NULL,4}
But I want an array like {4}, so without NULL value.
Do you know a way to avoid adding NULL value in this case ?

Finnaly, I found solution by wrapping the array to check if values are NULL.
I use the following code :
select array_agg(id) FROM unnest(array [
(SELECT ('{"tech_id": 4}'::jsonb->>'admin_id')::int),
(SELECT ('{"tech_id": 4}'::jsonb->>'tech_id')::int)
]) as id where id IS NOT NULL;
PostgreSQL returns the expected result :
-[ RECORD 1 ]--
array_agg | {4}

google bigquery selecting rows with multiple values in repeating field

Lets say I have a schema:
key STRING NULLABLE
values INTEGER REPEATED
Now, please note that second column is a repeated field of integers,
Lets say the data is something like:
key:'abc'
values: 1 2 3 (3 separate values, same for below values)
key:'def'
values: 1 2 5
key:'ghi'
values: 1 6 9
And here I wish to find out keys which has values 1 and 2 ? Expecting 'abc' and 'def' as result set.
Looking for a query for this. I want an 'and' ('in' does not work here). I need those both values to be present for any key to return as result.

SELECT
key,
SUM(values = 1 or values = 2) WITHIN RECORD AS check
FROM yourtable
HAVING check = 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get next struct item from array in HIve - hive

Update It looks like the issue is present in HDP Ambari Hive View 2.0 only. The query works fine in hive console and in Ambari Hive View 1st version.

Related

How to pivot table in sql?

Using json_build_object in PostgreSQL v14.x with the result of a SELECT statement

Qlik/SQL Server

Postgresql add item in array except if null

google bigquery selecting rows with multiple values in repeating field

Categories

Resources