Hive - Extract Arrays dyanmically - apache-pig

I have JSON data like below.
{
"userid": null,
"appnumber": "9",
"trailid": "1547383536",
"visit": [{
"visitNumber": "1",
"time": "0",
"hour": "18",
"minute": "15"
},
{
"visitNumber": "2",
"time": "2942",
"hour": "18",
"minute": "15"
}
]
}
I want to extract the visit array values dynamically.
Like below: (pipe demited column)
userid,appnumber| trailid |
visit.visitnumber | visit.time | visit.hour | visit.minute |
visit.visitnumber | visit.time | visit.hour | visit.minute
If you see I have 2 json elements inside the visit array. So I want to extract visitNumber, time, hour, minute dynamically. Sometime I may have 3 or 5 values inside the array, so It should extract all 3 or 5 json automatically(I mean dynamically).
Im going to run this on AWS Athena, or may be in Pig Cluster.
Could someone help me with the extact queries?

You can use below approach but even in this case, all the array entries will be part of different rows.
select
userid,
appnumber,
trailid,
d.visitnumber,
d.time,
d.hour ,
d.minute
FROM table t1
LATERAL VIEW OUTER EXPLODE (visit) collection as d;

Related

SQL group data to an array of objects

I have an table like this:
What i want to achive is to group it by its point_delivery_number and the other data should be grouped into an array. The result should be like this:
point_delivery_number | data
----------------------------------------
AT2341234asdf5234452341243 | [
| { year: "2021-01...", month: "..", consumption: "..", generation: "...", self_coverage: "..." },
| { year: "2021-01...", month: "..", consumption: "..", generation: "...", self_coverage: "..." },
| ...
| ]
----------------------------------------
AT523452345sadf345 | [{ ... }]
Is this even possible with SQL?
I am using AWS Timestream database, so there are some limitations. Certain functions are not even supported. You can see what is supported:
https://docs.aws.amazon.com/timestream/latest/developerguide/reference.html
You can use JSON_ARRAY_AGG combined with JSON_OBJECT:
SELECT point_delivery_number, JSON_ARRAYAGG(JSON_OBJECT(
'year', year,
'month', month,
'self_coverage', self_coverage
)) AS data
FROM t
GROUP BY point_delivery_number

How to query a nested json with varying key value pairs

My previous question has been answered, thanks to #Erwin Brandstetter for the help:
Query individual values in a nested json record
I have a follow-up:
Aurora Postgres - PostgreSQL 13.1. My jsonb column value looks like this:
'{
"usertype": [
{
"type": "staff",
"status": "active",
"permissions": {
"1": "add user",
"2": "add account"
}
},
{
"type": "customer",
"status": "suspended",
"permissions": {
"1": "add",
"2": "edit",
"3": "view",
"4": "all"
}
}
]
}'
I would like to produce a table style output where each permission item i shown as a column. It should show the value if not null else it will be NULL.
type | status | perm1 | perm2 | perm3 | perm4 | perm5 | perm6
----------+-----------+---------+------------+-------+-------+-------+-------
staff | active | adduser | addaccount | null | null | null | null
customer | suspended | add | edit | view | all | null | null
In other words, I would like a way to find out the max permissions count and show that many column in the select query.
An SQL query has to return a fixed number of columns. The return type has to be known at call time (at the latest). Number, names and data types of columns in the returned row(s) are fixed by then. There is no way to get a truly dynamic number of result columns in SQL. You'd have to use two steps (two round trips to the DB server):
Determine the list or result columns.
Send a query to produce that result.
Notably, that leaves a time window for race conditions under concurrent write load.
Typically, it's simpler to just return an array or a list or a document type (like JSON) for a variable number of values. Or a set of rows.
If there is a low, well-known maximum of possible values, say 6, like in your added example, just over-provision:
SELECT id
, js_line_item ->> 'type' AS type
, js_line_item ->> 'status' AS status
, js_line_item #>> '{permissions, 1}' AS perm1
, js_line_item #>> '{permissions, 2}' AS perm2
-- , ...
, js_line_item #>> '{permissions, 6}' AS perm6
FROM newtable n
LEFT JOIN LATERAL jsonb_array_elements(n.column1 -> 'usertype') AS js_line_item ON true;
LEFT JOIN to retain rows without any permissions.

Creating a table from JSON data sub arrays within in snowflake/SQL

I have a table (table_1) in snowflake that has 3 columns. The first column is JSON data with arrays within it. Here is an example of one value in the column "JSON":
{
"authors": [
{
"name": "Jim Bob, Jimothy Bob"
}
],
"date": 1578352260,
"publishers": [
{
"name": "Bob Jim"
}
],
"title": "A Look at Ants Through The Ages",
"editors": [
{
"name": "Jim Bobby"
}
]
}
Now, I am trying to unnest and flatten all of this into a new table, but every time I do it just creates a table with 0 rows and 0 data in it. Here is how I am trying to do this:
create or replace table table_2 as
select
json:editors::varchar as editors,
json:authors::varchar as authors,
json:publishers::varchar as publishers,
json:date::varchar as date,
json:title::varchar as title
from table_1,
lateral flatten(input=>json:table_1);
The desired result is
editors authors publishers date title
Jim Bobby Jim Bob Bob Jim 1578352260 A Look at Ants Through The Ages
Jimothy Bob Jim Bob Bob Jim 1578352260 A Look at Ants Through The Ages
The actual result is a successfully created empty table.
How can I flatten out this JSON data?
Thank you for your help.
In your "desired result" I assume you have the editors and authors columns the wrong way round - as in the JSON it is the authors that has two values, not the editors?
However, you can't achieve what you want in pure JSON as you don't actually have two authors: you have a single name field with the value of "Jim Bob, Jimothy Bob". In order to split the data in the way you want the JSON would need to look something like this:
"authors": [
{
"names":{
"name1": "Jim Bob"
"name2": "Jimothy Bob"
}
}
],
In order to achieve what you want you would need to write to a table, splitting the JSON into columns, leaving the value "Jim Bob, Jimothy Bob" in a single column and then split that column (e.g. using something like SPLIT_TO_TABLE) and join your data together to get the required result

Getting the value inside a random key in PostgreSQL JSONB?

is there a way to extract the value from a jsonb if your key is "random"?
Example:
{"6": {"id": "6", "name": "book-name", "genre": "history", "book_id": "3"}}
The key here is "6" which in my case is a pseudo-random number. It's always different for every row, so in a way inaccessible by simply pointing towards it. (jsonb->'key')
I've tried almost everything I could find and still have no solution to this.
Is there a way to get to the value in this case?
The PG version is 9.6.
Thanks :)
It's unclear to me what you want as the output, but maybe you are looking for this:
with sample_data (js) as (
values ('{"6": {"id": "6", "name": "book-name", "genre": "history", "book_id": "3"}}'::jsonb)
)
select x.v
from sample_data sd
cross join lateral jsonb_each(sd.js) as x(k,v)
The above will return:
v
--------------------------------------------------------------------
{"id": "6", "name": "book-name", "genre": "history", "book_id": "3"}
If you want the individual key/values then you can use
with sample_data (js) as (
values ('{"6": {"id": "6", "name": "book-name", "genre": "history", "book_id": "3"}}'::jsonb)
)
select d.*
from sample_data sd
cross join lateral jsonb_each(sd.js) as x(k,v)
cross join lateral jsonb_each_text(x.v) as d(k,v);
The above will return:
k | v
--------+----------
id | 6
name | book-name
genre | history
book_id | 3

calculate sum of two columns in PostgreSQL 9.5

I have a table of posts, each with an insight jsonb column containing something like the data sample below.
Data sample (comes from Facebook, so cannot change format)
[
{
"name": "post_story_adds_unique",
"values": [
{ "value": 93 }
],
},
{
"name": "post_story_adds",
"values": [
{ "value": 100 }
]
},
{
"name": "post_impressions_organic_unique",
"values": [
{ "value": 123 }
]
},
...
]
I want to have a calculated sum of reach and viral and then have it ordered by the total.
Desired Results
id message post_created reach viral total
69 This World Family dablah... 2016-05-11 18:44:16 6683 646 7329
...
I managed to get the results so far but I cannot figure how to get the sum of the two columns. I don't know where to add another join or select to sum the two columns.
Results so far
id message post_created reach viral
69 This World Family dablah... 2016-05-11 18:44:16 6683 646
58 blah blah flip flop blah... 2016-05-22 11:00:01 4880 403
55 This is another message ... 2016-05-24 10:00:00 4417 109
I've tried various ways such as including the SUM (reach + viral) as total in the first SELECT but mostly I get back errors saying columns don't exist.
Heres my SQL so far:
SELECT
id,
message,
post_created,
obj.value->'values'->0->'value' AS reach,
obj2.value->'values'->0->'value' AS viral
FROM (
SELECT
id,
message,
post_created,
insights
FROM posts
WHERE (
page_id = 4 AND
post_created >= '2016-05-01 00:00:00' AND
post_created <= '2016-05-31 23:59:59' AND
insights #> '[{"name":"post_impressions_organic_unique"}, {"name":"post_impressions_viral_unique"}]'
)
) e1
JOIN LATERAL jsonb_array_elements(insights) obj(value) ON obj.value->>'name' = 'post_impressions_organic_unique'
JOIN LATERAL jsonb_array_elements(insights) obj2(value) ON obj2.value->>'name' = 'post_impressions_viral_unique'
ORDER BY reach DESC;
Unsure if jsonb will have any impact, but, adding the values of two columns is as simple as using +
create table foo(a integer,b integer);
insert into foo values (3,4);
select *, a+b as total from foo;
a | b | total
---+---+-------
3 | 4 | 7
(1 row)
To do that you need to update your table like so :
UPDATE yourTable SET resultsRow = firtRow + secondRow;