how to make json value from hive columns - hive

I want to convert hive columns to json value.
I know how to convert json value to string i.e. by using get_json_object.
For example, this is the hive table:
id | name
-------------
1 | kim
2 | lee
3 | park
Expected Output is:
[ {"1" : "kim"}, {"2" : "lee"}, {"3" : "park"} ]

You can use Brickhouse UDF collect:
CREATE TEMPORARY FUNCTION collect AS 'brickhouse.udf.collect.CollectUDAF';
SELECT collect(map(CAST(id as STRING), name)) from table;

Related

Athena query get the index of any element in a list

I need to access to the elements in a column whose type is list according to the other elements' locations in another list-like column. Say, my dataset is like:
WITH dataset AS (
SELECT ARRAY ['hello', 'amazon', 'athena'] AS words,
ARRAY ['john', 'tom', 'dave'] AS names
)
SELECT * FROM dataset
And I'm going to achieve
SELECT element_at(words, index(names, 'john')) AS john_word
FROM dataset
Is there a way to have a function in Athena like "index"? Or how can I customize one like this? The desired result should be like:
| -------- |
| john_word|
| -------- |
| hello |
| -------- |
array_position:
array_position(x, element) → bigint
Returns the position of the first occurrence of the element in array x (or 0 if not found).
Note that in presto array indexes start from 1.
SELECT element_at(words, array_position(names, 'john')) AS john_word
FROM dataset

How to loop through JSON array of JSON objects to see if it contains a value that I am looking for in postgres?

Here is an example of the json object
rawJSON = [
{"a":0, "b":7},
{"a":1, "b":8},
{"a":2, "b":9}
]
And I have a table that essentially looks like this.
demo Table
id | ...(other columns) | rawJSON
------------------------------------
0 | ...(other columns info) | [{"a":0, "b":7},{"a":1, "b":8}, {"a":2, "b":9}]
1 | ...(other columns info) | [{"a":0, "b":17},{"a":11, "b":5}, {"a":12, "b":5}]
What I want is to return a row which insideRawJSON has value from "a" of less than 2 AND the value from "b" of less than 8. THEY MUST BE FROM THE SAME JSON OBJECT.
Essentially the query would similarly look like this
SELECT *
FROM demo
WHERE FOR ANY JSON OBJECT in rawJSON column -> "a" < 2 AND -> "b" < 8
And therefore it will return
id | ...(other columns) | rawJSON
------------------------------------
0 | ...(other columns info) | [{"a":0, "b":7},{"a":1, "b":8}, {"a":2, "b":9}]
I have searched from several posts here but was not able to figure it out.
https://dba.stackexchange.com/questions/229069/extract-json-array-of-numbers-from-json-array-of-objects
https://dba.stackexchange.com/questions/54283/how-to-turn-json-array-into-postgres-array
I was thinking of creating a plgpsql function but wasn't able to figure out .
Any advice I would greatly appreciate it!
Thank you!!
I would like to avoid cross join lateral because it will slow down a lot.
You can use a subquery that searches through the array elements together with EXISTS.
SELECT *
FROM demo d
WHERE EXISTS (SELECT *
FROM jsonb_array_elements(d.rawjson) a(e)
WHERE (a.e->>'a')::integer < 2
AND (a.e->>'b')::integer < 8);
db<>fiddle
If the datatype for rawjson is json rather than jsonb, use json_array_elements() instead of jsonb_array_elements().

KQL: mv-expand OR bag_unpack equivalent command to convert a list to multiple columns

According to mv-expand documentation:
Expands multi-value array or property bag.
mv-expand is applied on a dynamic-typed column so that each value in the collection gets a separate row. All the other columns in an expanded row are duplicated.
Just like the mv-expand operator will create a row each for the elements in the list -- Is there an equivalent operator/way to make each element in a list an additional column?
I checked the documentation and found Bag_Unpack:
The bag_unpack plugin unpacks a single column of type dynamic by treating each property bag top-level slot as a column.
However, it doesn't seem to work on the list, and rather works on top-level JSON property.
Using bag_unpack (like the below query):
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Smitha", "Age":30}),
]
| evaluate bag_unpack(d)
It will do the following:
Name Age
John 20
Dave 40
Smitha 30
Is there a command/way (see some_command_which_helps) I can achieve the following (convert a list to columns):
datatable(d:dynamic)
[
dynamic(["John", "Dave"])
]
| evaluate some_command_which_helps(d)
That translates to something like:
Col1 Col2
John Dave
Is there an equivalent where I can convert a list/array to multiple columns?
For reference: We can run the above queries online on Log Analytics in the demo section if needed (however, it may require login).
you could try something along the following lines
(that said, from an efficiency standpoint, you may want to check your options of restructuring the data set to begin with, using a schema that matches how you plan to actually consume/query it)
datatable(d:dynamic)
[
dynamic(["John", "Dave"]),
dynamic(["Janice", "Helen", "Amber"]),
dynamic(["Jane"]),
dynamic(["Jake", "Abraham", "Gunther", "Gabriel"]),
]
| extend r = rand()
| mv-expand with_itemindex = i d
| summarize b = make_bag(pack(strcat("Col", i + 1), d)) by r
| project-away r
| evaluate bag_unpack(b)
which will output:
|Col1 |Col2 |Col3 |Col4 |
|------|-------|-------|-------|
|John |Dave | | |
|Janice|Helen |Amber | |
|Jane | | | |
|Jake |Abraham|Gunther|Gabriel|
To extract key value pairs from text and convert them to columns without hardcoding the key names in query:
print message="2020-10-15T15:47:09 Metrics: duration=2280, function=WorkerFunction, count=0, operation=copy_into, invocationId=e562f012-a994-4fc9-b585-436f5b2489de, tid=lct_b62e6k59_prd_02, table=SALES_ORDER_SCHEDULE, status=success"
| extend Properties = extract_all(#"(?P<key>\w+)=(?P<value>[^, ]*),?", dynamic(["key","value"]), message)
| mv-apply Properties on (summarize make_bag(pack(tostring(Properties[0]), Properties[1])))
| evaluate bag_unpack(bag_)
| project-away message

extracting a substring from a text column in hive

We have text data in a column named title like below
"id":"S-1-98-13474422323-33566802","name":"uid=Xzdpr0,ou=people,dc=vm,dc=com","shortName":"XZDPR0","displayName":"Jund Lee","emailAddress":"jund.lee#bm.com","title":"Leading Product Investor"
Need to extract just the display name (Jund lee in this example) from the above text data in hive, I have tried using substring function but don't seem to work,Please help
Use regexp_extract function with the matching regex to capture only the displayName from your title field value.
Ex:
hive> with tb as(select string('"id":"S-1-98-13474422323-33566802",
"name":"uid=Xzdpr0,ou=people,dc=vm,dc=com","shortName":"XZDPR0",
"displayName":"Jund Lee","emailAddress":"jund.lee#bm.com",
"title":"Leading Product Investor"')title)
select regexp_extract(title,'"displayName":"(.*?)"',1) title from tb;
+-----------+--+
| title |
+-----------+--+
| Jund Lee |
+-----------+--+

Concatenating JSON results to single column postgresql

So, at the moment I have two columns in a table, one of which containing a JSON document, like so:
CID:
2
Name:
{"first" : "bob","1" : "william", "2" : "archebuster", "last" : "smith"}
When I do a search on this column using:
SELECT "CID", "Name"->>(json_object_keys("Name")) AS name FROM "Clients" WHERE
"Name"->>'first' LIKE 'bob' GROUP BY "CID";
I get:
CID | name
--------------
2 | bob
2 | william
2 | archebuster
2 | smith
When really I want:
CID | name
2 | bob william archebuster smith
How would i go about doing this? I'm new to JSON in postgresql.
I've tried string_agg and it wouldn't work, presumably because i'm working in a json column, despite the fact '->>' should type set the result to string
UPDATE:
First, you need to understand, if you include a set-returning function into the SELECT clause, you will create an implicit LATERAL CROSS JOIN.
Your query in reality looks like this:
SELECT "CID", "Name"->>"key" AS name
FROM "Clients"
CROSS JOIN LATERAL json_object_keys("Name") AS "foo"("key")
WHERE "Name"->>'first' LIKE 'bob'
GROUP BY "CID", "Name"->>"key"
If you really want to do that, you can apply an aggregate function here (possibly array_agg or string_agg).
SQLFiddle