Get field from unstructured array in Prestodb - arraylist

Im trying to query a category and category id (cat_id), that are in an array, in presto db source as following.
I used a query "json_extract_scalar"(data.reason, '$[0].column') for other column and it worked, but it didnt works for these array, because this levels are unstructured, not in json, its in array format.
Anyone has some suggestion? I can't to catch the [catid], and I tried a lot solutions.

It worked as following, using select and sailing between levels :
-- [data] is my array column
SELECT data[1][4] as tester,
data[1][4][2] as tester4,
data[1][4][3] as tester5
from MyTable

Related

Why json_extract works but json_extract_scalar does not?

I have a dataset containing a column in json with an attribute giving me a list, and I would like to unnest it to join some different data.
I thought about json_extract_scalar the json_data, then I could split it and finally unnest with other operations, however I got a problem.
In my case, when i run the json_extract it works fine but i cannot convert to a varchar. In the other hand, if i use json_extract_scalar it returns a null value.
I think the problem should be the quotation marks, but I am not sure how to deal with it - and even if this is the correct problem.
Let me give you a sample of the data:
{"my_test_list":["756596263-0","743349523-371296","756112380-0","755061590-0"]}
Can you guys give me some advice?
I'm querying SQL in Presto.
What you are storing under key my_test_list is a JSON array, not a scalar value - which is why json_extract_scalar() returns null.
It is rather unclear how you want to use that data. A typical solution is to cast it to an array, that you can then use as needed, for example by unnesting it. The base syntax would be:
cast(json_extract(mycol, '$.my_test_list') as array(varchar))
You would then use that in a lateral join, like:
select t.mycol, x.myval
from mytable t
cross join unnest(
cast(json_extract(mycol, '$.my_test_list') as array(varchar))
) as x(myval)

Convert strings into table columns in biq query

I would like to convert this table
to something like this
the long string can be dynamic so it's important to me that it's not a fixed solution for these values specifically
Please help, i'm using big query
You could start by using SPLIT SPLIT(value[, delimiter]) to convert your long string into separate key-value pairs in an array.
This will be sensitive to you having commas as part of your values.
SPLIT(session_experiments, ',')
Then you could either FLATTEN that array or access each element, and then use some REGEXs to separate the key and the value.
If you share more context on your restrictions and intended result I could try and put together a query for you that does exactly what you want.
It's not possible what you want, however, there is a better practice for BigQuery.
You can use arrays of structs to store that information in a table.
Let's say you have a table like that
You can use that sample query to understand how to use it.
with rawdata AS
(
SELECT 1 as id, 'test1-val1,test2-val2,test3-val3' as experiments union all
SELECT 1 as id, 'test1-val1,test3-val3,test5-val5' as experiments
)
select
id,
(select array_agg(struct(split(param, '-')[offset(0)] as experiment, split(param, '-')[offset(1)] as value)) from unnest(split(experiments)) as param ) as experiments
from rawdata
The output will look like that:
After having that output, it's more convenient to manipulate the data

Athena/Presto : complex structure/array

Think that I am asking the impossible here, but throwing it out there.
Trying to query some json in Athena.
The data I'm working with looks like this (excerpt)
condition={
"foranyvalue:stringlike":{"s3:prefix":["lala","hehe"]},
"forallvalues:stringlike":{"s3:prefix":["apples","bananas"]
}
.. and I need to get to here :
... PLUS:the key names are not fixed, so one day I might get:
condition={"something not seen before":{"surprise":["haha","hoho"]}}
With that last point, I was hoping to treat this an an array, and start by splitting the 'foranyvalue' and 'forallvalues' parts into separate rows.
but with everything wrapped in {}, it refuses to unnest.
But despite the above failed plan - ANY tips on solving this by ANY means gratefully received !
Thank You
When you have JSON data that does not have a schema that is easy to describe you can use STRING as the type of the column and then use Athena/Presto's JSON functions to query them, in combination with casting to MAP and UNNEST to flatten the structures.
One way of achieving what I think you're trying to do would be something like this:
WITH the_table AS (
SELECT CAST(condition AS MAP(VARCHAR, JSON)) AS condition
FROM (
VALUES
(JSON '{"foranyvalue:stringlike":{"s3:prefix":["lala","hehe"]},"forallvalues:stringlike":{"s3:prefix":["apples","bananas"]}}'),
(JSON '{"something not seen before":{"surprise":["haha","hoho"]}}')
) AS t (condition)
),
first_flattening AS (
SELECT
SPLIT(first_level_key, ':', 2) AS first_level_key,
CAST(first_level_value AS MAP(VARCHAR, JSON)) AS first_level_value
FROM the_table
CROSS JOIN UNNEST (condition) AS t (first_level_key, first_level_value)
),
second_flattening AS (
SELECT
first_level_key,
second_level_key,
second_level_value
FROM first_flattening
CROSS JOIN UNNEST (first_level_value) AS t (second_level_key, second_level_value)
)
SELECT
first_level_key[1] AS "for",
TRY(first_level_key[2]) AS condition,
second_level_key AS "left",
second_level_value AS "right"
FROM second_flattening
I've included the two examples you gave as an inline VALUES list in the first CTE, and exactly what to do in the table declaration (i.e. what type for the column to use) and what processing to do in the query (i.e. the cast) depends on your data and how you want/can set up the table. YMMV.
The query flattens the JSON structure in a couple of separate steps, first flattening the first level of keys and values, then the keys and values of the inner documents. It might be possible to do this in one step, but doing it in two at least makes it easier to read.
Since the first level keys don't always have the colon I've used TRY to make sure that accessing the second value doesn't break anything. You could perhaps filter out values without a colon earlier and avoid this, since you're not interested in them.

Extract Array from JSON in HiveQL

I have a column which contains a large JSON object. For example, let's call the column Column1, and this is a typical element:
{"key1":value,"key2":[{"subK11":val,"subK12":val},{"subK21":val,"subK22":val}]}
So, I can extract a normal element fine using:
select get_json_object(Column1,'$.key1') as key1
But I have been unable to figure out how to extract the ARRAY in a usable form, as this:
select get_json_object(Column1,'$.key2') as key2
Returns a STRING type. So I can't select elements from the array like normal. That is, this query will fail:
select key2[1] as first_element
from
(select get_json_object(Column1,'$.key2') as key2)
OR
select explode(key2)
from
(select get_json_object(Column1,'$.key2') as key2 )
Both give errors, the later says "explode() requires array type". So the issue, I think, is that get_json_object returns a string. I need it to recognize that key2 contains an ARRAY, but I have no idea how to do that.
I'm new to Hive SQL, mainly an SQL user, so please let me know if there's anything crazy obvious I'm missing. I have not found a solution to this type of problem on any of the other questions.
you can use hive-third-functions, It provide json_array_extract function, you can extract json array info like this:
json_array_extract("[{\"a\":{\"b\":\"13\"}}, {\"a\":{\"b\":\"18\"}}, {\"a\":{\"b\":\"12\"}}]", "$.a.b"); => ["\"13\"","\"18\"","\"12\""]
json_array_extract_scalar("[{\"a\":{\"b\":\"13\"}}, {\"a\":{\"b\":\"18\"}}, {\"a\":{\"b\":\"12\"}}]", "$.a.b") => ["13","18","12"]

Get an average value for element in column of arrays of json data in postgres

I have some data in a postgres table that is a string representation of an array of json data, like this:
[
{"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
{"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},
{"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]
This is is data in one cell from a single column of similar data in my database.
The datatype of this stored in the db is varchar(max).
My goal is to find the average RetailPrice of EVERY json item with "Role"=>"Abstract", including all of the json elements in the array, and all of the rows in the database.
Something like:
SELECT avg(json_extract_path_text(json_item, 'RetailPrice'))
FROM (
SELECT cast(json_items to varchar[]) as json_item
FROM my_table
WHERE json_extract_path_text(json_item, 'Role') like 'Abstract'
)
Now, obviously this particular query wouldn't work for a few reasons. Postgres doesn't let you directly convert a varchar to a varchar[]. Even after I had an array, this query would do nothing to iterate through the array. There are probably other issues with it too, but I hope it helps to clarify what it is I want to get.
Any advice on how to get the average retail price from all of these arrays of json data in the database?
It does not seem like Redshift would support the json data type per se. At least, I found nothing in the online manual.
But I found a few JSON function in the manual, which should be instrumental:
JSON_ARRAY_LENGTH
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
JSON_EXTRACT_PATH_TEXT
Since generate_series() is not supported, we have to substitute for that ...
SELECT tbl_id
, round(avg((json_extract_path_text(elem, 'RetailPrice'))::numeric), 2) AS avg_retail_price
FROM (
SELECT *, json_extract_array_element_text(json_items, pos) AS elem
FROM (VALUES (0),(1),(2),(3),(4),(5)) a(pos)
CROSS JOIN tbl
) sub
WHERE json_extract_path_text(elem, 'Role') = 'Abstract'
GROUP BY 1;
I substituted with a poor man's solution: A dummy table counting from 0 to n (the VALUES expression). Make sure you count up to the maximum number of possible elements in your array. If you need this on a regular basis create an actual numbers table.
Modern Postgres has much better options, like json_array_elements() to unnest a json array. Compare to your sibling question for Postgres:
Can get an average of values in a json array using postgres?
I tested in Postgres with the related operator ->>, where it works:
SQL Fiddle.