How do I escape the dash '-' in Presto when accessing elements from an unnested array? - sql

Using Presto, I want to access students.home-room which is an array struct within a classrooms table.
I tried:
SELECT
class.students.home-room
FROM
school_table_json
cross join unnest (classrooms) c (class)
WHERE year = '2022'
I get an error:
Column 'class.students.home' cannot be resolved
The error suggests Presto interprets 'home-room' as 'home' and can't find the truncated 'home' in hive (as it doesn't exist). Similar structs can be accessed, like class.students.grades. Presto errors handling the dash '-'...?
How do I escape the dash '-' in Presto when accessing elements in an unnested array?
Any help would be much appreciated

Use double quotes:
-- sample data
with dataset (id, r) as (
values (1, CAST(ROW(1, 2.0) AS ROW(x BIGINT, "home-room" DOUBLE)))
)
-- query
select r."home-room"
from dataset;
Output:
home-room
2.0

Related

How to SELECT a postgresql tsrange column as json?

I want to get a tsrange column returned as json but do not understand how get it to work in my query:
reserved is of type TSRANGE.
c.id and a.reservationid are of type TEXT.
-- query one
SELECT DISTINCT c.*, to_json(a.reserved) FROM complete_reservations c
JOIN availability a ON (c.id = a.reservationid)
throws
ERROR: could not identify an equality operator for type json
LINE 1: SELECT DISTINCT c.*, to_json(a.reserved) FROM complete_reser...
^
SQL state: 42883
Character: 22
it works if i try it like
-- query two
SELECT to_json('[2011-01-01,2011-03-01)'::tsrange);
Result:
"[\"2011-01-01 00:00:00\",\"2011-03-01 00:00:00\")"
and I do not understand the difference between both scenarios.
How do I get query one to behave like query two?
as pointed out in this comment by Edouard, there seems to be no JSON representation of a tsrange. Therefore I framed my question wrong.
In my concrete case it was sufficent to turn the tsrange to an array of the upper() and lower() values of the TSRANGE and cast those values as strings. this way i can use the output as is and let my downstream tools handle them as json.
SELECT DISTINCT c.*, ARRAY[to_char(lower(a.reserved),'YYYY-MM-DD HH:MI:SS'),to_char(upper(a.reserved),'YYYY-MM-DD HH:MI:SS')] reserved FROM complete_reservations_with_itemidArray c
JOIN availability a ON (c.id = a.reservationid)
which returns a value like this in the reserved column: {"2023-02-04 04:57:00","2023-02-05 04:57:00"} which can be parsed as json if needed.
I post this for reference. I am not sure if it exactly answers my question as it was framed.

Flatten data source in Snowflake from Array

I am trying to fix an array in a dataset. Currently, I have a data set that has a reference number to multiple different uuids. What I would like to do is flatten this out in Snowflake to make it so the reference number has separate row for each uuid. For example
Reference UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75 "[
""05554f65-6aa9-4dd1-6271-8ce2d60f10c4"",
""df662812-7f97-0b43-9d3e-12f64f504fbb"",
""08644a69-76ed-ce2d-afff-b236a22efa69"",
""f1162c2e-eeb5-83f6-5307-2ed644e6b9eb"",
]"
Should end up looking like:
Reference UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75 05554f65-6aa9-4dd1-6271-8ce2d60f10c4
2) 9f823c2a-ced5-4dbe-be65-869311462f75 df662812-7f97-0b43-9d3e-12f64f504fbb
3) 9f823c2a-ced5-4dbe-be65-869311462f75 08644a69-76ed-ce2d-afff-b236a22efa69
4) 9f823c2a-ced5-4dbe-be65-869311462f75 f1162c2e-eeb5-83f6-5307-2ed644e6b9eb
I just started working in Snowflake so I am new to it. It looks like there is a lateral flatten, but this is either not working on telling me that I have all sorts of errors with it. The documentation from snowflake is a bit perplexing when it comes to this.
While FLATTEN is the right approach when exploding an array, the UUID column value shown in the original description is invalid if interpreted as JSON syntax: "[""val1"", ""val2""]" and that'll need correction before a LATERAL FLATTEN approach can be applied by treating it as a VARIANT type.
If your data sample in the original description is a literal one and applies for all columnar values, then the following query will help transform it into a valid JSON syntax and then apply a lateral flatten to yield the desired result:
SELECT
T.REFERENCE,
X.VALUE AS UUID
FROM (
SELECT
REFERENCE,
-- Attempts to transform an invalid JSON array syntax such as "[""a"", ""b""]"
-- to valid JSON: ["a", "b"] by stripping away unnecessary quotes
PARSE_JSON(REPLACE(REPLACE(REPLACE(UUID, '""', '"'), '["', '['), ']"', ']')) AS UUID_ARR_CLEANED
FROM TABLENAME) T,
LATERAL FLATTEN(T.UUID_ARR_CLEANED) X
If your data is already in a valid VARIANT type with a successful PARSE_JSON done for the UUID column during ingest, and the example provided in the description was just a formatting issue that only displays the JSON invalid in the post, then the simpler version of the same query as above will suffice:
SELECT REFERENCE, X.VALUE AS UUID
FROM TABLENAME, LATERAL FLATTEN(TABLENAME.UUID) X

Google DataStudio on BigQuery Data, How to display Struct of Arrays

I'm trying to display a table in DataStudio plugged on a BigQuery Table. Where I have a String field, and a Struct of 2 Arrays. This is where my issue is.
When I want to include both of my arrays from the struct, the table kind of time out and shows a connection error. Whereas when I try to include on of them independently there are no issues.
This kind of struct is not supported in DataStudio? Or am I doing something wrong? Thank you.
It doesn't support it. You have to transform it on go in SELECT clause.
If you want to concatenate all strings from repeated string field you can use ARRAY_TO_STRING:
ARRAY_TO_STRING(recos.reco_sku)
or for integers, you have to cast them into a string and then concatenate them
ARRAY_TO_STRING(
ARRAY(
SELECT
CAST(i AS STRING)
FROM
UNNEST(recos.nb_asso) AS i WITH OFFSET o
ORDER BY
o
)
)
Otherwise, you can explode your array with LEFT/CROSS JOIN + UNNEST and make rows flat for each array entry.

unnest() not exploding array, returns error Column alias list has 1 entries but 't' has 2 columns available

I have some json data which includes a property 'characters' and it looks like this:
select json_data['characters'] from latest_snapshot_events
Returns: [{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":10,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":3},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":39,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":2},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":6801450488388220,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":1,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":8355588830097610,"shards":0,"CHAR_TPIECES":5,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4}]
This is returned on a single row. I would like a single row for each item within the array.
I found several SO posts and other blogs advising me to use unnest(). I've tried this several times and cannot get a result to return. For example, here is the documentation from presto. The bottom covers unnest as a stand in for hive's lateral view explode:
SELECT student, score
FROM tests
CROSS JOIN UNNEST(scores) AS t (score);
So I tried to apply this to my table:
characters as (
select
jdata.characters
from latest_snapshot_events
cross join unnest(json_data) as t(jdata)
)
select * from characters;
where json_data is the field in latest_snapshot_events that contains the the property 'characters' which is an array like the one shown above.
This returns an error:
[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 69:12: Column alias list has 1 entries but 't' has 2 columns available
How can I unnest/explode latest_snapshot_events.json_data['characters'] onto multiple rows?
Since characters is a JSON array in textual representation, you'll have to:
Parse the JSON text with json_parse to produce a value of type JSON.
Convert the JSON value into a SQL array using CAST.
Explode the array using UNNEST.
For instance:
WITH data(characters) AS (
VALUES '[{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":10,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":3},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":39,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":2},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":6801450488388220,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":1,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":8355588830097610,"shards":0,"CHAR_TPIECES":5,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4}]'
)
SELECT entry
FROM data, UNNEST(CAST(json_parse(characters) AS array(json))) t(entry)
which produces:
entry
-----------------------------------------------------------------------
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,...
In the example above, I convert the JSON value into an array(json), but
you can further convert it to something more concrete if the values inside each
array entry have a regular schema. For example, for your data, it is
possible to cast it to an array(map(varchar, json)) since every element in the
array is a JSON object.
json_parse works if your initial data is a JSON string. However, for array(row) types (i.e. an array of objects/dictionaries), casting to array(json) will convert each row into an array, removing all keys from the object and preventing you from using dot notation or json_extract functions.
To unnest array(row) data, the syntax is much simpler:
CROSS JOIN UNNEST(my_array) AS my_row
I got stuck with this error trying to unpivot data.
This might help someone:
SELECT a_col, b_col
FROM
(
SELECT MAP(
ARRAY['a', 'b', 'c', 'd'],
ARRAY[1, 2, 3, 4]
) my_col
) CROSS JOIN UNNEST(my_col) as t(a_col, b_col)
t() allows you define multiple columns as outputs.

Get an average value for element in column of arrays of json data in postgres

I have some data in a postgres table that is a string representation of an array of json data, like this:
[
{"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
{"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},
{"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]
This is is data in one cell from a single column of similar data in my database.
The datatype of this stored in the db is varchar(max).
My goal is to find the average RetailPrice of EVERY json item with "Role"=>"Abstract", including all of the json elements in the array, and all of the rows in the database.
Something like:
SELECT avg(json_extract_path_text(json_item, 'RetailPrice'))
FROM (
SELECT cast(json_items to varchar[]) as json_item
FROM my_table
WHERE json_extract_path_text(json_item, 'Role') like 'Abstract'
)
Now, obviously this particular query wouldn't work for a few reasons. Postgres doesn't let you directly convert a varchar to a varchar[]. Even after I had an array, this query would do nothing to iterate through the array. There are probably other issues with it too, but I hope it helps to clarify what it is I want to get.
Any advice on how to get the average retail price from all of these arrays of json data in the database?
It does not seem like Redshift would support the json data type per se. At least, I found nothing in the online manual.
But I found a few JSON function in the manual, which should be instrumental:
JSON_ARRAY_LENGTH
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
JSON_EXTRACT_PATH_TEXT
Since generate_series() is not supported, we have to substitute for that ...
SELECT tbl_id
, round(avg((json_extract_path_text(elem, 'RetailPrice'))::numeric), 2) AS avg_retail_price
FROM (
SELECT *, json_extract_array_element_text(json_items, pos) AS elem
FROM (VALUES (0),(1),(2),(3),(4),(5)) a(pos)
CROSS JOIN tbl
) sub
WHERE json_extract_path_text(elem, 'Role') = 'Abstract'
GROUP BY 1;
I substituted with a poor man's solution: A dummy table counting from 0 to n (the VALUES expression). Make sure you count up to the maximum number of possible elements in your array. If you need this on a regular basis create an actual numbers table.
Modern Postgres has much better options, like json_array_elements() to unnest a json array. Compare to your sibling question for Postgres:
Can get an average of values in a json array using postgres?
I tested in Postgres with the related operator ->>, where it works:
SQL Fiddle.