spark read json string in query - apache-spark-sql

I have table inspark, one column named ("mappings") have key-value pair as string
select mappings from hello;
Ex: {"foo": "baar", "foo1": "bar1" }
I want to cast "mappings" column into a MAP and read value of foo1
like select CAST("mappings" as MAP) from hello;
This will throw error in spark sql. How can we translate this to map

Related

How do I Unnest varchar to json in Athena

I am crawling data from Google Big Query and staging them into Athena.
One of the columns crawled as string, contains json :
{
"key": "Category",
"value": {
"string_value": "something"
}
I need to unnest these and flatten them to be able to use them in a query. I require key and string value (so in my query it will be where Category = something
I have tried the following :
WITH dataset AS (
SELECT cast(json_column as json) as json_column
from "thedatabase"
LIMIT 10
)
SELECT
json_extract_scalar(json_column, '$.value.string_value') AS string_value
FROM dataset
which is returning null.
Casting the json_column as json adds \ into them :
"[{\"key\":\"something\",\"value\":{\"string_value\":\"app\"}}
If I use replace on the json, it doesn't allow me as it's not a varchar object.
So how do I extract the values from the some_column field?
Presto's json_extract_scalar actually supports extracting just from the varchar (string) value :
-- sample data
WITH dataset(json_column) AS (
values ('{
"key": "Category",
"value": {
"string_value": "something"
}}')
)
--query
SELECT
json_extract_scalar(json_column, '$.value.string_value') AS string_value
FROM dataset;
Output:
string_value
something
Casting to json will encode data as json (in case of string you will get a double encoded one), not parse it, use json_parse (in this particular case it is not needed, but there are cases when you will want to use it):
-- query
SELECT
json_extract_scalar(json_parse(json_column), '$.value.string_value') AS string_value
FROM dataset;

Extracting JSON returns null (Presto Athena)

I'm working with SQL Presto in Athena and in a table I have a column named "data.input.additional_risk_data.basket" that has a json like this:
[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]
I need to extract some of the data there, for example data.input.additional_risk_data.basket.val.item_reference. I'm not used to working with jsons but I tried a few things:
json_extract("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference')
json_extract_scalar("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference)
They all returned null. I'm wondering what is the correct way to get the values from that json
Thank you!
There are multiple "problems" with your data and json path selector. Keys are not conventional (and I have not found a way to tell athena to escape them) and your json is actually an array of json objects. What you can do - cast data to an array and process it. For example:
-- sample data
WITH dataset (json_val) AS (
VALUES (json '[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]')
)
--query
select arr[1]['data.input.additional_risk_data.basket.val.item_reference'] item_reference -- or use unnest if there are actually more than 1 element in array expected
from(
select cast(json_val as array(map(varchar, json))) arr
from dataset
)
Output:
item_reference
"26484651"

Construct nested JSON value in sql/json (Oracle Database)

How do I construct a JSON value with a nested JSON value as a serialized string? I tried this:
SQL> select json { 'y' : json_serialize(json('{"hello":"world"}')) } x
from dual;
X
--------------------------------------------------------------------
{"y":{"hello":"world"}}
But the result I want is:
{"y":"{\"hello\":\"world\"}"}
I'm using Oracle Database 20c.
The JSON object constructor recognizes the output of json_serialize as serialized JSON and converts it to a JSON value when constructing the outer object. Use to_clob() instead:
select json { 'y' : to_clob(json('{"hello":"world"}')) } x
from dual;

Scan unstructured JSON BYTEA into map[string]string

This seems like a common problem and may be posted somewhere already, but I can't find any threads talking about it, so here is the problem:
I have a Postgres table storeing a column of type BYTEA.
CREATE TABLE foo (
id VARCHAR PRIMARY KEY,
json_data BYTEA
)
The column json_data is really just JSON stored as BYTEA (It's not ideal I know). It is unstructured, but guaranteed to be of string -> string JSON.
When I query this table, I need to scan the query SELECT * FROM foo WHERE id = $1 into the following struct:
type JSONData map[string]string
type Foo struct {
ID string `db:"id"`
Data JSONData `db:"json_data"`
}
I'm using sqlx's Get method. When I execute a query I'm getting the error message sql: Scan error on column index 1, name "json_data": unsupported Scan, storing driver.Value type []uint8 into type *foo.JSONData.
Obviously, the scanner is having trouble scanning the JSON BYTEA into a map. I can implement my own scanner and call my custom scanner on the json_data column, but I'm wondering if there are better ways to do this. Could my JSONData type implement an existing interface to do this automatically?
As suggested by #iLoveReflection, implementing the Scanner interface on *JSONData worked. Here is the actual implementation:
func (j *JSONData) Scan(src interface{}) error {
b, ok := src.([]byte)
if !ok {
return errors.New("invalid data type")
}
return json.Unmarshal(b, j)
}

Returning a tuple column type from slick plain SQL query

In slick 3 with postgres, I'm trying to use a plain sql query with a tuple column return type. My query is something like this:
sql"""
select (column1, column2) as tup from table group by tup;
""".as[((Int, String))]
But at compile time I get the following error:
could not find implicit value for parameter rconv: slick.jdbc.GetResult[((Int, String), String)]
How can I return a tuple column type with a plain sql query?
GetResult[T] is a wrapper for function PositionedResult => T and expects an implicit val with PositionedResult methods such as nextInt, nextString to extract positional typed fields. The following implicit val should address your need:
implicit val getTableResult = GetResult(r => (r.nextInt, r.nextString))
More details can be found in this Slick doc.