How to query json column using databricks sql? - sql

Query is
SELECT *
FROM companies c
where c.urls ->> 'Website' = '';
Here is the companies table.
urls
{'Website': '', 'Twitter': ''}
{'Website': 'www.google.com', 'Twitter: ''}
I'm querying the companies table for rows that have urls column with Website as empty string. However, this query is erroring out in dattabricks sql with:
mismatched input '->' expecting {<EOF>, ';'}(line 3, pos 13)
Does anyone know how to query the json column in databricks sql?

Take a look at the following page from the Databricks documentation: Query semi-structured data in SQL.
If the content of the column is JSON as a string, then you can make use of this syntax: <column-name>:<extraction-path>. For example:
select * from companies c
where c.urls:Website = ''
If the content of the column is a struct, then you can make use of this syntax: <column-name>:<nested-field>:
SELECT *
FROM companies c
where c.urls.Website = '';

Related

Extract complex json with random key field

I am trying to extract the following JSON into its own rows like the table below in Presto query. The issue here is the name of the key/av engine name is different for each row, and I am stuck on how I can extract and iterate on the keys without knowing the value of the key.
The json is a value of a table row
{
"Bkav":
{
"detected": false,
"result": null,
},
"Lionic":
{
"detected": true,
"result": Trojan.Generic.3611249',
},
...
AV Engine Name
Detected Virus
Result
Bkav
false
null
Lionic
true
Trojan.Generic.3611249
I have tried to use json_extract following the documentation here https://teradata.github.io/presto/docs/141t/functions/json.html but there is no mention of extraction if we don't know the key :( I am trying to find a solution that works in both presto & hive query, is there a common query that is applicable to both?
You can cast your json to map(varchar, json) and process it with unnest to flatten:
-- sample data
WITH dataset (json_str) AS (
VALUES (
'{"Bkav":{"detected": false,"result": null},"Lionic":{"detected": true,"result": "Trojan.Generic.3611249"}}'
)
)
--query
select k "AV Engine Name", json_extract_scalar(v, '$.detected') "Detected Virus", json_extract_scalar(v, '$.result') "Result"
from (
select cast(json_parse(json_str) as map(varchar, json)) as m
from dataset
)
cross join unnest (map_keys(m), map_values(m)) t(k, v)
Output:
AV Engine Name
Detected Virus
Result
Bkav
false
Lionic
true
Trojan.Generic.3611249
The presto query suggested by #Guru works, but for hive, there is no easy way.
I had to extract the json
Parse it with replace to remove some character and bracket
Then convert it back to a map, and repeat for one more time to get the nested value out
SELECT
av_engine,
str_to_map(regexp_replace(engine_result, '\\}', ''),',', ':') AS output_map
FROM (
SELECT
str_to_map(regexp_replace(regexp_replace(get_json_object(raw_response, '$.scans'), '\"', ''), '\\{',''),'\\},', ':') AS key_val_map
FROM restricted_antispam.abuse_malware_scanning
) AS S
LATERAL VIEW EXPLODE(key_val_map) temp AS av_engine, engine_result

How can I count all NULL values, without column names, using SQL?

I'm reading and executing sql queries from file and I need to inspect the result sets to count all the null values across all columns. Because the SQL is read from file, I don't know the column names and thus can't call the columns by name when trying to find the null values.
I think using CTE is the best way to do this, but how can I call the columns when I don't know what the column names are?
WITH query_results AS
(
<sql_read_from_file_here>
)
select count_if(<column_name> is not null) FROM query_results
If you are using Python to read the file of SQL statements, you can do something like this which uses pglast to parse the SQL query to get the columns for you:
import pglast
sql_read_from_file_here = "SELECT 1 foo, 1 bar"
ast = pglast.parse_sql(sql_read_from_file_here)
cols = ast[0]['RawStmt']['stmt']['SelectStmt']['targetList']
sum_stmt = "sum(iff({col} is null,1,0))"
sums = [sum_sql.format(col = col['ResTarget']['name']) for col in cols]
print(f"select {' + '.join(sums)} total_null_count from query_results")
# outputs: select sum(iff(foo is null,1,0)) + sum(iff(bar is null,1,0)) total_null_count from query_results

Querying for a specific value in a String stored in a database field

{"create_channel":"1","update_comm":"1","channels":"*"}
This is the database field which I want to query.
What would my query look like if I wanted to select all the records that have a "create_channel": "1" and a "update_comm": "1"
Additional question:
View the field below:
{"create_channel":"0","update_comm":"0","channels":[{"ch_id":"33","news":"1","parties":"1","questions ":"1","cam":"1","edit":"1","view_subs":"1","invite_subs":"1"},{"ch_id":"18","news":"1","parties":"1","questions ":"1","cam":"1","edit":"1","view_subs":"1","invite_subs":"1"}]}
How would I go about finding out all those that are subadmins in the News, parties, questions and Cams sections
You can use the ->> operator to return a member as a string:
select *
from YourTable
where YourColumn->>'create_channel' = '1' and
YourColumn->>'update_comm' = '1'
To find a user who has news, parties, questions and cam in channel 33, you can use the #> operator to check if the channels array contains those properties:
select *
from YourTable
where YourColumn->'channels' #> '[{
"ch_id":"33",
"news":"1",
"parties":"1",
"questions ":"1",
"cam":"1"
}]';

Postgres SELECT * FROM Table where array has element

I am using Postgres 9.3.2 to make a database of contacts.
Example: If i have a row in my table that looks something like this.
{
firstName : "First name"
lastName : "Last name"
emails : ["email#one.com", "email#two.com", "email#three.com]
}
PS: firstName, lastName and emails are columns in my db and the value associated is the value for that column for that specific row.
I want to be able to query the db so that if i query for the email "email#four.com" the result is nothing but if i query for "email#two.com" the result will be the above row entry.
I dont think the query
"Select * from contactTable where emails="email#two.com""
will work. instead i want to do something like
"Select * from contactTable where emails contains "email#two.com""
any ideas on how to do this?
"Select * from contactTable where emails contains "email#two.com""
I think you want:
"Select * from contactTable where thejsonfield -> emails
Example setup, after fixing up your totally broken json:
CREATE TABLE contacts AS SELECT '{
"firstName" : "First name",
"lastName" : "Last name",
"emails" : ["email#one.com", "email#two.com", "email#three.com"]
}'::json AS myjsonfield;
The following will work in PostgreSQL 9.4, but unfortunately does not in 9.3 due to the oversight of the missing json_array_elements_text function:
select *
from contacts,
lateral json_array_elements_text(myjsonfield -> 'emails') email
where email = 'email#two.com';
For 9.3, you have to use a clumsier method to scan the json array for matching values:
select *
from contacts,
lateral json_array_length(myjsonfield -> 'emails') numemails,
lateral generate_series(0, numemails) n
WHERE json_array_element_text(myjsonfield -> 'emails', n) = 'email#two.com';
You can't use the simple IN or = ANY constructs because (at this point) PostgreSQL doesn't understand that you might have a json array, so it'll fail with:
regress=> SELECT * FROM contacts WHERE 'email#two.com' = ANY (myjsonfield->'emails');
ERROR: op ANY/ALL (array) requires array on right side
LINE 1: SELECT * FROM contacts WHERE 'email#two.com' = ANY (myjsonfi...
^
as it expects a PostgreSQL array, not a json array, and there's no convenient builtin to turn a json array into a PostgreSQL array yet.
Postgres has support for parsing JSON. Here is documentation: http://www.postgresql.org/docs/9.3/static/functions-json.html. I can't give you more detailed answer since you didn't provide exact data and schema, but it's easy to find the right function in documentation.

NHibernate native SQL in WHERE clause

I have a Geography column in a table in SQL Server and would like to filter rows with a specific geometry type, e.g. all records where geometry type is 'Point'
The SQL query would look like
select * from GeometryTable g where g.Geography.STGeometryType() = 'Point'
How can I create a criteria for that? The criteria is going to be used with other criterias
criteria.Add(Restrictions.Add(<Geography.STGeometryType()>, some.Value)
Thanks
Use this syntax:
var criteria = session.CreateCriteria<Geometry>();
criteria.Add
(
Expression.Sql(" {alias}.[Geography].STGeometryType() = ? "
, "Point" // a place for your parameter
, NHibernate.NHibernateUtil.String)
);
var list = criteria.List<Geometry>();