How to extract content from a json list? - sql

There is a long string value which contains a json list:
{"name":"jack","age":"38","city":"JP"},{"name":"lee","age":"42","city":"tjs"},{"name":"smith","age":"46","city":"kh"}
The objective is to extract info of name, so the result of this is 'jack,lee,smith'.
I tried get_json_object but it returns null; I also tried get_json_object with split but still not worked...
Is there any suitable function in Hive that can implement this demand?

with t as (select '{"name":"jack","age":"38","city":"JP"},{"name":"lee","age":"42","city":"tjs"},{"name":"smith","age":"46","city":"kh"}' as myjson)
select get_json_object(concat('{"x":[',myjson,']}'),'$.x.name[*]') as names
from t
+------------------------+
| names |
+------------------------+
| ["jack","lee","smith"] |
+------------------------+
with t as (select '{"name":"jack","age":"38","city":"JP"},{"name":"lee","age":"42","city":"tjs"},{"name":"smith","age":"46","city":"kh"}' as myjson)
select translate(get_json_object(concat('{"x":[',myjson,']}'),'$.x.name[*]'),'[]"','') as names
from t
+----------------+
| names |
+----------------+
| jack,lee,smith |
+----------------+

Related

Is there a way to alter all columns in a SQL table to have 'utf-8' format

I'm using Spark and I found that my data is not being correctly interpreted. I've tried using decode and encode built-in functions but they can be applied only to one column at a time.
Update:
An example of the behaviour I am having:
+-----------+
| Pa�s |
+-----------+
| Espa�a |
+-----------+
And the one I'm expecting:
+-----------+
| País |
+-----------+
| España |
+-----------+
The sentence is just a simple
SELECT * FROM table

Remove/delete values in a column SQL

I am very new to using SQL and require help.
I have a table containing comma in the values
+-------------------+
| Sample |
+-------------------+
| sdferewr,yyuyuy |
| q45345,ty67rt |
| wererert,rtyrtytr |
| werr,ytuytu |
+-------------------+
I would want to delete/remove the values after the comma(,) and keep only those values before it.
Output required.
+----------+
| Sample |
+----------+
| sdferewr |
| q45345 |
| wererert |
| werr |
+----------+
How would I be able to do this in SQL? please help
Assuming that the table name is "TABLE_NAME" and the field name is "sample". Then
update TABLE_NAME set sample=SUBSTRING_INDEX(`sample`, ',', 1)
The most simple way to do that is
UPDATE table_name
SET column = substring(column for position('',' in column))
WHERE condition;
position(',' in column) will return the position of the comma and substring(column for n) returns the first n characters

Why am I getting empty result by using regex in sql?

There is a column named as keyword of the product table.
+---------+
| keyword |
+---------+
| dump |
| dump2 |
| dump4 |
| dump5 |
| pro |
+---------+
I am fetching those results from product table by using regex whose keyword containing the string du anywhere.
I used select * from products where keyword LIKE '%[du]%';
but it is returning empty set.
What am I doing wrong here ?
If you must use regex, you can just use du as the regex; that will match the string du anywhere in the keyword:
SELECT *
FROM products
WHERE keyword REGEXP 'du'
Output:
keyword
dump
dump2
dump4
dump5
Demo on dbfiddle

How to explode map datatype in Hive OR how to give multiple aliases in Hive

Suppose I query :
select explode(map_column_name) as exploded from table_name
I get this error:
The number of aliases in the AS clause does not match the number of
columns output by the UDTF, expected 2 aliases but got 1
and I googled the error and got to know that to give more than one alias , we use stack function ..
How to use stack function along with explode function so that I eventually explode map datatype and also give 2 aliases at a time?
Kindly bear with me as I am a beginner and learning Hive.
With default columns names
select explode(map) from table_name
With aliases
select explode(map) as (mykey,myval) from table_name
Demo
With default columns names
select explode (map('A',1,'B',2,'C',3))
;
+-----+-------+
| key | value |
+-----+-------+
| A | 1 |
| B | 2 |
| C | 3 |
+-----+-------+
With aliases
select explode (map('A',1,'B',2,'C',3)) as (mykey,myvalue)
;
+-------+---------+
| mykey | myvalue |
+-------+---------+
| A | 1 |
| B | 2 |
| C | 3 |
+-------+---------+

Bigquery query to find the column names of a table

I need a query to find column names of a table (table metadata) in Bigquery, like the following query in SQL:
SELECT column_name,data_type,data_length,data_precision,nullable FROM all_tab_cols where table_name ='EMP';
BigQuery now supports information schema.
Suppose you have a dataset named MY_PROJECT.MY_DATASET and a table named MY_TABLE, then you can run the following query:
SELECT column_name
FROM MY_PROJECT.MY_DATASET.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'MY_TABLE'
Yes you can get table metadata using INFORMATION_SCHEMA.
One of the examples mentioned in the past link retrieves metadata from the INFORMATION_SCHEMA.COLUMN_FIELD_PATHS view for the commits table in the github_repos dataset, you just have to
Open the BigQuery web UI in the GCP Console.
Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.
SELECT
*
FROM
`bigquery-public-data`.github_repos.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE
table_name="commits"
AND column_name="author"
OR column_name="difference"
Note: INFORMATION_SCHEMA view names are case-sensitive.
Click Run.
The results should look like the following
+------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| table_name | column_name | field_path | data_type | description |
+------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| commits | author | author | STRUCT<name STRING, email STRING, time_sec INT64, tz_offset INT64, date TIMESTAMP> | NULL |
| commits | author | author.name | STRING | NULL |
| commits | author | author.email | STRING | NULL |
| commits | author | author.time_sec | INT64 | NULL |
| commits | author | author.tz_offset | INT64 | NULL |
| commits | author | author.date | TIMESTAMP | NULL |
| commits | difference | difference | ARRAY<STRUCT<old_mode INT64, new_mode INT64, old_path STRING, new_path STRING, old_sha1 STRING, new_sha1 STRING, old_repo STRING, new_repo STRING>> | NULL |
| commits | difference | difference.old_mode | INT64 | NULL |
| commits | difference | difference.new_mode | INT64 | NULL |
| commits | difference | difference.old_path | STRING | NULL |
| commits | difference | difference.new_path | STRING | NULL |
| commits | difference | difference.old_sha1 | STRING | NULL |
| commits | difference | difference.new_sha1 | STRING | NULL |
| commits | difference | difference.old_repo | STRING | NULL |
| commits | difference | difference.new_repo | STRING | NULL |
+------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
For newbies like me, the above is of the following syntax:
select * from project_name.dataset_name.INFORMATION_SCHEMA.COLUMNS where table_catalog=project_name and table_schema=dataset_name and table_name=table_name
Update: This is now possible! See the INFORMATION SCHEMA docs and the answers below.
Answer, circa 2012:
It's not currently possible to retrieve table metadata (i.e. column names and types) via a query, though this isn't the first time it's been requested.
Is there a reason you need to do this as a query? Table metadata is available via the tables API.
Actually it is possible to do so using SQL. To do so you need to query the logging table for the last log of this particular table being created.
For example, assuming the table is loaded/created daily:
CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String)
RETURNS ARRAY<STRING> AS ((
SELECT
SPLIT(
REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,')
,',')
));
WITH valid_schema_columns AS (
WITH array_output aS (SELECT
jsonSchemaStringToArray(jsonSchema) AS column_names
FROM (
SELECT
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema
, ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count
FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101`
WHERE
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>'
AND
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>'
AND
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED'
) AS t
WHERE
t.record_count = 1 -- grab the latest entry
)
-- this is actually what UNNESTS the array into standard rows
SELECT
valid_column_name
FROM array_output
LEFT JOIN UNNEST(column_names) AS valid_column_name
)
To Check column, You can access your table Through CLI Easy and simple to find
bq query --use_legacy_sql=false 'select Hour, sum(column 1) as column from `project_id.dataset.table_name` where Date(Hour) = '2020-06-10';'