How to get query results from TableResult in json using BigQuery API - google-bigquery

I'm following the example given in https://cloud.google.com/bigquery/create-simple-app-api#bigquery-simple-app-java to obtain query results from BigQuery API.
TableResult result = queryJob.getQueryResults();
It returns the results in a TableResults type but I need to get the results in a json format.
TableResult{rows=[[FieldValue{attribute=PRIMITIVE, value=(...)},
FieldValue{attribute=PRIMITIVE, value=(...)},
FieldValue{attribute=PRIMITIVE, value=(...)},
FieldValue{attribute=PRIMITIVE, value=(...)},
FieldValue{attribute=PRIMITIVE, value=(...)},
FieldValue{attribute=PRIMITIVE, value=(...)},
FieldValue{attribute=PRIMITIVE, value=(...)}],(...)
How can I transform the results in TableResults type to json, or even csv?

To convert rows in a table to JSON you can use the function TO_JSON_STRING [1].
To get a JSON with formatting you need to pass the parameter “true” to the function.
The new query would look like this:
#standardSQL
“SELECT TO_JSON_STRING(t,true)"
+ "FROM (
SELECT CONCAT('https://stackoverflow.com/questions/', CAST(id as STRING)) as url, "
+ "view_count "
+ "FROM `bigquery-public-data.stackoverflow.posts_questions` "
+ "WHERE tags like '%google-bigquery%' "
+ "ORDER BY favorite_count DESC LIMIT 10) as t”
Output:
TableResult{rows=[[FieldValue{attribute=PRIMITIVE, value={
"url": "https://stackoverflow.com/questions/6607552",
"view_count": 24524
}}], [FieldValue{attribute=PRIMITIVE, value={
"url": "https://stackoverflow.com/questions/20349189",
"view_count": 4298
}}], [FieldValue{attribute=PRIMITIVE, value={
"url": "https://stackoverflow.com/questions/22734777",
"view_count": 7940
}}], [FieldValue{attribute=PRIMITIVE, value={
"url": "https://stackoverflow.com/questions/27537720",
"view_count": 2039
(...)

Related

PostgreSQL verify an empty array on json

I have the following row on select
jsonData
[]
[{"descricao":"falha na porta"}, {"descricao":"falha no ip"}]
[]
I have to Identify empty jsons, then manually add a value to it (eg row 1 and 3 ), I tried the following :
case when jsonData is null then cast('[{"descricao":"no error"}]' AS json) else jsonData end as opts
But the "is null" verification fails for this type of data (array of json), how to identify '[]' values in this case?
Note: I only have select permission on this db
You can use json_array_length()
when json_array_length(jsondata) = 0 then ...
Casting the json to text before comparison worked for this case :
" case when jsondata::text = '[]' "
Try this condition:
jsondata = JSON '[]'

Google Public Patent Data SQL (BigQuery)

I am trying to retrieve specific cpc codes AND assignees via SQL in the Google public patent data. I am trying to search for the term "VOLKSWAGEN" and cpc.code "H01M8".
But I got the error:
No matching signature for operator = for argument types: ARRAY
<STRUCT<name STRING, country_code STRING>>, STRING. Supported
signature: ANY = ANY at [15:3]
code:
SELECT
publication_number application_number,
family_id,
publication_date,
filing_date,
priority_date,
priority_claim,
ipc,
cpc.code,
inventor,
assignee_harmonized,
FROM
`patents-public-data.patents.publications`
WHERE
assignee_harmonized = "VOLKSWAGEN" AND cpc.code = "H01M8"
LIMIT
1000
I'm also interested in searching multiple assignees such as:
in ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")
I have recently started to work with SQL and do not see the mistake :/
Many thanks for your help!
Many thanks, now I created this code to screen multiple companies.
Is it possible to get the query of requests out of "cpc__u.code" in one row cell each? with a ", " to seperate the codes between the output string?.
Same I like to consider for the assignee_harmonized__u.name here as well !
Do you think the companies will be screened with this precedure and the "IN" operator?
SELECT
publication_number application_number,
family_id,
publication_date,
filing_date,
priority_date,
priority_claim,
cpc__u.code,
inventor,
assignee_harmonized,
assignee
FROM
`patents-public-data.patents.publications`,
UNNEST(assignee_harmonized) AS assignee_harmonized__u,
UNNEST(cpc) AS cpc__u
WHERE
assignee_harmonized__u.name in ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")
AND cpc__u.code LIKE "H01M8%"
LIMIT
100000
In Google BigQuery UNNEST is needed to access ARRAY elements. This is described here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays
The following query works for me.
SELECT
publication_number application_number,
family_id,
publication_date,
filing_date,
priority_date,
priority_claim,
ipc,
cpc__u.code,
inventor,
assignee_harmonized,
FROM
`patents-public-data.patents.publications`,
UNNEST(assignee_harmonized) AS assignee_harmonized__u,
UNNEST(cpc) AS cpc__u
WHERE
assignee_harmonized__u.name = "VOLKSWAGEN AG"
AND cpc__u.code LIKE "H01M8%"
LIMIT
1000
The following are changes I made to generate results:
UNNEST(assignee_harmonized) as assignee_harmonized__u to access assignee_harmonized__u.name.
UNNEST(cpc) as cpc__u to access cpc__u.code.
assignee_harmonized__u.name = "VOLKSWAGEN AG" as "VOLKSWAGEN" returns no results.
cpc__u.code LIKE "H01M8%" as "H01M8" returns no results. An example value is H01M8/10.
This returns the following:
Query complete (2.3 sec elapsed, 29.2 GB processed)
If you want to screen multiple assignee names, IN will work like the following, however, you need to have an exact match like VOLKSWAGEN AG or AUDI AG.
assignee_harmonized__u.name IN ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")
If you want to do a LIKE style match with multiple strings, you can try REGEXP_CONTAINS:
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains

Safe casting a regexp match from REGEXP_REPLACE

I have a table with some code points (e.g. &#38) which I want to strip out from a text value in BigQuery.
My strategy is to use a regexp replace on the number replacing the number with the valid character.
If I try:
WITH items as (SELECT "Test & " as item)
SELECT
CODE_POINTS_TO_STRING([SAFE_CAST(REGEXP_EXTRACT(item, r"&#([0-9]{2})") AS INT64)]) as test_replace
FROM items
This will produce the output that I want for the entry
[
{
"test_replace": "&"
}
]
If I try:
WITH items as (SELECT "Test & " as item)
SELECT
REGEXP_REPLACE(
item,
r"&#([0-9]{2});",
CODE_POINTS_TO_STRING([SAFE_CAST("\\1" as INT64)])
) as full_replace
FROM items
This will produce a null output
[
{
"full_replace": null
}
]
However if I hard code the value in:
WITH items as (SELECT "Test & " as item)
SELECT
REGEXP_REPLACE(
item,
r"&#([0-9]{2});",
CODE_POINTS_TO_STRING([SAFE_CAST("38" as INT64)])
) as full_replace
FROM items
This works.
[
{
"full_replace": "Test & "
]
I know that the regexp is evaluating correctly as if I try:
WITH items as (SELECT "Test & " as item)
SELECT
REGEXP_REPLACE(
item,
r"&#([0-9]{2});",
CONCAT("\\1", "test")
) as part_replace
FROM ITEMS
This will return:
[
{
"part_replace": "Test 38test "
}
]
My question is therefore, how do I get the SAFE_CAST() Function to evaluate the regexp match (it seems to be evaluating the string literal).
I have a table with some code points (e.g. &#38) which I want to strip out from a text value in BigQuery.
Try approach as in below example
#standardSQL
CREATE TEMP FUNCTION multiReplace(item STRING, arr ARRAY<STRUCT<x STRING, y STRING>>)
RETURNS STRING
LANGUAGE js AS """
for (i = 0; i < arr.length; i++) {
item = item.replace(arr[i].x, arr[i].y)
};
return item;
""";
WITH items AS (
SELECT "Test & abc ' xyz" AS item UNION ALL
SELECT "abc xyz"
)
SELECT item, multiReplace(item, points) full_replace
FROM (
SELECT
item,
ARRAY(
SELECT AS STRUCT val, CODE_POINTS_TO_STRING([SAFE_CAST(SUBSTR(val, -3, 2) AS INT64)]) point
FROM UNNEST(REGEXP_EXTRACT_ALL(item, r'(&#[0-9]{2};)')) val
) points
FROM items
)
with result
Row item full_replace
1 Test & abc ' xyz Test & abc ' xyz
2 abc xyz abc xyz
Option 2
While the simplest way to approach above is
#standardSQL
CREATE TEMP FUNCTION multiReplace(item STRING)
RETURNS STRING
LANGUAGE js AS """
var decodeHtmlEntity = function(str) {
return str.replace(/&#([0-9]{2});/g, function(match, dec) {
return String.fromCharCode(dec);
});
};
return decodeHtmlEntity(item);
""";
WITH items AS (
SELECT "Test & abc ' xyz" AS item UNION ALL
SELECT "abc xyz"
)
SELECT item, multiReplace(item) full_replace
FROM items
with the same output

SQL query to cakePHP format (invalid json result)

Hello I have a PostgreSQL query that I would like to write using cakePHP format
SELECT
id,
title,
author,
postdate,
postcontent,
userID
FROM posts
WHERE
userID = 12
AND id IN (SELECT articleID FROM favourites WHERE articlesUserID = 12)
ORDER BY postdate DESC;
this is the format my query has right now in cakePHP :
$favouritearticles = $this->Favourite->query('SELECT id, title, author, postdate, postdatecreation, posteditdate, postcontent, "userID" FROM posts WHERE "userID" = '.$userID.'AND id IN (SELECT lngblogarticleid FROM favourites WHERE lngloginuserid = '.$userID.') ORDER BY postdate DESC');
It's working but if echo json_encode the result like this :
echo json_encode($favouritearticles);
I get an invalid json format like the following :(checked with JSONLint)
[
[
{
"id": 2,
"title": "Prison Or Treatment For the Mentally ill ",
"author": "mike123",
"postdate": "March 12, 2013 at 6:46 pm",
"postdatecreation": "2013-03-12",
"posteditdate": null,
"postcontent": "<p><span>The public revulsion over repeated mass shootings has placed mental health in the spotlight. This is both good and bad.<\/span><\/p>",
"userID": 34
}
]
][
]
So I thought that maybe I should rewrite my query using cakePHP format "using find method" something like :
$favouritearticles = $this->Favourite->find('all',array('conditions'=>array(".........
however the query is quite complex and I don't see how to do so.
Thank you for any help.
Format of JSON is fine except for extra [ ] at the end.
If you still want to rewrite the query in CakePHP format, use following:
private function getFavouritePostsByUserId($userId) {
$db = $this->Post->getDataSource();
$subQuery = $db->buildStatement(
array(
'fields' => array('Favourite.articleID'),
'table' => $db->fullTableName($this->Favourite),
'alias' => 'Favourite',
'conditions' => array(
'Favourite.articlesUserID' => $userId
),
),
$this->Favourite
);
$subQuery = 'Post.id IN (' . $subQuery . ') ';
$subQueryExpression = $db->expression($subQuery);
$conditions = array($subQueryExpression, 'Post.userID' => $userId);
$fields = array('Post.*');
$order = 'Post.postdate DESC';
$this->Post->find('all', compact('conditions', 'fields', 'order'));
}

Extract data from facebook graph api

From the following graph api url,
$graphURL = "https://graph.facebook.com/me/news.reads&access_token=$access_token?" .
//"callback=''&" .
"date_format=U&" .
"limit=5";
I get an array result in the following format:
{"data":[{"id":"10151079746224166",
"from":{"name":"Nicholas Billionea","id":"666859165"},
"start_time":1344853875,
"end_time":1344853875,
"publish_time":1344853875,
"application":{"name":"Muzikki","namespace":"muzikki","id":"354957834546710"},
"data":{"article":{"id":"10150987952714331",
"url":"http:\/\/muzikki.com\/articles\/headlines\/mustapha-qtac-become-ambassadors-for-gui\/",
"type":"article",
"title":"Mustapha, Q-Tac become ambassadors for Guiness campaign in kenya"}},
"type":"news.reads",
"no_feed_story":false,
"likes":{"count":0,"can_like":true,"user_likes":false},
"comments":{"count":0,"can_comment":true}},
...
How do I extract the article title from this multidim array?
Parse the output as json.
$d = json_decode($data,true); // Here $data is the output returned from facebook API
print_r($d['data'][0]['data']['article']);
Output will be :
Array
(
[id] => 10150987952714331
[url] => http://muzikki.com/articles/headlines/mustapha-qtac-become-ambassadors-for-gui/
[type] => article
[title] => Mustapha, Q-Tac become ambassadors for Guiness campaign in kenya
)
Parse it as JSON and extract the fields you need like with any other array