I have a Cosmo DB graph where I would like to access the 'name' field in an expression using the string matching CONTAINS in Cosmos DB. CONTAINS works at 1 level as in matching CONATINS
SELECT s.label, s.name FROM s WHERE CONTAINS(LOWER(s.name._value), "cara") AND s.label = "site"
I also tried with a UDF function
SELECT s.label, s.name FROM s WHERE(s.label = 'site' AND udf.strContains(s.name._value, '/cara/i'))
I don't get any hits or syntax errors from Cosmos DB even that should be at least one record in this example. Does anyone have a hint? Thanks in advance
[
{
"label": "site",
"name": [
{
"_value": "0315817 Caracol",
"id": "2e2f000d-2e0a-435a-b472-75d257236558"
}
]
},
{
"label": "site",
"name": [
{
"_value": "0315861 New Times",
"id": "48497172-1734-43d0-9866-51faf9f603ed"
}
]
}
]
I noticed that the name property is an array not an object.So, you need to use join in sql.
SELECT s.label, s.name , name._value FROM s
join name in s.name
where CONTAINS(LOWER(name._value), "cara") AND s.label = "site"
Output:
Hope it helps you.
Related
I have a table on AWS RDS PostgreSQL that stores JSON objects. For instance I have this registry:
{
"id": "87b05c62-4153-4341-9b58-e86bade25ffd",
"title": "Just Ok",
"rating": 2,
"gallery": [
{
"id": "1cb158af-0983-4bac-9e4f-0274b3836cdd",
"typeCode": "PHOTO"
},
{
"id": "aae64f19-22a8-4da7-b40a-fbbd8b2ef30b",
"typeCode": "PHOTO"
}
],
"reviewer": {
"memberId": "2acf2ea7-7a37-42d8-a019-3d9467cbdcd1",
},
"timestamp": {
"createdAt": "2011-03-30T09:52:36.000Z",
"updatedAt": "2011-03-30T09:52:36.000Z"
},
"isUserVerified": true,
}
And I would like to create a query for obtaining one of the gallery objects.
I have tried this but get both objects in the array:
SELECT jsonb_path_query(data->'gallery', '$[*]') AS content
FROM public.reviews
WHERE jsonb_path_query_first(data->'gallery', '$.id') ? '1cb158af-0983-4bac-9e4f-0274b3836cdd'
With this other query I get the first object:
SELECT jsonb_path_query_first(data->'gallery', '$[*]') AS content
FROM public.reviews
WHERE jsonb_path_query_first(data->'gallery', '$.id') ? '1cb158af-0983-4bac-9e4f-0274b3836cdd'
But filtering by the second array object id, I get no result:
SELECT jsonb_path_query_first(data->'gallery', '$[*]') AS content
FROM public.reviews
WHERE jsonb_path_query_first(data->'gallery', '$.id') ? 'aae64f19-22a8-4da7-b40a-fbbd8b2ef30b'
I have read the official documentation and tried other functions like jsonb_path_exists or jsonb_path_match on the where condition but was not able to make the query work.
Any help would be greatly appreciated. Thanks in advance.
I managed to get the query working as needed. Here is my proposal:
SELECT gallery
FROM public.reviews, jsonb_path_query(data->'gallery', '$[*]') as gallery
WHERE data->>'id' = '87b05c62-4153-4341-9b58-e86bade25ffd' and gallery->>'id' = 'aae64f19-22a8-4da7-b40a-fbbd8b2ef30b'
Hope it helps others.
Background:
I wish to locate the entire JSON document that has a condition where "state" = "new" and where length(Features.id) > 4
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
This is what I have tried to do:
Since this is a nested document. My query looks like this:
A stackoverflow member has helped me to access the nested contents within the query, but is there a way to obtain the full document
I have used:
SELECT VALUE t.id FROM t IN f.feedback.Features where t.state = 'new' and length(t.id)>4
This will give me the ids.
My desire is to have access to the full document with this condition?
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
Any help is appreciated
Try this
SELECT *
FROM f
WHERE
f.feedback.Features[0].state = 'new'
AND length(f.feedback.Features[0].id)>4
Here is the SELECT spec for CosmosDB for more details
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-select
Also, check out "working with JSON" in CosmosDB notes
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-working-with-json
If the Features array has more than 1 value, you can use EXISTS clause to search within them. See specs of EXISTS here with examples:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery#exists-expression
I'm trying to merge two databases with the same schema on Google BigQuery.
I'm following the merge samples here: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement
However, my tables have nested columns, ie "service.id" or "service.description"
My code so far is:
MERGE combined_table
USING table1
ON table1.id = combined_table.id
WHEN NOT MATCHED THEN
INSERT(id, service.id, service.description)
VALUES(id, service.id, service.description)
However, I get the error message: Syntax error: Expected ")" or "," but got ".", and a red squiggly underline under .id on the INSERT(...) line.
Here is a view of part of my table's schema:
[
{
"name": "id",
"type": "STRING"
},
{
"name": "service",
"type": "RECORD",
"fields": [
{
"name": "id",
"type": "STRING"
},
{
"name": "description",
"type": "STRING"
}
]
},
{
"name": "cost",
"type": "FLOAT"
}
...
]
How do I properly structure this INSERT(...) statement so that I can include the nested columns?
Syntax error: Expected ")" or "," but got "."
Looks like you are on the right direction, Note in the documentation how you need to insert value to a REPEATED column,
You need to define the structure to guide BigQuery what to expect, For example:
STRUCT<created DATE, comment STRING>
This is the full example from the documentation
MERGE dataset.DetailedInventory T
USING dataset.Inventory S
ON T.product = S.product
WHEN NOT MATCHED AND quantity < 20 THEN
INSERT(product, quantity, supply_constrained, comments)
-- insert values like this
VALUES(product, quantity, true, ARRAY<STRUCT<created DATE, comment STRING>>[(DATE('2016-01-01'), 'comment1')])
WHEN NOT MATCHED THEN
INSERT(product, quantity, supply_constrained)
VALUES(product, quantity, false)
I've found the answer.
It turns out when referencing the top level of a STRUCT, BigQuery references all of the nested columns as well. So if I wanted to INSERT service and all of it's sub-columns (service.id and service.description), I only have to include service in the INSERT(...) statement.
The following code worked:
...
WHEN NOT MATCHED THEN
INSERT(id, service)
VALUES(id, service)
This would merge all sub columns, including service.id and service.description.
I'm currently checking out Bigquery, and I want to know if it's possible to add new data to a nested table.
For example, if I have a table like this:
[
{
"name": "name",
"type": "STRING"
},
{
"name": "phone",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "number",
"type": "STRING"
},
{
"name": "type",
"type": "STRING"
}
]
}
]
And then I insert a phone number for the contact John Doe.
INSERT into socialdata.phones_examples (name, phone) VALUES("Jonh Doe", [("555555", "Home")]);
Is there an option to later add another number to the contact ? To get something like this:
I know I can update the whole field, but I want to know if there is way to append to the nested table new values.
When you insert data into BigQuery, the granularity is the level of rows, not elements of the arrays contained within rows. You would want to use a query like this, where you update the relevant row and append to the array:
UPDATE socialdata.phones_examples
SET phone = ARRAY_CONCAT(phone, [("555555", "Home")])
WHERE name = "Jonh Doe"
if you need to update multiple records for some users - you can use below
#standardSQL
UPDATE `socialdata.phones_examples` t
SET phone = ARRAY_CONCAT(phone, [new_phone])
FROM (
SELECT 'John Doe' name, STRUCT<number STRING, type STRING>('123-456-7892', 'work') new_phone UNION ALL
SELECT 'Abc Xyz' , STRUCT('123-456-7893', 'work') new_phone
) u
WHERE t.name = u.name
or if those updates are available in some table (for example socialdata.phones_updates):
#standardSQL
UPDATE `socialdata.phones_examples` t
SET phone = ARRAY_CONCAT(phone, [new_phone])
FROM `socialdata.phones_updates` u
WHERE t.name = u.name
I'm investigating the feasibility of using BigQuery to store sensor data in time series. The intent is to store the data in BQ and process it in Pandas... so far so good... Pandas can interpret a TIMESTAMP field index and create a Series.
An additional requirement is that the data support arbitrary tags as key/value pairs (e.g. job_id=1234, task_id=5678). BigQuery can support this nicely with REPEATED fields of type RECORD:
{'fields':
[
{
"mode": "NULLABLE",
"name": "timestamp",
"type": "TIMESTAMP"
},
{
"mode": "REPEATED",
"name": "tag",
"type": "RECORD",
"fields":
[
{
"name":"name",
"type":"STRING"
},
{
"name":"value",
"type":"STRING"
},
{
"mode": "NULLABLE",
"name": "measurement_1",
"type": "FLOAT"
},
{
"mode": "NULLABLE",
"name": "measurement_2",
"type": "FLOAT"
},
{
"mode": "NULLABLE",
"name": "measurement_3",
"type": "FLOAT"
},
]
},
]
}
This works great for storing the data and it even works great for querying if I only need to filter on a single key/value combination
SELECT measurement_1 FROM measurements
WHERE tag.name = 'job_id' AND tag.value = '1234'
However, I also need to be able to combine sets of tags in query expressions and I can't seem to make this work. For example this query returns no result
SELECT measurement_1 FROM measurements
WHERE tag.name = 'job_id' AND tag.value = '1234'
AND tag.name = 'task_id' AND tag.value = '5678'
Questions: Is it possible to formulate a query to do what I want using this schema? What is the recommended way to attach this type of variable data to an otherwise fixed schema in Big Query?
Thanks for any help or suggestions!
Note: If you're thinking this looks like a great fix for InfluxDB it's because that's what I've been using thus far. The seemingly insurmountable issue is the amount of series cardinality in my data set, so I'm looking for alternatives.
BigQuery Legacy SQL
SELECT measurement_1 FROM measurements
OMIT RECORD IF
SUM((tag.name = 'job_id' AND tag.value = '1234')
OR (tag.name = 'task_id' AND tag.value = '5678')) < 2
BigQuery Standard SQL
SELECT measurement_1 FROM measurements
WHERE (
SELECT COUNT(1) FROM UNNEST(tag)
WHERE ((name = 'job_id' AND value = '1234')
OR (name = 'task_id' AND value = '5678'))
) >= 2
Repeated are great way for storing data series, collection etc.
In order to filter out from repeated fields just the value of one interest I would use the following template
SELECT
MAX( IF( filter criteria, value_to_pull, null)) WITHIN RECORD AS some_name
FROM <table>
In your case it would be the following:
SELECT
MAX(IF(tag.name = 'job_id' AND tag.value = '1234', measurement_1, NULL)) WITHIN RECORD AS job_1234_meassurement_1,
MAX(IF(tag.name = 'task_id' AND tag.value = '5678', measurement_1, NULL)) WITHIN RECORD AS task_5678_meassurement_1,
FROM measurements