Google BigQuery extract string from column with regexp_extract - string value - google-bigquery

I know there's similar question, but in my case the solution didn't work since there's no valid delimiter.
I have a string inside a column:
{"module-version": "2.0", "more-details-link": ""}, "has-cve": true, "issue-package": "VM-Essentials", "legacy-scan-status": "removed"}
I need the inside of issue-package. I tried:
REGEXP_EXTRACT(details_issues.issue_raw,r'\"issue-package=(.+?)\,')
But i'm getting Null inside the column. Any suggestions?

If this is a JSON string, you should use JSON_* functions instead.
You can try something like:
JSON_EXTRACT(details_issues.issue_raw, '$.issue-package')

Related

Bigquery named parameters regex in Java

I am looking for a way to use a regex as value in a named parameter in the Java SDK. According to the documentation, there is no datatype for that, and using a String parameter does not work.
Is there any way to use a regex as value in a named parameter?
QueryParameterValue Class has no datatype for that:
https://googleapis.dev/java/google-cloud-clients/0.91.0-alpha/com/google/cloud/bigquery/QueryParameterValue.html#of-T-com.google.cloud.bigquery.StandardSQLTypeName-
A regex in the query would e.g. look like this:
REGEXP_CONTAINS(some_attribute, r"^any\sregex\ssearchstring$")
and should be replaced by a named parameter like:
REGEXP_CONTAINS(some_attribute, #named_regex_parameter)
I tried different syntax in the query like
REGEXP_CONTAINS(some_attribute, r#named_regex_parameter)
etc. but none of them worked. The #named_regex_parameter is of type String. I tried to use values in the form of r"regex_expression" and just the regex_expression in the parameter value.
Seems like I need to build the query String without a named parameter for the regex part. Any hints to solve this with parameters would be really appreciated!
//Edit: added code example how the named parameters are used in the query config
QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query)
.setDestinationTable(TableId.of(destinationDataset, destinationTable))
.setAllowLargeResults(true)
.setCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
.setTimePartitioning(partitioning)
.setWriteDisposition(WriteDisposition.WRITE_APPEND)
.setParameterMode("NAMED")
.addNamedParameter("regexExpressionParam", QueryParameterValue.string(someRegexExpressionStringVariable)) //this does not work
.addNamedParameter("someStringParam", QueryParameterValue.string(stringVariable))
.setPriority(Priority.BATCH)
.build();
The query should use the parameter #regexExpressionParam like so:
REGEXP_CONTAINS(theAttributeToQuery, #regexExpressionParam))
You need to pass the regular expression string without r'...'
I had a very similar problem with running parameterized queries on Python: it was something like this.
from google.cloud import bigquery
regex_input = "^begin_word.*end_here$"
# Construct a BigQuery client object.
client = bigquery.Client()
query = """
SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE REGEXP_CONTAINS(word, #regex)
ORDER BY word_count DESC;
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ScalarQueryParameter("regex", "STRING", f"r'{regex_input}'"),
]
)
query_job = client.query(query, job_config=job_config)
At first, I thought the input had to be wrapped by r'...'; just like how I normally write a regex on BQ explorer.
I tried to modify the string input to make it like a regular expression, which was this pard f"r'{regex_input}'" of the code.
but apparently BQ correctly escapes string without our help and I can just pass down the regex string like bigquery.ScalarQueryParameter("regex", "STRING", regex_input)

Update/replace array with JSON_MODIFY

I have json object in table column that contains array type key roles.
I am trying to replace roles value but it's instead replace existing roles, adding new inside roles.
here is the db fiddler.
current result:
{
"roles": {"roles":[{....}]}
}
expected result:
{
"roles":[{....}]
}
Any help is appriciated.
thanks.
Query look correct,But passing object {roles:[]} as input to JSON_QUERY(j.newJson) instead of array [] value.
Either you can modify input or extarct array value from input to get expected result by using JSON_QUERY path.
Before
JSON_MODIFY(fs1.[Schema],'$.roles',JSON_QUERY(j.newJson)) as [Schema]
After adding JSON_QUERY path
JSON_MODIFY(fs1.[Schema],'$.roles',JSON_QUERY(j.newJson,'$.roles')) as [Schema]
for more details see ths docs JSON_QUERY (Transact-SQL)
The point of my original answer was to fully rebuild the JSON, which is why I added , ROOT ('roles').
So you don't need to use JSON_MODIFY, just replace the entire column.
Alternatively, if for example you have other parts to the JSON, you can remove , ROOT ('roles') and leave JSON_MODIFY in.

Error code: DelimitedTextMoreColumnsThanDefined Azure Data Factory

I am trying to copy data from a csv file to a sql table in Azure Data Factory
This is my type property for the CSV file
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "2020-09-16-stations.csv",
"container": "container"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
I recieve following error:
ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source '2020-09-16-stations.csv' with row number 2: found more columns than expected column count 11.,Source=Microsoft.DataTransfer.Common,'
This is row #2
0e18d0d3-ed38-4e7f,Station2,Mainstreet33,,12207,Berlin,48.1807,11.4609,1970-01-01 01:00:00+01,"{""openingTimes"":[{""applicable_days"":96,""periods"":[{""startp"":""08:00"",""endp"":""20:00""}]},{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""20:00""}]}]}"
I think the last column, the JSON query is making trouble in this case. When I view the data it looks fine:
I thought exactly the "quoteChar": "\""would prevent that the last column makes problems. I have no idea why I am getting this error while i run debug
Try setting the escape character = " (a double quote). This should treat each pair of double quotes as an actual single quote and wont consider them as a "Quote Char" within the string, so you will end up with a string that looks like this (and which the system knows is a single string and not something it has to split):
{"openingTimes":[{"applicable_days":96,"periods":[{"startp":"08:00","endp":"20:00"}]},
{"applicable_days":31,"periods":[{"startp":"06:00","endp":"20:00"}]}]}
This is because this value "{""openingTimes"":[{""applicable_days"":96,""periods"":[{""startp"":""08:00"",""endp"":""20:00""}]},{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""20:00""}]}]}" contains several comma and your columnDelimiter is "," which leads to that value is split to several column. So you need to change your columnDelimiter.

How to create a correct filter string with OR and AND operators for django?

My app has a frontend on vue.js and backend on django rest framework. I need to do a filter string on vue which should do something like this:
((status=closed) | (status=canceled)) & (priority=middle)
but got an error as a response
["Invalid querystring operator. Matched: ') & '."]
After encoding my string looks like this:
?filters=((status%3D%D0%97%D0%B0%D0%BA%D1%80%D1%8B%D1%82)%20%7C%20(status%3D%D0%9E%D1%82%D0%BA%D0%BB%D0%BE%D0%BD%D0%B5%D0%BD))%20%26%20(priority%3D%D0%A1%D1%80%D0%B5%D0%B4%D0%BD%D0%B8%D0%B9)
which corresponds to
?filters=((status=closed)|(status=canceled))&(priority=middle)
How should look a correct filter string for django?
I have no problem if statement includes only | or only &. For example filter string like this one works perfect:
?filters=(status%3D%D0%97%D0%B0%D0%BA%D1%80%D1%8B%D1%82)%20%7C%20(status%3D%D0%9E%D1%82%D0%BA%D0%BB%D0%BE%D0%BD%D0%B5%D0%BD)
a.k.a. ?filters=(status=closed)|(status=canceled). But if i add an & after it and additional brackets to specify the order of conditions calculation it fails with an error.
I also tried to reduce usage of brackets and had string like this (as experiment):
?filters=(status%3D%D0%97%D0%B0%D0%BA%D1%80%D1%8B%D1%82%20%7C%20status%3D%D0%9E%D1%82%D0%BA%D0%BB%D0%BE%D0%BD%D0%B5%D0%BD)
a.k.a. ?filters=(status=closed | status=canceled). This one doesn't work - get neither error nor the data.
I need to have a mixed results in my case: both statuses (closed and canceled) and priority=middle, but a string format isn't correct. Please explain, which format would be Ok?
That doesn't look like a very uri friendly syntax you're trying to use there.
Try doing this instead:
?status[]=closed&status[]=cancelled&priority=middle
Then use request.GET.getlist('status[]') to get back the list and use the values for logical OR queryset filtering:
qs = qs.filter(status__in=request.GET.getlist('status[]', [])
and then add any additional filtering which works as logical AND.
If you're using axios, it should automatically format js status url param into proper format.

Using Regex in a SQL Select

I have a SQL column that contains JSON payloads. I'm writing a SQL query that will extract part of that column w/ regex.
In the following field:
{
"myContent":{
"fieldG":null,
"valuable":"this is the text",
"fieldH":[
"a4a6ba1c2e0e4a9c89dac46f1092b505"
],
"fieldI":"1"
},
"fieldJ":"1441375349399"
}
I want to scrape the string immediately after "valuable":
I was experimenting w/ something like this:
\"valuable\":\".*\"
... but that doesn't help. For starters, I only want what's AFTER valuable and colon. Also, that particular regex matches from "valuable:" all the way through the entire end of the line. That's way more text than I need.
In SQL Server 2016 CTP3 you could use JSON_VALUE function, e.g.:
SELECT JSON_VALUE(column, '$.mycontent.valuable').
See http://blogs.msdn.com/b/jocapc/archive/2015/05/16/json-support-in-sql-server-2016.aspx
If you cannot wait for Sql Server 2016, you can use some JSON CLR library e.g. http://www.codeproject.com/Articles/1000953/JSON-for-SQL-Server-Part
Otherwise yo would need to use PATINDEX and substring.