Parametrized BigQuery query produces no results from a joined table - google-bigquery

I have rather a complex query that creates a table based on the inner join. The query is created by Python code and SQLAlchemy. When the query runs the result in the destination table is missing data from the right joined table.
To find out why I did the following:
Located the specific query in "Query History" of the BQ Console.
Using the Job associated with the query I have fetched the Job JSON file.
I have substituted all the parameters (#) in the query text with their literal values.
Loaded the resulting SQL text into the BQ editor and executed the query.
To my surprise, the data from the joined table are now present in the result.
The chance that this stems from a bug in BQ is very slim. I think the difference is in the way I substitute the values of the parameters when recreating the query.
The specific query parameters are either "STRING" or "INT64". Here is a sample from the query JSON file:
{
"name": "PARAM_802c1f6dd32747238ccdf80b305a4fd1",
"parameterType": {"type": "INT64"},
"parameterValue": {"value": "0"}
},
{
"name": "PARAM_f7ad61d9a6414d0ea8560e097950ecbc",
"parameterType": {"type": "STRING"},
"parameterValue": {"value": "`column`"}
},
These are the rules I follow for replacing "#param*":
if a parameter is INT64 I use the value from the JSON "as is" (without quotes)
if a parameter is STRING, I check whether it is surrounded by back ticks or not. If surrounded I use it "as is", if not, I add surrounding single quotes to the value.
Would be glad to hear from experts what can be wrong with my approach or anything else that can help me in solving (or debugging) my problem.

Seems my problem was that I have tried to use a column name as a value and this is not supported. Here is an excerpt from the documentation:
Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.

Related

problem with auto-created bigquery field name that contains "."

I used a simple ETL tool to import QuickBooks data into Google BigQuery. Great! The only challenge notable limitation on this step is that I can't do any translation ... more like it's an EL tool.
That said, now I want to query the imported table. It's no problem at all for correctly named fields in BigQuery (like txndate). However, some of the fields are of the format abc.xyz (e.g., deposittoaccountref.value) and can't be queried. The "." in the name is apparently confusing BigQuery.
If I dump the whole table, I can see the "." name fields and the associated values.
However, I can't create a custom query against those fields. They don't show up in the auto-generated schema that allows one to drag and drop field names into the query.
Also, I tried to manually type the field name in and received the following error message: Missing column alias. Any expression in a SELECT statement that is not a column from the original data source must be followed by an alias, for example: AS my_alias.
I've tried quoting the field name and bracketing the field name but they still throw the same error.
I traced back to QB API documentation and this is indeed how Intuit labels the fields.
Finally, as long as I can query these fields at all, I can rename them to eliminate the "." problem.
Please advise and thank you!
ok, I solved this myself.
The way to fix this within bigquery query editor is to manually type in the field name (i.e., not available in the auto-generated schema) and to parenthesis the field name.
e.g, deposittoaccountref.value becomes (deposittoaccountref.value)
Now, this will label the column in the result set as "value", so you may want to relabel the data field to something without the ".". For example, I took the original
deposittoaccountref.value and modified it to
(deposittoaccountref.value) as deposittoaccountref_value
Hopefully, this will help someone else in the future!
the above answer works when there is a single dot in the name as in the example.
however, if there are multiple e.g., "line.value.amount" then the parenthesis trick doesn't work
i've tried nesting the parenthesis in different ways to no avail
e.g., (line.value.amount) = error error, ((line.value).amount) = error, (line.(value.amount)) = error

SQL statement that checks if a table value is contained within a string I send it

I know about the LIKE operator, but for what im trying to do, I would need to swap the column with the value, which didn't work.
Im using this in a discord bot that is connected to a database with a table. The table has two columns, keyword and response. I need a query that could give me the response when a given string contains the keyword.
SELECT response FROM Reply WHERE (insert something here to see if provided string
contains table value)
Use of the SQL tag indicates your question relates to standard SQL (hover over the tag and read it).
LIKE cannot be used for your purpose because the standard is quite clear on what you can specify :
<character like predicate> ::=
<row value predicand> <character like predicate part 2>
<character like predicate part 2> ::=
[ NOT ] LIKE <character pattern> [ ESCAPE <escape character> ]
Therefore you can't WHERE 'myliteral' LIKE colname.
So you'd need a scalar function, but I am not aware of any scalar function defined in the standard that you can use for this purpose.
So you are confined to scalar functions offered by your particular DBMS. E.g. in DB2, there is POSSTR(source_string, search_string) that you could use as POSSTR('myliteral', colname).
Select top 1 response
from Reply t
where concat('%',keyword,'%') like '<user provided text>'
--using % let check if the keyword is 'contained' in user's text
PS CONCAT used this way works on SQLSERVER check the correspondence for the dbms you're using, also top1 stops the query once it finds the first keyword

SQL Server 2016 EscapeCharacter problems

So I am generating a JSON file from SQL server 2016 using 'FOR JSON'
I have used JSON_QUERY to wrap queries to prevent the escape characters from appearing before the generated double quotes ("). This worked correctly except they are still showing up for the forward slashes (/) on the formatted dates.
One thing to note is that I am converting the datetime objects in SQL using the following method CONVERT(VARCHAR, [dateEntity], 101)
An example (This is a subquery)
JSON_QUERY((
SELECT [LegacyContactID]
,[NameType]
,[LastName]
,[FirstName]
,[Active]
,[Primary]
,CONVERT(VARCHAR,[StartDate],101) AS [StartDate]
,CONVERT(VARCHAR,[EndDate],101) AS [EndDate]
FROM [LTSS].[ConsumerFile_02_ContactName]
WHERE [LegacyContactID] = ContactList.[LegacyContactID]
FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER
)) AS ContactName
And the result will be
"ContactName": {
"LegacyContactID": "123456789",
"NameType": "Name",
"LastName": "Jack",
"FirstName": "Apple",
"Active": true,
"Primary": true,
"StartDate": "04\/01\/2016",
"EndDate": "04\/30\/2016"
}
I have the whole query wrapped in JSON_QUERY to eliminate the escaping but it still escapes the forward slashes on the dates.
I also have passed the dates as strings without the conversion and still get the same results.
Any insight?
One solution is to avoid the "/" generally in date, by using the "right" JSON data format
SELECT JSON_QUERY((
SELECT TOP 1 object_id, create_date
FROM sys.tables
FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER
))
Result
{"object_id":18099105,"create_date":"2017-08-14T11:19:22.670"}
UPDATED:
Ah, yes, escape and CLRF characters.
Unless you your environment shows the offending characters, you will be forced to manually copy and paste from the result sets and replace the string from there.
Now, what you mention in your recent update got me considering why you feel the need to transform your data in the first place. DATES do not have formatting by default, so unless JSON is incompatible with handling SQL dates, there is really no need to transform this data inside JSON if you your target tables enforce the correct format.
So unless there is still a concern for the truncation of data, from an ETL perspective there are two ways you can accomplish this:
1 - USE STAGING TABLES
Staging tables can either be temporary tables, CTEs, or actual empty tables you use to extract, cleanse, and transform your data.
Advantages: You are only dealing with the rows being inserted, do not have to be concerned with constraints, and can easily modify OUTSIDE JSON any corruption or non-structured aspect of your data.
Disadvantages: Staging tables may represent more object in your database, depending how repetitive the need for them is. Thus, finding better, consistent structured data is preferable.
2 - ALTER YOUR TABLE TO USE STRINGS
Here you enforce the business rules cleansing the data AFTER insertion into the persistent table.
Advantages: You save on space, simplify the cleansing process, and can still use indexes. SQL Server is pretty efficient at parsing through DATE strings, still take advantage of EXISTS() and possible SARGS to check for not-dates when running your insert.
Disadvantages: You lose a primary integrity check on your table while the dates are now stored as strings, opening up possibilities of dirty data being exposed. Your UPDATE statements will be forced to use the entire table, which can drag on performances.
JSON_QUERY((
SELECT [LegacyContactID]
,[NameType]
,[LastName]
,[FirstName]
,[Active]
,[Primary]
,[StartDate] --it already is in a dateformat
,[EndDate]
FROM [LTSS].[ConsumerFile_02_ContactName]
WHERE [LegacyContactID] = ContactList.[LegacyContactID]
FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER
)) AS ContactName
I have ran into some similar issues. Without going into a ton of detail, believe this is some of the reason the new JSON functionality isn't getting a ton of adoption yet from what I can see.
I've added a couple comments to the MSDN about this and a tweet:
"Why can't the auto-escaping of ALL strings be turned off with a flag???" - https://msdn.microsoft.com/en-us/library/dn921889.aspx
"Almost there, but not quite yet..." - https://msdn.microsoft.com/en-us/library/dn921882.aspx
"Anyone else frustrated with forced auto-escaping of all JSON in #SQLServer / #AzureSQLDB? (see link for my comments) msdn.microso…" - https://twitter.com/brian_jorden/status/844621512711831552
If you come across a method or way to deal with this, would love to hear in this or any of those threads, and good luck...

How do you index an array inside a JSON with an Oracle 12c query?

I have a table "move" with one column "move_doc" which is a CLOB. The json stored inside has the structure:
{
moveid : "123",
movedate : "xyz",
submoves: [
{
submoveid: "1",
...
},
{
submoveid : "2",
...
}
]
}
I know I can run an Oracle 12c query to access the submoves list with:
select move.move_doc.submoves from move move
How do I access particular submoves of the array? And the attributes inside a particular submove?
You have to use Oracle functions json_query and/or json_value like this:
SELECT json_value(move_doc, '$.submoves[0].submoveid' RETURNING NUMBER) FROM move;
returns 1.
SELECT json_query(move_doc, '$.submoves[1]') FROM move;
would return the second JSON element, i.e. something like
{
submoveid : "2",
...
}
json_value is used to retrieve a scalar value, json_query is used to retrieve JSON values. You might also want to have a look at json_table which returns an SQL result table and thus can be used in Joins.
See this Oracle Doc for more examples
Beda here from the Oracle JSON team.
We have added a new multi-value index in release 21c allowing you to index values from a JSON array. Obviously, 21c is brand new and you want to know how to do this in older releases: Functional indexes (using JSON_Value function) are limited to a single value per JSON document and therefore are not capable to index array values. But: there is a 'JSON search index' which indexes your entire JSON document and therefore also values in the array. Another solution is to use a materialized view usign JSON_Table. This will expand the array values into separate rows.Then you can add a regular B-Tree index on that column.
Sample code here:
JSON indexing with functional indexes and JSON search index
https://livesql.oracle.com/apex/livesql/file/content_HN507PELCEEJGVNW4Q61L34DS.html
JSON and materialized views
https://livesql.oracle.com/apex/livesql/file/content_HYMB1YBP4CPMG6T6MXY5G9X5L.html
From what I've looked, In Oracle, you can index the "whole array" as a single index entry, but not individual elements of an array.
NoSQL databases like MongoDB, Couchbase, Cassandra have "array/collection" indexes which can index individual elements or fields of objects within an array and query them.

How to extract property of a collection in the root document

I'm using RavenDB and I'm having trouble extracting a particular value using the Lucene Query.
Here is the JSON in my document:
{
"customer" : "my customer"
"locations": [
{
"name": "vel arcu. Curabitur",
"settings": {
"enabled": true
}
}
]
}
Here is my query:
var list = session.Advanced.LuceneQuery<ExpandoObject>()
.SelectFields<ExpandoObject>("customer", "locations;settings.enabled", "locations;name")
.ToList();
The list is populated and contains a bunch of ExpandoObjects with customer properties but I can't for the life of me get the location -> name or location -> settings -> enabled to come back.
Is the ";" or "." incorrect usage??
It seems that you have misunderstood the concept of indexes and queries in RavenDB. When you load a document in RavenDB you always load the whole document including all of its contents it contains. So in your case, if you load a customer, you already have the collection and all its children loaded. That means, you can use standard linq-to-objects to extract all these values, no need for anything special like indexes or lucene here.
If you want to do this extraction on the database side, so that you can query on those properties, then you need an index. Indexes are written using linq, but it's important to understand that they run on the server and just extract some data to populate the lucene index from. But here again, in most cases you don't even have to write the indexes yourself because RavenDB can create them automatically for you.
I no case, you need to write lucene queries like the one in your question because in RavenDB lucene queries will always be executed against a pre-built index, and these are generally flat. But again, chances are you don't need to do anything with lucene to get what you want.
I hope that makes sense for you. If not, please update your question and tell us more about what you actually want to do.
Technically, you can use the comma operator "," to nest into collections.
That should work, but it isn't recommended. You can just get your whole object and use it, it is easier and faster.