BigQuery - Nested and Repeated not allowing to insert Null values - google-bigquery

I have a field with mutiplt Nested and Repeated fields inside it. It is not allowing me to Insert null values or null array for any child field with RECORD data type.
Is this a limitation or is there any workaround to this?

Just the question I've been looking for.. with no answer
Luckily, I've managed to find this question which address the same issue - only on update statement
The solution there works but it's a dirty hack - put the null inside an expression which address a field with the target type
IF(false, struct_to_set_null, NULL)
Where struct_to_set_null is the actual field name from your table
Unfortunately, it didn't work on my scenario because I used an inline select statement that is not on the same scope as my table
Instead, I simply used cast function (thanks to this answer). The downside is that I ended up with pretty much complex expressions such as:
# this represents a repeated record with a single field of type string
cast(null as Array<struct<STRING>>)
# same thing but with multiple values
cast(null as Array<struct<STRING, INTEGER>>)
On the bright side - it will always work + it's seems more clear and straight-forward (at least for me)
Hope this post will help someone out there

Related

"Cannot construct data type datetime" when filtering data, but all values filtered DO have valid dates

I am convinced that this question is NOT a duplicate of:
Cannot construct data type datetime, some of the arguments have values which are not valid
In that case the values passed in are explicitly not valid. Whereas in this case the values that the function could be expected to be called upon are all valid.
I know what the actual problem is, and it's not something that would help most people that find the other question. But it IS something that would be good to be findable on SO.
Please read the answer, and understand why it's different from the linked question before voting to close as dupe of that question.
I've run some SQL that's errored with the error message: Cannot construct data type datetime, some of the arguments have values which are not valid.
My SQL uses DATETIMEFROMPARTS, but it's fine evaluating that function in the select - it's only a problem when I filter on the selected value.
It's also demonstrating weird, can't-possibly-be-happening behaviour w.r.t. other changes to the query.
My query looks roughly like this:
WITH FilteredDataWithDate (
SELECT *, DATETIMEFROMPARTS(...some integer columns representing date data...) AS Date
FROM Table
WHERE <unrelated pre-condition filter>
)
SELECT * FROM FilteredDataWithDate
WHERE Date > '2020-01-01'
If I run that query, then it errors with the invalid data error.
But if I omit the final Date > filter, then it happily renders every result record, so clearly none of the values it's filtering on are invalid.
I've also manually examined the contents of Table WHERE <unrelated pre-condition filter> and verified that everything is a valid date.
It also has a wild collection of other behaviours:
If I replace all of ...some integer columns representing date data... with hard-coded numbers then it's fine.
If I replace some parts of that data with hardcoded values, that fixes it, but others don't. I don't find any particular patterns in what does or doesn't help.
If I remove most of the * columns from the Table select. Then it starts to be fine again.
Specifically, it appears to break any time I include an nvarchar(max) column in the CTE.
If I add an additional filter to the CTE that limits the results to Id values in the following ranges, then the results are:
130,000 and 140,000. Error.
130,000 and 135,000. Fine.
135,000 and 140,000. Fine.!!!!
Filtering by the Date column breaks everything ... but ORDER BY Date is fine. (and confirms that all dates lie within perfectly sensible bounds.)
Adding TOP 1000000 makes it work ... even though there are only about 1000 rows.
... WTAF?!
This took me a while to decode, but it turns out that the SS compiler doesn't necessarily restrict its execution of the function just to rows that are, or could be, relevant to the result set.
Depending on the execution plan it arrives at, the function could get called on any record in Table, even one that doesn't satisfy WHERE <unrelated pre-condition filter>.
This was found by another user, for another function, over here.
So the fact that it could return all the results without the filter wasn't actually proving that every input into the function was valid. And indeed there were some records in the table that weren't in the result set, but still had invalid data.
That actually means that even if you were to add an explicit WHERE filter to exclude rows containing invalid date-component data ... that isn't actually guaranteed to fix it, because the function may still get called against the 'excluded' rows.
Each of the random other things I did will have been influencing the query plan in one way or another that happened to fix/break things.
The solution is, naturally, to fix the underlying table data.

SQL CASE How do I correct assign a value?

First, I would like to say I have already spent the due time and diligence on my part by watching videos, using other sources, and reading other posts and answers on SOF before asking this and have been unable to find a solution.
The issue I am running into, in a particular case, is a certain type is being passed in, which would require the use of LIKE as the specific type itself will not match anything as three types use the one type, say 'painting' in this situation. The database has a 'painting small' and 'painting large.'
Code
// I tried this
CASE WHEN type = 'painting' THEN inventory.type LIKE '%'+type+'%' ELSE inventory.type = type END
I keep running into the "An expression of a non-boolean type specified in a context where a condition is expected. There are a few other variations I have tried as well as IF ELSE, however, I run into the same issue. Someone else may be able to word this question better.
I mainly want to be pointed in the right direction and receive clarification on what I am doing wrong.
Thank you
There are a few problems with your query. Rather than the CASE expression itself I'm going to address the less obvious problem, your lack of prefixing. Take this clause:
inventory.type LIKE '%'+type+'%'
This could likely either error, due to an ambiguous column name, or resolve to inventory.type LIKE '%'+inventory.type+'%'; obviously the latter is going to always be true unless the column type has the value NULL. Always prefix your column names, especially when your query contains 2+ tables.
As for the actual problem, this is presumably part of a WHERE, therefore use OR and AND logic:
WHERE (({Other Table Prefix}.[type] = 'painting' AND inventory.[type] LIKE '%' + {Other Table Prefix}.[Type] + '%')
OR ({Other Table Prefix}.[type] != 'painting' AND inventory.[type] = {Other Table Prefix}.[Type]))
Obviously, you need to appropriately replace {Other Table Prefix} with the correct prefix.
The problem seems to be in the
LIKE '%'+type+'%'
where LIKE may be returning a boolean value.

SQL DB2 - How to SELECT or compare columns based on their name?

Thank you for checking my question out!
I'm trying to write a query for a very specific problem we're having at my workplace and I can't seem to get my head around it.
Short version: I need to be able to target columns by their name, and more specifically by a part of their name that will be consistent throughout all the columns I need to combine or compare.
More details:
We have (for example), 5 different surveys. They have many questions each, but SOME of the questions are part of the same metric, and we need to create a generic field that keeps it. There's more background to the "why" of that, but it's pretty important for us at this point.
We were able to kind of solve this with either COALESCE() or CASE statements but the challenge is that, as more surveys/survey versions continue to grow, our vendor inevitably generates new columns for each survey and its questions.
Take this example, which is what we do currently and works well enough:
CASE
WHEN SURVEY_NAME = 'Service1' THEN SERV1_REC
WHEN SURVEY_NAME = 'Notice1' THEN FNOL1_REC
WHEN SURVEY_NAME = 'Status1' THEN STAT1_REC
WHEN SURVEY_NAME = 'Sales1' THEN SALE1_REC
WHEN SURVEY_NAME = 'Transfer1' THEN Null
ELSE Null
END REC
And also this alternative which works well:
COALESCE(SERV1_REC, FNOL1_REC, STAT1_REC, SALE1_REC) as REC
But as I mentioned, eventually we will have a "SALE2_REC" for example, and we'll need them BOTH on this same statement. I want to create something where having to come into the SQL and make changes isn't needed. Given that the columns will ALWAYS be named "something#_REC" for this specific metric, is there any way to achieve something like:
COALESCE(all columns named LIKE '%_REC') as REC
Bonus! Related, might be another way around this same problem:
Would there also be a way to achieve this?
SELECT (columns named LIKE '%_REC') FROM ...
Thank you very much in advance for all your time and attention.
-Kendall
Table and column information in Db2 are managed in the system catalog. The relevant views are SYSCAT.TABLES and SYSCAT.COLUMNS. You could write:
select colname, tabname from syscat.tables
where colname like some_expression
and syscat.tabname='MYTABLE
Note that the LIKE predicate supports expressions based on a variable or the result of a scalar function. So you could match it against some dynamic input.
Have you considered storing the more complicated properties in JSON or XML values? Db2 supports both and you can query those values with regular SQL statements.

PostgreSQL extract keys from jsonb, exception "cannot call jsonb_object_keys on a scalar"

I am trying to get my head around with jsonb in Postgres. There are quite a few issues here, What I wanted to do was something like:
SELECT table.column->>'key_1' as a FROM "table"
I tried with -> and also some combinations of brackets as well, but I was always getting nil in a.
So I tried to get all keys first to see if it is even recognizing jsonb or not.
SELECT jsonb_object_keys(table.column) as a FROM "table"
This threw an error:
cannot call jsonb_object_keys on a scalar
So, to check the column type(which I created, so I know it IS jsonb, but anyway)
SELECT pg_typeof(column) as a FROM "table" ORDER BY "table"."id" ASC LIMIT 1
This correctly gave me "jsonb" in the result.
values in the column are similar to {"key_1":"New York","key_2":"Value of key","key_3":"United States"}
So, I am really confused on what actually is going on here and why is it calling my json data to be scalar? What does it actually means and how to solve this problem?
Any help in this regard will be greatly helpful.
PS: I am using rails, posted this as a general question for the problem. Any rails specific solution would also work.
So the issue turned out to be OTHER than only SQL.
As I mentioned I am using rails(5.1), I had used default value '{}' for the jsonb column. And I was using a two-way serializer for the column by defining it in my model for the table.
Removing this serializer and adjusting the default value to {} actually solved the problem.
I think my serializer was doing something to the values, but still, in the database, it had correct value like i mentioned in the question.
It is still not 100% clear to me what was the problem. But it is solved anyway. If anyone can shed some light on what exactly the problem was, that will be great.
Hope this might help someone.
In my case the ORM layer somehow managed to wrote a null string into the JSON column and Postgres was happy with it. Trying to execute json_object_keys on such value resulted in the OP error.
I have managed to track down the place that allow such null strings and after fixing the code, I have also fixed the data with the following query:
UPDATE tbl SET col = '{}'::jsonb WHERE jsonb_typeof(col) <> 'object';
If you intentionally mix the types stored in the column (e.g. sometimes it is an object, sometimes array etc), you might want to filter out all rows that don't contain objects with a simple WHERE:
SELECT jsonb_object_keys(tbl.col) as a FROM tbl WHERE jsonb_typeof(col) = 'object';

Hide Empty columns

I got a table with 75 columns,. what is the sql statement to display only the columns with values in in ?
thanks
It's true that a similar statement doesn't exist (in a SELECT you can use condition filters only for the rows, not for the columns). But you could try to write a (bit tricky) procedure. It must check which are the columns that contains at least one not NULL/empty value, using queries. When you get this list of columns just join them in a string with a comma between each one and compose a query that you can run, returning what you wanted.
EDIT: I thought about it and I think you can do it with a procedure but under one of these conditions:
find a way to retrieve column names dynamically in the procedure, that is the metadata (I never heard about it, but I'm new with procedures)
or hardcode all column names (loosing generality)
You could collect column names inside an array, if stored procedures of your DBMS support arrays (or write the procedure in a programming language like C), and loop on them, making a SELECT each time, checking if it's an empty* column or not. If it contains at least one value concatenate it in a string where column names are comma-separated. Finally you can make your query with only not-empty columns!
Alternatively to stored procedure you could write a short program (eg in Java) where you can deal with a better flexibility.
*if you check for NULL values it will be simple, but if you check for empty values you will need to manage with each column data type... another array with data types?
I would suggest that you write a SELECT statement and define which COLUMNS you wish to display and then save that QUERY as a VIEW.
This will save you the trouble of typing in the column names every time you wish to run that query.
As marc_s pointed out in the comments, there is no select statement to hide columns of data.
You could do a pre-parse and dynamically create a statement to do this, but this would be a very inefficient thing to do from a SQL performance perspective. Would strongly advice against what you are trying to do.
A simplified version of this is to just select the relevant columns, which was what I needed personally. A quick search of what we're dealing with in a table
SELECT * FROM table1 LIMIT 10;
-> shows 20 columns where im interested in 3 of them. Limit is just to not overflow the console.
SELECT column1,column3,colum19 FROM table1 WHERE column3='valueX';
It is a bit of a manual filter but it works for what I need.