How to skip records during appending to table when column is NULL but was REQUIRED? - google-bigquery

I would like to skip rows during querying data to table which have NULL values on REQUIRED column.
Now whole query is failing.
I would like to have similar behaviour to
the flag configuration.load.maxBadRecords which is only for JSON/CSV.
It skips bad records and according to following answer stores bad records in status.errors field.
I didn't test above flag but I see that it is connected with configuration.load.ignoreUnknownValues and could be probably useful when records are not able to being parsed (eg. due to invalid characters)
Anyway in my case error is caused by NULL values when they are REQUIRED?
I would be grateful also for some reasonable workarounds.
For me setting default values by IFNULL is not the option.
I want also avoid detect such rows programatically.

Related

"Cannot construct data type datetime" when filtering data, but all values filtered DO have valid dates

I am convinced that this question is NOT a duplicate of:
Cannot construct data type datetime, some of the arguments have values which are not valid
In that case the values passed in are explicitly not valid. Whereas in this case the values that the function could be expected to be called upon are all valid.
I know what the actual problem is, and it's not something that would help most people that find the other question. But it IS something that would be good to be findable on SO.
Please read the answer, and understand why it's different from the linked question before voting to close as dupe of that question.
I've run some SQL that's errored with the error message: Cannot construct data type datetime, some of the arguments have values which are not valid.
My SQL uses DATETIMEFROMPARTS, but it's fine evaluating that function in the select - it's only a problem when I filter on the selected value.
It's also demonstrating weird, can't-possibly-be-happening behaviour w.r.t. other changes to the query.
My query looks roughly like this:
WITH FilteredDataWithDate (
SELECT *, DATETIMEFROMPARTS(...some integer columns representing date data...) AS Date
FROM Table
WHERE <unrelated pre-condition filter>
)
SELECT * FROM FilteredDataWithDate
WHERE Date > '2020-01-01'
If I run that query, then it errors with the invalid data error.
But if I omit the final Date > filter, then it happily renders every result record, so clearly none of the values it's filtering on are invalid.
I've also manually examined the contents of Table WHERE <unrelated pre-condition filter> and verified that everything is a valid date.
It also has a wild collection of other behaviours:
If I replace all of ...some integer columns representing date data... with hard-coded numbers then it's fine.
If I replace some parts of that data with hardcoded values, that fixes it, but others don't. I don't find any particular patterns in what does or doesn't help.
If I remove most of the * columns from the Table select. Then it starts to be fine again.
Specifically, it appears to break any time I include an nvarchar(max) column in the CTE.
If I add an additional filter to the CTE that limits the results to Id values in the following ranges, then the results are:
130,000 and 140,000. Error.
130,000 and 135,000. Fine.
135,000 and 140,000. Fine.!!!!
Filtering by the Date column breaks everything ... but ORDER BY Date is fine. (and confirms that all dates lie within perfectly sensible bounds.)
Adding TOP 1000000 makes it work ... even though there are only about 1000 rows.
... WTAF?!
This took me a while to decode, but it turns out that the SS compiler doesn't necessarily restrict its execution of the function just to rows that are, or could be, relevant to the result set.
Depending on the execution plan it arrives at, the function could get called on any record in Table, even one that doesn't satisfy WHERE <unrelated pre-condition filter>.
This was found by another user, for another function, over here.
So the fact that it could return all the results without the filter wasn't actually proving that every input into the function was valid. And indeed there were some records in the table that weren't in the result set, but still had invalid data.
That actually means that even if you were to add an explicit WHERE filter to exclude rows containing invalid date-component data ... that isn't actually guaranteed to fix it, because the function may still get called against the 'excluded' rows.
Each of the random other things I did will have been influencing the query plan in one way or another that happened to fix/break things.
The solution is, naturally, to fix the underlying table data.

Oracle SQL Query - where clause starting and end in certain values

I just want to check whether my query is correct. I have written a simple query on one table and against a field in that table where we are looking for all rows where the value in that field is starting with a certain characters and ending in certain characters, but must not contain certain characters in that field's value. Thus -
WHERE field_1 LIKE 'ABC%' AND field_1 LIKE '%XWY' AND field_1 NOT LIKE '%GHI12XY%'
The result is no rows, which is what we wanted, but I am not 100% sure whether the query is spot on and would like other whether this is correct
It looks correct to me. A good idea in all database software development is to create and load one or more test tables with rows that can be used to verify your logic. Then run your query against the test table(s) and see if you get the results you expect. Ideally, the rows in your test table will cover all the possibilities, for example, where you're supposed to get output, and where you're not supposed to get any.

Autoincrement varchar column optionally

I have a table with guid identifier and one field that is a 5 characters string that can be specified by user, but it is optional, and it should be unique per user. I'm looking for a way to have this field always there, even if user doesn't specify it. The easiest approach is to have it like "00001", "00002"... etc. in case that user doesn't specify it, it is stored like this. I'm using SQL and entity framework core. What is the best way to achieve this?
EDIT: maybe trigger that will check after insert if that field is not specified and then just take current row number and convert it to string? does this make sense?
Cheers
Setting a default value to '00001' can be done define the field with:
NOT NULL DEFAULT right('0000' || to_char(SomeSequence.nextval),5) (pseudo-code to be adapted to the DBMS you are connected to).
Compared to the solution in your EDIT, this will at least guarantee that 2 inserts at the same time from 2 different users get assigned different values.
The real problem comes with the unique constraint on the column. This does not work nicely when mixing manual input with calculated values.
If as a user, I input (manually) 00005, then the insertion will fail when SomeSequence reaches 5.
I think this problem will exist regardless of how you implement the generation of values (sequence, trigger, external code, ...)
Even if you are fine with coding some additional (and probably complicated) logic to manage that, it will probably decrease concurrency.

T-SQL - Error converting data type - show offending row

On a simple INSERT command, I am getting an error:
Error converting data type...
The source data has multiple sources and combined makes hundreds of thousands of rows.
Can I re-write my statement to catch the error and show the offending data?
Thanks!
EDIT:
Requests for code:
insert Table_A
([ID]
,[rowVersion]
,[PluginId]
,[rawdataId]
...
...
...
)
select [ID]
,[rowVersion]
,[PluginId]
,[rawdataId]
...
...
...
FROM TABLE_B
Here are two approaches that I've taken, when dealing with this problem. The issue is caused by an implicit conversion from a string to a date.
If you happen to know which field is being converted (which may be true in your example, but not always in mine), then just do:
select *
from table_B
where isdate(col) = 0 and col is not null
This may not be perfect for all data types, but it has worked well for me in practice.
Sometimes, when I want to find the offending row in a select statement, I would run the select, outputting the data into text rather than a grid. This is one of the options in SSMS, along the row of icons beneath the menus. It will output all the rows before the error, which sort of lets you identify the row with the error. This works best when there is an order by clause, but for debugging purpose it has worked for me.
In your case, I might create a temporary table that holds strings, and then do the analysis on this temporary table, particularly if Table_B is not really a table but a more complicated query.
The query statement insert into...select or select ... into ... from has no capability to find the offending data. Instead you can use BCP to set the max_erros and err_files to output all the offending data into an error file. Then you can simply analyze the error file to find all offending rows. [MSDN BCP]1
One solution is to do a binary search to find the problematic value(s). You can do that both by column and by row:
Try to insert only half the columns, if that works the other half of the columns.
Try to insert only half the number of rows. If that works the other half.
And repeat until you found the problem.

How you get a list of updated columns in SQL server trigger?

I want to know what columns where updated during update operation on a triger on first scaaning books online it looks like COLUMNS_UPDATED is the perfect solution but this function actualy don't check if values has changed , it check only what columns where selected in update clause, any one has other suggestions ?
The only way you can check if the values have changed is to compare the values in the DELETED and INSERTED virtual tables within the trigger. SQL doesn't check the existing value before updating to the new one, it will happily write a new identical value over the top - in other words, it takes your word for the update and tracks the update rather than actual changes.
We can use Update function to find if a particular column is updated:
IF UPDATE(ColumnName)
Refer to this link for details: http://msdn.microsoft.com/en-us/library/ms187326.aspx
As the others have posted, you'll need to interrogate INSERTED and DELETED. The only other useful bit of advice might be that you can get only the rows that have changed values (and discard the rows that didn't change) by using the EXCEPT operator - like this:
SELECT * FROM Inserted
EXCEPT
SELECT * FROM Deleted
The only way I can think of is that you can compare the values in DELETED and INSERTED to see which columns have changed.
Doesn't seem a particularly elegant solution though.
I asked this same question!
The previous posters are correct -- without directly comparing the values, you can't tell for sure whether the data has actually changed or not. However, there are several ways to do this type of checking, depending on what else you're trying to do in the trigger. My question has some good advice in the answers about those different mechanisms and their tradeoffs.