Replace null value in Table before TransformColumns - error-handling

In a PowerBI datasource I have a table with a Column "SPRINT" in one Column. This column contains a LIST of values or null (value comes from JIRA API).
In the next step I extract the values using Table.TransformColumns to geht the string out:
= Table.TransformColumns(#"PREVIOUS", {"Sprint", each Text.Combine(List.Transform(_, Text.From), ";"), type text})
Result:
in the next steps I do a SplitColumn in order to get the information out of this SPRINT string that I want to have.
In the end, all rows where SPRINT was null end in several columns with ERROR.
So my question is, how can I manage to prevent getting all those errors? Ideally I would expect to transform the SPRINT Column values where SPRINT is null into something that does not end with ERROR and avoid the execution of SPLIT function in case it does not make any sense.
I managed to add Table.ReplaceErrorValues in the end for all affected columns - and this works but it seems to be quite an overhead to me and I try to keep the transformations as slim as possible because the JIRA API is slow and tending to timeout with a specific number of rows and operations.
Thanks for helping out!

If it is null, you can simply add one more step after expanding and replace error with null or any other value you are expecting against null.

Related

"Cannot construct data type datetime" when filtering data, but all values filtered DO have valid dates

I am convinced that this question is NOT a duplicate of:
Cannot construct data type datetime, some of the arguments have values which are not valid
In that case the values passed in are explicitly not valid. Whereas in this case the values that the function could be expected to be called upon are all valid.
I know what the actual problem is, and it's not something that would help most people that find the other question. But it IS something that would be good to be findable on SO.
Please read the answer, and understand why it's different from the linked question before voting to close as dupe of that question.
I've run some SQL that's errored with the error message: Cannot construct data type datetime, some of the arguments have values which are not valid.
My SQL uses DATETIMEFROMPARTS, but it's fine evaluating that function in the select - it's only a problem when I filter on the selected value.
It's also demonstrating weird, can't-possibly-be-happening behaviour w.r.t. other changes to the query.
My query looks roughly like this:
WITH FilteredDataWithDate (
SELECT *, DATETIMEFROMPARTS(...some integer columns representing date data...) AS Date
FROM Table
WHERE <unrelated pre-condition filter>
)
SELECT * FROM FilteredDataWithDate
WHERE Date > '2020-01-01'
If I run that query, then it errors with the invalid data error.
But if I omit the final Date > filter, then it happily renders every result record, so clearly none of the values it's filtering on are invalid.
I've also manually examined the contents of Table WHERE <unrelated pre-condition filter> and verified that everything is a valid date.
It also has a wild collection of other behaviours:
If I replace all of ...some integer columns representing date data... with hard-coded numbers then it's fine.
If I replace some parts of that data with hardcoded values, that fixes it, but others don't. I don't find any particular patterns in what does or doesn't help.
If I remove most of the * columns from the Table select. Then it starts to be fine again.
Specifically, it appears to break any time I include an nvarchar(max) column in the CTE.
If I add an additional filter to the CTE that limits the results to Id values in the following ranges, then the results are:
130,000 and 140,000. Error.
130,000 and 135,000. Fine.
135,000 and 140,000. Fine.!!!!
Filtering by the Date column breaks everything ... but ORDER BY Date is fine. (and confirms that all dates lie within perfectly sensible bounds.)
Adding TOP 1000000 makes it work ... even though there are only about 1000 rows.
... WTAF?!
This took me a while to decode, but it turns out that the SS compiler doesn't necessarily restrict its execution of the function just to rows that are, or could be, relevant to the result set.
Depending on the execution plan it arrives at, the function could get called on any record in Table, even one that doesn't satisfy WHERE <unrelated pre-condition filter>.
This was found by another user, for another function, over here.
So the fact that it could return all the results without the filter wasn't actually proving that every input into the function was valid. And indeed there were some records in the table that weren't in the result set, but still had invalid data.
That actually means that even if you were to add an explicit WHERE filter to exclude rows containing invalid date-component data ... that isn't actually guaranteed to fix it, because the function may still get called against the 'excluded' rows.
Each of the random other things I did will have been influencing the query plan in one way or another that happened to fix/break things.
The solution is, naturally, to fix the underlying table data.

Invalid Position in SQL WHERE Clause

I have a query I am writing that examines an ID field and derives an ID number from that column based on several criteria. Now that I have its logic written, I want to run the query on each criteria to see if the logic is working. So, the last part of my query for doing so is as follows:
FROM TABLE1
WHERE SOURCE_SYSTEM_NM = 'XYZ' AND ((STRLEFT(SOURCE_ARRANGEMENT_ID,4)) NOT IN ('23CC','21CC'))
LIMIT 10000
Essentially what I am trying to do here is tell it to return to me only items with SOURCE_SYSTEM_NM equal to 'XYZ', while eliminating any with a SOURCE_ARRANGEMENT_ID not having the first 4 characters equal to '21CC' or '23CC'. I have a third criteria I want to filter on as well, which is that the first three characters must be '0CC'.
My problem when I run this is I get back an "Invalid Position" error. I removed one of the strings from the criteria, and it works. So, I decided to add the second in its own 'NOT IN...' clause with an AND between them, but that resulted in the same error.
If I had to guess, the NOT IN ('21CC','23CC') puts an AND between them and I think that must be the root of my issue. The criteria in my CASE statement derives the ID number with the following:
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND STRLEFT(SOURCE_ARRANGEMENT_ID, 4) IN ('23CC','21CC') THEN STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-4)
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND STRLEFT(SOURCE_ARRANGEMENT_ID, 3) IN ('0CC') THEN STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-3)
WHEN (M_CRF_CU_PRODUCT_ARRANGEMENT.SOURCE_SYSTEM_NM) IN ('XYZ') AND (STRLEFT(SOURCE_ARRANGEMENT_ID, 4) NOT IN ('23CC','21CC') OR STRLEFT(SOURCE_ARRANGEMENT_ID, 3) NOT IN ('0CC')) THEN (SOURCE_ARRANGEMENT_ID)
So with that, I am just trying to check each criteria to make sure the ID derived/created is correct. I need to filter down to get results for that last WHEN statement above, but I keep getting that "Invalid Position" in my WHERE statement at the end. I am using Aginity to run this query and it's running against an IBM Netezza database. Thanks in advance!
I figured out what the issue was on this - when performing
STRRIGHT(SOURCE_ARRANGEMENT_ID, LENGTH(SOURCE_ARRANGEMENT_ID)-4)
There are some of those Arrangement IDs that do not have 4 characters, thus I was getting an "Invalid Position". I fixed this by updating this query to use substring() instead:
SUBSTRING(SOURCE_ARRANGEMENT_ID,5,LENGTH(SOURCE_ARRANGEMENT_ID))
This fixed my issue. Just wanted to post an answer in case others have this issue. It s not Netezza specific, this will react this way with any SQL variant.

How to skip records during appending to table when column is NULL but was REQUIRED?

I would like to skip rows during querying data to table which have NULL values on REQUIRED column.
Now whole query is failing.
I would like to have similar behaviour to
the flag configuration.load.maxBadRecords which is only for JSON/CSV.
It skips bad records and according to following answer stores bad records in status.errors field.
I didn't test above flag but I see that it is connected with configuration.load.ignoreUnknownValues and could be probably useful when records are not able to being parsed (eg. due to invalid characters)
Anyway in my case error is caused by NULL values when they are REQUIRED?
I would be grateful also for some reasonable workarounds.
For me setting default values by IFNULL is not the option.
I want also avoid detect such rows programatically.

Access 2016 - Linked Excel Tbl made into Qry - get Data type Mismatch when trying Unmatched

This seems like a very simple question, however, I am still not getting rid of the Data Type Mismatch. Scenario:
-> Excel File link in as table [tbl_Mast_CC_List], I convert the possible Cost Center Numbers into Values for safety via query, there are NO text variables in the Cost Center or preceding 000's, next arrow
-> qry_CC_Clean is CostCenter:Val([tbl_Mast_CC_List.CostCenter])
-> then I create the Unmatched Query, here is the SQL:
SELECT
qry_CC_S1_Clean_F2F_Alloc.DataName
, qry_CC_S1_Clean_F2F_Alloc.Year
, qry_CC_S1_Clean_F2F_Alloc.CostCenter
FROM qry_CC_S1_Clean_F2F_Alloc
LEFT JOIN qry_CC_S1_Clean_Mast_CC_List
ON qry_CC_S1_Clean_F2F_Alloc.CostCenter = qry_CC_S1_Clean_Mast_CC_List.CostCenter
WHERE (((qry_CC_S1_Clean_Mast_CC_List.CostCenter) Is Null))
ORDER BY qry_CC_S1_Clean_F2F_Alloc.CostCenter;
The only time I can get it to work is if I make table of the query and I don't really want to do that. Any suggestions would be greatly appreciated because I have to run this unmatched query against numerous tables to make sure the company is not missing any cost centers rolling through. Thank you!
Your problem is likely due to using the val() function and trying to test it against nulls. My understanding is that val() doesn't return nulls, it returns 0 when it can't find anything. You might be better off running the conversion in the opposite direction, i.e. using CStr() on the numeric CostCenter field and comparing that to the text data from Excel.
Alternately you could change the Excel field itself to a number format instead of text.

Large number of UPDATE queries slowing down page

I am reading and validating large fixed-width text files (range from 10-50K lines) that are submitted via our ASP.net website (coded in VB.Net). I do an initial scan of the file to check for basic issues (line length, etc). Then I import each row into a MS SQL table. Each DB rows basically consists of a record_ID (Primary, auto-incrementing) and about 50 varchar fields.
After the insert is done, I run a validation function on the file that checks each field in each row based on a bunch of criteria (trimmed length, isnumeric, range checks, etc). If it finds an error in any field, it inserts a record into the Errors table, which has an error_ID, the record_ID and an error message. In addition, if the field fails in a particular way, I have to do a "reset" on that field. A reset might consist of blanking the entire field, or simply replacing the value with another value (e.g. replacing the string with a new one that has all illegals chars taken out).
I have a 5,000 line test file. The upload, initial check, and import takes about 5-6 seconds. The detailed error check and insert into the Errors table takes about 5-8 seconds (this file has about 1200 errors in it). However, the "resets" part takes about 40-45 seconds for 750 fields that need to be reset. When I comment out the resets function (returning immediately without actually calling the UPDATE stored proc), the process is very fast. With the resets turned on, the pages take 50 seconds to return.
My UPDATE stored proc is using some recommended code from http://sommarskog.se/dynamic_sql.html, whereby it uses CASE instead of dynamic SQL:
UPDATE dbo.Records
SET dbo.Records.file_ID = CASE #field_name WHEN 'file_ID' THEN #field_value ELSE file_ID END,
.
. (all 50 varchar field CASE statements here)
.
WHERE dbo.Records.record_ID = #record_ID
Is there any way I can help my performance here. Can I somehow group all of these UPDATE calls into a single transaction? Should I be reworking the UPDATE query somehow? Or is it just sheer quantity of 750+ UPDATEs and things are just slow (it's a quad proc server with 8GB ram).
Any suggestions appreciated.
Don't do this in sql; fix the data up in code, then do you updates.
If you have sql 2008, then look into table-value parameters. It enables you to pass an entire table as a parameter to a s'proc. From their you just have the one insert/update or merge statement
If your looping through the lines and doing individual updates/inserts this can be really expensive... Consider using SqlBulkCopy which can speed up all your inserts. Similarly, you can create a DataSet, make your updates on the dataset and then submit them all in one shot through a SqlDataAdapter.
I believe you are doing 50 case statements on every update. Sounds like that would be slow.
It is possible to solve this problem with inject proof code via parameterized querys and a string constant table.
Quick and dirty example code.
string [] queryList = { "UPDATE records SET col1 = {val} WHERE ID={key}",
"UPDATE records SET col2 = {val} WHERE ID={key}",
"UPDATE records SET col3 = {val} WHERE ID={key}",
...
"UPDATE records SET col50 = {val} WHERE ID={key}"}
Then in your call to SQL you just pick the item in the array corresponding to the col you want to update and set the value and key for the parameterized items.
I'm guessing you will see a significant improvement... let me know how it goes.
Um. Why are you inserting numeric data into VARCHAR fields then trying to run numeric checks on it? This is yucky.
Apply correct data typing and constraints to your table, do the INSERT, and see if it failed. SQL Server will happily report errors back to you.
I would try changing the recovery model to simple and look at my indexes. Kimberly Tripp did a session showing a scenario with improved performance using a heap.