I have a column that's generated from a custom component on the Data Flow in SSIS.
The data type of the column is float[DT_R8] that has along with valid float values, NaN's in there. I would like to identify these NaN's and treat(assign) these as NULL values.
I thought of doing something in the Derived Column Transformation like in the screenshot, but this didn't work.
It seems that in the Expression column, it can only be built from the functions available. But there isn't a 'isNaN' function that can be used.
Would you know of any other approaches, or how it can be done?
Thanks!
Related
What is the best way to determine the datatype of a column value if the data has already been loaded and the data has been classified as STRING datatype (i.e. BQ table metadata has "STRING" as the datatype for every column)? I've found a few different methods, but not sure if I'm missing any or any of these is substantially more performant. The result should include statistics on the grain of each value, not just per column.
Using a combination of CASE and SAFE_CAST on the STRING value to sum up all the instances where it successfully was able to CAST to X data type (where X is any datatype, like INT64 or DATETIME and having a few lines in query repeat the SAFE_CAST to cover all potential datatypes)
Similar to above, but using REGEXP_CONTAINS instead of SAFE_CAST on every value and summing up all instances of TRUE (a community UDF also seems to tackle this: https://github.com/GoogleCloudPlatform/bigquery-utils/blob/master/udfs/community/typeof.sql)
(For above can also use countif(), if statements etc.)
Loading data into a pandas dataframe and using something like pd.api.types.infer_dtype to infer automatically, but this adds overhead and more components
Thanks!
I have a table with a column1 nvarchar(50) null. I want to insert this into a more 'tight' table with a nvarchar(30) not null. My idea was to insert a derived column task between source and destination task with this expression: Replace column1 = (DT_WSTR,30)Column1
I get the "truncation may occur error" and I am not allowed to insert the data into the new tighter table.
Also I am 100% sure that no values are over 30 characters in the column. Moreover I do not have the possibility to change the column data type in the source.
What is the best way to create the ETL process?
JotaBe recommended using a data conversion transformation. Yes, that is another way to achieve the same thing, but it will also error out if truncation occurs. Your way should work (I tried it), provided the input data really is less than 30 characters.
You could modify your derived column expression to
(DT_WSTR,30)Substring([Column1], 1, 30)
Consider changing the truncation error disposition of the Derived Column component within your Data Flow. By default, a truncation will cause the Derived Column component to fail. You can configure the component to ignore or redirect rows which are causing a truncation error.
To do this, open the Derived Column Transformation editor and click the 'Configure Error Output...' button in the bottom-left of the dialog. From here, change the setting in the 'Truncation' column for any applicable columns as required.
Be aware that any data which is truncated for columns ignoring failure will not be reported by SSIS during execution. It sounds like you've already done this, but it's important to be sure you've analysed your data as it currently stands and taken into consideration any possible future changes to the nature of the data before disabling truncation reporting.
To do so you must use a Data Conversion Transformation, which allows to change the data type from the original nvarchar(50) to the desired nvarchar(30).
You'll get a new column with the required data type.
Of course, you can decide what to do in case of error: truncation, by configuring this component.
UPDATE
As there are people who have downvoted this answer, let's add 3 more comments:
this solution is checked and works. Create a table with a nvarchar(50) column, a new table with a nvarchar(30) column, add a data flow that uses a data conversion transform and it works witout a glitch. Please, chek it, I guarantee. Besides, as the OP states "Also I am 100% sure that no values are over 30 characters in the column" in his case there will be no truncation problems. However, I recommend treating the possible errors, just in case they happen.
from MSDN: "a package can perform the following types of data conversions: ... Set the column length of string data"
from MSDN: "If the length of an output column of string data is shorter than the length of its corresponding input column, the output data is truncated."
I have a table with a package size column with a data type of text that I need to convert to an integer for mathmatical reasons. The values in this column typically look something like "100ML","20GM","UD 20","13OZ" here is where it gets tricky there are occasionally values like "6X12ML","UD 5X6ML". The ones with the "X" in them I need to remove the "ML" I'm currently doing this with
Replace([TABLE_NAME].[COLUMN_NAME],"ML","")
in an expression column in a query. I can use nested Replace functions to remove the "ML","GM","OZ" and "UD ". All of my attempts to do this have failed, I figured the end solution would be something like
IIf([TABLE_NAME].[COLUMN_NAME] Like "X", (CInt(Left([TABLE_NAME].[COLUMN_NAME],InStr(1,[TABLE_NAME].[COLUMN_NAME],"X")-1))*CInt(Right([TABLE_NAME].[COLUMN_NAME],InStr(1,[TABLE_NAME].[COLUMN_NAME],"X")+1))),[TABLE_NAME].[COLUMN_NAME])
I have tried using a variation of the code above with no avail. All suggestions are appreciated, I would preffer to get this knocked out in one query but I do realize I can use and expression and just split the text before and after the "X" into two differenct expression columns. Then use another query to multiply the values.
QTY_ORDERED: IIf(InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X")>1,[CRX_HISTORIC_PO].[QUANTITY]/Left(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X")-1)*Right(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),Len(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""))-InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X"))*-1,[CRX_HISTORIC_PO].[QUANTITY]/Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ","")*-1)
The code above is what I used to complete the task at hand.
I am developing an SSIS 2008 package and I am trying to create a Derived Column transformation. But when I go to the Expression editor and try this expression it gives me alot of errors. I have tried various differentiations of this but all have resulted in errors. I need one of you SQL experts to point out a better expression!
ISNULL(WITHDRAWAL_DATE)||TRIM(WITHDRAWAL_DATE)==""?NULL:CAST(WITHDRAWAL_DATE
AS DATETIME)
So I want this WITHDRAWAL_DATE input String datatype to be compared to an empty string--if it is empty I want it to become Null, otherwise to be cast as a date.
Thanks guys for your helps. I am so confused! WITHDRAWAL_DATE is a DATE data type input in the source XML file and now I have it as a STRING data type in my XSD file. Ultimately the problem is that some of the Withdrawal_Date fields in my XML source data are empty. So I want to insert Null values into my database for these records.
What data type do I need to specify in my XSD, XLST, and SQL output table? And it doesn't matter to me if I use Data Conversion task or Derived Column Xform but since I am new to these, could you send me expression syntax?
#BobS: when I ran with your updated solution, I received error:
The conversion of a datetime2 data type to a datetime data type resulted in an out-of-range value
So when I googled datetime2 datatype it looks like this is a new data type that supports larger time/date fields. So I modified my SQL table to use DATETIME2 instead for this field and modified your cast expression below to use DATETIME2 but then output of transform didn't change accordingly.
I also tried changing WITHDRAWAL_DATE to datetime for all files and then changing SQL Table to say NOT NULL for this field. But this also gave me errors.
I see a couple of issues that you should address with this Derived Column transformation.
First, you can't change the data type of a column, which is what your expression is trying to do. I'm not sure if you're actually trying to do this. But, if your output column is the same as the input column, then you will have to change it. To do this, in the Derived Column editor, the Derived Column Name should be a new column name and the Derived Column should be <add as new column>
The expression needs two changes. To assign a NULL value you use a null function with syntax NULL(data-type). And, the CAST function syntax is (DT_datatype)columnname. So, here's how your expression should look
ISNULL(WITHDRAWAL_DATE) || TRIM(WITHDRAWAL_DATE) == "" ? NULL(DT_DBTIMESTAMP) : (DT_DBTIMESTAMP)WITHDRAWAL_DATE
UPDATE: You should be able to use the expression above; but, I did change it to reference the DT_DBTIMESTAMP data type. The SSIS DT_DBTIMESTAMP data type matches the DATETIME SQL Server data type.
To learn what data type you should use for the source component, you can right-click on the source component and select Show Advanced Editor... Select the inputs and outputs tab. Navigate the tree view to find your column and view the associated data type. The Advanced Editor is available most (maybe all) data flow components.
UPDATE 2: IF your output data type for the Derived Column component is DT_DBTIMESTAMP2 instead of DT_DBTIMESTAMP, make sure you change both DT_DBTIMESTAMP references in your expresson. Before closing the Derived Column component, look at the Data Type column for your expression. You can't change it, but it will show the data type that the expression output will be. If it's not what you want, then there's still a problem with your expression.
For, the source files, you can't change the data type of the external columns. At least, I haven't been able to do it. In SSIS, you have to work with what is interpreted by the Source component. If you can alter the files, to change data type, then great. Then, use the Derived Column component to convert what is giving to what you need.
What errors are you getting? I suspect one of them is a cast error.
Hopefully this is easy to explain, but I have a lookup transformation in one of my SSIS packages. I am using it to lookup the id for an emplouyee record in a dimension table. However my problem is that some of the source data has employee names in all capitals (ex: CHERRERA) and the comparison data im using is all lower case (ex: cherrera).
The lookup is failing for the records that are not 100% case similar (ex: cherrera vs cherrera works fine - cherrera vs CHERRERA fails). Is there a way to make the lookup transformation ignore case on a string/varchar data type?
There isn't a way I believe to make the transformation be case-insensitive, however you could modify the SQL statement for your transformation to ensure that the source data matches the case of your comparison data by using the LOWER() string function.
Set the CacheType property of the lookup transformation to Partial or None.
The lookup comparisons will now be done by SQL Server and not by the SSIS lookup component, and will no longer be case sensitive.
You have to change the source and as well as look up data, both should be in same case type.
Based on this Microsoft Article:
The lookups performed by the Lookup transformation are case sensitive. To avoid lookup failures that are caused by case differences in data, first use the Character Map transformation to convert the data to uppercase or lowercase. Then, include the UPPER or LOWER functions in the SQL statement that generates the reference table
To read more about Character Map transformation, follow this link"
Character Map Transformation