Replace null value with NA using Pentaho Kettle

Replace null value with NA using Pentaho Kettle - pentaho

I have an input csv file with one column field value as empty. I want to replace that field value as NA in my destination table. And in my destination table that column is specified as not null column.
I tried using if field value is null, value mapper step. but it doesnt work out.. can anyone suggest how to proceed.

The NULLS can not be replaced using If field value is null step if you enable Lazy Conversion in CSV input step.
So untick the lazy conversion? check box in CSV Input step. Then in If field value is null step check Select fields check box and select the field you want to check nulls and type NA in Replace by value column.

there is a specific step that does just that-
it replaces null values - In the Step you have a choice to a) pick either 1 or more field type(s) (STRING, INTEGER, etc.) or b) identify specific field(s) - then you provide a replacement string if you wish.

Related

How can I force BigQuery to autodetect schema with all strings?

Is there a way to use --autodect in BigQuery forcing all new fields to be treated as strings?
The issue is the following: I have a csv file separated by \t and where all fields are quoted like this '67.4'. Now, if I simply provide a schema, then the bq load breaks for reasons I cannot understand. If I do bq load --autodect it works fine, but the values are still quoted. Now, I tried to do
bq load --autodetect --quote="'" --max_bad_records=10000
--field_delimiter="\t" --source_format=CSV
repo:abc.2017 gs://abc/abc_2017-*.csv.gz
But it now breaks wih
- gs://abc/abc_2017-04-16.csv.gz: Error while reading data,
error message: Could not parse '67.4' as int for field
int64_field_35 (position 35) starting at location 2138722
Here's one row, fields again are separated by tabs:
'333933353332333633383339333033333337' '31373335434633' 'pre' 'E' '1' '333933383335333833393333333333383338' '2017-02-01
05:13:59' '29' '333733333330333033323339333933313335333333303333333433393336' '333333353331333933363338333033373333333833323338333733323330' '3333343234313434' 'R' 'LC' '100' '-70.2' '-31.34' 'HSFC310' 'WOMT24I' '146' '1' '05'
Ideas?

Auto detect schema samples up to the first 100 rows so if the column contains all integer for up to the first 100 rows then the data type will be integer. The purpose of --qoute flag is to enclose the column with the specified value.
Example:
Sample csv data:
col1, col2
1, "2"
If you don't specify the --quote then by default it will be ". The data type for col2 will be Integer and the value will be 2.
If you specify the --quote other than the default " then it will enclose the data with that value. Example: --quote="'", col2 will be String type and data value will be "2" (the double quotes itself will be part of the data value)
As of now you can't force auto-detect schema to make all your columns to be of certain datatype, otherwise, it wouldn't be auto-detect after all. You may want to file a feature request to add another flag for bq load (and even in the UI) to make certain columns to be of certain data type (e.g. I want to make column # 1, 2, 15, 100, xxx to be String or All columns should be String/Integer/Numeric, etc...).

How do I split an ssrs text value by comma?

I have an SSRS report that takes a parameter called Customer ID Enroller List. Its datatype in SSMS is varchar(max) and its datatype in SSDT/SSRS is listed as text.
As an example, the user may pass in 2 customerID's like the following:
2110012639,2110179997
I'd like to create document map based the passed parameters but I need to split the values first. I've tried using the following code:
=Split(Parameters!CustomerID_EnrollerList.Value,",")
My report runs but the value returned in the textbox is #Error. Any ideas on how to split a text datatype parameter by a comma delimiter?

Split function returns an array and you can select item by its index
First value would be
=Split(Parameters!CustomerID_EnrollerList.Value,",")(0)
And Second value would be
=Split(Parameters!CustomerID_EnrollerList.Value,",")(1)

Replacing multiple null value columns with blank in tableau

I have data source which consist of 16 columns. Many of these columns consist of null values. I have to replace null value columns with blank (all columns at a time).
When I surfed on internet I got one solution as create calculated field to replace null values using IFNULL() function. If I'll use this solution then I've to create 16 calculated fields for every single column.
Is there any solution so that I can replace all null columns with blank simultaneously? Is there any GLOBAL setting which will help me to achieve this?
Thank you.

BLANK is the default replacement for NULL in Tableau.
You can set how Tableau treats NULLs by measure:
Right-Click the measure and select 'Format':
On the left, you will see "Special Values" section:
Set the test to whatever you want.

Use LOOKUP() function, for the columns where values are present, it shows the value else blank or '-'.
For e.g LOOKUP(SUM(Sales) , 0)
for null

How to check remove numbers from a string?

I'm using an SSIS package to bring data through data from one table to another. However, I have a predicament where a field in the table(GroupName) brings through data with numbers at the end. This comes in two forms, either the string will be a name and then a set of numbers less than 4 characters in length. (E.g - Group Name 22)
Or it will come as a name and four numeric characters. (E.g Group Name 2012). Now I'd like to do a check on the data in SQL to see if the length of numeric characters at the end of the string is less than 4. If so, remove the numbers.
Can anyone help

You can use patindex
SELECT
SUBSTRING('Group Name 2012'
,PATINDEX('%[0-9]%'
,'Group Name 2012')
,LEN('Group Name 2012')) as NumberOnly
,LEN( SUBSTRING('Group Name 2012'
,PATINDEX('%[0-9]%'
,'Group Name 2012')
,LEN('Group Name 2012'))) as Numberlength

Alternatively add a Derived column transformation with
NumericCheck= RIGHT(stringvariable,4)
and then in a separate Derived column transformation
(DT_I4)Numeric_check == (DT_I4)Numeric_check ? 1 : 0
Note: You will need to Configure the Error output to "Ignore Failure" for this check. Then have a conditional split which sends the zero values to be updated via an OLE DB Command

SQL Server comma delimiter for money datatype

I import Excel files via SSIS to SQL-Server. I have a temp table to get everything in nvarchar. For four columns I then cast the string to money type and put in my target table.
In my temp table one of those four columns let me call it X has a comma as the delimiter the rest has a dot. Don't ask me why, I have everything in my SSIS set the same.
In my Excel the delimiter is a comma as well.
So now in my target table I have everything in comma values but the X column now moves the comma two places to the right and looks like this:
537013,00 instead of 5370,13 which was the original cell value in the temp and excel column.
I was thinking this is a culture setup problem but then again it should or shouldn't work on all of these columns.
a) Why do I receive dot values in my temp table when my Excel displays comma?
b) how can I fix this? Can I replace the "," in the temp table with a dot?
UPDATE
I think I found the reason but not the solution:
In this X column in excel the first three cells are empty - the other three columns all start with 0. If I fill these three cells of X with 0s then I also get the dot in my temp table and the right value in my target table. But of course I have to use the Excel file as is.
Any ideas on that?

Try the code below. It checks whether the string value being converted to money is of numeric data type. If the string value is of numeric data type, then convert it to money data type, otherwise, return a NULL value. And it also replaces the decimal symbol and the digit grouping symbol of the string value to match the expected decimal symbol and digit grouping symbol of SQL Server.
DECLARE #MoneyString VARCHAR(20)
SET #MoneyString = '$ 1.000,00'
SET #MoneyString = REPLACE(REPLACE(#MoneyString, '.', ''), ',', '.')
SELECT CAST(CASE WHEN ISNUMERIC(#MoneyString) = 1
THEN #MoneyString
ELSE NULL END AS MONEY)
As for the reason why you get comma instead dot I have no clue. My first guess would be cultural settings but you already checked that. What about googling, did you get some results?

First the "separator" in SQL is the decimal point: its only excel that is using the comma. You can change the formatting in excel: you should format the excel column as money and specify a decimal point as the separator. Then in the SSIS import wizard split out the transformation of the column so it imports to a money data type. Its a culture thing, but delimiter tends to be used in the context of signifying the end of one column and the start of the next (as in csv)
HTH

Well thats a longstanding problem with excel. It uses the first 30 or so rows to infer data type. It can lead to endless issues. I think your solution has to be to process everything as a string in the way Yaroslav suggested, or supply an excel template to have data predefined and formatted data type columns, which then have the values inserted. Its a pita.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Replace null value with NA using Pentaho Kettle - pentaho

there is a specific step that does just that- it replaces null values - In the Step you have a choice to a) pick either 1 or more field type(s) (STRING, INTEGER, etc.) or b) identify specific field(s) - then you provide a replacement string if you wish.

Related

How can I force BigQuery to autodetect schema with all strings?

How do I split an ssrs text value by comma?

Replacing multiple null value columns with blank in tableau

How to check remove numbers from a string?

SQL Server comma delimiter for money datatype

Categories

Resources