Pentaho treating empty string as null - pentaho

Hello i'm particularly new to pentaho and why when the table_input has an empty string, and when do insert to table_output the data becoming null
for example in the table input
ID|name
1|dan
2|
3|itzy
4|kim
5|(null)
i do preview my query, and it's ok and has empty string
but when it enter the table output it became null,
i check in the db SELECT * FROM TABLE_OUTPUT
ID|name
1|dan
2|(null)
3|itzy
4|kim
5|(null)
My step consists of TABLE_INPUT=>TABLE_OUTPUT
just a simple select and insert
i'm using pdi-ce-9.1.0.0-324
table input is from MYSQL DB, table output is POSTGRESQL 13

I'm afraid it is beyond Pentaho the problem you have, for as you see, It makes no change in the data at all, it only reads and write it. I tried various steps, trying to replace the empty string with a space or a tab, but didn't manage it.
You will have to run the following code in your SQL:
SELECT ISNULL( name , '')
or
SELECT COALESCE( name , '')
This way, you'll be able to work with your DB if there comes a task that doesn't allow null values.
(you can use the 'SQL Script' step in Pentaho and run this after populating the table)
I hope I did help you!
Have a good day!

you need a setting in kettle.properties. The file is in .kettle in your user dir if you use pdi standalone.
KETTLE_EMPTY_STRING_DIFFERS_FROM_NULL = Y
BR Alexander

Related

String split/chunks into chunked columns from one varchar column

Hopefully everyone is having a productive lockdown all over the world. This is my second issue I wanted some assistance with today.
What I have is a chat from a telecom company signing up new customers.
I have successfully collapsed them into 2x rows per unique_id - a unique chat interaction captured between customer and company agent.
I would like to now take each column (text) in each row and separate
it out to 5 equal varchar columns.
The objective is to splice/chunk a
conversation into 5 different stages within this table.
I do not
have access to delimiters as customers and company staff use
delimiting characters themselves so it makes this tricky.
Below I have 2 images with what the data looks like now and what I am looking for.
BEFORE
AFTER
I have looked at the following articles to try to crack it, but am stuck:
Split A Single Field Value Into Multiple Fixed-Length Column Values in T-SQL
How to Split String by Character into Separate Columns in SQL Server
How to split a comma-separated value to columns
How to split a single column values to multiple column values?
Split string in SQL Server to a maximum length, returning each as a row
Here is the SQL Fiddle page, but I am running this code in MS SQL Server: http://sqlfiddle.com/#!9/ddd08c
Here is the table creation code:
CREATE TABLE Table1
(`unique_id` double, `user` varchar(8), `text` varchar(144))
;
INSERT INTO Table1
(`unique_id`, `user`, `text`)
VALUES
(50314585222, 'customer', 'This is part 1 of long text. This is part 2 of long text. This is part 3 of long text. This is part 4 of long text. This is part 5 of long text.'),
(50314585222, 'company', 'This is part 1 of long text This is part 2 of long text This is part 3 of long text This is part 4 of long text This is part 5 of long text'),
(50319875222, 'customer', 'This is part 1 This is part 2 This is part 3 This is part 4 This is part 5'),
(50319875222, 'company', 'This is part 1 This is part 2 This is part 3 This is part 4 This is part 5')
;
I have requested an almost similar algorithm in R, in my history. I have been trying to do this in SQL.
I have manage to solve this with the T-SQL statement below:
WITH DataSource AS
(
SELECT *
,'\b.{1,'+CAST(CEILING(LEN([text]) * 1.0 /5) AS VARCHAR(12)) +'}\b' AS [pattern]
FROM TAble1
), PreparedData AS
(
SELECT unique_id
,[user]
,'text' + CAST(RM.matchID + 1 AS VARCHAR(12)) as [column]
,RM.CaptureValue AS [value]
FROM DataSource T
CROSS APPLY [dbo].[fn_Utils_RegexMatches] ([text], [pattern]) RM
)
SELECT *
FROM PreparedData DS
PIVOT
(
max([value]) for [column] IN ([text1], [text2], [text3], [text4], [text5])
) PVT;
In order to use this code, you need to implement SQL CLR function(s) for working with regular expression in the context of T-SQL (you need to invest some time understanding how SQL CLR works) - otherwise, you will not be able to use this solution.
So, having RegexMatches function, the first part is to build a regular expression pattern for splitting the data:
SELECT *
,'\b.{1,'+CAST(CEILING(LEN([text]) * 1.0 /5) AS VARCHAR(12)) +'}\b' AS [pattern]
FROM TAble1;
The pattern is \b.number\b and will match part of the strings with length number but not cutting the words (check if boundary works for you, because in some cases it won't).
Then, using our regex matches function we getting a result like this (the second common table expression):
And the data above is ready for pivoting which is pretty easy.
So, the notes are:
you need to implement Microsoft String Utility
you need to ensure the regex pattern works for you
you can split the T-SQL I used, check the other columns of the regex function and even make dynamic pivoting - the code is an example and need to modify/check it before using in production

Access SQL Date Function

So I'm working on editing some SQL code and I've just began learning it. I'm trying to fix an update query so it updates a table's value5 column with a corresponding database value. The value type from the database is a number, which I want to convert to a date and place into my table. The database number is in yyyymmdd format so I've been trying to use datefromparts() which doesn't work. Anyone have any ideas?
UPDATE tbl INNER JOIN dB ON
(dB.value1= tbl.value1 OR
dB.value2 =tbl.value2 ) AND
(LEFT(dB.value3 ,5)=tbl.value3 ) AND
(dB.value4 =tbl.value4 )
SET tbl.value5 = DateFromParts(Left(dB.value5,4),Mid(dB.value5,5,2),Right(dB.value5,2))
WHERE tblInvoice.value5 IS NULL;
The current program uses the code
"SET tbl.value5 = dB.value5"
instead (it runs perfectly fine) and I am having another issue with testing the conversion SQL code (datefromparts()). Because I am converting from numbers to time/date, I have to go into the design view of the target table and change the input data type of the value5 column from numbers to time/date. When I run the query with the conversion SQL code, the query stalls for a bit and no values get updated, leaving me with just a blank value5 column. If I now want to fill in the original number values, I change the SQL code back into its original "SET tbl.value5 = dB.value5", change the input data type from time/date to numbers, and rerun the program. The query stalls and no values are updated, and I am again left with blank columns, even though the same code left me with the corrected update values before the modifications to the SQL and table input Data types. I come from a VBA background and I'm just really confused with how this is working. Any tips would be appreciated, thanks!
Have you tried with substring instead?
SELECT DATEFROMPARTS ( left('20101231',4), substring('20101231',5,2), right('20101231',2) ) AS Result;
MS Access (and MS Jet too) have no DateFromParts function. Using DateSerial instead.
SET tbl.value5 = DateSerial(Left(dB.value5, 4), Mid(dB.value5, 5, 2), Right(dB.value5, 2))
It's not clear if you work with T-SQL or Access SQL. In Access, you can use Format:
SET tbl.value5 = CDate(Format(dB.value5, "####\/##\/##"))
In T-SQL you could use a similar method.

Copy section of data string from one column to another column in SQL

Table and (columns) in question are:
Attachment (att_name and att_path)
I make this call to Select the info I want to update:
select * from attachment
where att_path like '%//bahamas/attachments/images/logos/%'
I need to update the att_name column with the filename in the path above. For example, if SQL finds "//bahamas/attachments/images/logos/ABCDE.tif" I need SQL to update the att_name to replace whatever is currently in there and insert ABCDE.tif
I have tried multiple different test's on just one item and I can't seem to get my SQL correct to implement a global call where this update runs on all rows where att_path like '%//bahamas/attachments/images/logos/%'
Any advise/help is greatly appreciated.
update attachment set
att_name = Right(att_path, CharIndex('/', Reverse(att_path)) - 1)
where att_path like '%//bahamas/attachments/images/logos/%'
should do what you want.
Reversing and then searching the first occurrence of / in effect searches from the end.
Before running the update, however, I usually run a select to check if I hit the correct character positions when developing statements like the following, as it is easy to get off-by-one errors with string manipulations:
select att_path, Right(att_path, CharIndex('/', Reverse(att_path)) - 1)
from attachment
where att_path like '%//bahamas/attachments/images/logos/%'

Display hindi data store in nvarchar datatype without N prefix

SQL Server 2008 - Table contains nvarchar(max) datatype and store hindi & english data without N' prefix. like - "मांगलिक welcome" but in table store as "×梻çÜ·¤ welcome".
Please guide how to display the data from SQL server in .net.
The N prefix only denotes the string is NVARCHAR as opposed to VARCHAR
See this for more info
C# is Unicode by default so your data will be ok.
In fact re-reading your question I'm not sure what you are asking.
Are you saying you store the data in the database WITHOUT the N prefix ? Is this done via .net ?
Can you please make your question clearer ?
** EDIT
I'm not sure you can. The data outside of the non Unicode code page will be lost.
Check this page here for further details
First try to create a table as shown:
Create table TestLang (strText nvarchar(max))
Next try to insert values
insert into TestLang values ( N'मांगलिक')
insert into TestLang values ( N'Welcome')
Now try to search the name as shown:
SELECT * FROM TestLang WHERE strText LIKE N'मां%'
UPDATE:
If you want to display the data try this way:
string input = "0928;0940;0932;092E;";
Regex rx = new Regex(#"([0-9A-Fa-f]{4});");
string output = rx.Replace(input, match => ((char)Int32.Parse(match.Groups[1].Value, NumberStyles.HexNumber)).ToString());
Output: "नीलम"
Took from here

Searching a column containing CSV data in a MySQL table for existence of input values

I have a table say, ITEM, in MySQL that stores data as follows:
ID FEATURES
--------------------
1 AB,CD,EF,XY
2 PQ,AC,A3,B3
3 AB,CDE
4 AB1,BC3
--------------------
As an input, I will get a CSV string, something like "AB,PQ". I want to get the records that contain AB or PQ. I realized that we've to write a MySQL function to achieve this. So, if we have this magical function MATCH_ANY defined in MySQL that does this, I would then simply execute an SQL as follows:
select * from ITEM where MATCH_ANY(FEAURES, "AB,PQ") = 0
The above query would return the records 1, 2 and 3.
But I'm running into all sorts of problems while implementing this function as I realized that MySQL doesn't support arrays and there's no simple way to split strings based on a delimiter.
Remodeling the table is the last option for me as it involves lot of issues.
I might also want to execute queries containing multiple MATCH_ANY functions such as:
select * from ITEM where MATCH_ANY(FEATURES, "AB,PQ") = 0 and MATCH_ANY(FEATURES, "CDE")
In the above case, we would get an intersection of records (1, 2, 3) and (3) which would be just 3.
Any help is deeply appreciated.
Thanks
First of all, the database should of course not contain comma separated values, but you are hopefully aware of this already. If the table was normalised, you could easily get the items using a query like:
select distinct i.Itemid
from Item i
inner join ItemFeature f on f.ItemId = i.ItemId
where f.Feature in ('AB', 'PQ')
You can match the strings in the comma separated values, but it's not very efficient:
select Id
from Item
where
instr(concat(',', Features, ','), ',AB,') <> 0 or
instr(concat(',', Features, ','), ',PQ,') <> 0
For all you REGEXP lovers out there, I thought I would add this as a solution:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]';
and for case sensitivity:
SELECT * FROM ITEM WHERE FEATURES REGEXP BINARY '[[:<:]]AB|PQ[[:>:]]';
For the second query:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]' AND FEATURES REGEXP '[[:<:]]CDE[[:>:]];
Cheers!
select *
from ITEM where
where CONCAT(',',FEAURES,',') LIKE '%,AB,%'
or CONCAT(',',FEAURES,',') LIKE '%,PQ,%'
or create a custom function to do your MATCH_ANY
Alternatively, consider using RLIKE()
select *
from ITEM
where ','+FEATURES+',' RLIKE ',AB,|,PQ,';
Just a thought:
Does it have to be done in SQL? This is the kind of thing you might normally expect to write in PHP or Python or whatever language you're using to interface with the database.
This approach means you can build your query string using whatever complex logic you need and then just submit a vanilla SQL query, rather than trying to build a procedure in SQL.
Ben