Pentaho Data integration: "group by" with Concatenate strings separated by ";" instead of "," - pentaho

I have to concatenate strings from different fields. The output should be all strings in on field seperated by semi-colons.
The "Group by" Transformation step does everything i need to do, but I can only separate them with " " or with tabs or with ",".
The strings i need to concat contain "," themselves so i cant use "," to separate them?

The Group By step has a second operation "Concatenate strings separated by" (without the comma). Using that, you can enter the separator in the value column.

Related

Extract text using GREL in OpenRefine

I'm trying to add a column based on a column in OpenRefine using GREL.
I need to extract every text after the second space in scientific name.
Here is two examples of the original cell data ---> what I want to extract:
Amandinea punctata (Hoffm.) Coppins & Scheid. ---> (Hoffm.) Coppins & Scheid.
Agonimia tristicula (Nyl.) Zahlbr. ---> (Nyl.) Zahlbr.
Here are three ways to achieve the desired result on the given data, ordered from easy to understand to more advanced.
Use column splitting
You can split the column into three columns by choosing a whitespace as separator and limit the number of new columns to 3 in the corresponding dialog. Then you can delete the first two columns and have your desired result.
Use Array functions
You can use the same technique via GREL and arrays... split on whitespace, discard the first two entries and join the rest on whitespace.
value.split(" ").slice(2).join(" ")
Use regular expressions
You can also use the match function with a regular expression.
value.match(/\S+\s\S+\s(.+)/)[0]
A solution :
partition on what appears to be a good separator : " (", take the right part and add a missing "(" at the beginning.
"("+value.partition(" (")[2]

How do I sql query for records containing an ampersand in MS Access?

I have a database of customer names and am looking to select only the names that contain an ampersand somewhere in them. I have concatenated the first and last name together and just need to return any records that contain an ampersand.
I've tried using contains and like %&% but neither of those is working because MS access seems to see the ampersand as a variable of some sort?
Here's the code that gives me the lsit of full name, as well as a unique identifier I can use to find the record in the original database.
SELECT [BillTo_FirstName] & " " & [BillTo_LastName] AS FullName,
June_ampersand.MerchantReferenceNumber
FROM June_ampersand
;
MS Access uses different wildcards for LIKE operator - try LIKE '*&*' condition

Split column preserving values

I have a column of words followed by numbers, like this:
I want to split it into two columns, putting the text to the left of the digits in the first column, and the digits and any text that follow into the second column.
I suspect I'll have to add a column based on this column, containing the digits and everything after. Then I'll have to delete the digits and everything after from the previous column.
I'm not great at GREL, and the examples I've found don't work. Help?
There are several ways. If you don't like GREL but you know some regular expressions, you can use "Edit column" -> Split into several columns "and use as separator this regex :
\s(?=\d)
It means "any space that is before a number".
(Don't forget to check the box "regular expression".)
If any of your values contain multiple numbers (eg, "text 123 newtext 345 sometext"), specify "split into 2 columns at most".

sas sql : filter out corrupted row

I need to copy data from own table to another and filter out corrupted rows;
I have a column with dates and sometimes I have rows like this " . " - random number of spaces and one dot.
how can I make my sql to ignore these rows?
i tried to make using
where (trim(put(DatesOfRun) not like '.'
and multiple other variance of
"where not like"
or
"where <>"
but all of them gave me an errors like
"Expression using equals (=) has components that are of different
data types."
or
ERROR 22-322: Syntax error, expecting one of the following:
and a long list of operators
First, you need to confirm if this is a character or a numeric field. . is how SAS displays null (missing in SAS speak) for numerics, so it's entirely possible you have a numeric field.
where not missing(DatesOfRun)
or
where DatesOfRun is not null
Either of those should do it, if it's numeric.
If it is character, then it's fairly simple.
where not (strip(DatesOfRun) = '.')
trim only trims blanks at the end, strip removes from both sides.
It's also possible you have non-breaking spaces or other things that are going to mess the latter up; if the strip one works as in doesn't error, but doesn't actually remove the characters, you may want to use a data step and put that variable to the log using $HEX32. format (with appropriate width, 2 times the number of characters possible), and see what comes out; if you don't recognize the characters or don't know how to handle ASCII codes, come back here and ask a new question with that information.
Just to clarify, you are trying to ignore results where the DatesOfRun column contains the character '.'? If so, you may want to use wildcard operators if the '.' can appear in random locations, such as '.%' or '%.%'
Also, check the datattype of the DatesOfRun column; this could influence results as well.
Two WHERE clauses could potentially solve your issue; try using this WHERE clause and see if it throws an error:
WHERE DatesOfRun is not null
AND DatesOfRun not like '%.%'

Inserting Strings Without Trailing Spaces SQL

I have a database with a field named Field1 that has 100 nchars per entry.
Each time I add a value, it is stored as:
"value (100-ValueLength Spaces) "
So Basically each stored value has a string of spaces after it. This is getting to be an issue when I try doing:
if (value == "Example")
because of all of the empty spaces after the string.
How can I get it so the stored values don't have all of these trailing spaces?
If you want a variable-length string, use nvarchar(100) instead of nchar(100). The later always has 100 characters, the former can have up to 100 characters, but doesn't fill up the space.
Use the sql LTRIM and RTRIM functions when inserting.
Are you able to use a nvarchar, so that way there isnt padding added if you don't meet the required string length. If so that might be better then constantly having to trim your string entry.