Create a conditional column using a specific string as a delimiter in pentaho - sql

Im trying to create a conditional column in pentaho spliting by the delimiter "NF" on the image below...
I've tried a lot of things, like filter rows, split columns and etc, but as specif string being requested i think that is better way to do this, can someone help pls?
I've tried filter rows, split fields, and a function in the formula step

You don't state the output you are trying to get from the column with the NF delimiter, lets say you are trying to get two new columns:
IMPOSTOS
BEFORE_NF
AFTER_NF
PIS APURACAO S/NF 0001 TAG COMERCIO
PIS APURACAO S/
0001 TAG COMERCIO
COFINS APURACAO S/NF 0002 TAG COMERCIO
COFINS APURACAO S/
0002 TAG COMERCIO
To get this outcome you can use the Regex Evaluation step, that uses this regex formula to separate your column:
(.*)(NF\s)(.*)
This separates your text in 3 groups: text before "NF ", the text "NF " and the text after "NF ".
The Regex evaluation step also has the ability to create another column with a flag to indicate if the regex formula was successful (the formula match the text or not).

Related

Getting specific rows in a Powershell variable/array

I hope I'm able to ask my question as simple as possible. I am very new to working with PowerShell.
Now to my question:
I use Invoke-Sqlcmd to run a query, which puts Data in a variable, let's say $Data.
In this case I query for triggers in an SQL Database.
Then I kind of split the array to get more specific information:
$Data2 = $Data | Where {$_.table -like 'dbo.sportswear'}
$Data3 = $Data2 | Where {$_.event -match "Delete"}
So in the end I have a variable with these Indexes(?), I'm not sure if they are called indexes.
table
trigger_name
activation
event
type
status
definition
Now all I want is to check something in the definition.
So I create a $Data4 = $Data3.definition, so far so good.
But now I have a big text and I want only the content of 2-3 specific rows.
When I used like $Data4[1] or $Data4[1..100], I realized that PowerShell sees every char as a line/row.
But when I just write $Data4 it shows me the content nice formatted with paragraphs, new lines and so on.
Has anyone an idea how I can get specific rows or lines of my variable?
Thank you all :)
It appears $Data4 is a formatted string. Since it is a single string, any indexed element lookups return single characters (of type System.Char). If you want indexes to return longer substrings, you will need to split your string into multiple strings somehow or come up with a more sophisticated search mechanism.
If we assume the rows you are after are actual lines separated by line feed and/or carriage return, you can just split on those newline characters and use indexes to access your lines:
# Array indexing starts at 0 for line 1. So [1] is line 2.
# Outputs lines 2,3,4
($Data4 -split '\r?\n')[1..3]
# Outputs lines 2,7,20
($Data4 -split '\r?\n')[1,6,19]
-split uses regex to match characters and perform a string split on all matches. It results in an array of substrings. \r matches a carriage return. \n matches a line feed. ? matches 0 or one character, which is needed in case there are no carriage returns preceding your line feeds.

SQL functions in NetSuite saved search results - how to fix these functions?

I am trying to achieve the following in the context of NetSuite saved search results output.
1. Remove every character after the first hyphen (-) or a colon (:) including space right before either of these characters.
So for e.g.
Input: test 123 - xyz : 123
this should output as test 123 -> this should even remove the space that you see right before the hyphen.
I tried the below two codes
SUBSTR({custitem123}, 0, INSTR({custitem123}, '-')-1)
SUBSTR({custitem123}, 0, INSTR({custitem123}, ':')-1)
And these work fine on their own- so I am trying to combine these in one single formula that will look for either of these and remove all characters after them -- apart from this, it should also look for any space right before the hyphen or colon and replace it with nothing. Not sure how you would achieve this.
2. Remove all non-alphabet characters & space before the alphabet characters (if any).
for e.g. Input: 1. Test XYZ
This should have Output as:
Test XYZ
I tried achieving this by using the below formula-
TRIM({class}, '[^A-Za-z ]', '')
The problem with this approach is it fails to replace the space character before the first alphabet of Test. I understand this is because I told it to skip replacing space characters. What I don't know is how do I tell it to only replace the space that it finds before the first alphabet character.
In short, how do I make sure the output is:
Test XYZ
And not
Test XYZ (that has a space before Test)
You can use regexp_substr as
regexp_substr({custitem123}, '[^-]+') to extract test 123 only from Input: test 123 - xyz : 123
if you add trim also, then you can get whitespaces around trimmed as
e.g. trim(regexp_substr({custitem123}, '[^-]+')) gives test 123 as trimmed output.
use RTRIM instead of Trim to remove the trailing whitespaces like this:
RTRIM(regexp_substr({custitem123}, '[^-]+'))
test 123 - xyz : 123 resolves to test 123
Also thanks for asking this question helped me solve my own similar issue :D

Regex to match BIN ranges

I'm trying to write a regex that matches the numbers 456725 to 456744 (Last 2 digits, 25-44), but can't seem to figure out a correct regex format. I've tried ^(4567[2-4][0-9]) but using this also matches 456745 which it shouldn't.
If you do it like ^(4567[2-4][0-9]), you are allowing any number in the range between [2-4] together with any number in the range between [0-9], which is obviously not what you wanted.
So you need to change for something like:
^4567(?:2[5-9]|3[0-9]|4[0-4])
Explanation
^ asserts position at start of the string
4567 matches the characters 4567 literally
Non-capturing group (?:2[5-9]|3[0-9]|4[0-4])
1st Alternative 2[5-9]
2 matches the character 2 literally
Match a single character present in the list [5-9]
2nd Alternative 3[0-9]
3 matches the character 3 literally
Match a single character present in the list [0-9]
3rd Alternative 4[0-4]
4 matches the character 4 literally
Match a single character present in the list [0-4]
You could use the page regex101 to learn more and read good explanations on the subject. Hope it helps.
If your variable is just an integer it is best to just compare it as such...
For the regex though..the ^(4567 is correct your issue is the [2-4] and [0-9] those are independent of each other. You need to put the pieces together so only 25-29 and 40-44 are allowed.
This should get you on the right track:
^(4567(?:2[5-9]|3[0-9]|4[0-4]))$

Escape all commas in line except first and last

I have a CSV file which I'm trying to import to a SQL Server table. The file contains lines of 3 columns each, separated by a comma. The only problem is that some of the data in the second column contains an arbitrary number of commas. For example:
1281,I enjoy hunting, fishing, and boating,smith317
I would like to escape all occurrences of commas in each line except the first and the last, such that the result of this line would be:
1281,I enjoy hunting\, fishing\, and boating,smith317
I know I will need some type of regular expression to accomplish this task, but my knowledge of regular expressions is very limited. Currently, I'm trying to use Notepad++ find/replace with regex, but I am open to other ideas.
Any help would be greatly appreciated :-)
Okay, could be a manual stuff. Do this:
Normal find all the , and replace it with \,. Escape everything.
Regex find ^(.*)(\\,) and replace it with $1,.
Regex find (\\,)(.*)$ and replace it with ,$2.
Worked for me in Sublime Text 2.

SQL cannot search

In my SQL table Image, when i perform a search query
SELECT * FROM Image WHERE platename LIKE 'WDD 666'
it return no result(using other column to search then no problem).
The all the column data was inserted by C# code. (If enter data manually search works.)
now i suspect that the words WDD 666 wasn't english alphabet. is this possible?
In c#,
the plate number was generate by using tesseract wrapper string type.
what should i do to search the plate number?
Thanks in advance and sorry for my bad English.
Since your case matches, I'm going to rule out Case-sensitivity.
There may be leading or trailing blank spaces - Try this..
SELECT * FROM Image WHERE platename LIKE '%WDD 666%'
Try running this command:
SELECT '*'+plateName+'*',len(plateName)
FROM image.
I suspect platename has some non-printable characters in the field.
It appears to be a CR/LF at the end of the data. You can use
UPDATE image SET plateName = replace(plateName,char(13)+char(10),'')
WHERE plateName like '%'+char(13)+char(10)+'%'
If you get a positive row count, you'll know there was CR/LF data and it was removed. If you run the select afterwards, your lengths should be 7 and 8 based on your sample data