exclude data having specific value [substring within a string] using pentaho - pentaho

I have a column "Number field" (Excel sheet). It has value as shown below.
Test_Number Number_field
1 0011 10 00A34 PS
2 0011 10 00A34 PS
3 0010 01 00A30 PS
4 0010 01 00A30 PS
5 0010 01 00A35 PS
6 0010 01 00A35 PS
Now, from these i need to remove those which contains "0A34" and "0A35". How can i achieve this? I tried "filter" option, but I cannot search substring in a string using this. Please help

You can simply do this in two steps as follows.
In Filter rows step you add the following conditions.

Use combination of User Defined Java Expression step with following parameters:
Java expression: (Number_field.indexOf("0A34") != -1 || Number_field.indexOf("0A35") != -1) ? "Remove" : "Ok"
Value type: String
New field: is_row_to_remove
and Filter rows step with this parameters:
The condition: `is_row_to_remove = Remove (String)
Send 'true' data to step: Your next step
Send 'false' data to step: Dummy (do nothing) step
Flow explanation:
User Defined Java Expression: Java code finds 0A34 or 0A35 and marks such a row with Remove value in a new field is_row_to_remove
Filter rows: The step filters record stream according to value in is_row_to_remove. If value is set to Remove then continues with Dummy step. Otherwise continues to your next step.

If you want to do that in excel itself then you can use below formula and have filter on that to remove the records from your excel.
Add below formula and drag it upto your all records. Create filter on this new formula column and then remove the records.
=IF(OR(IFERROR( SEARCH("A34",B2), 0),IFERROR( SEARCH("A35",B2), 0)), "REMOVE", "KEEP")
check snap below.
Hope this will help you.
If it helps then mark it as answer.

Related

Counting Rows of one2many field odoo

How can I count the rows of a one2many field and output the current row count ?
This is how it should work.For example:
1 ------column1------------
2 ------column2------------
3 ------column3------------
4 ------Column4------------
etc..
I have made an automated action for this, but it does not work as intended:
The automated action refers to the model of the one2many field and is triggered when a record is created. The following Python code is executed:
for line in record.picking_id.move_line_ids_without_package:
for rec in str(record.x_studio_position):
record['x_studio_position'] = len(record.picking_id.move_line_ids_without_package)
What happens is the following for e.g. 4 columns
4 ------column1------------
4 ------column2------------
4------column3------------
4 ------column4------------
It will write the total number in each row instead of the current column number.
You set the position to the number of lines in the field move_line_ids_without_package, it will be the same for all lines.
You can use enumerate to get the line sequence
Example:
for index, line in enumerate(record.picking_id.move_line_ids_without_package):
line['x_studio_position'] = index + 1

Try to check if column consists out of 3 numbers, and change the value to the first number of the column

I'm trying to create a new column called 'team'. In the image below you see different type of codes. The first number of the code is the team someone's in, IF the number consists out of 3 characters. E.G: 315 = team 3, 240 = team 2, and 3300 = NULL.
In the image below you can see my data flow so far and the expression I have tried, but doesn't work.
You forget parenthesis () in your regex :
Try :
^([0-9]{3})$
Demo

OpenRefine: Remove row if specific cell in this row is empty

The input for OpenRefine is a csv file containing data like this
phy,205.4,,,Unterwasserakustik. Sonar-Technik,,
phy,205.6,,,Lärm. Lärmbekämpfung. Schallschutz. Filter (vgl.a.san 525),,
phy,205.9,,,Sonstiges,,
,,,,,,
,,Wärme. Statistische Physik (Temperaturstrahlung s. phy 495),,,,
,220,,Gesamtgebiet,,,
I would like to remove all rows where the second column (the numeric code) is empty.
In Open Refine I created a Facet->CustomizedFacet->FacetByBlank on the second column. In the appearing menu on the left, I clicked true (197 false, 2 true - which is correct). Then, I went to All->EditRows->RemoveAllMatchingRows. Instead of removing only the two rows, OpenRefine removes 143 rows and no data is shown anymore.
What has happend? And how can I remove only the two rows with an empty second column?
It might be connected to the row counter in the All column: The first time the entry in the first column "phy" is missing, there is no row count anymore.
1. phy 205.4 ...
2. phy 205.6 ...
3. phy 205.9 ...
Wärme...
220 ...
The 220 row does not contain the "phy" column and is incorrectly ignored.
It looks like you may be operating in "record mode" as opposed to "row mode." If the facet says 197 true, 2 false, you should only see two rows displayed on the screen when you go to do your delete. If you see more than that try selecting Row mode.

iteration in spark sql dataframe , getting 1st row value in first iteration and second row value in next iteration and so on

Below is the query that will give the data and distance where distance is <=10km
var s=spark.sql("select date,distance from table_new where distance <=10km")
s.show()
this will give the output like
12/05/2018 | 5
13/05/2018 | 8
14/05/2018 | 18
15/05/2018 | 15
16/05/2018 | 23
---------- | --
i want to use first row of the dataframe s , store the date value in a variable v , in first iteration.
In next iteration it should pick the second row , and corresponding data value to be replaced the old variable b .
like wise so on .
I think you should look at Spark "Window Functions". You may find here what you need.
The "bad" way to do this would be to collect the dataframe using df.collect() which would return a list of Rows which you can manually iterate over each using a loop.This is bad cause it brings all the data in your driver.
The better way would be to use foreach() :
df.foreach(lambda x: <<your code here>>)
foreach() takes a lambda function as argument which iterates over each row of the dataframe without bringing all the data in the driver.But you cant use a simple local variable v inside a lambda fuction when there is overwriting involved.you can use spark accumulators for such a case.
eg: if i want to sum all the values in 2nd column
counter = sc.longAccumulator("counter")
df.foreach(lambda row: counter.add(row.get(1)))

How to display the numeric numbers

Here's the content of my DataGrid
id
1
2
3A
4
5
6A
..
...
10V1
I want to get the max number from the datagrid. Then, I want to
display the next number (In this case: 11) in the textbox beside the grid
Expected Output
id
1
2
3A
4
5
6A
..
...
10V1
11
I tried the following code:
textbox1.text = gridList.Rows(gridlist.RowCount() - 1).Cells(1).Value + 1
It works if the previous row values is entirely numeric. However, if the value is alpahnumeric, I am getting the following error:
Conversion from string "10V1" to type 'Double' is not valid.
Can someone help me solve this problem? I am looking for a solution in VB.Net
You may want to look into Regex to do that (based on what I understand from your question)
Here's a related question on this.
Regex.Match will return the part of the string that will match the expression... In your case, you want the first number in your string (Try "^\d+" as your expression, it will find any serie of numbers at the beginning of your string). You can then convert the result string into an int and add 1 to it.
Hope this helps!
Edit: Here's more info on regex expressions.