Find substring in array - mule

I want to count the count the number occurences of a substring in a field in an array.
Example:
The XML below has 3 occurences of the substring 'TXT1' in Field1.
<Level1>
<Level2>
<Field1>10000TXT1</Field1>
<Field1>TXT210000</Field1>
<Field1>10001TXT1</Field1>
<Field1>TXT30000</Field1>
<Field1>10TXT1000</Field1>
<Field1>TXT20000</Field1>
</Level2>
fun countOccurences(txtToSearchFor) =
// Some code that can count how many times the text 'TXT1' occur in all the Field1 fields.
I have tried the examples below, but they dont work
1)
trim(upper(Field1)) contains "TXT1"
2)
(((Field1) find 'TXT1') joinBy '')
Hope you can help :-)

Hi you can use the function sumBy from the dw::core::Arrays module. This function takes an array and a lambda that returns the number to be added for each element in the array. So then I just need to ask for the times of repetitions of a String inside another String. That is achieved by using sizeOf and find
%dw 2.0
output application/json
import sumBy from dw::core::Arrays
fun timesOf(value: Array<String>, txtToSearchFor: String) =
value sumBy ((text) -> sizeOf(text find txtToSearchFor))
---
payload.Level1.Level2.*Field1 timesOf "TXT1"

I found the answer: :-)
fun countOccurences(texts) =
sizeOf (Level1.Level2.*Field1 filter ($ contains texts))

Related

numpy/pandas - find a substring by regex and replace it by selecting a random value from a list

there is a list which is like below.
list=[1,2,3,4,5.....]
Then there's a df like below.
message
"2022-12-18 23:56:32,939 vlp=type rev=2 td=robert CIP=x.x.x.x motherBoard=A motherName=""A"" ns=nsA. npd=npd1 messageID=sfsdfdsfsdsa nu=nuA diui=8"
...
...
I use below code to find the messageID value first and then replace by selecting a random value from list. but it doesn't work
messageID = list(map(str, messageID))
df.messageID = df.messageID.str.replace(r'\s+messageID=(.*?)\s+', np.random.choice(messageID, size=len(df)) , regex=True)
can any expert please help take a look?
Thanks.
Use lookbehind with re.sub for replace in list comprehension:
import re
zipped = zip(df.messageID, np.random.choice(messageID, size=len(df)))
df['messageID'] = [re.sub(r'(?<=messageID=)\w+', s, r) for r, s in zipped]

Converting zip+4 to zip python

I am looking to convert zip+4 codes into zip codes in a pandas dataframe. I want it to identify that a zip 4 code exists and keep just the first 5 digits. I effectively want to do the below code (although this doesn't work in this format):
df.replace('^(\d{5}-?\d{4})', group(1), regex=True)
The following code does the same procedure for a list, I'm looking to do the same thing in the dataframe.
my_input = ['01234-5678', '012345678', '01234', 'A1A 1A1', 'A1A1A1']
expression = re.compile(r'^(\d{5})-?(\d{4})?$')
my_output = []
for string in my_input:
if m := re.match(expression, string):
my_output.append(re.match(expression, string).group(1))
else:
my_output.append(string)
You can use
df = df.replace(r'^(\d{5})-?\d{4}$', r'\1', regex=True)
See the regex demo.
Details:
^ - start of string
(\d{5}) - Group 1 (\1): five digits
-? - an optional -
\d{4} - any four digits
$ - end of string.

Get array from 1 to number of columns of csv in nextflow

One of my process gives output of one csv file. I want to create an array channel from 1 to number of columns. For example:
My output
my_out_ch.view() -> test.csv
Assume, test.csv has 11 columns. Now I want to create a channel which gives me:
1,2,3,4,5,6,7,8,910,11
How could I get this? I have tried with splitText operator as below without luck:
my_out_ch.splitText(by:1,limit:1)
But it only gives me the columns names. There is a parameter elem, I am not sure if elem could give me the array and also not sure how to use it. Any help?
You could use the splitCsv operator to parse the CSV file. Then create an intRange using the map operator. Either call collect() to emit a java.util.ArrayList or call join() to emit a string. For example:
params.input_tsv = 'test.tsv'
Channel.fromPath( params.input_tsv )
| splitCsv( sep: '\t', limit: 1 )
| map { (1..it.size()).join(',') }
| view()
Results:
1,2,3,4,5,6,7,8,9,10,11

How to print lists in a scific order in kotlin?

im working on a project and i have a list in kotlin like:
val list = listOf("banana", "1","apple","3","banana","2")
and i want to print it like
Output:
banana = 1
banana = 2
apple = 3
so like every work with the number should be like one val, and i need to print in scific order (the order is toooo random for any sort command), so im panning on just coppying the whole xinhua dictionary here (since all chinese words have a scific unicode), and make the code it replace like:
val list = listOf("banana丨", "1","apple丩","3","banana丨","2")
but how to print them in the order?
ps. even me as a chinese dont know most of the words in xinhua dictionary lol so there is more then enofe
Assuming that you have the following input list, as shown in your question, where the order of occurrence is always one word followed by the scific order:
val list = listOf("banana", "1","apple","3","banana","2")
You could do the following:
1. Create a data class that defines one entry in your raw input list
data class WordEntry(val word: String, val order: Int)
2. Map over your raw input list by using the windowed and map methods
val dictionary = list.windowed(2, 2).map { WordEntry(it.first(), it.last().toInt()) }
Here, the windowed(2, 2) method creates a window of size 2 and step 2, meaning that we iterate over the raw input list and always work with two entries at every second step. Assuming that the order in the raw input list is always the word followed by the scific order, this should work. Otherwise, this would not work, so the order is very important here!
3. Sort the transformed dictionary by the order property
val sortedDictionary = dictionary.sortedBy { it.order }
Edit: You can also sort by any other property. Just pass another property to the lambda expression of sortedBy (e.g. sortedBy { it.word } if you want to sort it by the word property)
4. Finally, you can print out your sorted dictionary
val outputStr = sortedDictionary.joinToString("\n") { "${it.word} = ${it.order}" }
print(outputStr)
Output:
banana = 1
banana = 2
apple = 3

How to find correct min / max values of a list in Perl 6

New to Perl6, trying to figure out what I'm doing wrong here. The problem is a simple checksum that takes the difference of the max and min values for each row in the csv
The max and min values it returns are completely wrong. For the first row in the csv, it returns the max as 71, and the min as 104, which is incorrect.
Here's the link to the repo for reference, and the link to the corresponding problem.
#!/usr/bin/env perl6
use Text::CSV;
sub checksum {
my $sum = 0;
my #data = csv(in => "input.csv");
for #data -> #value {
$sum += (max #value) - (min #value);
}
say $sum;
}
checksum
I assume your input contains numbers, but since CSV is a text format, the input is being read in as strings. min and max are operating based on string sorting, so max("4", "10") is 4 but max("04", "10") is 10. To solve this, you can either cast each element to Numeric (int, floating point, etc.) before you get the min/max:
#input.map(*.Numeric).max
or pass a conversion function to min and max so each element is parsed as a number as it's compared:
#input.max(*.Numeric)
The first solution is better for your case, since the second solution is an ephemeral conversion, converting internally but still returning a string. Final note: in normal code I would write +* or { +$_ } to mean "treat X as a number", but in this case I prefer being explicit: .Numeric.