Mule 4.4 DataWeave append counter to file - mule

I am reading in a file (see below). The example file has 13 rows.
A|doe|chemistry|100|A|
B|shea|maths|90|A|
C|baba|physics|80|B|
D|doe|chemistry|100|A|
E|shea|maths|90|A|
F|baba|physics|80|B|
G|doe|chemistry|100|A|
H|shea|maths|90|A|
I|baba|physics|80|B|
J|doe|chemistry|100|A|
K|shea|maths|90|A|
L|baba|physics|80|B|
M|doe|chemistry|100|A|
Then iterating over these rows using a for each ( batch size 5 ) and then calling a REST API
Depending on REST API response ( success or failure ) I am writing payloads to respective success / error files.
I have mocked the called API such that first batch of 5 records will fail and rest of the files will succeed.
While writing to success / error files am using the following transformation :
output application/csv quoteValues=true,header=false,separator="|"
---
payload
All of this works fine.
Success log file:
"F"|"baba"|"physics"|"80"|"B"
"G"|"doe"|"chemistry"|"100"|"A"
"H"|"shea"|"maths"|"90"|"A"
"I"|"baba"|"physics"|"80"|"B"
"J"|"doe"|"chemistry"|"100"|"A"
"K"|"shea"|"maths"|"90"|"A"
"L"|"baba"|"physics"|"80"|"B"
"M"|"doe"|"chemistry"|"100"|"A"
Error log file:
"A"|"doe"|"chemistry"|"100"|"A"
"B"|"shea"|"maths"|"90"|"A"
"C"|"baba"|"physics"|"80"|"B"
"D"|"doe"|"chemistry"|"100"|"A"
"E"|"shea"|"maths"|"90"|"A"
Now what I want to do is append the row/line number to each of these files so when this goes to production , whoever is monitoring these files can easily understand and correlate with the original file .
So as an example in case of error log file ( the first batch failed which is rows 1 to 5 ) I want to append these numbers to each of the rows:
"1"|"A"|"doe"|"chemistry"|"100"|"A"
"2"|"B"|"shea"|"maths"|"90"|"A"
"3"|"C"|"baba"|"physics"|"80"|"B"
"4"|"D"|"doe"|"chemistry"|"100"|"A"
"5"|"E"|"shea"|"maths"|"90"|"A"
Not sure what I should write in DataWeave to achieve this?

Inside the ForEach scope, you have access to the counter vars.counter (or whatever name you've chosen since it's configurable).
You will need to iterate over each chunk of records for adding the position for each one. You can use something like:
%dw 2.0
output application/csv quoteValues=true,header=false,separator="|"
var batchSize = 5
---
payload map ({
counter: batchSize * (vars.counter - 1) + ($$ + 1)
} ++ $
)
Or if you prefer to use the update function (this will add the record counter at the last column instead though):
%dw 2.0
output application/csv quoteValues=true,header=false,separator="|"
var batchSize = 5
---
payload map (
$ update {
case .counter! -> batchSize * (vars.counter - 1) + ($$ + 1)
}
)
Remember to replace the batchSize variable from this code with the same value you're using in the ForEach scope (if it's parameterised, it would be better).
Edit 1 -
Clarification: the - 1 and + 1 are because both indexes (the counter from the For Each scope and the $$ from the map) are zero-based.

Just another workaround and to simplify without using any external variables. The script can be split into two; 1st is for Error group and 2nd is for Success.
%dw 2.0
output application/csv quoteValues=true,header=false,separator="|"
// Will be used for creating a counter for Error group
var errorIdx = 1
// Will be used for creating a counter for Success group
var successIdx = 6
---
//errorItems for the first 5 rows
(payload[0 to 4] map (items,idx) -> (({"0":(idx) + errorIdx} ++ items)))
++
//successItems from 6 and remaining items.
(payload[5 to -1] map (items,idx) -> (({"0":(idx) + successIdx} ++ items)))
DataWeave Inline Variables:
errorIdx is a pointer for starting the error counter
successIdx is a pointer for starting the success counter
This will extract from index 0 to 4 element:
payload[0 to 4]
This will extract from index 5 to remaining elements:
payload[5 to -1]

Related

Get array from 1 to number of columns of csv in nextflow

One of my process gives output of one csv file. I want to create an array channel from 1 to number of columns. For example:
My output
my_out_ch.view() -> test.csv
Assume, test.csv has 11 columns. Now I want to create a channel which gives me:
1,2,3,4,5,6,7,8,910,11
How could I get this? I have tried with splitText operator as below without luck:
my_out_ch.splitText(by:1,limit:1)
But it only gives me the columns names. There is a parameter elem, I am not sure if elem could give me the array and also not sure how to use it. Any help?
You could use the splitCsv operator to parse the CSV file. Then create an intRange using the map operator. Either call collect() to emit a java.util.ArrayList or call join() to emit a string. For example:
params.input_tsv = 'test.tsv'
Channel.fromPath( params.input_tsv )
| splitCsv( sep: '\t', limit: 1 )
| map { (1..it.size()).join(',') }
| view()
Results:
1,2,3,4,5,6,7,8,9,10,11

Multiply all values in a %hash and return a %hash with the same structure

I have some JSON stored in a database column that looks like this:
pokeapi=# SELECT height FROM pokeapi_pokedex WHERE species = 'Ninetales';
-[ RECORD 1 ]------------------------------------------
height | {"default": {"feet": "6'07\"", "meters": 2.0}}
As part of a 'generation' algorithm I'm working on I'd like to take this value into a %hash, multiply it by (0.9..1.1).rand (to allow for a 'natural 10% variance in the height), and then create a new %hash in the same structure. My select-height method looks like this:
method select-height(:$species, :$form = 'default') {
my %heights = $.data-source.get-height(:$species, :$form);
my %height = %heights * (0.9..1.1).rand;
say %height;
}
Which actually calls my get-height routine to get the 'average' heights (in both metric and imperial) for that species.
method get-height (:$species, :$form) {
my $query = dbh.prepare(qq:to/STATEMENT/);
SELECT height FROM pokeapi_pokedex WHERE species = ?;
STATEMENT
$query.execute($species);
my %height = from-json($query.row);
my %heights = self.values-or-defaults(%height, $form);
return %heights;
}
However I'm given the following error on execution (I assume because I'm trying to multiple the hash as a whole rather than the individual elements of the hash):
$ perl6 -I lib/ examples/height-weight.p6
{feet => 6'07", meters => 2}
Odd number of elements found where hash initializer expected:
Only saw: 1.8693857987465123e0
in method select-height at /home/kane/Projects/kawaii/p6-pokeapi/lib/Pokeapi/Pokemon/Generator.pm6 (Pokeapi::Pokemon::Generator) line 22
in block <unit> at examples/height-weight.p6 line 7
Is there an easier (and working) way of doing this without duplicating my code for each element? :)
Firstly, there is an issue with logic of your code. Initially, you are getting a hash of values, "feet": "6'07\"", "meters": 2.0 parsed out of json, with meters being a number and feet being a string. Next, you are trying to multiply it on a random value... And while it will work for a number, it won't for a string. Perl 6 allomorphs allow you to do that, actually: say "5" * 3 will return 15, but X"Y' pattern is complex enough for Perl 6 to not naturally understand it.
So you likely need to convert it before processing, and to convert it back afterwards.
The second thing is exact line that leads to the error you are observing.
Consider this:
my %a = a => 5;
%a = %a * 10 => 5; # %a becomes a hash with a single value of 10 => 5
# It happens because when a Hash is used in math ops, its size is used as a value
# Thus, if you have a single value, it'll become 1 * 10, thus 10
# And for %a = a => 1, b => 2; %a * 5 will be evaluated to 10
%a = %a * 10; # error, the key is passed, but not a value
To work directly on hash values, you want to use map method and process every pair, for example: %a .= map({ .key => .value * (0.9..1.1).rand }).
Of course, it can be golfed or written in another manner, but the main issue is resolved this way.
You've accepted #Takao's answer. That solution requires manually digging into %hash to get to leaf hashes/lists and then applying map.
Given that your question's title mentions "return ... same structure" and the body includes what looks like a nested structure, I think it's important there's an answer providing some idiomatic solutions for automatically descending into and duplicating a nested structure:
my %hash = :a{:b{:c,:d}}
say my %new-hash = %hash».&{ (0.9 .. 1.1) .rand }
# {a => {b => {c => 1.0476391741359872, d => 0.963626602773474}}}
# Update leaf values of original `%hash` in-place:
%hash».&{ $_ = (0.9 .. 1.1) .rand }
# Same effect:
%hash »*=» (0.9..1.1).rand;
# Same effect:
%hash.deepmap: { $_ = (0.9..1.1).rand }
Hyperops (eg ») iterate one or two data structures to get to their leaves and then apply the op being hypered:
say %hash».++ # in-place increment leaf values of `%hash` even if nested
.&{ ... } calls the closure in braces using method call syntax. Combining this with a hyperop one can write:
%hash».&{ $_ = (0.9 .. 1.1) .rand }
Another option is .deepmap:
%hash.deepmap: { $_ = (0.9..1.1).rand }
A key difference between hyperops and deepmap is that the compiler is allowed to iterate data structures and run hyperoperations in parallel in any order whereas deepmap iteration always occurs sequentially.

Clean up code and keep null values from crashing read.csv.sql

I am using read.csv.sql to conditionally read in data (my data set is extremely large so this was the solution I chose to filter it and reduce it in size prior to reading the data in). I was running into memory issues by reading in the full data and then filtering it so that is why it is important that I use the conditional read so that the subset is read in, versus the full data set.
Here is a small data set so my problem can be reproduced:
write.csv(iris, "iris.csv", row.names = F)
library(sqldf)
csvFile <- "iris.csv"
I am finding that the notation you have to use is extremely awkward using read.csv.sql the following is the how I am reading in the file:
# Step 1 (Assume these values are coming from UI)
spec <- 'setosa'
petwd <- 0.2
# Add quotes and make comma-separated:
spec <- toString(sprintf("'%s'", spec))
petwd <- toString(sprintf("'%s'", petwd))
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Species" in ($spec)'
and "Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
My main problem is that if any of the values above (from UI) are null then it won't read in the data properly, because this chunk of code is all hard coded.
I would like to change this into: Step 1 - check which values are null and do not filter off of them, then filter using read.csv.sql for all non-null values on corresponding columns.
Note: I am reusing the code from this similar question within this question.
UPDATE
I want to clear up what I am asking. This is what I am trying to do:
If a field, say spec comes through as NA (meaning the user did not pick input) then I want it to filter as such (default to spec == EVERY SPEC):
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
Since spec is NA, if you try to filter/read in a file matching spec == NA it will read in an empty data set since there are no NA values in my data, hence breaking the code and program. Hope this clears it up more.
There are several problems:
some of the simplifications provided in the link in the question were not followed.
spec is a scalar so one can just use '$spec'
petwd is a numeric scalar and SQL does not require quotes around numbers so just use $petwd
the question states you want to handle empty fields but not how so we have used csvfix to map them to -1 and also strip off quotes. (Alternately let them enter and do it in R. Empty numerics will come through as 0 and empty character fields will come through as zero length character fields.)
you can use [...] in place of "..." in SQL
The code below worked for me in both Windows and Ubuntu Linux with the bash shell.
library(sqldf)
spec <- 'setosa'
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file where [Species] = '$spec' and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix map -smq -fv "" -tv -1'
)
Update
Regarding the update at the end of the question it was clarified that the NA could be in spec as opposed to being in the data being read in and that if spec is NA then the condition involving spec should be regarded as TRUE. In that case just expand the SQL where condition to handle that as follows.
spec <- NA
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file
where ('$spec' == 'NA' or [Species] = '$spec') and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix echo -smq'
)
The above will return all rows for which Petal.Width is 0.2 .

[KDB+/Q]: Apply list of functions over data sequentially (pipe)

In kdb+/q, how to pipe data through a sequential list of functions so that output of previous step is the input to next step?
For example:
q)t:([]sym:`a`c`b;val:1 3 2)
q)`sym xkey `sym xasc t / how to achieve the same result as this?
I presume some variation of over or / could work:
?? over (xasc;xkey)
Bonus: how to achieve the same in a way where t is piped in from the right-hand side (in the spirit of left-of-right reading of the q syntax)?
(xasc;xkey) ?? t
how to pipe data through a sequential list of functions so that output of previous step is the input to next step?
You can use the little known composition operator. For example:
q)f:('[;])over(2+;3*;neg)
q)f 1 # 2+3*neg 1
-1
If you want to use the left of right syntax, you will have to define your own verb:
q).q.bonus:{(('[;])over x)y}
q)(2+;3*;neg)bonus 1
-1
Use a lambda on the left as well as the over adverb (form of recursion)
Also the dot (.) form of apply is used to apply the function to the table and the column:
{.[y;(z;x)]}/[t;(xasc;xkey);`sym]
sym| val
---| ---
a | 1
b | 2
c | 3

How to get the number of words per line in pig?

I'm trying to figure out how many words their are per line in a file in pig. I've gotten as far as loading and splitting:
raw = load file;
words = FOREACH raw GENERATE TOKENIZE(*);
which gets me a bag of tulples each containing a word. Then I go to count these items I get an error:
counts = FOREACH words GENERATE COUNT(*);
I get an error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing count in COUNT
...
Caused by: java.lang.NullPointerException
is that because some of the lines have an empty bag? or is there something else I'm doing wrong?
if it is the problem with an empty bag then you can try something like this: (Not tested)
raw = load file;
words = FOREACH raw GENERATE TOKENIZE(*) as tokenized_words;
counts = FOREACH words GENERATE ((tokenized_words IS null or TRIM(tokenized_words) == '') ? 0 : COUNT(*)) as total_count;
here we are writing if-else condition to check if the tokenized_words is null or empty, if yes then we are assigning zero to it else the total count.
Can you try like this?
input
Hi hello how are you
this is apache pig
works
like a charm
Pigscript:
A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE TOKENIZE(line);
C = FOREACH B GENERATE COUNT($0);
DUMP C;
Output:
(5)
(4)
(1)
()
(3)