Precedence in multiple DataWeave functions - mule

I'm going through the Mule Dev 1 course and am stumped between module content and what I'm seeing in practice.
The module content states that:
"When using a series of functions, the last function in the chain is executed first."
So
filghts orderBy $.price filter ($.availableSeats > 30)
would "filter then orderBy".
However, I'm seeing that this statement:
payload.flights orderBy $.price filter $.price < 500 groupBy $.destination
actually does NOT execute groupBy first. In fact, placing the groupBy anywhere else throws an error (since the schema of the output after groupBy is changed).
Any thoughts here on why the module states the last function is executed first when that's clearly seems not the case?
Thanks!

The precedence is all the same for (orderBy, groupBy, etc).
So it will first do the orderBy by price then it will filter it by price and last it will groupBy destination.
This is the same for dw 1 (mule 3.x) and dw 2 ( mule 4.x). Now the difference between this to versions of DW is that in DW1 all this used to be lang operators but in DW 2 are just functions that are called using infix notation. So this mean that you can just write the same using the prefix notation
filter(
orderBy(filghts, (value, index) -> value.price),
(value, index) -> value.availableSeats > 30)
Just to show you this is the AST of this expression.

Related

DataWeave and Case Sensitivity

Can I turn off case sensitivity in DataWeave?
Two different requests are returning responses where the first contains a node called CDATA while the other contains a node called CData. In DataWeave is there a way to treat these as equal or do I need to have separate code statements such as payload.Data.CDATA and payload.Data.CData? If things were case insensitive I could have a single statement such as payload.data.cdata.
Thanks in advance,
Terry
It appears that I need two different statements.
payload.Data.*CDATA map $.#SeqId when payload.Data? and payload.Data.CDATA? and payload.Data.CDATA.#SeqId?
payload.Data.*CData map $.#SeqId when payload.Data? and payload.Data.CData? and payload.Data.CData.#SeqId?
No, but you can create a function like the following to select ignoring case.
Which filters an object by a given key (mapObject comparing keys using lower) and then gets the values from the resulting object (with pluck).
%function selectIgnoreCase(obj, keyName)
obj mapObject ((v, k) -> k match {
x when (lower x) == keyName -> {(k): v},
default -> {}
}) pluck $
And you'd use it like this:
selectIgnoreCase(payload.Data, "cdata")
Note: With Mule 4 (and DW 2) syntax for this would be a little bit better.

ERROR: function regexp_matches(jsonb, unknown) does not exist in Tableau but works elsewhere

I have a column called "Bakery Activity" whose values are all JSONs that look like this:
{"flavors": [
{"d4js95-1cc5-4asn-asb48-1a781aa83": "chocolate"},
{"dc45n-jnsa9i-83ysg-81d4d7fae": "peanutButter"}],
"degreesToCook": 375,
"ingredients": {
"d4js95-1cc5-4asn-asb48-1a781aa83": [
"1nemw49-b9s88e-4750-bty0-bei8smr1eb",
"98h9nd8-3mo3-baef-2fe682n48d29"]
},
"numOfPiesBaked": 1,
"numberOfSlicesCreated": 6
}
I'm trying to extract the number of pies baked with a regex function in Tableau. Specifically, this one:
REGEXP_EXTRACT([Bakery Activity], '"numOfPiesBaked":"?([^\n,}]*)')
However, when I try to throw this calculated field into my text table, I get an error saying:
ERROR: function regexp_matches(jsonb, unknown) does not exist;
Error while executing the query
Worth noting is that my data source is PostgreSQL, which Tableau regex functions support; not all of my entries have numOfPiesBaked in them; when I run this in a simulator I get the correct extraction (actually, I get "numOfPiesBaked": 1" but removing the field name is a problem for another time).
What might be causing this error?
In short: Wrong data type, wrong function, wrong approach.
REGEXP_EXTRACT is obviously an abstraction layer of your client (Tableau), which is translated to regexp_matches() for Postgres. But that function expects text input. Since there is no assignment cast for jsonb -> text (for good reasons) you have to add an explicit cast to make it work, like:
SELECT regexp_matches("Bakery Activity"::text, '"numOfPiesBaked":"?([^\n,}]*)')
(The second argument can be an untyped string literal, Postgres function type resolution can defer the suitable data type text.)
Modern versions of Postgres also have regexp_match() returning a single row (unlike regexp_matches), which would seem like the better translation.
But regular expressions are the wrong approach to begin with.
Use the simple json/jsonb operator ->>:
SELECT "Bakery Activity"->>'numOfPiesBaked';
Returns '1' in your example.
If you know the value to be a valid integer, you can cast it right away:
SELECT ("Bakery Activity"->>'numOfPiesBaked')::int;
I found an easier way to handle JSONB data in Tableau.
Firstly, make a calculated field from the JSONB field and convert the field to a string by using str([FIELD_name]) command.
Then, on the calculated field, make another calculated field and use function:
REGEXP_EXTRACT([String_Field_Name], '"Key_to_be_extracted":"?([^\n,}]*)')
The required key-value pair will form the second caluculated field.

Difference between sequential and combined predicates

In Selenium I have written a xpath and both of them retrieves the same result.
//a[#role='tab'][text()=' Assets']
//a[#role='tab' and text()=' Assets']
Does both of them have the same meaning?
In most cases a[b][c] has exactly the same effect as a[b and c]. There are two exceptions to be aware of:
They are not equivalent if either predicate is numeric, or has a dependency on position() or last() (I call these positional predicates). For example a[#x][1] selects the first a element that has an #x attribute, while a[1][#x] selects the first a element provided it has an #x attribute (and selects nothing otherwise). By contrast a[1 and #x] converts the integer 1 to the boolean true(), so it just means a[#x].
There may be differences in behaviour if evaluation of b or c fails with a dynamic error. The precise rules here depend on which version of XPath you are using, and to be honest the rules leave implementations some leeway, but you need to exercise care if you want to be sure that in the event of b being false, c is not evaluated. (This hardly matters in XPath 1.0 because very few expressions throw dynamic errors.)
When you add Square Brackets ([]) to XPath you are adding a condition, so
first row adding 2 conditions
Which produce similar results as adding condition with and
Normally you don't use first row, because it less readable,
Mainly because this syntax represent in other languages a Matrix
// return a random m-by-n matrix with values between 0 and 1
public static double[][] random(int m, int n) {
See tutorial:
5 XPaths with predicates
A predicate is an expression that can be true or false
It is appended within [...] to a given location path and will refine results
More than one predicate can be appended to and within (!) a location path
The first one is a predicate, which means it checks if a[#role='tab'] is true then it proceeds to [text()=' Assets']
The second one is a just using an and operator so it validates both are true.

UDF in Spark SQL DSL

I am trying to use DSL over pure SQL in Spark SQL jobs but I cannot get my UDF works.
sqlContext.udf.register("subdate",(dateTime: Long)=>dateTime.toString.dropRight(6))
This doesn't work
rdd1.toDF.join(rdd2.toDF).where("subdate(rdd1(date_time)) === subdate(rdd2(dateTime))")
I also would like to add another join condition like in this working pure SQL
val results=sqlContext.sql("select * from rdd1 join rdd2 on rdd1.id=rdd2.idand subdate(rdd1.date_time)=subdate(rdd2.dateTime)")
Thanks for your help
SQL expression you pass to where method is incorrect at least for a few reasons:
=== is a Column method not a valid SQL equality. You should use single equality sign =
bracket notation (table(column)) is not a valid way to reference columns in SQL. In this context it will be recognized as a function call. SQL uses dot notation (table.column)
even if it was neither rdd1 nor rdd2 are valid table aliases
Since it looks like column names are unambiguous you could simply use following code:
df1.join(df2).where("subdate(date_time) = subdate(dateTime)")
If it wasn't the case using dot syntax wouldn't work without providing aliases first. See for example Usage of spark DataFrame "as" method
Moreover registering UDFs makes sense mostly when you use raw SQL all the way. If you want to use DataFrame API it is better to use UDF directly:
import org.apache.spark.sql.functions.udf
val subdate = udf((dateTime: Long) => dateTime.toString.dropRight(6))
val df1 = rdd1.toDF
val df2 = rdd2.toDF
df1.join(df2, subdate($"date_time") === subdate($"dateTime"))
or if column names were ambiguous:
df1.join(df2, subdate(df1("date_time")) === subdate(df2("date_time")))
Finally for simple functions like this it is better to compose built-in expressions than create UDFs.

Are these two statements equivalent? (from NerdDinner tutorial)

Basically I want to know if the two statements below are ultimately exactly the same. The NerdDinner tutorial states that IQueryable<> objects won't query the database until we attempt to access/iterate over the data or call ToList on it. So aside from returning the same exact items do the two statements below also PERFORM the same as far as querying the database is concerned? If I had a million records, would one be better than the other?
I have the following statement:
return from party in entities.Parties
where party.PartyDate > DateTime.Now
orderby party.PartyDate
select party;
is that the same as:
return entities.Parties.Where(p => p.PartyDate > DateTime.Now);
They will perform exactly the same, yes - aside from the ordering. With a very slight change, they will compile into exactly the same code:
// Extension method syntax
return from party in entities.Parties
where party.PartyDate > DateTime.Now
orderby party.PartyDate
select party;
// Query expression
return entities.Parties
.Where(party => party.PartyDate > DateTime.Now)
.OrderBy(party => party.PartyDate);
Note that as well as adding the OrderBy, I've also changed the name of the lambda expression parameter name to match that in the query expression.
Effectively the compiler transforms the first block into the second block before applying all the normal compilation steps. You can think of query expression support as being a bit like a preprocessor step.
I wrote about this in more detail in my Edulinq blog series, in Part 41: How Query Expressions Work.
YES. They are both the same, only the first one orders the results whereas the second one does no ordering.
The first statement converted to lambda form would look like this:
return entities.Parties
.Where(p => p.PartyDate > DateTime.Now)
.OrderBy(p => p.PartyDate)
.Select(p => p);
As far as performance is concerned, the query comprehension syntax is converted to the lambda syntax by the compiler; it's simply syntactic sugar. They perform equally because they're compiled to the same expression.
Yes, except the top one orders. Performance is pretty much on a par except for that.
First one is LinQ syntax, second one is a Lambda expression