Compare a sub set array with a master array in Mule 4 (DW2.0)

Compare a sub set array with a master array in Mule 4 (DW2.0) - mule

I have a fixed elements array as : ['a', 'b', 'c', 'd'] This will be used as a base while comparing the input arrays (that can be subset of the master array)
I get an input array of various combinations that may satisfy below set of scenarios:
['a', 'c'] should return true — can be sub set of master set
['a', 'b', 'd', 'c'] should return true — no order restrictions and can be same as master set
['a', 'b', 'c', 'd', 'e'] should return false — can’t contain additional element
['e', 'f'] should return false — no matching elements found
and finally:
['a'] should return true — can be sub set and can contain single element too, however that single element should be always 'a'
['b','c','d'] should return false — all input arrays must contain at least the element 'a'

So what you need to do is basically check that the first element matches and then that they are all present in the test array.
%dw 2.0
output application/json
import * from dw::core::Arrays
var test= ['a', 'b', 'c', 'd']
var value = ['a']
---
test[0] == value[0] and (value every ((item) -> test contains item ))

%dw 2.0
output application/json
var mainset = ['a', 'b', 'c', 'd']
var subset = ['a', 'c']
---
{
isSubset : isEmpty(subset -- mainset) and contains(subset,'a')
}

Related

Pandas: select multiple rows or default with new API

I need to retrieve multiples rows (which could be duplicated) and if the index does not exist get a default value. An example with Series:
s = pd.Series(np.arange(4), index=['a', 'a', 'b', 'c'])
labels = ['a', 'd', 'f']
result = s.loc[labels]
result = result.fillna(my_default_value)
Now, I'm using DataFrame, an equivalent with names is:
df = pd.DataFrame({
"Person": {
"name_1": "Genarito",
"name_2": "Donald Trump",
"name_3": "Joe Biden",
"name_4": "Pablo Escobar",
"name_5": "Dalai Lama"
}
})
default_value = 'No name'
names_to_retrieve = ['name_1', 'name_2', 'name_8', 'name_3']
result = df.loc[names_to_retrieve]
result = result.fillna(default_value)
With both examples it's throwing a warning saying:
FutureWarning: Passing list-likes to .loc or [] with any missing
label will raise KeyError in the future, you can use .reindex() as an
alternative.
In the documentation of the issue it says that you should use reindex but they say that It won't work with duplicates...
Is there any way to work without warnings and duplicated indexes?
Thanks in advance

Let's try merge:
result = (pd.DataFrame({'label':labels})
.merge(s.to_frame(name='x'), left_on='label',
right_index=True, how='left')
.set_index('label')['x']
)
Output:
label
a 0.0
a 1.0
d NaN
f NaN
Name: x, dtype: float64

How about :
on_values = s.loc[s.index.intersection(labels).unique()]
off_values = pd.Series(default_value,index=s.index.difference(labels))
result = pd.concat([on_values,off_values])

Check isin with append
out = s[s.index.isin(labels)]
out = out.append(pd.Series(index=set(labels)-set(s.index),dtype='float').fillna(0))
out
Out[341]:
a 0.0
a 1.0
d 0.0
f 0.0
dtype: float64

You can write a simple function to handle the rows in labels and missing from labels separately, then join. When True the in_order argument will ensure that if you specify labels = ['d', 'a', 'f'], the output is ordered ['d', 'a', 'f'].
def reindex_with_dup(s: pd.Series or pd.DataFrame, labels, fill_value=np.NaN, in_order=True):
labels = pd.Series(labels)
s1 = s.loc[labels[labels.isin(s.index)]]
if isinstance(s, pd.Series):
s2 = pd.Series(fill_value, index=labels[~labels.isin(s.index)])
if isinstance(s, pd.DataFrame):
s2 = pd.DataFrame(fill_value, index=labels[~labels.isin(s.index)],
columns=s.columns)
s = pd.concat([s1, s2])
if in_order:
s = s.loc[labels.drop_duplicates()]
return s
reindex_with_dup(s, ['d', 'a', 'f'], fill_value='foo')
#d foo
#a 0
#a 1
#f foo
#dtype: object
This retains the .loc behavior that if your index is duplicated and your labels are duplicated it duplicates the selection:
reindex_with_dup(s, ['d', 'a', 'a', 'f', 'f'], fill_value='foo')
#d foo
#a 0
#a 1
#a 0
#a 1
#f foo
#f foo
#dtype: object

Capitalize the element of list when unpacking

I'm unable to capitalize the first letter of the list.
let = ['a', 'b', 'c', 'd', 'e']
count = 5
for x in range(5):
print(*let[0:count])
count -= 1
So on this example don't know how to make 'a' printed as 'A'.

You can't make change to the print method but you can change your list to make the first element capitalized :
let[0] = let[0].upper().
If for some reason you can't modifiy the initial list make a copy let2=let and work on it.

I would probably write it this way:
let = ['a', 'b', 'c', 'd', 'e']
for i in range(5, 0, -1):
print(let[0].capitalize(), *let[1:i])

New dataframe creation within loop and append of the results to the existing dataframe

I am trying to create conditional subsets of rows and columns from a DataFrame and append them to the existing dataframes that match the structure of the subsets. New subsets of data would need to be stored in the smaller dataframes and names of these smaller dataframes would need to be dynamic. Below is an example.
#Sample Data
df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [4,5,6,4,3,4,6,], 'c': [1,2,2,4,2,1,7], 'd': [4,4,2,2,3,5,6,], 'e': [1,3,3,4,2,1,7], 'f': [1,1,2,2,1,5,6,]})
#Function to apply to create the subsets of data - I would need to apply a #function like this to many combinations of columns
def f1 (df, input_col1, input_col2):
#Subset ros
t=df[df[input_col1]>=3]
#Subset of columns
t=t[[input_col1, input_col2]]
t = t.sort_values([input_col1], ascending=False)
return t
#I want to create 3 different dataframes t1, #t2, and t3, but I would like to create them in the loop - not via individual #function calls.
#These Individual calls - these are just examples of what I am trying to achieve via loop
#t1=f1(df, 'a', 'b')
#t2=f1(df, 'c', 'd')
#t3=f1(df, 'e', 'f')
#These are empty dataframes to which I would like to append the resulting #subsets of data
column_names=['col1','col2']
g1 = pd.DataFrame(np.empty(0, dtype=[('col1', 'f8'),('col2', 'f8')]))
g2 = pd.DataFrame(np.empty(0, dtype=[('col1', 'f8'),('col2', 'f8')]))
g3 = pd.DataFrame(np.empty(0, dtype=[('col1', 'f8'),('col2', 'f8')]))
list1=['a', 'c', 'e']
list2=['b', 'd', 'f']
t={}
g={}
#This is what I want in the end - I would like to call the function inside of #the loop, create new dataframes dynamically and then append them to the #existing dataframes, but I am getting errors. Is it possible to do?
for c in range(1,4,1):
for i,j in zip(list1,list2):
t['t'+str(c)]=f1(df, i, j)
g['g'+str(c)]=g['g'+str(c)].append(t['t'+str(c)], ignore_index=True)

I guess you want to create t1,t2,t3 dynamically.
You can use globals().
g1 = pd.DataFrame(np.empty(0, dtype=[('a', 'f8'), ('b', 'f8')]))
g2 = pd.DataFrame(np.empty(0, dtype=[('c', 'f8'), ('d', 'f8')]))
g3 = pd.DataFrame(np.empty(0, dtype=[('e', 'f8'), ('f', 'f8')]))
list1 = ['a', 'c', 'e']
list2 = ['b', 'd', 'f']
for c in range(1, 4, 1):
globals()['t' + str(c)] = f1(df, list1[c-1], list2[c-1])
globals()['g' + str(c)] = globals()['g' + str(c)].append(globals()['t' + str(c)])

tensorflow string_split on batch data

From tensorflow offical doc, it says
For example: N = 2, source[0] is 'hello world' and source[1] is 'a b c', then the output will be st.indices = [0, 0; 0, 1; 1, 0; 1, 1; 1, 2] st.shape = [2, 3] st.values = ['hello', 'world', 'a', 'b', 'c']
What if I want something like [['hello', 'world'], ['a','b','c']], how can I get this?
Thanks.

Use tf.map_fn to map your batch onto the function tf.string_split.
https://www.tensorflow.org/api_docs/python/tf/map_fn
The map function will split your batch along the first dimension (your batch size, N as referenced by the documentation in your question), then it will pass each of the samples to tf.string_split individually, each of which will return ['hello', 'world'] and ['a', 'b', 'c'] respectively. Then the map function will recombine the individual results into an array which will result in [['hello', 'world'], ['a', 'b', 'c']] as desired.

Selecting data from a dataframe based on a tuple

Suppose I have the following dataframe
df = DataFrame({'vals': [1, 2, 3, 4],
'ids': ['a', 'b', 'a', 'n']})
I want to select all the rows which are in the list
[ (1,a), (3,f) ]
I have tried using boolean indexing like so
to_search = { 'vals' : [1,3],
'ids' : ['a', 'f']
}
df.isin(to_search)
I expect only the first row to match but I get the first and the third row
ids vals
0 True True
1 True False
2 True True
3 False False
Is there any way to match exactly the values at a particular index instead of matching any value?

You might create a DataFrame for what you want to match, and then merge it:
In [32]: df2 = DataFrame([[1,'a'],[3,'f']], columns=['vals', 'ids'])
In [33]: df.merge(df2)
Out[33]:
ids vals
0 a 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Compare a sub set array with a master array in Mule 4 (DW2.0) - mule

%dw 2.0 output application/json var mainset = ['a', 'b', 'c', 'd'] var subset = ['a', 'c'] --- { isSubset : isEmpty(subset -- mainset) and contains(subset,'a') }

Related

Pandas: select multiple rows or default with new API

Capitalize the element of list when unpacking

New dataframe creation within loop and append of the results to the existing dataframe

tensorflow string_split on batch data

Selecting data from a dataframe based on a tuple

Categories

Resources