Given a conditional expression cE = ConditionalExpression[ Value, Condition ] how can I extract the Condition of cE?
I tried indexing, but that didn't help.
Maybe it is interesting to give a cleaner version that you can use in more complex situations.
Consider ConditionalExpression official documentation example
In[]:= ce = Integrate[x^n, {x, 0, 1}]
with the following output:
1
Out[]= ConditionalExpression[-----, Re[n] > -1]
1 + n
To extract the condition Re[n] > -1 you can use:
In[]:= FirstCase[ce, ConditionalExpression[_, c_] :> c, Missing[], {0,-1}]
which prints:
Out[]= Re[n] > -1
In your comment you mentionned a nested expression, with the previous approach this will work too. For instance:
In[]:= FirstCase[{{5, 6, ce, 1}}, ConditionalExpression[_, c_] :> c, Missing[], {0,-1}]
still returns
Out[]= Re[n] > -1
If the pattern is not found, the command gently returns Missing[]. For instance with Sin[6]:
In[]:= FirstCase[Sin[6], ConditionalExpression[_, c_] :> c, Missing[], {0,-1}]
the output is:
Out[]= Missing[]
Related
I have a particular use case for which I have not found the solution in the Snakemake documentation.
Let's say in a given pipeline I have a portion with 3 rules a, b and c which will run for N samples.
Those rules handle large amount of data and for reasons of local storage limits I do not want those rules to execute at the same time. For instance rule a produces the large amount of data then rule c compresses and export the results.
So what I am looking for is a way to chain those 3 rules for 1 sample/wildcard, and only then execute those 3 rules for the next sample. All of this to make sure the local space is available.
Thanks
I agree that this is problem that Snakemake still has no solution for. However you may have a workaround.
rule all:
input: expand("a{sample}", sample=[1, 2, 3])
rule a:
input: "b{sample}"
output: "a{sample}"
rule b:
input: "c{sample}"
output: "b{sample}"
rule c:
input:
lambda wildcards: f"a{wildcards.sample-1}"
output: "c{sample}"
That means that the rule c for sample 2 wouldn't start before the output for rule a for sample 1 is ready. You need to add a pseudo output a0 though or make the lambda more complicated.
So building on Dmitry Kuzminov's answer, the following can work (both with numbers as samples and strings).
The execution order will be a3 > b3 > a1 > b1 > a2 > b2.
I used a different sample order to show it can be made different from the sample list.
samples = [1, 2, 3]
sample_order = [3, 1, 2]
def get_previous(wildcards):
if wildcards.sample != sample_order[0]: # if different from a3 in this case
previous_sample = sample_order[sample_order.index(wildcards.sample) - 1]
return f'b_out_{previous_sample}'
else: # if is the first sample in the order i.e. a3
return #here put dummy file always present e.g. the file containing those rules or the Snakemake
rule all:
expand("b_out_{S}", S=sample)
rule a:
input:
"a_in_{sample}",
get_previous
output:
"a_out_{sample}"
rule b:
input:
"a_out_{sample}"
output:
"b_out_{sample}"
I am trying to construct a complex query in Django by adding Q objects based on a list of user inputs:
from django.db.models import Q
q = Q()
expressions = [
{'operator': 'or', 'field': 'f1', 'value': 1},
{'operator': 'or', 'field': 'f2', 'value': 2},
{'operator': 'and', 'field': 'f3', 'value': 3},
{'operator': 'or', 'field': 'f4', 'value': 4},
]
for item in expressions:
if item['operator'] == 'and':
q.add(Q(**{item['field']:item['value']}), Q.AND )
elif item['operator'] == 'or':
q.add(Q(**{item['field']:item['value']}), Q.OR )
Based on this I am expecting to get a query with the following where condition:
f1 = 1 or f2 = 2 and f3 = 3 or f4 = 4
which, based on the default operator precedence will be executed as
f1 = 1 or (f2 = 2 and f3 = 3) or f4 = 4
however, I am getting the following query:
((f1 = 1 or f2 = 2) and f3 = 3) or f4 = 4
It looks like the Q() object forces the conditions to be evaluated in the order they were added.
Is there a way that I can keep the default SQL precedence? Basically I want to tell the ORM not to add parenthesis in my conditions.
Seems that you are not the only one with a similar problem. (edited due to #hynekcer 's comment)
A workaround would be to "parse" the incoming parameters into a list of Q() objects and create your query from that list:
from operator import or_
from django.db.models import Q
query_list = []
for item in expressions:
if item['operator'] == 'and' and query_list:
# query_list must have at least one item for this to work
query_list[-1] = query_list[-1] & Q(**{item['field']:item['value']})
elif item['operator'] == 'or':
query_list.append(Q(**{item['field']:item['value']}))
else:
# If you find yourself here, something went wrong...
Now the query_list contains the individual queries as Q() or the Q() AND Q() relationships between them.
The list can be reduce()d with the or_ operator to create the remaining OR relationships and used in a filter(), get() etc. query:
MyModel.objects.filter(reduce(or_, query_list))
PS: Although Kevin's answer is clever, using eval() is considered a bad practice and should be avoided.
Since SQL precedence is the same as Python precedence when it comes to AND, OR, and NOT, you should be able to achieve what you want by letting Python parse the expression.
One quick-and-dirty way to do it would be to construct the expression as a string and let Python eval() it.
from functools import reduce
ops = ["&" if item["operator"] == "and" else "|" for item in expressions]
qs = [Q(**{item["field"]: item["value"]}) for item in expressions]
q_string = reduce(
lambda acc, index: acc + " {op} qs[{index}]".format(op=ops[index], index=index),
range(len(expressions)),
"Q()"
) # equals "Q() | qs[0] | qs[1] & qs[2] | qs[3]"
q_expression = eval(q_string)
Python will parse this expression according to its own operator precedence, and the resulting SQL clause will match your expectations:
f1 = 1 or (f2 = 2 and f3 = 3) or f4 = 4
Of course, using eval() with user-supplied strings would be a major security risk, so here I'm constructing the Q objects separately (in the same way you did) and just referring to them in the eval string. So I don't think there are any additional security implications of using eval() here.
I am given a set of substrings. I need to find the count of occurrence of all those substrings in a particular column in a dataframe. The relevant datframe would look like this
training['concat']
0 svAxu$paxArWAn
1 xvAxaSa$varRANi
2 AxAna$xurbale
3 go$BakwAH
4 viXi$Bexena
5 nIwi$kuSalaM
6 lafkA$upamam
7 yaSas$lipsoH
8 kaSa$AGAwam
9 hewumaw$uwwaram
10 varRa$pUgAn
My set of substrings is a dictionary, where the keys are the substrings and values are the probabilities with which they occur
reg = {'anuBavAn':0.35, 'a$piwra':0.2 ...... 'piwra':0.7, 'pa':0.03, 'a':0.0005}
#The length of dicitioanry is 2000
Particularly I need to find those substrings which occur more than twice
I have written the following code that performs the task. Is there a more elegant pythonic way or panda specific way to achieve the same as the current implementation is taking quite some time to execute.
elites = dict()
for reg_pat in reg_:
count = 0
eliter = len(training[training['concat'].str.contains(reg_pat)]['concat'])
if eliter >=3:
elites[reg_pat] = reg_[reg_pat]
You can use apply instead str.contains, it is faster:
reg_ = {'anuBavAn':0.35, 'a$piwra':0.2, 'piwra':0.7, 'pa':0.03, 'a':0.0005}
elites = dict()
for reg_pat in reg_:
if training['concat'].apply(lambda x: reg_pat in x).sum() >= 3:
elites[reg_pat] = reg_[reg_pat]
print (elites)
{'a': 0.0005}
Hopefully I have interpreted your question correctly. I'm inclined to stay away from regex here (in fact, I've never used it in conjunction with pandas), but it's not wrong, strictly speaking. In any case, I find it hard to believe that any regex operations are faster than a simple in check, but I could be wrong on that.
for substr in reg:
totalStringAppearances = training.apply((lambda string: substr in string))
totalStringAppearances = totalStringAppearances.sum()
if totalStringAppearances > 2:
reg[substr] = totalStringAppearances / len(training)
else:
# do what you want to with the very rare substrings
Some gotchas:
If you wanted something like a substring 'a' in 'abcdefa' to return 2, then this will not work. It merely checks for existence of the substring in each string.
Inside the apply(), I am using a potentially unreliable exploitation of booleans. See this question for more details.
Post-edit: Jezrael's answer is more complete as it uses the same variable names. But, in a simple case, regarding regex vs. apply and in, I validate his claim, and my presumption:
I have a ByteTensor and want to grab the indices where there is a 1. In numpy, I could do something like
a = np.array([1,0,1,0,1])
return np.where(a)
which would return (array([0, 2, 4]),). Is this functionality defined in Torch?
(In my particular case, I want to use these indices to index into several different Tensor objects, but it'd be nice to know how to do this in general.)
You can use torch.nonzero, e.g.:
> a = torch.ByteTensor{1,0,1,0,1}
> print(torch.nonzero(a))
1
3
5
[torch.LongTensor of size 3x1]
If you really need to find the 1-s only you can chain a logical operator:
> a = torch.ByteTensor{1,2,1,6,1}
> a:eq(1):nonzero()
The error is in the function below, I'm trying to generate 2 measures of entropy (the latter removes all events with <5 frequency).
My error:
ERROR 1200: Cannot expand macro 'TOTUPLE'. Reason: Macro must be defined before expansion.
Which is weird, because TOTUPLE is a built-in function. Other pig scripts use TOTUPLE with no problems.
Code:
define dual_entropies (search, field) returns entropies {
summary = summary_total($search, $field);
entr1 = count_sum_entropy(summary, $field);
summary = filter summary by events >= 5L;
entr2 = count_sum_entropy(summary, $field);
$entropies = TOTUPLE(entr1, entr2);
};
Note that entr1 and entr2 are both single numbers, not vectors of numbers - I suspect that's part of the issue.
I ran into similar confusions. I'm not sure if it's true in general but Pig only liked TOTUPLE when it's part of a FOREACH operation. I worked around by doing group by all, which returns a bag with a single tuple in it, followed by a FOREACH .. GENERATE such as:
B = group A ALL;
C = foreach B generate 'x', 2, TOTUPLE('a', 'b', 'c');
dump C;
...
(x,2,(hi,2,3))
Perhaps this will help