I have a client which depends on a third party API, and I need to combine two calls into one record (or be able to display content from both calls in one view).
I have no experience with Elm, but understand there is a concept of Task which may be helpful. Collecting data to two different models is also something I have considered, but it seems to give me some problems displaying the data.
Here are the resources I want to combine
data1 = [{a, b, c1:{x1,y1}, c2:{x2,y2}}, ...]
data2 = [{x1, y1, z1, q1}, {x2, y2, z2, q2}, ...]
I want to display data1 with the enhanced c variable from data2.
Expected result
data = [{a, b, c1:{x1,y1,z1,q1}, c2:{x2,y2,z2,q2}, ...]
Is combining the resources on a local server and then fetch the data as one model the easiest choice?
Related
I have implemented some NGS data analysis workflows with Nextflow. I used "Paired End" channels (fromFilePairs method) for some of my workflow processes. I ran into a problem I wasn't expecting after multiple workflow executions : my samples ID would sometimes be mixed, resulting in inaccurate outputs for the processes where it happened. I think this is related to the Non-deterministic input channels issue (https://www.nextflow.io/blog/2019/troubleshooting-nextflow-resume.html).
Let's suppose I apply my worklow on these paired-end files : sample1_R{1,2}.fastq, sample2_R{1,2}.fastq
process Step1 {
input:
tuple pair_ID, file(A) from channelA
tuple pair_ID, file(B) from channelB
tuple pair_ID, file(C) from channelC
...
}
For this kind of process with more than one "tuple pair_ID" as input, the data pair_ID (= my samples names) can be mixed up and my process would end up using randomly input files A and B of the sample1, and the input file C of the sample2 instead of all files (A,B,C) of the same pair_ID (key = only sample1 or only sample2).
I had this randomly mixed input filenames issue (which impacted the outputs) after several workflow executions, after using -resume when an error occurred but also after full successful workflow runs.
In order to have the same key (pair_ID) between the input files emitted by each of the 3 channels, I used the join operator:
Process Step1 {
input:
tuple pair_ID, file(A), file(B), file(C) from channelA.join(channelB).join(channelC)
...
}
This operator seems to make everything work as expected, I don't see any mix in my sample IDs and in my final outputs. In the doc (https://www.nextflow.io/docs/latest/operator.html?highlight=join#join), join seems to be suited for a 2 channels use only, so I am unsure if I am using it right for 3 channels.
Is my method using join legit ? Or does it still have some flaws ?
Is there a better way to correct my issue ?
If I am unsure that this method is correct to avoid any mix in my samples ID, I might change to another workflow management system such as Snakemake but I would really like to solve this issue and to continue using Nextflow.
Thank you in advance, don't hesitate if something isn't clear !
As you have discovered, you should avoid using the same variable name (pair_ID) more than once in your input block. Using the same variable name does not guarantee the inputs will be joined up using this key. I imagine that whatever value you get for pair_ID from one input channel will just get clobbered by the pair_ID you get from one of your other input channels. You have also discovered that when you declare two or more input channels, the overall input ordering may not be consistent across multiple executions (like when using the -resume).
To join two or more channels with a common key, you can simply use the join operator:
join
The join operator creates a channel that joins together the items
emitted by two channels for which exits a matching key. The key is
defined, by default, as the first element in each item emitted.
Note that the join operator creates (returns) a new channel. Therefore, this:
joined = channelA.join(channelB).join(channelC)
Is functionally the same as:
temp = channelA.join(channelB)
joined = temp.join(channelC)
I want to create two ipywidget sliders, say one with value x, the other with value 1-x. When I change one slider, the other one should be automatically changed acccordingly. I am trying to use observe for callback. I see that I might use owner and description to identify which slider was modified. But I don't think description supposed to be used for this purpose. After all, description should not need to be unique in the first place. I wonder if I am missing something here.
from ipywidgets import widgets
x=0.5
a=widgets.FloatSlider(min=0,max=1,description='a',value=x)
b=widgets.FloatSlider(min=0,max=1,description='b',value=1-x)
display(a,b)
def on_value_change(change):
if str(change['owner']).split("'")[1]=='a':
exec('b.value='+str(1-change['new']))
else:
exec('a.value='+str(1-change['new']))
a.observe(on_value_change,names='value')
b.observe(on_value_change,names='value')
Beginner with widgets, but I ran into the same question earlier and couldn't find a solution. I pieced together several sources and came up with something that seems to work.
Here's a model example of two sliders maintaining proportionality according to '100 = a + b', with two sliders representing a and b. :
caption = widgets.Label(value='If 100 = a + b :')
a, b = widgets.FloatSlider(description='a (=100-b)'),\
widgets.FloatSlider(description='b (= 100-a)')
def my_func1(a):
# b as function of a
return (100 - a)
def my_func2(b):
# a as function of b
return (100 - b)
l1 = widgets.dlink((a, 'value'), (b, 'value'),transform=my_func1)
l2 = widgets.dlink((b, 'value'), (a, 'value'),transform=my_func2)
display(caption, a, b)
To explain, as best as I understand... the key was to set up a directional link going each direction between the two sliders, and to provide a transform function for the math each direction across the sliders.
i.e.,:
l1 = widgets.dlink((a, 'value'), (b, 'value'),transform=my_func1)
What that is saying is this: .dlink((a, 'value'), (b, 'value'),transform=my_func1) is like "the value of a is a variable (input) used to determine the value of b (output)" and that "the function describing b, as a function of a, is my_func1".
With the links described, you just need to define the aforementioned functions.
The function pertaining to direct link l1 is:
def my_func1(a): # defining b as function of a
return (100 - a)
Likewise (but in reverse), l2 is the 'vice versa' to l1, and my_func2 the 'vice versa' to my_func1.
I found this to work better for learning purposes, compared to the fairly common approach of utilizing a listener (a.observe or b.observe) to log details (e.g. values) about the states of the sliders into a dictionary-type parameter (change) which can be passed into the transform functions and indexing for variable assignments.
Good luck, hope that helps! More info at https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html#Linking-Widgets
Is there an equivalent of TTree::AddFriend() with uproot ?
I have 2 parallel trees in 2 different files which I'd need to read with uproot.iterate and using interpretations (setting the 'branches' option of uproot.iterate).
Maybe I can do that by manually obtaining several iterators from iterate() calls on the files, and then calling next() on each iterators... but maybe there's a simpler way akin to AddFriend ?
Thanks for any hint !
edit: I'm not sure I've been clear, so here's a bit more details. My question is not about usage of arrays, but about how to read them from different files. Here's a mockup of what I'm doing :
# I will fill this array and give it as input to my DNN
# it's very big so I will fill it in place
bigarray = ndarray( (2,numentries),...)
# get a handle on a tree, just to be able to build interpretations :
t0 = .. first tree in input_files
interpretations = dict(
a=t0['a'].interpretation.toarray(bigarray[0]),
b=t0['b'].interpretation.toarray(bigarray[1]),
)
# iterate with :
uproot.iterate( input_files, treename,
branches = interpretations )
So what if a and b belong to 2 trees in 2 different files ?
In array-based programming, friends are implicit: you can JOIN any two columns after the fact—you don't have to declare them as friends ahead of time.
In the simplest case, if your arrays a and b have the same length and the same order, you can just use them together, like a + b. It doesn't matter whether a and b came from the same file or not. Even if I've if these is jagged (like jets.phi) and the other is not (like met.phi), you're still fine because the non-jagged array will be broadcasted to match the jagged one.
Note that awkward.Table and awkward.JaggedArray.zip can combine arrays into a single Table or jagged Table for bookkeeping.
If the two arrays are not in the same order, possibly because each writer was individually parallelized, then you'll need some column to act as the key associating rows of one array with different rows of the other. This is a classic database-style JOIN and although Uproot and Awkward don't provide routines for it, Pandas does. (Look up "merging, joining, and concatenating" in the Pandas documenting—there's a lot!) You can maintain an array's jaggedness in Pandas by preparing the column with the awkward.topandas function.
The following issue talks about a lot of these things, though the users in the issue below had to join sets of files, rather than just a single tree. (In principle, a process would have to look ahead to all the files to see which contain which keys: a distributed database problem.) Even if that's not your case, you might find more hints there to see how to get started.
https://github.com/scikit-hep/uproot/issues/314
This is how I have "friended" (befriended?) two TTree's in different files with uproot/awkward.
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
for field in array2.fields:
array1 = awkward.with_field(array1, getattr(array2, field), where=field)
# array1 now has branch "a" and "b"
print(array1.a)
print(array1.b)
Alternatively, if it is acceptable to "name" the trees,
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
zippedArray = awkward.zip({"tree1": array1, "tree2": array2})
# zippedArray. now has branch "tree1.a" and "tree2.b"
print(zippedArray.tree1.a)
print(zippedArray.tree2.b)
Of course you can use array1 and array2 together without merging them like this. But if you have already written code that expects only 1 Array this can be useful.
We have 5 different types of nodes in database. Largest one has ~290k, the smallest is only ~3k. Each node type has an id field and they are all indexed. I am using py2neo to build relationship, but it is very slow (~ 2 relationships inserted per second)
I used pandas read from a relationship csv, iterate each row to create a relationship wrapped in transaction. I tried batch out 10k creation statements in one transaction, but it does not seem to improve the speed a lot.
Below is the code:
df = pd.read_csv(r"C:\relationship.csv",dtype = datatype, skipinitialspace=True, usecols=fields)
df.fillna('',inplace=True)
def f(node_1 ,rel_type, node_2):
try:
tx = graph.begin()
tx.evaluate('MATCH (a {node_id:$label1}),(b {node_id:$label2}) MERGE (a)-[r:'+rel_type+']->(b)',
parameters = {'label1': node_1, 'label2': node_2})
tx.commit()
except Exception as e:
print(str(e))
for index, row in df.iterrows():
if(index%1000000 == 0):
print(index)
try:
f(row["node_1"],row["rel_type"],row["node_2"])
except:
print("error index: " + index)
Can someone help me what I did wrong here. Thanks!
You state that there are "5 different types of nodes" (which I interpret to mean 5 node labels, in neo4j terminology). And, furthermore, you state that their id properties are already indexed.
But your f() function is not generating a Cypher query that uses the labels at all, and neither does it use the id property. In order to take advantage of your indexes, your Cypher query has to specify the node label and the id value.
Since there is currently no efficient way to parameterize the label when performing a MATCH, the following version of the f() function generates a Cypher query that has hardcoded labels (as well as a hardcoded relationship type):
def f(label_1, id_1, rel_type, label_2, id_2):
try:
tx = graph.begin()
tx.evaluate(
'MATCH' +
'(a:' + label_1 + '{id:$id1}),' +
'(b:' + label_2 + '{id:$id2}) ' +
'MERGE (a)-[r:'+rel_type+']->(b)',
parameters = {'id1': id_1, 'id2': id_2})
tx.commit()
except Exception as e:
print(str(e))
The code that calls f() will also have to be changed to pass in both the label names and the id values for a and b. Hopefully, your df rows will contain that data (or enough info for you to derive that data).
If your aim is for better performance then you will need to consider a different pattern for loading these, i.e. batching. You're currently running one Cypher MERGE statement for each relationship and wrapping that in its own transaction in a separate function call.
Batching these by looking at multiple statements per transaction or per function call will reduce the number of network hops and should improve performance.
In my experiment, I am presenting images (faces) that are different across 2 dimensions: face identity and emotion.
There are 5 faces displaying 5 different emotional expressions; making 25 unique stimuli in total. These only need to be presented once (so 25 trials).
After I present one of the faces, the next face has to be different on only the emotion OR the identity, but the same on the other.
Example:
Face 1, emotion 1 -> face 3, emotion 1 -> face 3, emotion 4 -> ... etc.
1: is psychopy up to this task? I have mostly worked with the builder so far, except for some data-logging code, but I'd be happy to get more experienced with the coder.
My hunch is that I would need to add two columns to the list of trials, one for identity and one for emotion. Then use the getEarlierTrial call somehow, but I pretty much get lost at this point.
2: Would anyone be willing to point me in the right direction please?
Many thanks in advance.
This is difficult to implement in Builder's normal mode of operation, which is to drive trials from a fixed list of conditions. Although the order of rows can be randomised across subjects, the pairings of values across columns remains constant.
The standard answer to this is what you allude to in your comment above: in code, shuffle the conditions file at the beginning of each experiment, so each subject is in essence having their trials driven by a unique conditions file.
You seem happy to do this in Matlab. That would work fine, as this stuff can be done before PsychoPy even launches. But it could also very easily be implemented in Python code. That way you could do everything in PsychoPy, and in this case there would be no need to abandon Builder. You'd just insert a code component with some code to run at the beginning of the experiment that customises a conditions file.
You'll need to create three lists, not two, i.e. you also need a list of pseudo-random choices to alternate between preserving either face or emotion from trial to trial: if you do this fully randomly, you'll get unbalanced, and exhaust one of the attributes before the other.
from numpy.random import shuffle
# make a list of 25 dictionaries of unique face/emotion pairs:
pairsList = []
for face in ['1', '2', '3', '4', '5']:
for emotion in ['1', '2', '3', '4', '5']:
pairsList.append({'faceNum': face, 'emotionNum': emotion})
shuffle(pairsList)
# a list of whether to alternate between preserving face or emotion across trials:
attributes = ['faceNum', 'emotionNum'] * 12 # length 24
shuffle(attributes)
# need to create an initial selection before cycling though the
# next 24 randomised but balanced choices:
pair = pairsList.pop()
currentFace = pair['faceNum']
currentEmotion = pair['emotionNum']
images = ['face_' + currentFace + '_emotion_' + currentEmotion + '.jpg']
for attribute in attributes:
if attribute == 'faceNum':
selection = currentFace
else:
selection = currentEmotion
# find another pair with the same selected attribute:
for pair in pairsList:
if pair[attribute] == selection:
# store the combination for this trial:
currentFace = pair['faceNum']
currentEmotion = pair['emotionNum']
images.append('face_' + currentFace + '_emotion_' + currentEmotion + '.jpg')
# remove this combination so it can't be used again
pairsList.remove(pair)
images.reverse()
print(images)
Then just write the images list to a single column .csv file to use as a conditions file.
Remember to set the loop in Builder to be in a fixed order, not randomised, as the list itself has the randomisation built in.