Cannot read property 'getChildren' of undefined - file-upload

I have an XML file with 4000 childs. Within each are several children entries. Using P5.js. I use loadXML to load the file, then when completed, I run this code, checking first to see if there is an xmlWx.
airportData = xmlWx.getChild('data').getChildren('METAR');
stations = xmlWx.getChild('data').getChildren('station_id');
latitudes = xmlWx.getChild('data').getChildren('latitude');
longitudes = xmlWx.getChild('data').getChildren('longitude');
I intermittently get
Cannot read property 'getChildren' of undefined
Here is a data example:
<METAR>
<raw_text>KIFP 091948Z 22004KT 10SM BKN080 BKN100 18/M04 A2994</raw_text>
<station_id>KIFP</station_id>
<observation_time>2021-02-09T19:48:00Z</observation_time>
<latitude>35.15</latitude>
<longitude>-114.57</longitude>
</METAR>
With 4000 entries, its possible that there are missing entries. But It goes away if I keep restarting the code. Is there away to check for this because it seems to stop the processing when encountered.

Not all children were present in each child.

Related

Issues trying to open a bi-dimensional array leave contained in a ROOT Tree in Pyroot

I’m stuck with a problem using Pyroot. I’m not able to read a leaf on a tree which is a two dimensional array of float values. You can see the related Tree in the following:
root [1] TTree tr=(TTree)g->Get(“tevent_2nd_integral”)
root [2] tr.Print()
*Tree :tevent_2nd_integral: Event packet tree 2nd GTUs integral *
*Entries : 57344 : Total = 548967602 bytes File Size = 412690067 *
: : Tree compression factor = 1.33 *
*Br 7 :photon_count_data : photon_count_data[1][1][48][48]/F *
*Entries : 57344 : Total Size= 530758073 bytes File Size = 411860735 *
*Baskets : 19121 : Basket Size= 32000 bytes Compression= 1.29 *
…
The array (the bold one) is photon_count_data[1][1][48][48]. Actually i have several root files and I tried both to make a chain and to use hadd method like hadd file.root 'ls /path/.root’.*
I tried several ways as i will show soon. Anytime i found different problem: once the numpy array which should contain the 48x48 values per each event was not created at all, others just didn’t write anything or strange values (negative also which is not possible).
My code is the following:
# calling the root file after using hadd to merge all files
rootFile = path+"merge.root"
f = XROOT.TFile(rootFile,'read')
tree = f.Get('tevent_2nd_integral')
# making a chain
PDMchain=TChain("tevent_2nd_integral")
for filename in sorted(os.listdir(path)):
if filename.endswith('.root') and("CPU_RUN_MAIN" in filename) :
PDMchain.Add(filename)
pdm_counts = []
#First method using python pyl class
leaves = tree.GetListOfLeaves()
# define dynamically a python class containing root Leaves objects
class PyListOfLeaves(dict) :
pass
# create an istance
pyl = PyListOfLeaves()
for i in range(0,leaves.GetEntries() ) :
leaf = leaves.At(i)
name = leaf.GetName()
# add dynamically attribute to my class
pyl.__setattr__(name,leaf)
for iev in range(0,nEntries_pixel) :
tree.GetEntry(iev)
pdm_counts.append(pyl.photon_count_data.GetValue())
# the Draw method
count = tree.Draw("photon_count_data","","")
pdm_counts.append(np.array(np.frombuffer(tree.GetV1(), dtype=np.float64, count=count)))
#ROOT buffer method
for event in PDMchain:
pdm_data_for_this_event = event.photon_count_data
pdm_data_for_this_event.SetSize(2304) #ROOT buffer
pdm_couts.append(np.array(pdm_data_for_this_event,copy=True))
with the python class method the array pdm_counts is filled with just the first element contained in photon_count_data
with the Draw method I get a segmentation violation or a strange kernel issue
with the root buffer method I get right back a list containing all the 2304 (48x48) values but they are completely different from those in the photon_count_data, id est, negative values or orders of magnitude senseless
Could you tell me where I’m wrong or if there could be a more elegant and quick method to do so.
Thanks in advance
actually I found the solution and I would like to share it if anytime someone will need it!
Actually the third method explained
for event in PDMchain:
pdm_data_for_this_event = event.photon_count_data
pdm_data_for_this_event.SetSize(2304) #ROOT buffer
pdm_couts.append(np.array(pdm_data_for_this_event,copy=True))
works, but unfortunately I was using Spyder to visualize data and for some reason it return strange values which are not right! So...don't use Spyder!!!
Moreover another method works fine:
from root_pandas import read_root
data = read_root('merge.root', 'tevent_2nd_integral', columns=['cpu_packet_time', 'photon_count_data'])
Cheers!

Pymongo: insert_many() gives "TypeError: document must be instance of dict" for list of dicts

I haven't been able to find any relevant solutions to my problem when googling, so I thought I'd try here.
I have a program where I parse though folders for a certain kind of trace files, and then save these in a MongoDB database. Like so:
posts = function(source_path)
client = pymongo.MongoClient()
db = client.database
collection = db.collection
insert = collection.insert_many(posts)
def function(...):
....
post = parse(trace)
posts.append(post)
return posts
def parse(...):
....
post = {'Thing1': thing,
'Thing2': other_thing,
etc}
return post
However, when I get to "insert = collection.insert_many(posts)", it returns an error:
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
According to the debugger, "posts" is a list of about 1000 dicts, which should be vaild input according to all of my research. If I construct a smaller list of dicts and insert_many(), it works flawlessly.
Does anyone know what the issue may be?
Some more debugging revealed the issue to be that the "parse" function sometimes returned None rather than a dict. Easily fixed.

How to use Bioproject ID, for example, PRJNA12997, in biopython?

I have an Excel file in which are given more then 2000 organisms, where each one of them has a Bioproject ID associated (like PRJNA12997). The idea is to use these IDs to get the sequence for a later multiple alignment with other five sequences that I have in a text file.
Can anyone help me understand how I can do this using biopython? At least the part with the bioproject ID.
You can first get the info using Bio.Entrez:
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
# This call to efetch fails sometimes with a 400 error.
handle = Entrez.efetch(db="bioproject", id="PRJNA12997")
I've been trying, and Entrez.read(handle) doesn't seems to work. But if you do record_xml = handle.read() you'll get the XML entry for this record. In this XML you can get the ID for the organism, in this case 12997.
handle = Entrez.esearch(db="nuccore", term="12997[BioProject]")
search_results = Entrez.read(handle)
Now you can efecth from your search results. At this point you should use Biopython to parse whatever you will get in the efetch step, playing with the rettype http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/
for result in search_results["IdList"]:
entry = Entrez.efetch(db="nuccore", id=result, rettype="fasta")
this_seq_in_fasta = entry.read()

Using deepstream List for tens of thousands unique values

I wonder if it's a good/bad idea to use deepstream record.getList for storing a lot of unique values, for example, emails or any other unique identifiers. The main purpose is to be able to answer a question quickly whether we already have, say, a user with such email (email in use) or another record by specific unique field.
I made few experiments today and got two problems:
1) when I tried to populate the list with few thousands values I got
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
and my deepstream server went off. I was able to fix it by adding more memory to the server node process with this flag
--max-old-space-size=5120
it doesn't look fine but allowed me to make a list with more than 5000 items.
2) It wasn't enough for my tests so I precreated the list with 50000 items and put the data directly to rethinkdb table and got another issue on getting the list or modifing it:
RangeError: Maximum call stack size exceeded
I was able to fix it with another flag:
--stack-size=20000
It helps but I believe it's only matter of time when one of those errors appear in production when the list size reaches proper value. I don't know really whether it's nodejs, javascript, deepstream or rethinkdb issue. That's all in general made me think that I try to use deepstream List wrong way. Please, let me know. Thank you in advance!
Whilst you can use lists to store arrays of strings, they are actually intended as collections of recordnames - the actual data would be stored in the record itself, the list would only manage the order of the records.
Having said that, there are two open Github issues to improve performance for very long lists by sending more efficient deltas and by introducing a pagination option
Interesting results in regards to memory though, definitely something that needs to be handled more gracefully. In the meantime you could drastically improve performance by combining updates into one:
var myList = ds.record.getList( 'super-long-list' );
// Sends 10.000 messages
for( var i = 0; i < 10000; i++ ) {
myList.addEntry( 'something-' + i );
}
// Sends 1 message
var entries = [];
for( var i = 0; i < 10000; i++ ) {
entries.push( 'something-' + i );
}
myList.setEntries( entries );

Can I use the the power of Generics to solve my issue

I have a wierd issue. I am loading 1k invoice objects, header first then details in my DAL. I am using VB.NET on this project. I am able to get the invoice headers just fine. When I get to loading the details for each invoice I am getting a timeout on SQL Server. I increased the timeout to 5 minutes but still the same thing. If I reduce the invoice count to 200 it works fine.
Here is what I am doing
//I already loaded the invoice headers. I am now iterating each invoice to get it's detail
For Each invoice As Invoice In invoices
drInvoiceItems = DBSqlHelperFactory.ExecuteReader(CONNECTION_STRING, CommandType.StoredProcedure, "dbo.getinvoiceitem", _
New SqlParameter("#invoicenumber", invoice.InvoiceNumber))
While drInvoiceItems.Read()
invoice.LineItems.Add(New InvoiceLine(drInvoiceItems("id"), drInvoiceItems("inv_id"), drInvoiceItems("prodid"), drInvoiceItems("name"), drInvoiceItems("barcode"), drInvoiceItems("quantity"), drInvoiceItems("costprice")))
End While
Next
Return invoices
I am aware that I am firing 1k connections to the DB due to the iterations. Can't I load all the line items with one select statement and then do something like
For Each invoice As Invoice In invoices
invoice.Items.Add(invoiceItems.Find(Function(i as InvoiceItem),i.InvoiceNumber = invoice.InvoiceNumber))
Next
I get the error whenusing the lambda funcion above
Error 1 Value of type 'System.Collections.Generic.List(Of BizComm.InvoiceLine)' cannot be converted to 'BizComm.InvoiceLine'. C:\Projects\BizComm\InvoiceDAL.vb 75 35 BizComm
One thing I have done when iterating through items in the past is use the same Connection object for all the necessary read activities. It seems to greatly enhance performance.
I'd also look at the database to see whether the dbo.getinvoiceitem procedure can be improved, or if another procedure can be written which will give you all the line items for a group of invoices (perhaps by date or customer/vendor) rather than just one header at a time. Then you can more effectively apply your iteration over the invoice collection and add the lines to the headers.
You can also check to see whether there is an effective index on column that the #invoicenumber parameter references.
From your code, it looks like you are not closing the connections and datareaders. See if you can place your connections and datareaders in a USING statement:
Using con As New SqlConnection(connectionString)
....
End Using
The DBSqlHelperFactory opens a connection, but can't close it since the connection is needed after its return. I'd modify the code, so that you open one connection and pass it to DBSqlHelperFactory as a parameter.
To quickly pick up these issues, I always debug with:
Max Pool Size=1;
added to the end of the connection string. That will quickly throw an error any time you forget to close a connection.
Why load InvoiceItems before hand? Can't you load it on demand?
i.e. when you need to get the Items, call a method on Invoice instance (myInvoice.GetItems)
EDIT: It will be better to understand the full picture of what you are trying to do.
Is it really required to get all Invoices as well?
Why not select all the line items for all the invoices you need in a single query. then split the results up into multiple invoice objects?
Re: how do I map between collections?
One implementation could be: create 1000 anemic Invoice object, chuck them in a Dictionary which goes from Id to Invoice. Then when you select the line items you include the invoice id, look up the anemic invoice and add the line to it.