Samtools pysam mate - samtools

I am using pysam to dome data mining on .bam files. I want to check if a read has a mapped mate. The command
mate = samfile.mate(read1)
throws an error if the mate is not mapped, so if I do
if samfile.mate(read1): ...
that throws an error, too. Any other way to check if the read has a mapped mate?
Thanks.

AlignedSegment.mate_is_unmapped should work for you. See docs for pysam
if not read1.mate_is_unmapped:
mate = samfile.mate(read1)
...
Alternatively, you could just catch the exception and move on, but relying on exception handling for normal program flow is not ideal.

Related

Python: If error occurs anywhere, do specific line of code

I have a script I'm trying to write to process a large amount of data. There are, of course, potential for errors. In the script I need to connect to databases. If the script encounters an error, the code never reaches the point where the connection to the database is terminated. I'd like to have something in my python code that will recognize an error occurs, not matter where, and if nothing else at least close those databases. Does something like this exist? I know I can use try/except, but that would only work if I know exactly where I could get the error? I'm basically looking for a catchall to close my databases in the event an error occurs in a location I didn't anticipate.
To run certain cleanup code even if there is an error, use the finally block:
try:
# do stuff, possible exception
except:
# run this if exception
finally:
# always run this, even if exception
Reference: https://docs.python.org/3/tutorial/errors.html#defining-clean-up-actions

WebSphere wsadmin testConnection error message

I'm trying to write a script to test all DataSources of a WebSphere Cell/Node/Cluster. While this is possible from the Admin Console a script is better for certain audiences.
So I found the following article from IBM https://www.ibm.com/support/knowledgecenter/en/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/txml_testconnection.html which looks promising as it describles exactly what I need.
After having a basic script like:
ds_ids = AdminConfig.list("DataSource").splitlines()
for ds_id in ds_ids:
AdminControl.testConnection(ds_id)
I experienced some undocumented behavior. Contrary to the article above the testConnection function does not always return a String, but may also throw a exception.
So I simply use a try-catch block:
try:
AdminControl.testConnection(ds_id)
except: # it actually is a com.ibm.ws.scripting.ScriptingException
exc_type, exc_value, exc_traceback = sys.exc_info()
now when I print the exc_value this is what one gets:
com.ibm.ws.scripting.ScriptingException: com.ibm.websphere.management.exception.AdminException: javax.management.MBeanException: Exception thrown in RequiredModelMBean while trying to invoke operation testConnection
Now this error message is always the same no matter what's wrong. I tested authentication errors, missing WebSphere Variables and missing driver classes.
While the Admin Console prints reasonable messages, the script keeps printing the same meaningless message.
The very weird thing is, as long as I don't catch the exception and the script just exits by error, a descriptive error message is shown.
Accessing the Java-Exceptions cause exc_value.getCause() gives None.
I've also had a look at the DataSource MBeans, but as they only exist if the servers are started, I quickly gave up on them.
I hope someone knows how to access the error messages I see when not catching the Exception.
thanks in advance
After all the research and testing AdminControl seems to be nothing more than a convinience facade to some of the commonly used MBeans.
So I tried issuing the Test Connection Service (like in the java example here https://www.ibm.com/support/knowledgecenter/en/SSEQTP_8.5.5/com.ibm.websphere.base.doc/ae/cdat_testcon.html
) directly:
ds_id = AdminConfig.list("DataSource").splitlines()[0]
# other queries may be 'process=server1' or 'process=dmgr'
ds_cfg_helpers = __wat.AdminControl.queryNames("WebSphere:process=nodeagent,type=DataSourceCfgHelper,*").splitlines()
try:
# invoke MBean method directly
warning_cnt = __wat.AdminControl.invoke(ds_cfg_helpers[0], "testConnection", ds_id)
if warning_cnt == "0":
print = "success"
else:
print "%s warning(s)" % warning_cnt
except ScriptingException as exc:
# get to the root of all evil ignoring exception wrappers
exc_cause = exc
while exc_cause.getCause():
exc_cause = exc_cause.getCause()
print exc_cause
This works the way I hoped for. The downside is that the code gets much more complicated if one needs to test DataSources that are defined on all kinds of scopes (Cell/Node/Cluster/Server/Application).
I don't need this so I left it out, but I still hope the example is useful to others too.

File: 0: Unexpected from Google BigQuery load job

I've a compressed json file (900MB, newline delimited) and load into a new table via bq command and get the load failure:
e.g.
bq load --project_id=XXX --source_format=NEWLINE_DELIMITED_JSON --ignore_unknown_values mtdataset.mytable gs://xxx/data.gz schema.json
Waiting on bqjob_r3ec270ec14181ca7_000001461d860737_1 ... (1049s) Current status: DONE
BigQuery error in load operation: Error processing job 'XXX:bqjob_r3ec270ec14181ca7_000001461d860737_1': Too many errors encountered. Limit is: 0.
Failure details:
- File: 0: Unexpected. Please try again.
Why the error?
I tried again with the --max_bad_records, still not useful error message
bq load --project_id=XXX --source_format=NEWLINE_DELIMITED_JSON --ignore_unknown_values --max_bad_records 2 XXX.test23 gs://XXX/20140521/file1.gz schema.json
Waiting on bqjob_r518616022f1db99d_000001461f023f58_1 ... (319s) Current status: DONE
BigQuery error in load operation: Error processing job 'XXX:bqjob_r518616022f1db99d_000001461f023f58_1': Unexpected. Please try again.
And also cannot find any useful message in the console.
To BigQuery team, can you have a look using the job ID?
As far I know there are two error sections on a job. There is one error result, and that's what you see now. And there is a second, which should be a stream of errors. This second is important as you could have errors in it, but the actual job might succeed.
Also you can set the --max_bad_records=3 on the BQ tool. Check here for more params https://developers.google.com/bigquery/bq-command-line-tool
You probably have an error that is for each line, so you should try a sample set from this big file first.
Also there is an open feature request to improve the error message, you can star (vote) this ticket https://code.google.com/p/google-bigquery-tools/issues/detail?id=13
This answer will be picked up by the BQ team, so for them I am sharing that: We need an endpoint where we can query based on a jobid, the state, or the stream of errors. It would help a lot to get a full list of errors, it would help debugging the BQ jobs. This could be easy to implement.
I looked up this job in the BigQuery logs, and unfortunately, there isn't any more information than "failed to read" somewhere after about 930 MB have been read.
I've filed a bug that we're dropping important error information in one code path and submitted a fix. However, this fix won't be live until next week, and all that will do is give us more diagnostic information.
Since this is repeatable, it isn't likely a transient error reading from GCS. That means one of two problems: we have trouble decoding the .gz file, or there is something wrong with that particular GCS object.
For the first issue, you could try decompressing the file and re-uploading it as uncompressed. While it may sound like a pain to send gigabytes of data over the network, the good news is that the import will be faster since it can be done in parallel (we can't import a compressed file in parallel since it can only be read sequentially).
For the second issue (which is somewhat less likely) you could try downloading the file yourself to make sure you don't get errors, or try re-uploading the same file and seeing if that works.

Return key in process

The code was run as:
u = subprocess.Popen(['process','abc','def','','ghi','jkl'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
It doesn't work below due to an error occurred:
ValueError: I/O operation on closed file
I suggest you to try pexpect, it is far more well-suited for this tasks (actually, it is a tool built for these kind of tasks).
You can also browse througn examples and see what its usage looks like.

Vb2012 admin permission

I have coded up a cleaner type of program but getting a huge error with permissions ( I think )
The error message is printed out like so.
An unhandled exception of type 'System.UnauthorizedAccessException' occurred in mscorlib.dll
Additional information: Access to the path 'C:\Windows\CSC\v2.0.6' is denied.
And it says that the error part of the code is this line
For Each fi In DirectroyInfos.GetFiles(filter)
But here is the fill block of code.
For Each fi In DirectroyInfos.GetFiles(filter)
Try
file_count = file_count + 1
file_size = CULng(file_size + fi.Length)
FilesToDelete.Add(fi.FullName)
Catch ex As UnauthorizedAccessException
'There's really no pretty way to handle this exception
Catch ex As FileNotFoundException
'There's really no pretty way to handle this exception
End Try
I think it's some sort of permission problem I have windows 7 and have noticed there are a lot of run has admin problems.. I think its trying to remove or gain access to a file which it does not have permission to get .
Is there anyway to fix this? Am I missing something in my coding?
As you said in your question, you don't have write access that file.
Either skip files you don't have access to, or run the application as admin.
There appears to be a limitiation with the win32 api that will skip all files in the folder if you do not have access to one of them.
Check out this solution from Microsoft Connect:
How to: Iterate Through a Directory Tree (C# Programming Guide)