Querying NCBI for a sequence from ncbi via Biopython - sequence

How can I query NCBI for sequences given a chromosome's Genbank identifier, and start and stop positions using Biopython?
CP001665 NAPP TILE 6373 6422 . + . cluster=9;
CP001665 NAPP TILE 6398 6447 . + . cluster=3;
CP001665 NAPP TILE 6423 6472 . + . cluster=3;
CP001665 NAPP TILE 6448 6497 . + . cluster=3;
CP001665 NAPP TILE 7036 7085 . + . cluster=10;
CP001665 NAPP TILE 7061 7110 . + . cluster=3;
CP001665 NAPP TILE 7073 7122 . + . cluster=3;

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "sample#example.org"
handle = Entrez.efetch(db="nuccore",
id="CP001665",
rettype="gb",
retmode="text")
whole_sequence = SeqIO.read(handle, "genbank")
print whole_sequence[6373:6422]
Once you know the id and the database to fetch from, use Entrez.efetch to get a handle to that file. You should specify the returning type (rettype="gb") and the mode (retmode="text"), to get a handler to the filelike data.
Then pass this handler to SeqIO, which should return a SeqRecord object. One nice feature of the SeqRecords is that they can be cleanly sliced as lists. If you can retrieve the starting and ending points from somewhere, the above print statement returns:
ID: CP001665.1
Name: CP001665
Description: Escherichia coli 'BL21-Gold(DE3)pLysS AG', complete genome.
Number of features: 0
Seq('GCGCTAACCATGCGAGCGTGCCTGATGCGCTACGCTTATCAGGCCTACG', IUPACAmbiguousDNA())

Probably something similar to this?
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
handle = Entrez.efetch(db="genome", id="56", rettype="fasta")
You'll need to determine the right database and query for it. I'd suggest using the Assembly Advanced Search Builder to build your query and see if you can go around the problem that way:
http://www.ncbi.nlm.nih.gov/assembly/advanced

To download Nucleotide/Protein sequences it is not necessary to use Biopython.You can use urllib2 instead or Biopython or Bioperl.Here List contaiins NCBI GI IDs.
import urllib2
List = ['440906003','440909279','440901052']
for gi in List:
url = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.cgi? tool=portal&sendto=on&log$=seqview&db=protein&dopt=fasta&sort=&val='+gi+'&from=begin&to=end&maxplex=1'
resp = urllib2.urlopen(url)
page = resp.read()
print (page),

Related

SPARQLWrapper - Connecting RDFLib with Fuseki Server

I am a newbie of SPARQL and Fuseki. I set up Fuseki Server with
fuseki-server --update --mem /address_act
to create a dataset.
And I have a graph containing many triples and then I want to add these triples into the dataset via SPARQLUPDATE. Here are the code to get triples into graph and try to save it to the dataset via Fuseki Server:
import requests
import rdflib
import re
from rdflib import ConjunctiveGraph, Graph, Literal, URIRef
from rdflib.plugins.stores import sparqlstore
query_endpoint = 'http://localhost:3030/address_act/query'
update_endpoint = 'http://localhost:3030/address_act/update'
store = sparqlstore.SPARQLUpdateStore()
store.open((update_endpoint, update_endpoint))
default_graph = URIRef('http://example.org/default-graph')
ng = Graph(store, identifier=default_graph)
g = Graph()
g1 = Graph()
for i in range(1,2):
url = 'http://gnafld.net/address/?per_page=3&page=' + str(i)
g.parse(url)
page = g.query("""SELECT ?subject
WHERE {
?subject a <http://gnafld.net/def/gnaf#Address>.
}""")
for row in page:
ad_info = requests.get(row.subject).content
g1.parse(data=ad_info, format='turtle')
#print('The number of triples in Graph: {}'.format(len(g1)))
ng.update(
u'INSERT DATA { %s }' % g1.serialize(format='turtle')
)
Besides, I have another way to do this using SPARQLWrapper:
import requests
import rdflib
import re
from rdflib import ConjunctiveGraph, Graph, Literal, URIRef
from rdflib.plugins.stores import sparqlstore
from SPARQLWrapper import SPARQLWrapper
query_endpoint = 'http://localhost:3030/address_act/query'
update_endpoint = 'http://localhost:3030/address_act/update'
store = sparqlstore.SPARQLUpdateStore()
store.open((update_endpoint, update_endpoint))
default_graph = URIRef('http://example.org/default-graph')
g = Graph()
g1 = Graph(identifier=default_graph)
for i in range(1,2):
url = 'http://gnafld.net/address/?per_page=3&page=' + str(i)
g.parse(url)
page = g.query("""SELECT ?subject
WHERE {
?subject a <http://gnafld.net/def/gnaf#Address>.
}""")
for row in page:
ad_info = requests.get(row.subject).content
g1.parse(data=ad_info, format='turtle')
#print('The number of triples in Graph: {}'.format(len(g1)))
for s,p,o in g1:
queryStringUpload = 'INSERT DATA {GRAPH <http://example.org/default-graph> {%s %s %s}}' %(s,p,o)
sparql = SPARQLWrapper('http://localhost:3030/address_act/update')
sparql.setQuery(queryStringUpload)
sparql.method = 'POST'
sparql.query()
When I run the two above, the last sentence ng.update(u'INSERT DATA { %s }' % g1.serialize(format='turtle')) and sparql.query() in the program causes the error. I am sure these triples have existed in the graph, but when do update, both give the error like:
QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.
Response:
b'Error 400: Lexical error at line 11, column 59. Encountered: "\\\'" (39), after : "b"\n\n\nFuseki - version 3.7.0 (Build date: 2018-04-05T11:04:59+0000)\n'
and error like:
QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.
Response:
b'Error 400: Line 1, column 56: Unresolved prefixed name: http:\n\n\nFuseki - version 3.7.0 (Build date: 2018-04-05T11:04:59+0000)\n'
It seems that the SPARQL update operation does not work. Is there any grammar mistake so that the triples cannot be inserted? Any idea how to solve this? Grateful for any effort.

Custom SPARQL functions in rdflib

What is a good way to hook a custom SPARQL function into rdflib?
I have been looking around in rdflib for an entry point for custom function. I found no dedicated entry point but found that rdflib.plugins.sparql.CUSTOM_EVALS might be a place to add the custom function.
So far I have made an attempt with the code below. It seems "dirty" to me. I am calling a "hidden" function (_eval) and I am not sure I got all the argument updating correct. Beyond the custom_eval.py example code (which form the basis for my code) I found little other code or documentation about CUSTOM_EVALS.
import rdflib
from rdflib.plugins.sparql.evaluate import evalPart
from rdflib.plugins.sparql.sparql import SPARQLError
from rdflib.plugins.sparql.evalutils import _eval
from rdflib.namespace import Namespace
from rdflib.term import Literal
NAMESPACE = Namespace('//custom/')
LENGTH = rdflib.term.URIRef(NAMESPACE + 'length')
def customEval(ctx, part):
"""Evaluate custom function."""
if part.name == 'Extend':
cs = []
for c in evalPart(ctx, part.p):
if hasattr(part.expr, 'iri'):
# A function
argument = _eval(part.expr.expr[0], c.forget(ctx, _except=part.expr._vars))
if part.expr.iri == LENGTH:
e = Literal(len(argument))
else:
raise SPARQLError('Unhandled function {}'.format(part.expr.iri))
else:
e = _eval(part.expr, c.forget(ctx, _except=part._vars))
if isinstance(e, SPARQLError):
raise e
cs.append(c.merge({part.var: e}))
return cs
raise NotImplementedError()
QUERY = """
PREFIX custom: <%s>
SELECT ?s ?length WHERE {
BIND("Hello, World" AS ?s)
BIND(custom:length(?s) AS ?length)
}
""" % (NAMESPACE,)
rdflib.plugins.sparql.CUSTOM_EVALS['exampleEval'] = customEval
for row in rdflib.Graph().query(QUERY):
print(row)
So first off, I want to thank you for showing how you implemented a new SPARQL function.
Secondly, by using your code I was able to create a SPARQL function that evaluates two strings by using the Levenshtein distance. It has been really insightful and I wish to share it for it holds additional documentation that could help other developers creating their own custom SPARQL functions.
# Import needed to introduce new SPARQL function
import rdflib
from rdflib.plugins.sparql.evaluate import evalPart
from rdflib.plugins.sparql.sparql import SPARQLError
from rdflib.plugins.sparql.evalutils import _eval
from rdflib.namespace import Namespace
from rdflib.term import Literal
# Import for custom function calculation
from Levenshtein import distance as levenshtein_distance # python-Levenshtein==0.12.2
def SPARQL_levenshtein(ctx:object, part:object) -> object:
"""
The first two variables retrieved from a SPARQL-query are compared using the Levenshtein distance.
The distance value is then stored in Literal object and added to the query results.
Example:
Query:
PREFIX custom: //custom/ # Note: this part refereces to the custom function
SELECT ?label1 ?label2 ?levenshtein WHERE {
BIND("Hello" AS ?label1)
BIND("World" AS ?label2)
BIND(custom:levenshtein(?label1, ?label2) AS ?levenshtein)
}
Retrieve:
?label1 ?label2
Calculation:
levenshtein_distance(?label1, ?label2) = distance
Output:
Save distance in Literal object.
:param ctx: <class 'rdflib.plugins.sparql.sparql.QueryContext'>
:param part: <class 'rdflib.plugins.sparql.parserutils.CompValue'>
:return: <class 'rdflib.plugins.sparql.processor.SPARQLResult'>
"""
# This part holds basic implementation for adding new functions
if part.name == 'Extend':
cs = []
# Information is retrieved and stored and passed through a generator
for c in evalPart(ctx, part.p):
# Checks if the function holds an internationalized resource identifier
# This will check if any custom functions are added.
if hasattr(part.expr, 'iri'):
# From here the real calculations begin.
# First we get the variable arguments, for example ?label1 and ?label2
argument1 = str(_eval(part.expr.expr[0], c.forget(ctx, _except=part.expr._vars)))
argument2 = str(_eval(part.expr.expr[1], c.forget(ctx, _except=part.expr._vars)))
# Here it checks if it can find our levenshtein IRI (example: //custom/levenshtein)
# Please note that IRI and URI are almost the same.
# Earlier this has been defined with the following:
# namespace = Namespace('//custom/')
# levenshtein = rdflib.term.URIRef(namespace + 'levenshtein')
if part.expr.iri == levenshtein:
# After finding the correct path for the custom SPARQL function the evaluation can begin.
# Here the levenshtein distance is calculated using ?label1 and ?label2 and stored as an Literal object.
# This object is than stored as an output value of the SPARQL-query (example: ?levenshtein)
evaluation = Literal(levenshtein_distance(argument1, argument2))
# Standard error handling and return statements
else:
raise SPARQLError('Unhandled function {}'.format(part.expr.iri))
else:
evaluation = _eval(part.expr, c.forget(ctx, _except=part._vars))
if isinstance(evaluation, SPARQLError):
raise evaluation
cs.append(c.merge({part.var: evaluation}))
return cs
raise NotImplementedError()
namespace = Namespace('//custom/')
levenshtein = rdflib.term.URIRef(namespace + 'levenshtein')
query = """
PREFIX custom: <%s>
SELECT ?label1 ?label2 ?levenshtein WHERE {
BIND("Hello" AS ?label1)
BIND("World" AS ?label2)
BIND(custom:levenshtein(?label1, ?label2) AS ?levenshtein)
}
""" % (namespace,)
# Save custom function in custom evaluation dictionary.
rdflib.plugins.sparql.CUSTOM_EVALS['SPARQL_levenshtein'] = SPARQL_levenshtein
for row in rdflib.Graph().query(query):
print(row)
To answer your question: "What is a good way to hook a custom SPARQL function into rdflib?
Currently I'm developing a class that handles RDF data and I believe it might be best to implement the following code in to __init__function.
For example:
class ClassName():
"""DOCSTRING"""
def __init__(self):
"""DOCSTRING"""
# Save custom function in custom evaluation dictionary.
rdflib.plugins.sparql.CUSTOM_EVALS['SPARQL_levenshtein'] = SPARQL_levenshtein
Please note, this SPARQL function will only work for the endpoint on which it is implemented. Even though the SPARQL syntax in the query is correct, it is not possible applying the function in SPARQL-queries used for databases like DBPedia. The DBPedia endpoint does not support this custom function (yet).

Is there a syntax error in this iCalendar event code?

I have a 'Subscribed Calendar' on my iPhone which fetches calendar events in the iCalendar format from a URL. The calendar works fine except it does not show the following event, is there a reason why? All other events show fine. I'm thinking there's maybe a problem with the way the event is formatted/syntax but I can't seem to find anything that may be causing it.
BEGIN:VEVENT SUMMARY:Meet with tenant DESCRIPTION:Notes: Meter readings\, SoC images\, post box key\, finalise Let Procedure.\nLocation: Apartment X X Woodland Road\, Bebington\, Wirral\, CHXX XXX\nEmployee: Michael Le Brocq\nStatus: Confirmed\nOriginally Arranged: 07/09/16 12:18:43 by Lucy Christian\nLast Updated: 12/09/16 15:57:05 by Michael Le Brocq\n UID:2432 STATUS:CONFIRMED DTSTART:20160914T160000 DTEND:20160914T151500 LAST-MODIFIED:20160912T155705 LOCATION:Apartment 5 20 Woodland Road\, Bebington\, Wirral\, CH42 4NT END:VEVENT
Code used to generate calendar events;
<?php
require_once('../inc/app_top_cron.php');
if (!empty($_GET)) {
// define and escape each GET as a variable
foreach ($_GET as $key => $value) {
if (!empty($value)) {
${$key} = mysqli_real_escape_string($con, PrepareInput($value));
}
}
}
// company details
$company_details_query = mysqli_query($con, "SELECT company_id, company_trading_name FROM company WHERE company_token = '" . $company . "' LIMIT 1") or die(mysql_error());
$company_details = mysqli_fetch_array( $company_details_query );
// the iCal date format. Note the Z on the end indicates a UTC timestamp.
define('DATE_ICAL', 'Ymd\THis');
// max line length is 75 chars. New line is \\r\n
$output = "BEGIN:VCALENDAR
METHOD:PUBLISH
VERSION:2.0
PRODID:-//Property Software//Calendar//EN
CALSCALE:GREGORIAN
X-WR-CALNAME:" . $company_details['company_trading_name'] . " Calendar" . "
\r\n";
$sql = "SELECT ce.*, ces.calendar_event_status_name
FROM calendar_event ce
INNER JOIN calendar_event_status ces
on ce.calendar_event_status = ces.calendar_event_status_id
WHERE ce.calendar_event_company_id = '" . $company_details['company_id'] . "'";
$calendar_event_query = mysqli_query($con, $sql) or die(mysql_error());
while($row = mysqli_fetch_array( $calendar_event_query )) {
$calendar_event_subject = str_replace(",","\,", $row['calendar_event_subject']);
$calendar_event_description = str_replace(",","\,", $row['calendar_event_description']);
$calendar_event_description = str_replace("\r\n","\\n", $calendar_event_description);
$calendar_event_location = str_replace(",","\,", $row['calendar_event_location']);
// loop through events
$output .=
"BEGIN:VEVENT
SUMMARY:" . $calendar_event_subject . "
DESCRIPTION:" . $calendar_event_description . "
UID:" . $row["calendar_event_id"] . "
STATUS:" . $row["calendar_event_status_name"] . "
DTSTART:" . date(DATE_ICAL, strtotime($row["calendar_event_start"])) . "
DTEND:" . date(DATE_ICAL, strtotime($row["calendar_event_end"])) . "
LAST-MODIFIED:" . date(DATE_ICAL, strtotime($row["calendar_event_date_updated"])) . "
LOCATION:" . $calendar_event_location . "
END:VEVENT
";
}
// close calendar
$output .= "END:VCALENDAR";
echo $output;
mysqli_close($con);
?>
This:
$calendar_event_description = str_replace("\r\n","\\n", $calendar_event_description);
You're taking \r\n (carriage return + newline) and turning them into a literal \ character, followed by an n. That's not a new line (one byte/character), it's TWO bytes/characters, and has no special meaning to anything.
And as per my comments above, don't do multiline string building/concatenating. it makes for hard-to-read and hard-to-follow debugging. Use a heredoc instead:
$output = <<<EOL
BEGIN:VEVENT
SUMMARY: {$calendar_event_subject}
DESCRIPTION: {$calendar_event_description}
UID: {$row["calendar_event_id"]}
etc...
EOL;
Note the lack of any " or . - making for a much more compact and easy-to-follow code block. If you need to change the line breaks afterwards, because your system uses something different than what your code editor is embedding, you can do that with a simple str_replace() after finishing building the string.
The icalendar standard requires \r\n line breaks between lines. You can validate the icalendar output using the validator at http://icalendar.org/validator.html

Rename screenshots taken on failure in PHPUnit Selenium

PHPUnit has an option to take a screenshot upon a Selenium test case failure. However, the screenshot filename generated is a hash of something - I don't know what exactly. While the test result report allows me to match a particular failed test case with a screenshot filename, this is troublesome to use.
If I could rename the screenshot to use the message from the failed assert as well as a timestamp for instance, it makes the screenshots much easier to cross-reference. Is there any way to rename the generated screenshot filename at all?
You could try something like this (it's works with selenium2):
protected function tearDown() {
$status = $this->getStatus();
if ($status == \PHPUnit_Runner_BaseTestRunner::STATUS_ERROR || $status == \PHPUnit_Runner_BaseTestRunner::STATUS_FAILURE) {
$file_name = sys_get_temp_dir() . '/' . get_class($this) . ':' . $this->getName() . '_' . date('Y-m-d_H:i:s') . '.png';
file_put_contents($file_name, $this->currentScreenshot());
}
}
Also uncheck
protected $captureScreenshotOnFailure = FALSE;
I ended up using a modified version of #sectus' answer:
public function onNotSuccessfulTest(Exception $e) {
$file_name = '/' . date('Y-m-d_H-i-s') . ' ' . $this->getName() . '.png';
file_put_contents($this->screenshotPath . $file_name, base64_decode($this->captureEntirePageScreenshotToString()));
parent::onNotSuccessfulTest($e);
}
Although the conditional check in tearDown() works fine, based on Extending phpunit error message, I decided to go with onNotSuccessfulTest() as it seemed cleaner.
The filename could not accept colons :, or I would get an error message from file_get_contents: failed to open stream: Protocol error
The function currentScreenshot also did not exist, so I ended up taking the screenshot in a different way according to http://www.devinzuczek.com/2011/08/taking-a-screenshot-with-phpunit-and-selenium-rc/.
Another method I played around with, as I still wanted to use $this->screenshotUrl and $this->screenshotPath for convenient configuration:
I overwrote takeScreenshot from https://github.com/sebastianbergmann/phpunit-selenium/blob/master/PHPUnit/Extensions/SeleniumTestCase.php
protected function takeScreenshot() {
if (!empty($this->screenshotPath) &&
!empty($this->screenshotUrl)) {
$file_name = '/' . date('Y-m-d_H-i-s') . ' ' . $this->getName() . '.png';
file_put_contents($this->screenshotPath . $file_name, base64_decode($this->captureEntirePageScreenshotToString()));
return 'Screenshot: ' . $this->screenshotUrl . '/' . $file_name . ".png\n";
} else {
return '';
}
}

Compiling a week's worth of tweets automatically?

I'd like to be able to run a script that parsed through the twitter page and compiled a list of tweets for a given time period - one week to be more exact. Ideally it should return the results as a html list that could then be posted in a blog. Like here:
http://www.perezfox.com/2009/07/12/the-week-in-tweet-for-2009-07-12/
I'm sure there's a script out there that could do it, unless the guy does it manually (that would be a big pain!). If there is such a script forgive my ignorance.
Thanks.
Use the Twitter search API. For instance, this query returns my tweets between 2009-07-10 and 2009-07-17:
http://search.twitter.com/search.atom?q=from:tormodfj&since=2009-07-10&until=2009-07-17
For anyone that's interested, I hacked together a quick PHP parser that will take the XML output of the above feed and turn it into a nice list. It's sensible if you post a lot of tweets to use the rpp parameter, so that your feed doesn't get clipped at 15. The maximum limit is 100. So by sticking this url into NetNewsWire (or equivalent feed reader):
http://search.twitter.com/search.atom?q=from:yourTwitterAccountHere&since=2009-07-13&until=2009-07-19&rpp=100
and exporting the xml to a hard file, you can use this script:
<?php
$date = "";
$in = 'links.xml'; //tweets
file_exists($in) ? $xml = simplexml_load_file($in) : die ('Failed to open xml data.');
foreach($xml->entry as $item)
{
$newdate = date("dS F", strtotime($item->published));
if ($date == "")
{
echo "<h2>$newdate</h2>\n<ul>\n";
}
elseif ($newdate != $date)
{
echo "</ul>\n<h2>$newdate</h2>\n<ul>\n";
}
echo "<li>\n<p>" . $item->content ." *</p>\n</li>\n";
$date = $newdate;
}
echo "</ul>\n";
?>