How to get the latest papers from pubmed - api

This is a bit of a specific question, but somebody must have done this before. I would like to get the latest papers from pubmed. Not papers about a certain subjects, but all of them. I thought to query depending on modification date (mdat). I use biopython.py and my code looks like this
handle = Entrez.egquery(mindate='2015/01/10',maxdate='2017/02/19',datetype='mdat')
results = Entrez.read(handle)
for row in results["eGQueryResult"]:
if row["DbName"]=="nuccore":
print(row["Count"])
However, this results in zero papers. If I add term='cancer' I get heaps of papers. So the query seems to need the term keyword... but I want all papers, not papers on a certain subjects. Any ideas how to do this?
thanks
carl

term is a required parameter, so you can't omit it in your call to Entrez.egquery.
If you need all the papers within a specified timeframe, you will probably need a local copy of MEDLINE and PubMed Central:
For MEDLINE, this involves getting a license. For PubMed Central, you
can download the Open Access subset without a license by ftp.

EDIT for python3. The idea is that the latest pubmed id is the same thing as the latest paper (which I'm not sure is true). Basically does a binary search for the latest PMID, then gives a list of the n most recent. This does not look at dates, and only returns PMIDs.
There is an issue however where not all PMIDs exist, for example https://pubmed.ncbi.nlm.nih.gov/34078719/ exists, https://pubmed.ncbi.nlm.nih.gov/34078720/ does not (retraction?), and https://pubmed.ncbi.nlm.nih.gov/34078721/ exists. This ruins the binary search since it can't know if it's found a PMID that hasn't been used yet, or if it has found one that has previously existed.
CODE:
import urllib
def pmid_exists(pmid):
url_stem = 'https://www.ncbi.nlm.nih.gov/pubmed/'
query = url_stem+str(pmid)
try:
request = urllib.request.urlopen(query)
return True
except urllib.error.HTTPError:
return False
def get_latest_pmid(guess = 27239557, _min_guess=None, _max_guess=None):
#print(_min_guess,'<=',guess,'<=',_max_guess)
if _min_guess and _max_guess and _max_guess-_min_guess <= 1:
#recursive base case, this guess must be the largest PMID
return guess
elif pmid_exists(guess):
#guess PMID exists, search for larger ids
_min_guess = guess
next_guess = (_min_guess+_max_guess)//2 if _max_guess else guess*2
else:
#guess PMID does not exist, search for smaller ids
_max_guess = guess
next_guess = (_min_guess+_max_guess)//2 if _min_guess else guess//2
return get_latest_pmid(next_guess, _min_guess, _max_guess)
#Start of program
n = 5
latest_pmid = get_latest_pmid()
most_recent_n_pmids = range(latest_pmid-n, latest_pmid)
print(most_recent_n_pmids)
OUTPUT:
[28245638, 28245639, 28245640, 28245641, 28245642]

Related

How to list all topics created by me

How can I get a list of all topics that I created?
I think it should be something like
%SEARCH{ "versions[-1].info.author = '%USERNAME%" type="query" web="Sandbox" }%
but that returns 0 results.
With "versions[-1]" I get all topics, and with "info.author = '%USERNAME%'" a list of the topics where the last edit was made by me. Having a list of all topics where any edit was made by me would be fine, too, but "versions.info.author = '%USERNAME%'" again gives 0 results.
I’m using Foswiki-1.0.9. (I know that’s quite old.)
The right syntax would be
%SEARCH{ "versions[-1,info.author='%USERNAME%']" type="query" web="Sandbox"}%
But that's not performing well, i.e. on your old Foswiki install.
Better is to install DBCacheContrib and DBCachePlugin and use
%DBQUERY{"createauthor='%WIKINAME%'"}%
This plugin caches the initial author in a way it does not have to retrieve the information from the revision system for every topic under consideration during query time.

Sentence segmentation and dependency parser

I’m pretty new to python (using python 3) and spacy (and programming too). Please bear with me.
I have three questions where two are more or less the same I just can’t get it to work.
I took the “syntax specific search with spacy” (example) and tried to make different things work.
My program currently reads txt and the normal extraction
if w.lower_ != 'music':
return False
works.
My first question is: How can I get spacy to extract two words?
For example: “classical music”
With the previous mentioned snippet I can make it extract either classical or music. But if I only search for one of the words I also get results I don’t want like.
Classical – period / era
Or when I look for only music
Music – baroque, modern
The second question is: How can I get the dependencies to work?
The example dependency with:
elif w.dep_ != 'nsubj': # Is it the subject of a verb?
return False
works fine. But everything else I tried does not really work.
For example, I want to extract sentences with the word “birthday” and the dependency ‘DATE’. (so the dependency is an entity)
I got
if d.ent_type_ != ‘DATE’:
return False
To work.
So now it would look like:
def extract_information(w,d):
if w.lower_ != ‘birthday’:
return False
elif d.ent_type_ != ‘DATE’:
return False
else:
return True
Does something like this even work?
If it works the third question would be how I can filter sentences for example with a DATE. So If the sentence contains a certain word and a DATE exclude it.
Last thing maybe, I read somewhere that the dependencies are based on the “Stanford typed dependencies manual”. Is there a list which of those dependencies work with spacy?
Thank you for your patience and help :)
Before I get into offering some simple suggestions to your questions, have you tried using displaCy's visualiser on some of your sentences?
Using an example sentence 'John's birthday was yesterday', you'll find that within the parsed sentence, birthday and yesterday are not necessarily direct dependencies of one another. So searching based on the birthday word having a dependency of a DATE type entity, might not be yield the best of results.
Onto the first question:
A brute force method would be to look for matching subsequent words after you have parsed the sentence.
doc = nlp(u'Mary enjoys classical music.')
for (i,token) in enumerate(doc):
if (token.lower_ == 'classical') and (i != len(doc)-1):
if doc[i+1].lower_ == 'music':
print 'Target Acquired!'
If you're unsure of what enumerate does, look it up. It's the pythonic way of using python.
To questions 2 and 3, one simple (but not elegant) way of solving this is to just identify in a parsed sentence if the word 'birthday' exists and if it contains an entity of type 'DATE'.
doc = nlp(u'John\'s birthday was yesterday.')
for token in doc:
if token.lower_ == 'birthday':
for entities in doc.ents:
if entities.label_ == 'DATE':
print 'Found ya!'
As for the list of dependencies, I presume you're referring to the Part-Of-Speech tags. Check out the documentation on this page.
Good luck! Hope that helped.

How to get professions on world of warcraft vanilla's addon?

I'm creating an AddOn for a private server of World of Warcraft 1.12.1/Classic/Vanilla and I need to check the user's professions.
The information I got was the APIs GetProfessions() and GetProfessionInfo() but I can't find out how to use them.
I wanna have a variable for each profession.
It's something like this:
prof1, prof2, archaeology, fishing, cooking, firstAid = GetProfessions()
Profession1 = GetProfessionInfo(prof1)
Profession2 = GetProfessionInfo(prof2)
Profession3 = GetProfessionInfo(archaeology)
Profession4 = GetProfessionInfo(fishing)
Profession5 = GetProfessionInfo(cooking)
Profession6 = GetProfessionInfo(firstAid)
Quick glance shows there are no special tradeskill functions in API in 1.12.1. AFAIR professions were just regular entries in the spellbook back then. As such you can iterate over spellbook with GetSpellName and check that either first return matches name of known profession or second return matches name of a known profession rank.
Additional info on each profession can be retrieved with GetTradeSkillLine, but only when this profession is opened in tradeskill window (i.e. window where you see list of items to craft).
If I'm understanding this correctly, GetProfessions() returns a table. You could always try a different way around the problem, like so:
professions = GetProfessions()
Profession1 = GetProfessionInfo(professions[1])
Profession2 = GetProfessionInfo(professions[2])
Profession3 = GetProfessionInfo(professions[3])
Profession4 = GetProfessionInfo(professions[4])
Profession5 = GetProfessionInfo(professions[5])
Profession6 = GetProfessionInfo(professions[6])
I'm not sure if this will solve your problem, but I figured I could weigh in my opinion. I have never done anything with World of Warcraft.

How to use Bioproject ID, for example, PRJNA12997, in biopython?

I have an Excel file in which are given more then 2000 organisms, where each one of them has a Bioproject ID associated (like PRJNA12997). The idea is to use these IDs to get the sequence for a later multiple alignment with other five sequences that I have in a text file.
Can anyone help me understand how I can do this using biopython? At least the part with the bioproject ID.
You can first get the info using Bio.Entrez:
from Bio import Entrez
Entrez.email = "Your.Name.Here#example.org"
# This call to efetch fails sometimes with a 400 error.
handle = Entrez.efetch(db="bioproject", id="PRJNA12997")
I've been trying, and Entrez.read(handle) doesn't seems to work. But if you do record_xml = handle.read() you'll get the XML entry for this record. In this XML you can get the ID for the organism, in this case 12997.
handle = Entrez.esearch(db="nuccore", term="12997[BioProject]")
search_results = Entrez.read(handle)
Now you can efecth from your search results. At this point you should use Biopython to parse whatever you will get in the efetch step, playing with the rettype http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/
for result in search_results["IdList"]:
entry = Entrez.efetch(db="nuccore", id=result, rettype="fasta")
this_seq_in_fasta = entry.read()

pseudo randomization in loop PsychoPy

I know other people have asked similar questions in past but I am still stuck on how to solve the problem and was hoping someone could offer some help. Using PsychoPy, I would like to present different images, specifically 16 emotional trials, 16 neutral trials and 16 face trials. I would like to pseudo randomize the loop such that there would not be more than 2 consecutive emotional trials. I created the experiment in Builder but compiled a script after reading through previous posts on pseudo randomization.
I have read the previous posts that suggest creating randomized excel files and using those, but considering how many trials I have, I think that would be too many and was hoping for some help with coding. I have tried to implement and tweak some of the code that has been posted for my experiment, but to no avail.
Does anyone have any advice for my situation?
Thank you,
Rae
Here's an approach that will always converge very quickly, given that you have 16 of each type and only reject runs of more than two emotion trials. #brittUWaterloo's suggestion to generate trials offline is very good--this what I do myself typically. (I like to have a small number of random orders, do them forward for some subjects and backwards for others, and prescreen them to make sure there are no weird or unintended juxtapositions.) But the algorithm below is certainly safe enough to do within an experiment if you prefer.
This first example assumes that you can represent a given trial using a string, such as 'e' for an emotion trial, 'n' neutral, 'f' face. This would work with 'emo', 'neut', 'face' as well, not just single letters, just change eee to emoemoemo in the code:
import random
trials = ['e'] * 16 + ['n'] * 16 + ['f'] * 16
while 'eee' in ''.join(trials):
random.shuffle(trials)
print trials
Here's a more general way of doing it, where the trial codes are not restricted to be strings (although they are strings here for illustration):
import random
def run_of_3(trials, obj):
# detect if there's a run of at least 3 objects 'obj'
for i in range(2, len(trials)):
if trials[i-2: i+1] == [obj] * 3:
return True
return False
tr = ['e'] * 16 + ['n'] * 16 + ['f'] * 16
while run_of_3(tr, 'e'):
random.shuffle(tr)
print tr
Edit: To create a PsychoPy-style conditions file from the trial list, just write the values into a file like this:
with open('emo_neu_face.csv', 'wb') as f:
f.write('stim\n') # this is a 'header' row
f.write('\n'.join(tr)) # these are the values
Then you can use that as a conditions file in a Builder loop in the regular way. You could also open this in Excel, and so on.
This is not quite right, but hopefully will give you some ideas. I think you could occassionally get caught in an infinite cycle in the elif statement if the last three items ended up the same, but you could add some sort of a counter there. In any case this shows a strategy you could adapt. Rather than put this in the experimental code, I would generate the trial sequence separately at the command line, and then save a successful output as a list in the experimental code to show to all participants, and know things wouldn't crash during an actual run.
import random as r
#making some dummy data
abc = ['f']*10 + ['e']*10 + ['d']*10
def f (l1,l2):
#just looking at the output to see how it works; can delete
print "l1 = " + str(l1)
print l2
if not l2:
#checks if second list is empty, if so, we are done
out = list(l1)
elif (l1[-1] == l1[-2] and l1[-1] == l2[0]):
#shuffling changes list in place, have to copy it to use it
r.shuffle(l2)
t = list(l2)
f (l1,t)
else:
print "i am here"
l1.append(l2.pop(0))
f(l1,l2)
return l1
You would then run it with something like newlist = f(abc[0:2],abc[2:-1])