How to use atomselect to specify atomNumbers/Index position in a pdb file for MDAnalysis? - mdanalysis

After reading through the documentation I still have not been able to figure out a way to select one specific atom in my pdb file with MDAnalysis.
For my project I am essentially pulling a chloride through a membrane channel in an applied electric field. I want to use MDAnalysis to specifically select this one chloride that is being pushed through the channel. Unfortunately this is not the only chloride in the same system or even segment so selecting a resname or segment isn't going to be useful in identifying which chloride I would like to track. Essentially with MDAnalysis I am going to be running a calculation that uses the trajectory of that one specific chloride to calculate the displacement charge function. I am having difficulty writing code to tell MDAnalysis that I want that specific chloride.
The way I have distinguished this in NAMD was by using an index number but I cannot find a way in their documentation to use this number within MDAnalysis and would really appreciate if someone has experience using the pdb index number rather than segnames and ids, or linking the relevant page to the documentation if it exists.
Thanks!

Unfortunately this is not yet documented, but as of 2.0.0dev0 you can also use the id keyword in addition to index and bynum. This might be more straightforward, as id corresponds directly to the number in the "serial" column in the PDB format. For example:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB
u = mda.Universe(PDB)
atom_with_serial_10_in_pdb = u.select_atoms("id 10")[0]
or
atom_with_serial_10_in_pdb = u.atoms[u.atoms.ids == 10][0]

bynum 7:10 which is a 1-based inclusive index
index 7 which is a 0-based index
For example, atomi7 = universe.select_atoms("index 7")
Relevant documentation:

Related

Reading Fortran binary file in Python

I'm having trouble reading an unformatted F77 binary file in Python.
I've tried the SciPy.io.FortraFile method and the NumPy.fromfile method, both to no avail. I have also read the file in IDL, which works, so I have a benchmark for what the data should look like. I'm hoping that someone can point out a silly mistake on my part -- there's nothing better than having an idiot moment and then washing your hands of it...
The data, bcube1, have dimensions 101x101x101x3, and is r*8 type. There are 3090903 entries in total. They are written using the following statement (not my code, copied from source).
open (unit=21, file=bendnm, status='new'
. ,form='unformatted')
write (21) bcube1
close (unit=21)
I can successfully read it in IDL using the following (also not my code, copied from colleague):
bcube=dblarr(101,101,101,3)
openr,lun,'bcube.0000000',/get_lun,/f77_unformatted,/swap_if_little_endian
readu,lun,bcube
free_lun,lun
The returned data (bcube) is double precision, with dimensions 101x101x101x3, so the header information for the file is aware of its dimensions (not flattend).
Now I try to get the same effect using Python, but no luck. I've tried the following methods.
In [30]: f = scipy.io.FortranFile('bcube.0000000', header_dtype='uint32')
In [31]: b = f.read_record(dtype='float64')
which returns the error Size obtained (3092150529) is not a multiple of the dtypes given (8). Changing the dtype changes the size obtained but it remains indivisible by 8.
Alternately, using fromfile results in no errors but returns one more value that is in the array (a footer perhaps?) and the individual array values are wildly wrong (should all be of order unity).
In [38]: f = np.fromfile('bcube.0000000')
In [39]: f.shape
Out[39]: (3090904,)
In [42]: f
Out[42]: array([ -3.09179121e-030, 4.97284231e-020, -1.06514594e+299, ...,
8.97359707e-029, 6.79921640e-316, -1.79102266e-037])
I've tried using byteswap to see if this makes the floating point values more reasonable but it does not.
It seems to me that the np.fromfile method is very close to working but there must be something wrong with the way it's reading the header information. Can anyone suggest how I can figure out what should be in the header file that allows IDL to know about the array dimensions and datatype? Is there a way to pass header information to fromfile so that it knows how to treat the leading entry?
I played a bit around with it, and I think I have an idea.
How Fortran stores unformatted data is not standardized, so you have to play a bit around with it, but you need three pieces of information:
The Format of the data. You suggest that is 64-bit reals, or 'f8' in python.
The type of the header. That is an unsigned integer, but you need the length in bytes. If unsure, try 4.
The header usually stores the length of the record in bytes, and is repeated at the end.
Then again, it is not standardized, so no guarantees.
The endianness, little or big.
Technically for both header and values, but I assume they're the same.
Python defaults to little endian, so if that were the the correct setting for your data, I think you would have already solved it.
When you open the file with scipy.io.FortranFile, you need to give the data type of the header. So if the data is stored big_endian, and you have a 4-byte unsigned integer header, you need this:
from scipy.io import FortranFile
ff = FortranFile('data.dat', 'r', '>u4')
When you read the data, you need the data type of the values. Again, assuming big_endian, you want type >f8:
vals = ff.read_reals('>f8')
Look here for a description of the syntax of the data type.
If you have control over the program that writes the data, I strongly suggest you write them into data streams, which can be more easily read by Python.
Fortran has record demarcations which are poorly documented, even in binary files.
So every write to an unformatted file:
integer*4 Test1
real*4 Matrix(3,3)
open(78,format='unformatted')
write(78) Test1
write(78) Matrix
close(78)
Should ultimately be padded by an np.int32 values. (I've seen references that this tells you the record length, but haven't verified persconally.)
The above could be read in Python via numpy as:
input_file = open(file_location,'rb')
datum = np.dtype([('P1',np.int32),('Test1',np.int32),('P2',np.int32),('P3',mp.int32),('MatrixT',(np.float32,(3,3))),('P4',np.int32)])
data = np.fromfile(input_file,datum)
Which should fully populate the data array with the individual data sets of the format above. Do note that numpy expects data to be packed in C format (row major) while Fortran format data is column major. For square matrix shapes like that above, this means getting the data out of the matrix requires a transpose as well, before using. For non square matrices, you will need to reshape and transpose:
Matrix = np.transpose(data[0]['MatrixT']
Transposing your 4-D data structure is going to need to be done carefully. You might look into SciPy for automated ways to do so; the SciPy package seems to have Fortran related utilities which I have not fully explored.

Accessing a .fits file and plotting its columns

I'm trying to access a .fits file and plotting two columns (out of many!).
I used pyfits to access the file, and
plt.plotfile('3XMM_DR5.fits', delimiter=' ', cols=(0, 1), names=('x-axis','y-axis'))
but that's not working. Are there any alternatives? And is there any way to open the file using python? In order to access the data table
According to the docs from matplotlib for plotfile:
Note: plotfile is intended as a convenience for quickly plotting data from flat files; it is not intended as an alternative interface to general plotting with pyplot or matplotlib.
This isn't very clear. I think by "flat files" it just means CSV data or something--this function isn't used very much in my experience, and it certainly does't know anything about FITS files, which are seldom used outside astronomy. You mentioned in your post that you did something with PyFITS, but that isn't demonstrated anywhere in your question.
PyFITS, incidentally, has been deprecated for several years now, and its functionality is integrated into Astropy.
You can open a table from a FITS file with astropy.Table.read:
from astropy.table import Table
table = Table.read('3XMM_DR5.fits')
then access the columns with square bracket notation like:
plt.plot(table['whatever the x axis column is named'], table['y axis column name'])

CHM/HHP: maximum length of variable names in [ALIAS] section

What is the maximum length of variable names in the [ALIAS] section of HHP files?
I_AM_WONDERING_ABOUT_THE_MAXIMUM_LENGTH_OF_THIS_STRING_RIGHT_HERE=this-is-some-really-helpful-html-file.html
I have found a CHM/HHP specification right here:
https://www-user.tu-chemnitz.de/~heha/viewchm.php/hs/chmspec.chm/hhp.html
That page only talks about the length of the overall line, though (and not about the length of the variable name). Very specific question, I know. Still, someone may be able to point me somewhere.
As far as I know never asked before and I never heard about limitations. But I think this is because nobody used long variable names in this place so far.
The purpose of the two files e.g. alias.h and map.h is to ease the coordination between developer and help author. The mapping file links an ID to the map number - typically this can be easily created by the developer and passed to the help author. Then the help author creates an alias file linking the IDs to the topic names. That was the idea behind years (decades) ago by Ralph Walden (ex Microsoft).
Please note HTMLHelp is about 20 years old and these context ID strings inside a alias.h file were derived from WinHelp as a predecessor of HTMLHelp.
You'll find some further Information at Creating Context-Sensitive Help for Applications.
In general I'd recommend to use ID's with a fixed format because of the better legibility like shown below:
;-------------------------------------------------------------
; alias.h file example for HTMLHelp (CHM)
; www.help-info.de
;
; All IDH's > 10000 for better format
; last edited: 2006-07-09
;---------------------------------------------------
IDH_90001=index.htm
IDH_10000=Context-sensitive_example\contextID-10000.htm
IDH_10010=Context-sensitive_example\contextID-10010.htm
IDH_20000=Context-sensitive_example\contextID-20000.htm
IDH_20010=Context-sensitive_example\contextID-20010.htm
I'd recommend to use less than 1024 bytes per line.

Extracting Data from an Area file

I am trying to extract information at a specific location (lat,lon) from different satellite images. These images are were given to me in the AREA format and I cooked up a simple jython script to extract temperature values like so.
While the script works, here is small snippet from it that prints out the data value at a point.
from edu.wisc.ssec.mcidas import AreaFile as af
url="adde://localhost/imagedata?&PORT=8113&COMPRESS=gzip&USER=idv&PROJ=0& VERSION=1&DEBUG=false&TRACE=0&GROUP=FL&DESCRIPTOR=8712C574&BAND=2&LATLON=29.7276 -85.0274 E&PLACE=ULEFT&SIZE=1 1&UNIT=TEMP&MAG=1 1&SPAC=4&NAV=X&AUX=YES&DOC=X&DAY=2012002 2012002&TIME=&POS=0&TRACK=0"
a=af(url);
value=a.getData();
print value
array([[I, [array([I, [array('i', [2826, 2833, 2841, 2853])])])
So what does this mean?
Please excuse me if the question seems trivial, while I am comfortable with python I am really new to dealing with scientific data.
Note
Here is a link to the entire script.
After asking around, I found out that the Area objects returns data in multiples of four. So the very first value is what I am looking for.
Grabbing the value is as simple as :
ar[0][0][0]

Controlling Doxygen's LaTeX output for making PDF documentation

I'm using Doxygen to generate documentation for my code. I need to make a PDF version of this and using Doxygen's LaTeX output appears to be the way to do it.
However I've run into a number of annoying problems, and not knowing anything about LaTeX previously haven't really got much of an idea on how to approach them, and the countless references for LaTeX related things are not much help...
I worked out how to create a custom style thing in a sty file and how to get Doxygen to use it. After a lot of searching I found out how to set the page margins etc. through this, and I'm guessing the perhaps this is the file I want for doing the other things I want, but I cant seem to find any commands for doign what I want :(
The table of contents at the start of the document contains a lot of items Id rather it didn't as it makes the contents very long. Is there some way to limit this contents to just say the first two levels, rather than having entries for every single individual function, variable, etc.? Id quite like to keep all the bookmarks however. I did try the "COMPACT_LATEX" option but as well as removing items on the contents pages, it removed the bookmarks and the member lists at the start of each section, which I do really want to keep.
Is there a way to change the order of things, like putting the full class description at the start of the section, rather than after all the members and attributes?
Wow, that's kind of evil of Doxygen.
Okay, to get around the tocdepth counter problem, add the following line to your .sty file:
\AtBeginDocument{\setcounter{tocdepth}{2}}% or whatever level you want
You can set the PDF bookmarks depth to a separate value:
% requires you \usepackage{hyperref} first
\hypersetup{
bookmarksdepth = section, % of whatever level you want
}
Also note that if you have a list of figures/tables, the tocdepth must be at least 2 for them to show up.
I don't see any way of rearranging those items within the LaTeX files---Doxygen just barfs them out there, so we can't do much. You'll have to poke around the Doxygen documentation to see if there's any way to specify the order I guess. (Here's hoping!)
You're so close.
Googling on "latex contents level" brought me to LaTeX - customizing the depth of the table of contents for different parts of the thesis which suggests
\setcounter{tocdepth}{n}
where n starts at zero for only the highest level division. This is presumable defined in all the default styles, but is worth a try in doxygen.
You could write a Perl/Awk script to simply delete the unwanted lines from the table of contents. For the file burble.tex, Latex will generate the file burble.toc, which will contain lines such as:
\contentsline {subsection}{Class F rewrites}{38}
\contentsline {subsection}{Class M rewrites}{39}
\contentsline {section}{\numberline {7}Definition and properties of the translation}{44}
\contentsline {paragraph}{Well-formedness}{54}
Simple regexes will identify which levels each line belongs to, and you can filter the file based on that. Once you have the table of contents the way you want it, insert \nofiles in the appropriate place (the style sheet?), which means that Latex will read the auxiliary files but not overwrite them.