Using fileinput.input() to read gzip files

Using fileinput.input() to read gzip files - gzip

I'm using fileinput to read some large data:
import gzip
import fileinput
f=gzip.open('/scratch/try.fastq.gz','r')
for line in fileinput.input(f):
print line
However I got errors like:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/share/lib/python2.6/fileinput.py", line 253, in next
line = self.readline()
File "/share/lib/python2.6/fileinput.py", line 345, in readline
self._file = open(self._filename, self._mode)
IOError: [Errno 2] No such file or directory: '#HWI-ST150_0129:2:1:13466:2247#0/1\n'
Cannot fileinput take file object as input? Then how to use fileinput to deal with gzip file?
thx

Nope, the first argument to fileinput.input should be a list of filenames. What you want can be achieved with
for line in gzip.open('/scratch/try.fastq.gz')
print line
fileinput exists to support the idiom where a program reads from a list of files, probably supplied on the command line, or standard input if no files have been specified. If you still want to use it, even though it's useless in your example, you should do
for line in fileinput(['/scratch/try.fastq.gz'], openhook=gzip.open):
print line

As other sources have said, the value for openhook must be a function, but that doesn't mean you can't call a function to return a function. For example, if you want to support multiple different types of incoming files you could write something like this:
import fileinput
import gzip
def get_open_handler(compressed):
if deciding_data:
# mode comes in as 'r' by defualt, but that means binary to `gzip`
return lambda file_name, mode: gzip.open(file_name, mode='rt')
else:
# the default mode of 'r' means text for `open`
return open
# get args here
for line in fileinput.input(args.files, openhook=get_open_handler(args.compressed))
print(line)
As you can see, we are calling a function from openhook, but that function returns another function. In this case, we are fixing the mode of gzip.open, but we can do anything we want, including using functools.partial to bind some values to a function so that when the default filename and mode get passed to the function assigned to openhook, the function will do what you want.

Related

Incorrect context when trying to reload text

in order to develop my script in an external IDE, I'm trying to automatically reload a text in blender when it is modified from outside of Blender. I think I've got the idea of how it works, but there must be one tiny detail that completely blocks me from achieving my goal.
My script is the following:
import bpy
import sys
ctx = bpy.context.copy()
for area in ctx['screen'].areas:
if area.type == 'TEXT_EDITOR':
textEditor = area
for text in bpy.data.texts:
if text.name == 'main.py':
textEditor.spaces[0].text = text
bpy.ops.text.reload()
Output:
>>> exec(bpy.data.texts['reload.py'].as_string())
Traceback (most recent call last):
File "<blender_console>", line 1, in <module>
File "<string>", line 13, in <module>
File "/Applications/Blender.app/Contents/Resources/3.3/scripts/modules/bpy/ops.py", line 113, in __call__
ret = _op_call(self.idname_py(), None, kw)
RuntimeError: Operator bpy.ops.text.reload.poll() failed, context is incorrect
I cannot get what part of the context is correct. I found no solution on the Internet, even the doc recommends reading the source code to understand what might be the cause of this error, but I couldn't find it anyway in the source code.
Can someone tell me what I am missing here please?

How to have multi-page table row cells converted properly using rst2pdf Shpinx?

I have a table of which one-row cell has so much data that it could span multiple pages in the finally-generated PDF file. rst2pdf ungracefully fails when I feed it my file, with the following output:
[ERROR] pdfbuilder.py:161 Failed to build doc
Traceback (most recent call last):
File "/home/pwng/.local/lib/python3.9/site-packages/rst2pdf/pdfbuilder.py", line 158, in write
docwriter.write(doctree, destination)
File "/usr/lib/python3/dist-packages/docutils/writers/__init__.py", line 78, in write
self.translate()
File "/home/pwng/.local/lib/python3.9/site-packages/rst2pdf/pdfbuilder.py", line 697, in translate
createpdf.RstToPdf(
File "/home/pwng/.local/lib/python3.9/site-packages/rst2pdf/createpdf.py", line 689, in createPdf
pdfdoc.multiBuild(elements)
File "/usr/lib/python3/dist-packages/reportlab/platypus/doctemplate.py", line 1167, in multiBuild
self.build(tempStory, **buildKwds)
File "/usr/lib/python3/dist-packages/reportlab/platypus/doctemplate.py", line 1080, in build
self.handle_flowable(flowables)
File "/home/pwng/.local/lib/python3.9/site-packages/rst2pdf/createpdf.py", line 859, in handle_flowable
self.handle_frameEnd()
File "/usr/lib/python3/dist-packages/reportlab/platypus/doctemplate.py", line 726, in handle_frameEnd
self.handle_pageEnd()
File "/usr/lib/python3/dist-packages/reportlab/platypus/doctemplate.py", line 668, in handle_pageEnd
raise LayoutError(ident)
reportlab.platypus.doctemplate.LayoutError: More than 10 pages generated without content - halting layout. Likely that a flowable is too large for any frame.
FAILED
build succeeded.
and make latexpdf produce undesirable output depicted in the following screenshot.
Is there a way to remedy this problem using either latexpdf or rst2pdf? Ideally, I would like a solution that works for both spaced text (i.e. space-separated words) and consecutive, wrapped non-separated text.

This isn't the answer you want, but rst2pdf won't split the cell across pages, so if it doesn't fit onto the page, it won't be able to generate the document. The project (I'm a maintainer) is open to patches, in case you end up fixing it yourself. I'm not aware of a workaround, other than reformatting the content to be more printable.

How to open a remote .FTS.gz file with astropy.io.fits.open()?

Summary of a problem:
I am writing some code that checks the content of a FTS file header (data saved from a telescope) using astropy.io.fits. My problem is when I try to open .FTS.gz files instead of .FTS files on a remote server. When I open() a .FTS.gz I get errors, if I gunzip the .FTS.gz file, all is good. One of the errors suggest I have an END missing card. Searching online, I used a suggestion of using the ignore_missing_end=True argument in fits.open(), but then I get the next error. This next error suggests my FITS file is empty or corrupt, however it is not the case. I can open it with SAOImage DS9 without any problems, plus I have run this handy online tool called fitsverify which reports no errors in my file. If I download the offending file .FTS.gz and run a similar code to fits.open() this file locally, I get no errors at all. An example of an offending file (used in the code below) is now uploaded here.
The Astropy documentation says:
"Working with compressed files
The open() function will seamlessly open FITS files that have been compressed with gzip, bzip2 or pkzip. Note that in this context we’re talking about a fits file that has been compressed with one of these utilities - e.g. a .fits.gz file."
How do I open a remote .FTS.gz file without downloading it? I have hundreds of thousands of files like this, so downloading is not an option and it is not just one file that gives a problem, it is all of them.
Thanks,
Aina.
Code and errors:
CODE TO OPEN A REMOTE .FTS.gz FILE:
from astropy.io import fits
import paramiko
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.load_system_host_keys()
client.connect('myhostname', username='myusername', password='mypassword')
apath = '/path/to/folder/to/search'
apattern = '"RUN0001.FTS.gz"'
rawcommand = 'find {path} -name {pattern}'
command = rawcommand.format(path=apath, pattern=apattern)
stdin, stdout, stderr = client.exec_command(command)
filelist = stdout.read().splitlines()
for i in filelist:
sftp_client = client.open_sftp()
remote_file = sftp_client.open(i)
hdulist = fits.open(remote_file)
client.close()
ERROR:
Traceback (most recent call last):
File "/Users/amusaeva/Documents/PyCharm/FITSHeaders/stackoverflow.py", line 17, in <module>
hdulist = fits.open(remote_file)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 166, in fitsopen
lazy_load_hdus, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 404, in fromfile
lazy_load_hdus=lazy_load_hdus, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1040, in _readfrom
read_one = hdulist._read_next_hdu()
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1135, in _read_next_hdu
hdu = _BaseHDU.readfrom(fileobj, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/base.py", line 329, in readfrom
**kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/base.py", line 394, in _readfrom_internal
header = Header.fromfile(data, endcard=not ignore_missing_end)
File "/Library/Python/2.7/site-packages/astropy/io/fits/header.py", line 450, in fromfile
padding)[1]
File "/Library/Python/2.7/site-packages/astropy/io/fits/header.py", line 519, in _from_blocks
raise IOError('Header missing END card.')
IOError: Header missing END card.
Process finished with exit code 1
CHANGING THE CODE ABOVE FOR ONE LINE ONLY:
hdulist = fits.open(remote_file, ignore_missing_end=True)
ERROR:
WARNING: VerifyWarning: Error validating header for HDU #0 (note: Astropy uses zero-based indexing).
Header size is not multiple of 2880: 7738429
There may be extra bytes after the last HDU or the file is corrupted. [astropy.io.fits.hdu.hdulist]
Traceback (most recent call last):
File "/Users/amusaeva/Documents/PyCharm/FITSHeaders/stackoverflow.py", line 17, in <module>
hdulist = fits.open(remote_file, ignore_missing_end=True)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 166, in fitsopen
lazy_load_hdus, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 404, in fromfile
lazy_load_hdus=lazy_load_hdus, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1044, in _readfrom
raise IOError('Empty or corrupt FITS file')
IOError: Empty or corrupt FITS file
Process finished with exit code 1
CODE TO OPEN THE OFFENDING .FTS.gz FILE LOCALLY PRODUCES NO ERRORS:
import os
from astropy.io import fits
folderTosearch = "/path/to/folder/to/search/locally";
for root, dirs, files in os.walk(folderTosearch):
for file in files:
if file.endswith("RUN0001.FTS.gz"):
hdulist = fits.open(os.path.join(root, file))

This happens because the sftp call passes some variant of a file-like object (which has a .read() method that fits.open() will use.
The file like object, however, is still a gzip file. Astropy checks whether a file is zipped only for file names, that is, when the argument to fits.open() is a string (that happens to be a path). Astropy does not appear to test for the magic bytes that identify a byte stream as a gzip file. Oddly enough, it does do this verification when path strings are passed. Arguably, this may be a slight shortcoming in the astropy.io.fits module, but perhaps there's a reason for it.
(Disclaimer: the above conclusion is from scanning quickly through the relevant source code; I may have missed something. Hopefully people will correct me if so.)
One solution is to do the unzipping yourself. I've cobbled up the following:
from cStringIO import StringIO
import zlib
<...>
for i in filelist:
sftp_client = client.open_sftp()
remote_file = sftp_client.open(i)
decompressed = StringIO(
zlib.decompress(remote_file.read(), zlib.MAX_WBITS|32))
hdulist = fits.open(decompressed)
client.close()
Above, we're reading the full contents of the remote file (remote_file.read(), then uncompressing the contents. That results in a string, so we wrap it in a StringIO instance to make it a file-like object again, that we can pass to fits.open(). (For the zlib.MAX_WBITS|32 argument: see this answer.)
Alternatively, you can sftp the file to local disk, and then read the file (with the local filename) locally. The above just keeps everything in memory.

Preventing overwriting when using numpy.savetxt

Is there built error handling for prevent overwriting a file when using numpy.savetxt?
If 'my_file' already exists, and I run
numpy.savetxt("my_file", my_array)
I want an error to be generated telling me the file already exists, or asking if the user is sure they want to write to the file.

You can check if the file already exists before you write your data:
import os
if not os.path.exists('my_file'): numpy.savetxt('my_file', my_array)

You can pass instead of a filename a file handle to np.savetxt(), e.g.,
import numpy as np
a = np.random.rand(10)
with open("/tmp/tst.txt", 'w') as f:
np.savetxt(f,a)
So you could write a helper for opening the file.

Not in Numpy. I suggest writing to a namedTemporaryFile and checking if the destination file exists. If not, rename the file to a concrete file on the system. Else, raise an error.

Not an error handler, but it's possible to create a new version in the form of:
file
filev2
filev2v3
filev2v3v4
so that no file ever gets overwritten.
n=2
while os.path.exists(f'{file}.txt'):
file = file + f'v{n}'
n+=1

How to rewrite a specific frame in a traceback?

In python you can compile() string to be executed faster with exec(). But as soon as i use it, we lost information when an exception happen in the exec.
For example, here is a code snippet that calling a unknown method (for demo purposes):
code = 'my_unknown_method()'
bytecode = compile(code, '<string>', 'exec')
And later, i'm calling exec on that bytecode:
exec bytecode
The traceback showed is:
Traceback (most recent call last):
File "test.py", line 3, in <module>
exec bytecode
File "<string>", line 1, in <module>
NameError: name 'my_unknown_method' is not defined
The "exec()" frame is now obscure. I would like to have a better exception like:
Traceback (most recent call last):
File "test.py", line 3, in <module>
exec "my_unknown_method()"
File "<string>", line 1, in <module>
NameError: name 'my_unknown_method' is not defined
Any thoughts ?
Notes:
I don't want to use the second argument of compile (the filename)
I have tested to play with inspect and modify f_code on the frame, but it's readonly attribute.
EDIT: after looking more to sys.excepthook, i've seen that in python source code/traceback.c, when python want to show the line content, they are fopen() directly the file if found. No hook available at all to display our own content. The only way would be to create in real fake filename on disk ? Anyone ?
EDIT2: i checked some jinja2 debug code, and they are rewriting traceback too, but not for content. Would i need a custom except hook ? My concerns with it is since it's not in the traceback itself, if an user/module/whatever take the exception, the traceback will not contain valuable information.

You should have a look at this talk by Armin Ronacher. He's the author of Jinja2 and in this talk he explained, how he manipulates stack traces in Jinja2. If I remember correctly, he's using ctypes, to manipulate the data structures on the C level of Python. The talk was in my opinion the best talk of the whole Europython 2011 by the way.

After a deep search, it's not possible, CPython is using its own API to determine where the file is etc, and it cannot be patched in pure Python.

How about something like this:
import sys
def mkexec(code_str):
bc = compile(code, '<string>', 'exec')
def run():
try:
exec bc
except: # Yes I know a bare except
t, v, tb = sys.exc_info()
raise MyUsefullException("%s raised %s" % (code_str, v))
return run
exe = mkexec("some_unknown_something()")
exe()

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using fileinput.input() to read gzip files - gzip

Related

Incorrect context when trying to reload text

How to have multi-page table row cells converted properly using rst2pdf Shpinx?

How to open a remote .FTS.gz file with astropy.io.fits.open()?

Preventing overwriting when using numpy.savetxt

How to rewrite a specific frame in a traceback?

Categories

Resources