Python pandas - Does read_csv keep file open? - pandas

When using pandas read_csv() method, does it keep the file open or it closes it (discard the file descriptor)?
If it keeps it, how do I close it after I finish using the dataframe?

If you pass it an open file it will keep it open (reading from the current position), if you pass a string then read_csv will open and close the file.
In python if you open a file but forget to close it, python will close it for you at the end of the function block (during garbage collection).
def foo():
f = open("myfile.csv", "w")
...
f.close() # isn't actually needed
i.e. if you call a python function which opens a file, unless the file object is returned, the file is automagicallymatically closed.
Note: The preferred syntax is the with block (which, as well as closing f at the end of the with block, defines the f variable only inside the with block):
def foo():
with open("myfile.csv", "w") as f:
...

Related

I'm trying to use input to output to a filename

I'm trying to use input to variable to create a file using the input the filename
The only examples I've seen are print(input)
I'm new to Python but trying to write a functional program
thanks
This is nice beginning for you
def create_file():
fn = input('Enter file name: ').strip()
try:
file = open(fn, 'r')
except IOError:
file = open(fn, 'w')
In Python you can use the open() function to create a file (assuming it will be a text file).
The documentation for it is located here
Using your input variable to create the file, you could do it like so:
file = open(input, 'w+')
This will give you a file object which you can write lines to using the write() function.

Is it possible to get the name of the file when opened with messagebox.askopenfile

I am trying to open a file with file = filedialog.askopenfile(initialdir='./'), but I need also to know the name of the file opened for other purposes. I know that if the user selects a file, file is not None, otherwise it's something like this:
<_io.TextIOWrapper name='/Users/u/Desktop/e/config.py' mode='r' encoding='US-ASCII'>
but _io.TextIOWrapper objects are not sub-scriptable.
By suggestion, I discovered that there exists another function similar to messagebox.askopenfile, askopenfilename, that instead of opening directly the file, returns just the name of the file. If we want also to open the file, we can open and read it manually:
file_name = filedialog.askopenfilename(initialdir='./')
if file_name != '':
with open(file_name, 'r') as file:
string = ''
for line in file:
string += str(line)
print(string)
Even if this is a good way, I'm still thinking that tkinter should have provided this functionality with messagebox.askopenfile directly.
Navigating trough the Python directory we can find the filedialog.py file, which contains the specifications of both functions, which are very similar:
def askopenfilename(**options):
"Ask for a filename to open"
return Open(**options).show()
askopenfile
def askopenfile(mode = "r", **options):
"Ask for a filename to open, and returned the opened file"
filename = Open(**options).show()
if filename:
return open(filename, mode)
return None
As we can see, the first returns the result of the call to the show function, whereas the second returns an opened file.

Preventing overwriting when using numpy.savetxt

Is there built error handling for prevent overwriting a file when using numpy.savetxt?
If 'my_file' already exists, and I run
numpy.savetxt("my_file", my_array)
I want an error to be generated telling me the file already exists, or asking if the user is sure they want to write to the file.
You can check if the file already exists before you write your data:
import os
if not os.path.exists('my_file'): numpy.savetxt('my_file', my_array)
You can pass instead of a filename a file handle to np.savetxt(), e.g.,
import numpy as np
a = np.random.rand(10)
with open("/tmp/tst.txt", 'w') as f:
np.savetxt(f,a)
So you could write a helper for opening the file.
Not in Numpy. I suggest writing to a namedTemporaryFile and checking if the destination file exists. If not, rename the file to a concrete file on the system. Else, raise an error.
Not an error handler, but it's possible to create a new version in the form of:
file
filev2
filev2v3
filev2v3v4
so that no file ever gets overwritten.
n=2
while os.path.exists(f'{file}.txt'):
file = file + f'v{n}'
n+=1

Modify instance variables from another file?

It is possible to modify an instance variable from another file?
What I want is to modify an instance variable inside File_1 from File_2.
For example:
//File 1
import File_2
class Main:
def __init__(self):
self.example = "Unmodified"
def modify(self):
File_2.modify()
main = Main()
main.modify()
//File 2
import File_1
def modify():
File_1.main.example = "Modified"
This gives me the following output:
Traceback (most recent call last):
File "File_1.py", line 4, in <module>
import File_2
File "File_2.py", line 3, in <module>
import File_1
File "File_1.py", line 14, in <module>
main.modify()
File "File_1.py", line 11, in modify
File_2.modify()
AttributeError: 'module' object has no attribute 'modify'
Why?
EDIT (to explain better):
The instance of the main class (in file 1) has a variable; what I want is to modify that variable from another file (file 2). I modified a little bit the code:
//File 1
import File_2
class Main:
def __init__(self):
self.example = "Unmodified"
def modify(self):
File_2.modify()
if __name__ == "__main__":
main = Main()
main.modify()
//File 2
def modify():
//do some stuff
//now I want to modify the example variable from the main class, but how?
Your code is full of cyclic imports, take a look at Python: Circular (or cyclic) imports to know what I'm talking about.
Basically the problem is that when the compiler comes to this line:
File_2.modify()
File_2 is not completely loaded, menaning that the compiler have not yet read the lines:
def modify():
File_1.main.example = "Modified"
Since it was brought back to File_1 from the previous:
import File_1
Besides this, you're code seems quite strange. If you care to provide more information about your real code, maybe a better design could solve your problem.
Edit: You have to remove the cyclic imports. One way to do what you seem to need is to pass an argument to the File_2.modify(arg) function, and work on that:
# File_2
# !! do NOT import File_1 in this file
def modify(obj):
obj.value += 7
But in your case you'll have to pass the whole object (self) to the modify function, and is some of a waste to modify only one value.
It would be better to do something like:
# File_1
import File_2
class Main:
# ...
def modify()
self.value = File_2.modify(self.value)
# File_2
# !! do NOT import File_1 in this file
def modify(num):
return num + 7
But once again this are just examples, since your not showing your real code, we can't really tell you what's best in your case (maybe neither of the above) or help you very much.
What does not work in Python is this "cross importing" you are trying to do -
When you do both files import each other, you have inconsistencies and undesireable side effects. In this case when the main.modify() line is run at File_1 parsing, it does find a not yet fully initialized "File_2" module in memory - where the "modify" function does not exist yet.
Reorder yoru code so you don't have the cyclic imports, and it should work -
For example, if teh file you import first - or run as main module, is File_1, inside File_2, instead of import File_2 in the first line, import it just inside the modify function, like this:
#File 2
def modify():
import File_1
File_1.main.example = "Modified"
N.B. these imports, since they are referencing a module that isa ctually already imported on the interpreter, just bind the module object, already loaded, to the variable in the running scope - in other words, Python won't do a disk access for the module file at each time the function is called.

Using fileinput.input() to read gzip files

I'm using fileinput to read some large data:
import gzip
import fileinput
f=gzip.open('/scratch/try.fastq.gz','r')
for line in fileinput.input(f):
print line
However I got errors like:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/share/lib/python2.6/fileinput.py", line 253, in next
line = self.readline()
File "/share/lib/python2.6/fileinput.py", line 345, in readline
self._file = open(self._filename, self._mode)
IOError: [Errno 2] No such file or directory: '#HWI-ST150_0129:2:1:13466:2247#0/1\n'
Cannot fileinput take file object as input? Then how to use fileinput to deal with gzip file?
thx
Nope, the first argument to fileinput.input should be a list of filenames. What you want can be achieved with
for line in gzip.open('/scratch/try.fastq.gz')
print line
fileinput exists to support the idiom where a program reads from a list of files, probably supplied on the command line, or standard input if no files have been specified. If you still want to use it, even though it's useless in your example, you should do
for line in fileinput(['/scratch/try.fastq.gz'], openhook=gzip.open):
print line
As other sources have said, the value for openhook must be a function, but that doesn't mean you can't call a function to return a function. For example, if you want to support multiple different types of incoming files you could write something like this:
import fileinput
import gzip
def get_open_handler(compressed):
if deciding_data:
# mode comes in as 'r' by defualt, but that means binary to `gzip`
return lambda file_name, mode: gzip.open(file_name, mode='rt')
else:
# the default mode of 'r' means text for `open`
return open
# get args here
for line in fileinput.input(args.files, openhook=get_open_handler(args.compressed))
print(line)
As you can see, we are calling a function from openhook, but that function returns another function. In this case, we are fixing the mode of gzip.open, but we can do anything we want, including using functools.partial to bind some values to a function so that when the default filename and mode get passed to the function assigned to openhook, the function will do what you want.