I'm trying to use input to output to a filename - input

I'm trying to use input to variable to create a file using the input the filename
The only examples I've seen are print(input)
I'm new to Python but trying to write a functional program
thanks

This is nice beginning for you
def create_file():
fn = input('Enter file name: ').strip()
try:
file = open(fn, 'r')
except IOError:
file = open(fn, 'w')

In Python you can use the open() function to create a file (assuming it will be a text file).
The documentation for it is located here
Using your input variable to create the file, you could do it like so:
file = open(input, 'w+')
This will give you a file object which you can write lines to using the write() function.

Related

How to extract file .xlsx out of the tmp folder in lambda

I have a zip file that upon unlocking opens up a .xlsx file. When I use the Zipfile extract function in my lambda function, It stores it in a directory called 'tmp'. Below is the code I have written.
def extract_zip(input_zip):
input_zip=ZipFile(input_zip)
with input_zip as zf:
return zf.extractall(path='/tmp/',pwd=b'password')
My question is how can I access the .xlsx file in the 'tmp' directory? Essentially all I want to do is convert it into a data frame
df = pd.read_excel('tmp')
You can simply read the file using Pandas as you would normally do:
df = pd.read_excel('/tmp/filename.xlsx')

How to iterate over files extract file name and pass to pandas logic

I have a folder called "before_manipulation ".
It contains 3 CSV files with names File_A.CSV, File_B.CSV ,File_C.CSV
Current_path : c:/users/before_manipulation [file_A.CSV, File_B.CSV,File_C.CSV]
I have a data manipulation that I need to do in each of the files and after manipulation ,I need to save with the same file names in another directory.
Targeted_path : C:/users/after_manipulation [file_A.CSV, File_B.CSV,File_C.CSV]
I have the logic to do the data manipulation when there is only a single file with Pandas dataframe. When I have multiple files, how to read each file and its name and pass it to my logic ?
Pseudo Code of how I am working if there was one file.
import pandas as pd
df = pd.read_csv('c:/users/before_manipulation/file_A.csv')
... do logic/manipulation
df.to_csv('c:/users/after_manipuplation/file_A.csv')
any help is appreciated.
You can use os.listdir(<path>) to return a list of the files contained within a directory. If you do not pass a variable to <path> it will return the working directory listing.
With the list from os.listdir you can iterate over it, passing the capture filename to the function you already have for data manipulation. Then on the save to you can use the captured filename to save in your desired directory.
In summary the code would look something like this.
import os
import pandas as pd
in_dir = r'c:/users/before_manipulation/'
out_dir = r'c:/users/after_manipulation/'
files_to_run = os.listdir(in_dir)
for file in files_to_run:
print('Running {}'.format(in_dir + file))
df = pd.read_csv(in_dir + file)
...do your logic here to return the changed df you want to save
...
df.to_csv(out_dir + file)
For this to work you would need to have the same shape files for each file you have in the directory, and also you would need to want to do the same logic for each file.
If that is not the case you will need something like a dictionary to save the different manipulations you need to do based on the file name and call those when appropriate.
Assuming you have some logic that works for one file, I'd just put that logic into a function and run it on a for loop.
You'd end up with something like this:
directory = r'c:/users/before_manipulation'
files = ['file_A.CSV', 'File_B.CSV','File_C.CSV']
for file in files:
somefunction(directory + '/' + file)
If you need more info on functions I'd check this out: https://www.w3schools.com/python/python_functions.asp
using pathlib
from pathlib import Path
new_dir = '\\your_path'
files = [file for file in Path(your_dir).glob('*.csv')]
for file in files:
df = pd.read_csv(file)
# .. your logic
df.to_csv(f'{new_dir}\\{file.name}',index=False)

Is it possible to get the name of the file when opened with messagebox.askopenfile

I am trying to open a file with file = filedialog.askopenfile(initialdir='./'), but I need also to know the name of the file opened for other purposes. I know that if the user selects a file, file is not None, otherwise it's something like this:
<_io.TextIOWrapper name='/Users/u/Desktop/e/config.py' mode='r' encoding='US-ASCII'>
but _io.TextIOWrapper objects are not sub-scriptable.
By suggestion, I discovered that there exists another function similar to messagebox.askopenfile, askopenfilename, that instead of opening directly the file, returns just the name of the file. If we want also to open the file, we can open and read it manually:
file_name = filedialog.askopenfilename(initialdir='./')
if file_name != '':
with open(file_name, 'r') as file:
string = ''
for line in file:
string += str(line)
print(string)
Even if this is a good way, I'm still thinking that tkinter should have provided this functionality with messagebox.askopenfile directly.
Navigating trough the Python directory we can find the filedialog.py file, which contains the specifications of both functions, which are very similar:
def askopenfilename(**options):
"Ask for a filename to open"
return Open(**options).show()
askopenfile
def askopenfile(mode = "r", **options):
"Ask for a filename to open, and returned the opened file"
filename = Open(**options).show()
if filename:
return open(filename, mode)
return None
As we can see, the first returns the result of the call to the show function, whereas the second returns an opened file.

Preventing overwriting when using numpy.savetxt

Is there built error handling for prevent overwriting a file when using numpy.savetxt?
If 'my_file' already exists, and I run
numpy.savetxt("my_file", my_array)
I want an error to be generated telling me the file already exists, or asking if the user is sure they want to write to the file.
You can check if the file already exists before you write your data:
import os
if not os.path.exists('my_file'): numpy.savetxt('my_file', my_array)
You can pass instead of a filename a file handle to np.savetxt(), e.g.,
import numpy as np
a = np.random.rand(10)
with open("/tmp/tst.txt", 'w') as f:
np.savetxt(f,a)
So you could write a helper for opening the file.
Not in Numpy. I suggest writing to a namedTemporaryFile and checking if the destination file exists. If not, rename the file to a concrete file on the system. Else, raise an error.
Not an error handler, but it's possible to create a new version in the form of:
file
filev2
filev2v3
filev2v3v4
so that no file ever gets overwritten.
n=2
while os.path.exists(f'{file}.txt'):
file = file + f'v{n}'
n+=1

Using fileinput.input() to read gzip files

I'm using fileinput to read some large data:
import gzip
import fileinput
f=gzip.open('/scratch/try.fastq.gz','r')
for line in fileinput.input(f):
print line
However I got errors like:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/share/lib/python2.6/fileinput.py", line 253, in next
line = self.readline()
File "/share/lib/python2.6/fileinput.py", line 345, in readline
self._file = open(self._filename, self._mode)
IOError: [Errno 2] No such file or directory: '#HWI-ST150_0129:2:1:13466:2247#0/1\n'
Cannot fileinput take file object as input? Then how to use fileinput to deal with gzip file?
thx
Nope, the first argument to fileinput.input should be a list of filenames. What you want can be achieved with
for line in gzip.open('/scratch/try.fastq.gz')
print line
fileinput exists to support the idiom where a program reads from a list of files, probably supplied on the command line, or standard input if no files have been specified. If you still want to use it, even though it's useless in your example, you should do
for line in fileinput(['/scratch/try.fastq.gz'], openhook=gzip.open):
print line
As other sources have said, the value for openhook must be a function, but that doesn't mean you can't call a function to return a function. For example, if you want to support multiple different types of incoming files you could write something like this:
import fileinput
import gzip
def get_open_handler(compressed):
if deciding_data:
# mode comes in as 'r' by defualt, but that means binary to `gzip`
return lambda file_name, mode: gzip.open(file_name, mode='rt')
else:
# the default mode of 'r' means text for `open`
return open
# get args here
for line in fileinput.input(args.files, openhook=get_open_handler(args.compressed))
print(line)
As you can see, we are calling a function from openhook, but that function returns another function. In this case, we are fixing the mode of gzip.open, but we can do anything we want, including using functools.partial to bind some values to a function so that when the default filename and mode get passed to the function assigned to openhook, the function will do what you want.