Preventing overwriting when using numpy.savetxt - numpy

Is there built error handling for prevent overwriting a file when using numpy.savetxt?
If 'my_file' already exists, and I run
numpy.savetxt("my_file", my_array)
I want an error to be generated telling me the file already exists, or asking if the user is sure they want to write to the file.

You can check if the file already exists before you write your data:
import os
if not os.path.exists('my_file'): numpy.savetxt('my_file', my_array)

You can pass instead of a filename a file handle to np.savetxt(), e.g.,
import numpy as np
a = np.random.rand(10)
with open("/tmp/tst.txt", 'w') as f:
np.savetxt(f,a)
So you could write a helper for opening the file.

Not in Numpy. I suggest writing to a namedTemporaryFile and checking if the destination file exists. If not, rename the file to a concrete file on the system. Else, raise an error.

Not an error handler, but it's possible to create a new version in the form of:
file
filev2
filev2v3
filev2v3v4
so that no file ever gets overwritten.
n=2
while os.path.exists(f'{file}.txt'):
file = file + f'v{n}'
n+=1

Related

pyunpack Archive.extractall hangs on password protected .rar file

I'm writing a python script to process some old archives that include many .rar files. I can successfully extract data from them using Archive.extractall from pyunpack, but haven't been able to find much documentation. The documentation link at https://pypi.org/project/pyunpack/0.1.2/ is broken. Everything was going well until the program hung on a specific .rar file. I looked at the file with 7zip, and it said it was password protected. I have no idea what the password is, so I would like the script to simply skip it. I already have extractall in a try block, but I guess there is no exception. How can I test if the file is password protected and not try to extract it if so? Here's the relevant portion of my code.
import os
import rarfile
from pyunpack import Archive
for subdir,dirs,files in os.walk(directoryToExtract):
for file in files:
if rarfile.is_rarfile(os.path.join(subdir, file)):
print("Found rarfile " + os.path.join(subdir, file))
try:
Archive(os.path.join(subdir, file)).extractall(os.path.join(subdir, file[0:len(file)-4]),auto_create_dir=True)
except Exception as e:
print("ERROR: Couldn't unrar " + os.path.join(subdir, file))
print(str(e))

trouble with utf-8 with julia and jupyterlab

I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.
Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).

pandas.read_csv of a gzip file within a zipped directory

I would like to use pandas.read_csv to open a gzip file (.asc.gz) within a zipped directory (.zip). Is there an easy way to do this?
This code doesn't work:
csv = pd.read_csv(r'C:\folder.zip\file.asc.gz') // can't find the file
This code does work (however, it requires me to unzip the folder, which I want to avoid because my dataset currently contains thousands of zipped folders):
csv = pd.read_csv(r'C:\folder\file.asc.gz')
Is there an easy way to do this? I have tried using a combination of zipfile.Zipfile and read_csv, but have been unsuccessful (I think partly due to the fact that this is an ascii file as well)
Maybe the followings might help.
df = pd.read_csv('filename.gz', compression='gzip')
OR
import gzip
file=gzip.open('filename.gz','rb')
content=file.read()

load script from other file extension?

is it possible to load module from file with extension other than .lua?
require("grid.txt") results in:
module 'grid.txt' not found:
no field package.preload['grid.txt']
no file './grid/txt.lua'
no file '/usr/local/share/lua/5.1/grid/txt.lua'
no file '/usr/local/share/lua/5.1/grid/txt/init.lua'
no file '/usr/local/lib/lua/5.1/grid/txt.lua'
no file '/usr/local/lib/lua/5.1/grid/txt/init.lua'
no file './grid/txt.so'
no file '/usr/local/lib/lua/5.1/grid/txt.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
no file './grid.so'
no file '/usr/local/lib/lua/5.1/grid.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
I suspect that it's somehow possible to load the script into package.preaload['grid.txt'] (whatever that is) before calling require?
It depends on what you mean by load.
If you want to execute the code in a file named grid.txt in the current directory, then just do dofile"grid.txt". If grid.txt is in a different directory, give a path to it.
If you want to use the path search that require performs, then add a template for .txt in package.path, with the correct path and then do require"grid". Note the absence of suffix: require loads modules identified by names, not by paths.
If you want require("grid.txt") to work should someone try that then yes, you'll need to manually loadfile and run the script and put whatever it returns (or whatever require is documented to return when the module doesn't return anything) into package.loaded["grid.txt"].
Alternatively, you could write your own loader just for entries like this which you set into package.preload["grid.txt"] which finds and loads/runs the file or, more generically, you could write yourself a loader function, insert it into package.loaders, and then let it do its job whenever it sees a "*.txt" module come its way.

how to get the latest file in the folder

i have written the code to retrieve and file and time it got created by i just want to get the latest file name created. Please suggest how can i do that in jython .
import os
import glob
import time
folder='C:/xml'
for folder in glob.glob(folder):
for file in glob.glob(folder+'/*.xml'):
stats=os.stat(file)
print file ,time.ctime(stats[8])
Thanks again for all your help
I have re-modified the codes as suggested and i am not getting the right answer , Please suggest what mistake i am doing.
import os
import glob
import time
folder='C:/xml'
for x in glob.glob(folder+"/*.xml"):
(mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime)=os.stat(x)
time1=time.ctime(mtime)
for z in glob.glob(folder+"/*.xml"):
(mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime)=os.stat(z)
time2=time.ctime(mtime)
if (time1>time2):
new_file=x
new_time=time1
else:
new_file=z
new_time=time2
print new_file,new_time
Use two variables to keep track of the name and time of the latest file found so far. Whenever you find a later file, update both variables. When your loop is done, the variables will contain the name and time of the latest file.
I'm not quite sure why you have two nested loops in your example code; if you're looking for all *.xml files in the given directory, you only need one loop.
A Pythonic solution might be something like:
folder = "C:/xml"
print max((os.stat(x)[8], x) for x in glob.glob(folder+"/*.xml"))
If you choose the max() solution, be sure to consider the case where there are no *.xml files in your directory.