I have received a file from a customer. The file is said to be
SQL code (application/sql)
However, this has turned out to be wrong: nothing could open it. It turns out it was secretely a .zip file. By renaming it to '.zip' and manually extracting it I was able to get the files contained in it. I would like to do a similar process in python.
So far I've renamed the file:
file_name_zip = file_name.replace('.sql', '.zip')
os.rename(file_name, file_name_zip)
And I've tried extracting it:
zip_ref = zipfile.ZipFile(file_name_zip, 'r')
zip_ref.extractall(extracted_file)
However, this failed because
zipfile.BadZipFile: File is not a zip file
I've googled, and apparently this can sometimes be fixed using:
zip_file_name_2 = zip_file_name.replace('.zip', '2.zip')
os.system(f'zip -FF {zip_file_name} --out {zip_file_name_2}')
This required me to put in a bunch of settings, which I wasn't able to figure out. There must be a better way to go about this.
Does anybody know how to parse such an .sql file?
Related
Not entirely sure why our script isnt backing up, i suspect that it may be something with the Robocopy. Ik robo copy is a sys32 file but looking at the script its shown to be apart of another path then followed by it is the correct path c:\windows\sys32.
So im looking at the two "copy "lines. lines 2&3. Is that normal formatting, because the robocopy is not in that path.
if exist "C:\windows\system32\robocopy.exe" Goto SkipPrep
copy"%logonserver%\netlogon\tools\robocopy.exe""C:\Windows\system32"
copy"%logonserver%\netlogon\tools\sleep.exe" "C:\Windows\system32\
cls
:SkipPrep
This a bat file
I'm relatively new to Netlogo and already struggling ;)
I have the following problem: I want my program to open a folder, check a file in that folder and afterwards remove that file from that folder. I figured the best way to do this is via a while loop, but I'm struggling to find the right syntax. Hope you all can help!
The command 'file-open' will open a file using the path provided (the string after file-open: e.g. file-open "C:\Documents\model-out.txt" will open a file titled model-out.txt in the Documents folder on the C drive.)
You can then use 'file-read' or 'file-write' to read or write to the file respectively.
The command 'file-close' will close the file, which then can be deleted with 'file-delete'.
You can also check if a file exists in a folder using the command if file-exists? "C:\Documents\model-out.txt", and if true, the file can be deleted using file-delete.
Also check the command 'set-current-directory'.
Best,
I would like to use pandas.read_csv to open a gzip file (.asc.gz) within a zipped directory (.zip). Is there an easy way to do this?
This code doesn't work:
csv = pd.read_csv(r'C:\folder.zip\file.asc.gz') // can't find the file
This code does work (however, it requires me to unzip the folder, which I want to avoid because my dataset currently contains thousands of zipped folders):
csv = pd.read_csv(r'C:\folder\file.asc.gz')
Is there an easy way to do this? I have tried using a combination of zipfile.Zipfile and read_csv, but have been unsuccessful (I think partly due to the fact that this is an ascii file as well)
Maybe the followings might help.
df = pd.read_csv('filename.gz', compression='gzip')
OR
import gzip
file=gzip.open('filename.gz','rb')
content=file.read()
I'm sorry if this is basic and I missed something simple. I'm trying to run the code below to iterate through files in a folder and merge all files that start with a specific string, into a dataframe. All files sit in a lake.
file_list=[]
path = "/dbfs/rawdata/2019/01/01/parent/"
files = dbutils.fs.ls(path)
for file in files:
if(file.name.startswith("CW")):
file_list.append(file.name)
df = spark.read.load(path=file_list)
# check point
print("Shape: ", df.count(),"," , len(df.columns))
db.printSchema()
This looks fine to me, but apparently something is wrong here. I'm getting an error on this line:
files = dbutils.fs.ls(path)
Error message reads:
java.io.FileNotFoundException: File/6199764716474501/dbfs/rawdata/2019/01/01/parent does not exist.
The path, the files, and everything else definitely exist. I tried with and without the 'dbfs' part. Could it be a permission issue? Something else? I Googled for a solution. Still can't get traction with this.
Make sure you have a folder named "dbfs" if your parent folder starts from "rawdata" the path should be "/rawdata/2019/01/01/parent" or "rawdata/2019/01/01/parent".
The error is thrown in case of incorrect path.
This is an old thread, but if someone is still looking for a solution:
It does require path to be listed as:
"dbfs:/rawdata/2019/01/01/parent/"
I have a csv file that is tared and zipped. So I have test.tar.gz.
I would like, through text file input, read csv file.
I try this tar:gz:file://C:/test/test.tar.gz!/test.tar! use wildcard like ".*\.csv".
But it sometime can't read success.
It throws Exception
org.apache.commons.vfs.FileNotFolderException:
Could not list the contents of
"tar:gz:file:///C:/test/test.tar.gz!/test.tar!/"
because it is not a folder.
I use windows8.1, pdi 5.2
Where it might be wrong?
For a compressed file csv reading, "Text File Input" step in Pentaho Kettle only supports the first files inside the compressed folder(either in Zip/GZip file). Check the Pentaho Wiki in the compression section.
Now for your issue, try removing the wildcard entry since only the first file inside the zip/gzip file will be read. (as explained above)
I have placed a sample code containing both reading zip and gzip files. Check it here.
Hope it helps :)