If I try to open a text file larger than 20MB, I get the message: File <path> is too large (21.97MB). Where could I relax this restriction?
Found by inspecting intellij source code:
you have to edit the property idea.max.intellisense.filesize in idea.properties located in idea_home/bin.
The maximum file size to load = max(20MB, <value of idea.max.intellisense.filesize>)
Related
I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.
Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).
df.to_csv("/path/to/destination.zip", compression="zip")
The above line will generate a file called destination.zip in the directory /path/to/.
Decompressing the ZIP file, will result in a directory structure path/to/destination.zip where destination.zip is the CSV file.
Why is the path/to/ folder structure included in the compressed file? Is there any way to avoid this?
Was blown away by this, currently writing the ZIP locally (destination.zip) and using os.rename to move it to the desired location.. Is this a bug ?
I have received a file from a customer. The file is said to be
SQL code (application/sql)
However, this has turned out to be wrong: nothing could open it. It turns out it was secretely a .zip file. By renaming it to '.zip' and manually extracting it I was able to get the files contained in it. I would like to do a similar process in python.
So far I've renamed the file:
file_name_zip = file_name.replace('.sql', '.zip')
os.rename(file_name, file_name_zip)
And I've tried extracting it:
zip_ref = zipfile.ZipFile(file_name_zip, 'r')
zip_ref.extractall(extracted_file)
However, this failed because
zipfile.BadZipFile: File is not a zip file
I've googled, and apparently this can sometimes be fixed using:
zip_file_name_2 = zip_file_name.replace('.zip', '2.zip')
os.system(f'zip -FF {zip_file_name} --out {zip_file_name_2}')
This required me to put in a bunch of settings, which I wasn't able to figure out. There must be a better way to go about this.
Does anybody know how to parse such an .sql file?
I have a large video file stored in MongoDB gridFS.
I would like to read it and write it on my disk.
I can find the file in the database with:
file = grid_fs.find_one({"filename":'file_in_database.cin'})
I get back a grid out object gridfs.grid_file.GridOut at 0xa7b7be0
I try to write the file on my disk with:
with open('file_from_database.cin', 'w') as f:
f.write(file.read())
I get the file written but the size of the one download from the database is slightly different from the original size of the file:
05/15/2015 09:09 AM 65,585,808 file_from_database.cin
08/01/2007 01:08 PM 65,585,800 Original_file.cin
I checked the file in the database and the md5 field is the same as the original so the problem must be during the download or writing.
I'm using win7 64 and anaconda64 dirstribution for python 2.7
Any help would be appreciated.
Update
I tried the same code with a jpeg image and I get the same problem, the image is stored well in the database but when I get it and write it to the disk the size is slightly different and I cannot read it.
03/20/2015 02:36 PM 5,422,339 original_image.JPG
05/15/2015 02:44 PM 5,438,750 image_from_database.JPG
Am I doing some simple mistake reading the gridout and writing to the disk?
interesttingly if I open the image with:
PIL.Image.open(file)
I can get the image fine. Any Idea?
I have a csv file that is tared and zipped. So I have test.tar.gz.
I would like, through text file input, read csv file.
I try this tar:gz:file://C:/test/test.tar.gz!/test.tar! use wildcard like ".*\.csv".
But it sometime can't read success.
It throws Exception
org.apache.commons.vfs.FileNotFolderException:
Could not list the contents of
"tar:gz:file:///C:/test/test.tar.gz!/test.tar!/"
because it is not a folder.
I use windows8.1, pdi 5.2
Where it might be wrong?
For a compressed file csv reading, "Text File Input" step in Pentaho Kettle only supports the first files inside the compressed folder(either in Zip/GZip file). Check the Pentaho Wiki in the compression section.
Now for your issue, try removing the wildcard entry since only the first file inside the zip/gzip file will be read. (as explained above)
I have placed a sample code containing both reading zip and gzip files. Check it here.
Hope it helps :)