I have a large video file stored in MongoDB gridFS.
I would like to read it and write it on my disk.
I can find the file in the database with:
file = grid_fs.find_one({"filename":'file_in_database.cin'})
I get back a grid out object gridfs.grid_file.GridOut at 0xa7b7be0
I try to write the file on my disk with:
with open('file_from_database.cin', 'w') as f:
f.write(file.read())
I get the file written but the size of the one download from the database is slightly different from the original size of the file:
05/15/2015 09:09 AM 65,585,808 file_from_database.cin
08/01/2007 01:08 PM 65,585,800 Original_file.cin
I checked the file in the database and the md5 field is the same as the original so the problem must be during the download or writing.
I'm using win7 64 and anaconda64 dirstribution for python 2.7
Any help would be appreciated.
Update
I tried the same code with a jpeg image and I get the same problem, the image is stored well in the database but when I get it and write it to the disk the size is slightly different and I cannot read it.
03/20/2015 02:36 PM 5,422,339 original_image.JPG
05/15/2015 02:44 PM 5,438,750 image_from_database.JPG
Am I doing some simple mistake reading the gridout and writing to the disk?
interesttingly if I open the image with:
PIL.Image.open(file)
I can get the image fine. Any Idea?
Related
I haven't used openvms for 20+ years. It was my 1st OS. I've been asked if it possible to copy the data from RMS files from openvms server to windows as a text file - so that it's readable.
No-one has experience or knowledge of the record structures etc.
The files are xyz.DAT and are relative files. I'm hoping the dat files are fixed length.
My 1st attempt would be to try and use Datatrieve (DTR) but get an error that the image isn't loaded.
Thought it might be as easy using CONVERT/FDL = nnnn.FDL - by changing the Relative to Sequential. The file seems still to be unreadable.
Is there an easy way to stream an RMS index file to a flat ASCII file?
I use to use COBOL and C to access the data in the past but had lots of libraries to help....
I've notice some solution may use odbc to connect but not sure what I can or cannot install on the server.
I can FTP using Filezilla to the server....
Another plan writing C application to read a file and output out as string.....or DCL too.....doesn't have to be quick...
Any ideas
Has mentioned before
The simple solution MIGHT be to to just use: $ TYPE/OUT=test.TXT test.DAT.
This will handle Relatie and Indexed files alike.
It is much the same as $ CONVERT / FDL=NL: test.DAT test.TXT
Both will just read records from the source and transfer the bytes, byte for byte, to the records in a sequential file.
FTP in ASCII mode will transfer that nicely to windows.
You can also use an 'inline' FDL file to generate a 'unix' LF file like:
$ conv /fdl="record; format stream_lf" test.DAT test.TXT
Or CR-LF file using:
$ conv /fdl="record; format stream" test.DAT test.TXT
Both can be transferring in Binary or Ascii with FTP.
MOSTLY - because this really only works well for TEXT ONLY source .DAT file.
There should be no CR, LF, FF or NUL characters in the source or things will break.
As 'habo' points out, use DUMP /RECORD=COUNT=3 to see how 'readable' the source data is.
If you spot 'binary' data using DUMP then you will need to find a record defintion somewhere which maps byte to Integers or Floating points or Dates as needed.
These defintions can be COBOL LIB files, or BASIC MAPS and are often stores IN the CDD (Common Data Dictionary) or indeed in DATATRIEVE .DIC DICTIONARIES
To use such definition you likely need a program to just read following the 'map' and write/print as text. Normally that's not too hard - notably not when you can find an example program on the server to tweak.
If it is just one or two 'suspect' byte ranges, then you can create a DCL loop to read and write and use F$EXTRACT to select the chunks you like.
If you want further help, kindly describe in words what kind of data is expected and perhaps provide the output from DUMP for 3 or 5 rows.
Good luck!
Hein.
I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.
Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).
After simulation is finished Dymola runs dsres2sdf.exe to convert the results to sdf-format (if that option is enabled in the simulation setup output tab).
Usually this runs smoothly but sometimes it generates a sdf file that is very small (800 Byte) and empty.
Starting the dsres2sdf.exe manually from command line generates the same empty file.
I suspect that happens if the *.mat-File is very large (>1 GB)
Anybody has any clue how to get a proper sdf-File?
The SDF Editor and the SDF libraries for Python and MATLAB can read Dymola result files (*.mat) transparently (as if they were SDFs) and allow you to save them as *.sdf.
For example with Python:
import sdf
# load the Dymola result file
data = sdf.load('DoublePendulum.mat')
# re-save as SDF
sdf.save('DoublePendulum.sdf', data)
I know how to read the CSV into numpy and do it from a Python script, and that is good enough for my use case.
But since it has a GUI with data loading functionality, I was expecting it would just work for such an universal data format.
So I tried to go on the menu:
File
Load data
Open file
but when I select a simple CSV file:
i=0; while [ "$i" -lt 10 ]; do echo "$i,$((2*i)),$((4*i))"; i=$((i+1)); done > main.csv
which contains:
0,0,0
1,2,4
2,4,8
3,6,12
4,8,16
5,10,20
6,12,24
7,14,28
8,16,32
9,18,36
an error popup shows on the GUI:
No suitable reader found for file /home/ciro/main.csv
Google led me to this interesting file in the source tree: https://github.com/enthought/mayavi/blob/e2569be1096be3deecb15f8fa8581a3ae3fb77d3/mayavi/tools/data_wizards/csv_loader.py but that just looks like an example of how to do it from a script.
Tested in Mayavi 4.6.2.
From the documentation
One needs to have some data or the other loaded before a Module or Filter may be used. Mayavi supports several data file formats most notably VTK data file formats. Alternatively, mlab can be used to load data from numpy arrays. For advanced information on data structures, refer to the Data representation in Mayavi section.
I've tested importing using the GUI on a Asus Laptop Intel CoreTM i7-4510U CPU # 2.00 GHz with 8 GBs de RAM, using Windows 10, both in and out of a Python virtualenv and always got the same problem:
It all points to CSV files not being directly supported, so had to find another workaround.
My favorite was to use a virtual environment and install on it mayavi, jupyterlab, PyQt5 and Pandas.
Then, using PowerShell, start a Jupyter notebook (jupyter notebook) > Upload > Select the .csv. This imported a 1,25 GBs (153543233 rows x 3 columns) .csv in around 20s, which then became available for usage.
I ran this command to load 11 files to a Bigquery table:
bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part* /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt
I got this error:
Waiting on bqjob_r46f38146351d545_00000147ef890755_1 ... (11s) Current status: DONE
BigQuery error in load operation: Error processing job 'ardent-course-601:bqjob_r46f38146351d545_00000147ef890755_1': Too many errors encountered. Limit is: 0.
Failure details:
- File: 5: Unexpected. Please try again.
I tried many times after that and still got the same error.
To debug what went wrong, I instead load each file one by one to the Bigquery table. For example:
/usr/local/bin/bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part-m-00011.gz /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt
There are 11 files total and each ran fine.
Could someone please help? Is this a bug on Bigquery side?
Thank you.
There was an error reading one of the files: gs://...part-m-00005.gz
Looking at the import logs, it appears that the gzip reader encountered an error decompressing the file.
It looks like that file may not actually be compressed. BigQuery samples the header of the first file in the list to determine whether it is dealing with compressed or uncompressed files and to determine the compression type. When you import all of the files at once, it only samples the first file.
When you run the files individually, bigquery reads the header of the file and determines that it isn't actually compressed (despite having the suffix '.gz') so imports it as a normal flat file.
If you run a load that doesn't mix compressed and uncompressed files, it should work successfully.
Please let me know if you think this is not the case and I'll dig in some more.