Copying large BigQuery Tables to Google Cloud Storage and subsequent local download - google-bigquery

My goal is to locally save a BigQuery table to be able to perform some analyses. To save it locally, i tried to export it to Google Cloud Storage as a csv file. Alas the dataset is too big to move it as one file, thus it is splitted into many different files, looking like this:
exampledata.csv000000000000
exampledata.csv000000000001
...
Is there a way to put them back together again in the Google Cloud Storage? Maybe even change the format to csv?
My approach was to download it and try to change it manually. Clicking on it does not work, as it will save it as a BIN.file and is also very time consuming. Furthermore I do not know how to assemble them back together.
I also tried to get it via the gsutil command, and I was able to save them on my machine, but as zipped files. When unzipping with WinRar, it gives me exampledata.out files, which I do not know what to do with. Additionally I am clueless how to put them back together in one file..
How can I get the table to my computer, as one file, and as a csv?
The computer I am working with runs on Ubuntu, but I need to have the data on a Google Virtual Machine, using Windows Server 2012.

try using the following to merge all files into one from the windows command prompt
copy *.cs* merged.csv

Suggest you to save the file as .gzip file, then you can download it from Google Cloud easily as BIN file. If you get these splited files in bigquery as following:
Export Table -> csv format, compression as GZIP, URI: file_name*
Then you can combine them back by doing steps as below:
In windows:
add .zip at the end all these files.
use 7-zip to unzip the first .zip file, with name "...000000000000", then it will automatically detect all the rest .zip files. This is just like the normal way to unzip a splitted .zip file.
In Ubuntu:
I failed to unzip the file following the method I can find in internet. Will update the answer if I figure it out.

Related

CSV/Pickle Files Too Large to Commit to GitHub Repo

I'm working on committing a project I have been working on for awhile that we have not yet uploaded to GitHub. Most of it is Python Pandas where we are doing all our ETL work and saving to CSV's and Pickle files to then use in creating dashboards/running metrics on our data.
We are running into some issues with version control without using GitHub so want to get on top of that. I don't need version control on our CSV or Pickle files, but I can't change the file paths or everything will break. When I try to initially commit to the repo it won't let me because our pickle and CSV files are too big. Is there a way for me to commit the project and not upload the whole CSV/pickle files (the largest is ~10 GB).
I have this in my gitignore file, but still not letting me get around it. Thanks for any and all help!
*.csv
*.pickle
*.pyc
*.json
*.txt
__pycache__/MyScripts.cpython-38.pyc
.Git
.vscode/settings.json
*.pm
*.e2x
*.vim
*.dict
*.pl
*.xlsx

Making python-based .exe file accessible to anyone

I have used Spyder (Anaconda) to generate a Python GUI App. The app can browse & load any time series csv file on the user's pc, perform few statistical tests and print the results on to a txt file and save it to the user's desktop screen.
Is it possible to upload the executable file on to any repository so that others could try it out. For example, Google Earth Engine based apps can be easily shared via a link and anyone with that link can access the app. Similarly, is there anything for my case ?
This may not be the answer your looking for,
But you can upload .exe to Google drive and share it. So anyone could download it from the link generated.
File types: Users can upload any type of file, including executables
(for example, .exe or .vbs) and compressed files.
source

How do I fix Directus error code 3 when downloading stored files

I can store files to the specified storage location via the GUI. I can see the files are in the storage location.
When I try to download them using the GUI, I get this every time.
{"error":{"code":3,"message":"Unauthorized request","class":"Directus\\Exception\\UnauthorizedException","file":"\/var\/www\/directus\/src\/helpers\/app.php","line":287}}
When I try the links from the File library, I get the same error.
I found some old topics concerning a "_" project. I do not see any "_" entries in my project.php configuration.
Everyone has read permissions for the storage directory.
The rest of the system appears to run without error.
check the folder in the server and what is been set, the default should be, like
So then if you want to access the 300x300 the URL should be like:
domain.com/public/uploads/Directus/generated/w300,h300,fcrop,q80/file-name.jpg

Downloading Output files from Google Colabortory

For some reason I am not able to download my output files from colaboratory. I am able to upload input files but I can not download my output files separately. I am also using a mac.
For information on various ways to download files from Colab, see https://colab.research.google.com/notebooks/io.ipynb.

Trying to only download files which got uploaded today [duplicate]

This question already has an answer here:
Download file with today's date in its name from remote server with WinSCP [closed]
(1 answer)
Closed last year.
So everyday I download multiple .zip files from an SFTP server. Also everyday our client uploads new .zip files to this SFTP server, but is not willing to delete the old files.
So I download the same files of the last few days + the files which got uploaded today.
I tried a lot but didn't have any success.
This is my short script right now (which downloads way to many files and eats my storage space up):
open sftp://user:password#sftp-server.com/ -hostkey=*
synchronize local D:/Test\Download /sftp-server/PDF-files/
I couldn't find an option to download files per date, so maybe you can help me further.
Also important, the .zip files are named:
"name_clientname_YYYYMMDD_NumberOfUploads.zip"
I tried to add
*%TIMESTAMP#yyyymmdd%*.zip
at the end of the path of the files, but that didn't work out.
Don't use synchronise if you are deleting the old files from your local copy. Select files based on timestamp instead:
From the winscp site: How do I transfer new/modified files only?
The appropriate get syntax (close to what you tried) seems to be something like:
open sftp://user:password#sftp-server.com/ -hostkey=*
get -filemask="*.zip>today" /remote-folder/* D:\local-folder\
where the filemask constraint is as specified in: https://winscp.net/eng/docs/file_mask#size_time