colab truncate folder with more than 200k files - google-colaboratory

there is a maximun number of files allowed per folder when we read gdrive from colab? I create a folder from colab with more than 200k a run an "ls" command just after creation and everything is ok, but everytime i close the session and open it again (remount gdrive) the folder get truncated. can't read anymore this quantity actually not more than 20k, need to recreate/unzip the folder again. The folder contains images for training a DL model.

update: I'm running drive.flush_and_unmount() from the notebook where i created the folder (without closing the session) and is running smoothly. From another notebook I'm controlling the qty of files inside the folder (same folder but from another notebook) and it seems that the qty is beginning to increase slowly so it look like that the solution is running drive.flush_and_unmount() to force sync to gdrive but not sure yet if after closing the session and reopening again the folder will be synced. will let you know! at least it is progress

Related

How to open multiple notebooks on the same google colab runtime?

Scenario:
You have a github repo you want to work on. The repo has the following file structure:
src
|
-- directory containing .py files
notebook1.ipynb
notebook2.ipynb
You head to colab and create a new empty notebook as the entry point between your repo and the google colab runtime.
On that empty colab notebook you add the following command to clone your github repo:
!git clone your_repo_address
Checking the colab file explorer we seen that our repo and file format are copied to colab runtime.
So far so good, now say you want to open notebook1.ipynb and execute cells and work on it.
How the hell you do that?
Every time I try that it opens in a file explorer without the possibility to execute the notebook cells.
Why can't colab work similar to jupyter. It's extremely cumbersome and time wasting in this regard compared to jupyter.

CSV/Pickle Files Too Large to Commit to GitHub Repo

I'm working on committing a project I have been working on for awhile that we have not yet uploaded to GitHub. Most of it is Python Pandas where we are doing all our ETL work and saving to CSV's and Pickle files to then use in creating dashboards/running metrics on our data.
We are running into some issues with version control without using GitHub so want to get on top of that. I don't need version control on our CSV or Pickle files, but I can't change the file paths or everything will break. When I try to initially commit to the repo it won't let me because our pickle and CSV files are too big. Is there a way for me to commit the project and not upload the whole CSV/pickle files (the largest is ~10 GB).
I have this in my gitignore file, but still not letting me get around it. Thanks for any and all help!
*.csv
*.pickle
*.pyc
*.json
*.txt
__pycache__/MyScripts.cpython-38.pyc
.Git
.vscode/settings.json
*.pm
*.e2x
*.vim
*.dict
*.pl
*.xlsx

Google Colab: mounted Drive but unable to read files

I am doing Stanford's course CS231n on deep learning and am using Google Colab.
The initialization code and all the files are given, so all i need is just to hit "run" on the given code.
I have followed step be step the official instructions and successfully mounted Google Drive, yet i get an error when trying to read the files:
"cp: cannot stat 'cs231n/assignments/assignment1/cs231n/': No such file or directory /content".
And then some more errors.
The files are located in my drive as in the path "FOLDERNAME".
The errors i get:
How it should be:
Official instructions:
https://cs231n.github.io/assignments2020/assignment1/
How can i solve it?
Thanks!
I am having the exactly same problem as you were. And here is my solution:
delete the folder named cs231n
follow the tutorial again
3. change the path %cd drive/My\ Drive into %cd drive/MyDrive
As you can see, default setting of the folder is 'MyDrive'(without space in midddle) but not 'My Drive'(My\ Drive).
Then, everything should run as your expectation.
Cheers.
Based on the course instructions, it sounds like you need to expand the course archive in your Drive.
Quoting here:
Create a folder in your personal Google Drive and upload assignment1/
folder to the Drive folder. We recommend that you call the Google
Drive folder cs231n/assignments/ so that the final uploaded folder has
the path cs231n/assignments/assignment1/.
If you already did that, I'd check the Drive web UI to make sure that the paths line up with the course instructions, and to move the files around if there's a mismatch.

is there any way to load local data set folder directly from google drive to google colab?

see the image carefullyi couldn't load custom data folder from google drive to google colab.though i mounted google drive.like instead of MNIST data set i want to load my own image data set folder.i have tried pydrive wrapper.but i need simple solution.
suppose i have dataset of images inside google drive.how to load it to google colab?
from google.colab import drive
drive.mount('/content/gdrive')
then
with open('/content/gdrive/My Drive/foo.txt', 'w') as f:
f.write('Hello Google Drive!')
!cat /content/gdrive/My\ Drive/foo.txt
here insted of foo.txt i have an image folder called Dog inside ml-data folder.but i can't load it.how to load it in google colab directly from google drive as it is in my local hard drive.
To load data directly from the local machine, you need to follow these steps:
Go to files [left side menu]
Click on upload to session storage
Select file(s) from your machine to upload
It will prompt something indicating that file(s) will be available for the current session only, click ok.
The file(s) will be uploaded in the directory. Click on it (left and right-click both work the same).
And then;
Copy path & use it inside pd.read_csv() function.
Note: After the session is terminated, files will be lost from colab session. To use it again, you'll need to upload it again.
Many times, we prefer to it have all our data in a GitHub repository or in a google drive folder to fetch from there.
Reading many files from Google Drive through colab is going to be less performant and more unreliable than first copying a .zip or similar single file from Drive to the colab VM and unzipping it outside the drive mount directory, and then using that copy of the data.

___jb_bak___ and ___jb_old___ files in PyCharm

When I got some PyCharm project from my colleague I saw some backup files of *.py files.
This files have types: *.___jb_old___ and *.___jb_bak___.
I open the files in Notepad++ and see that these are identical backup files of the corresponding *.py files.
I asked my colleague, but he didn't know what these are.
Why are there TWO identical backup files for each *.py file?
How can I tune PyCharm? We want to turn off this backup.
Google gave me nothing :(
You can disable "safe write"
Use "safe write" (save changes to a temporary file first) If this
check box is selected, a changed file will be first saved to a
temporary file; if the save operation is completed successfully, the
original file is deleted, and the temporary file is renamed.
https://www.jetbrains.com/webstorm/help/system-settings.html
i had this problem in webstorm when a script file was running and i was editing it in webstorm. when i stopped the script and edited it everything was fine
it's a temporary file used by PyCharm to make sure you change will not be lost when editing files. it's safe to delete them manually, you will only loss very recent changes. IntelliJ IDEA works the same as PyCharm.
How to delete them?
To delete a file on a file system requires two things: 1)you have the permission. 2)no program is using it.
so make sure you have 'w' the permission, and stop all program which is using it. then you can remove it.
How to know which program is using it?
Normally you should already know it. but sometimes some background programs(like crash plan, google drive sync, e.g.) may also hold it quietly, then find and kill all programs may be very tricky. the easiest way is reboot your computer with 'safe mode', in which only the OS kernel is loaded.
I spend two hours to figure out the reason why I cannot delete the temp file even when I have whole permission. a crash plan service is holding it in background. This may not be your issue, but if you cannot delete the temp file, this will save your time.
While JeremyWeir's solution probably does work, the real fix - imo - is to enable write permission on the directory.
Saving a file would only need write permission to that file itself. But with the "safe write", you need permission to create the file and rename it - which means you need write access to the directory.
In Linux this would be e.g. chmod ug+w DIR, if you want to give write access to user and group.
I have exact same issue with PhpStorm after system crash. The fix I found was to manualy delete *._jb_old_ and *._jb_bak_ files and reinstall PhpStorm