Exporting Maplotlib.Path files? - matplotlib

I am currently working on a project with convexhulls (from scikit.spatial). I am transforming them into matplotlib.path files to facilitate applying logic on them.
The question is: is there some way or format to export and import these path files, so I don't have to compute them every time I restart the kernel?

Related

Grib2 data extraction with xarray and cfgrib very slow, how to improve the code?

The code is taking about 20 minutes to load a month for each variable with 168 time steps for the cycle of 00 and 12 UTC of each day. When it comes to saving to csv, the code takes even longer, it's been running for almost a day and it still hasn't saved to any station. How can I improve the code below?
Reading .grib files using xr.open_mfdataset() and cfgrib:
I can speak to the slowness of reading grib files using xr.open_mfdataset(). I had a similar task where I was reading in many grib using xarray and it was taking forever. Other people have experienced similar issues with this as well (see here).
According to the issue raised here, "cfgrib is not optimized to handle files with a huge number of fields even if they are small."
One thing that worked for me was converting as many of the individual grib files as I could to one (or several) netcdf files and then read in the newly created netcdf file(s) to xarray instead. Here is a link to show you how you could do this with several different methods. I went with the grib_to_netcdf command via ecCodes tool.
In summary, I would start with converting your grib files to netcdf, as it should be able to read in the data to xarray in a more performant manner. Then you can focus on other optimizations further down in your code.
I hope this helps!

How can you implement transfer learning with Yolov4 on colab?

I am trying to follow this tutorial for using Yolov4 with transfer learning for a new object to detect: https://sandipanweb.wordpress.com/2022/01/17/custom-object-detection-with-transfer-learning-with-pre-trained-yolo-v4-model/. In trying to set up the initial file structure on Colab, I'm having to guess where files and folders are needed, and could use some help in identifying if this is the right file/folder structure. Or, if there is a more detailed tutorial or a way to understand what Yolov4 is going to need, that would be great.
There are several references to build/darknet/x64/data/. From what I can see when I explore the two downloads, this is really /content/build/darknet/x64/data/ in Colab. Is this a correct understanding?
Under the data folder, is Yolov4 looking for the train folder and the valid folder? That is, is it looking for /content/build/darknet/x64/data/train/ and /content/build/darknet/x64/data/valid/? I'm guessing the answer is yes.
Does Yolov4 need the _darknet.labels file along with all of the image and image annotation files. I am guessing yes because this is what's in the racoon dataset.
The build/darknet/x64/data/train.txt and build/darknet/x64/data/valid.txt files are to have the names of the images, so I'm guessing that name includes the .jpg extension because the tutorial specifically refers to images. The reason I question this is Yolov4 should also need the annotation file names, but that is not referenced in the tutorial. If Yolov4 strips the .jpg and adds the .txt to get the annotation file name that's great, but if it needs the file name w/o the extension so that it can add the extension to access both files, then I didn't understand that from this tutorial.
Any guidance would really be appreciated!
After working on this, here are some notes that have helped me get it to work:
After executing !git clone https://github.com/AlexeyAB/darknet/ change directories to /content/darknet before executing !wget -P build/darknet/x64/ https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137.
Although it may be obvious, to "change the Makefile to enable GPU and opencv", click on the icon on the left margin in Colab labeled "Files". In the darknet folder, open the Makefile (double click to open) and change the entries for GPU, CUDNN, and OPENCV from 0 to 1. ctrl-s to save the file, then make to compile the darknet executable.
To get around the path names referenced in the tutorial, I used full path names. Also, since the files referenced disappear when the Colab runtime is terminated, I copied the files from folders in Drive to local folders, some of which I needed to create. Here is the code I used:
!mkdir /content/darknet/build/darknet/x64/data/train
!mkdir /content/darknet/build/darknet/x64/data/valid
!cp /content/drive/MyDrive/your_folder_name/obj.data /content/darknet/build/darknet/x64/data/obj.data
!cp /content/drive/MyDrive/your_folder_name/obj.names /content/darknet/build/darknet/x64/data/obj.names
!cp /content/drive/MyDrive/your_folder_name_with_train_images/* /content/darknet/build/darknet/x64/data/train/
!cp /content/drive/MyDrive/your_folder_name_with_valid_images/* /content/darknet/build/darknet/x64/data/valid/
!cp /content/drive/MyDrive/your_folder_name/train.txt /content/darknet/build/darknet/x64/data/train.txt
!cp /content/drive/MyDrive/your_folder_name/valid.txt /content/darknet/build/darknet/x64/data/valid.txt
!cp /content/drive/MyDrive/your_folder_name/yolov4_train.cfg /content/darknet/build/darknet/x64/cfg/yolov4_train.cfg
In the folders with the images, I also included a _darknet.labels file (which just has the label for the single object you are detecting in it) and the annotation file (which has a zero in the first entry, then the bounding box coordinates using centerx, centery, width, heigth and just spaces in between--see the tutorial for an example). This was the answer for my question 3.
The train.txt and valid.txt files in my question 4 do work with just the .jpg file names.
The tutorial mentions that you need to change the number of filters and classes, but note this has to be changed in three locations.
Last, if you just want the prediction to print out in Colab for an image, this can be done by modifying /content/darknet/darknet.py before doing the make. In the detect_image function, insert a print(sorted(predictions, key=lambda x: x[1])) before the return. When the make is done, this will be picked up.
If you have not copied files in Colab, it requires some permissions to be set up in your first cells. Here are the two cells I run at the beginning (Colab will ask you to manually verify you are OK with access being granted when you run the cells):
`from google.colab import drive
drive.mount('/content/drive')`
`# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)`

importing HDF5 files to bigquery

The title pretty much summarizes the situation. BigQuery seems to accept a number of data formats for uploads, but hdf does not appear to be one of them. What is the "canonical" way to upload an hdf5 file? (obviously I can convert the file to a bunch of CSVs but that seems clunky).

Unable to open or view data with python

I have a netcdf3 dataset that I need to get into a data frame (preferable pandas) for further manipulation but am unable to do so. Since the data is multi hierarchical, I am first reading it in xarray then converting it to pandas using the to_dataframe command. This has worked well on other data, but it just kills the kernel in this case. I have tried converting it to a netcdf4 file using ncks but still cannot open it. I have also tried calling a single row or column of the xarray data structure just to view it, and this similarly kills the kernel. I produced this data so it is probably not corrupted. Does anyone have any experience with this? The file size is about 890 MB if that's relevant.
I am running python 2.7 on a Mac.

Applying XGBOOST with large data set

I have a large dataset with size approximately 5.3 GB and i have stored data using bigmemory() in R. Please let me know how to apply XGBOOST to this kind of data??
There currently is no support for this with xgboost. You could file an issue on the github repo with respect to the R package.
Otherwise, you could attempt to have it read from a file of your data. The docs say you can point to a local data file. Not sure about format restrictions or how it will be handled in RAM but something to explore.