can't pickle weakref objects - serialization

I'm trying to use spark to extract spatial data from a csv and represent it in a folium map.
While using the foreach() function i get a PicklingError: Could not serialize object: TypeError: can't pickle weakref objects
i understand as described here that not all types can be pickled, however i need this folium map type to be pickled ,in order to visualize my data.
the github owners of folium answered me here that folium object can't be pickled.
what should i do ?
How to make the folium objects serializable ?
Is there an alternative for pickle that can serialize folium objects ?

Related

having a huge file in dropbox with Dask

I am trying to save a huge pandas dataframe into dropbox without success.
For the moment my code looks as follows:
import dropbox
import csv
dbx = dropbox.Dropbox("<access_token>")
import dask.dataframe as dd
dask_merge_bodytextknown5 = dd.from_pandas(merge_bodytextknown5, npartitions=10)
#I THINK THIS IS THE PROBLEMATIC LINE:
dbx.files_upload(dask_merge_bodytextknown5.to_csv(index=False, single_file=True).encode(), "/df_compl_emakg.csv")
Could you please help me with this?
Furthermore, I would like to reduce the size of the pandas df and was thinking about downcasting the strings. In details, as you can see I have a lot of strings that are encoded as "objects" by pandas:
...
oecd_field object
oecd_subfield object
wosfield object
author float32
entity_id float32
affiliation2 float32
class object
foaf_name object
foundation_date float32
type_entities object
acronym object
pos#lat float32
pos#long float32
city_name object
city_lat float32
city_lon float32
state_name object
postcode object
country_name object
country_alpha2 object
country_alpha3 object
country_official_name object
...
I was wondering if something like:
df["col"] = df["col"].astype("|S")
for each of the object column might reduce the memory usage of the database.
Thank you

Use folium Map as holoviews DynamicMap

I have a folium.Map that contains custom HTML Popups with clickable URLs. These Popups open when clicking on the polygons of the map. This is a feature that doesn't seem to be possible to achieve using holoviews.
My ideal example of the final application that I want to build with holoviews/geoviews is here with the source code here, but I would like to exchange the main map with my folium Map and plot polygons instead of rasterized points. Now when I would like to create the holoviews.DynamicMap from the folium.Map, holoviews complains (of course) that the data type "map" is not accepted. Is this somehow still possible?
I have found some notebook on GitHub where a holoviews plot in embedded in a folium map using a workaround that writes and reads again HTML, but it seems impossible to embed a folium map into holoviews such that other plots can be updated from this figure using Streams!?
Here is some toy data (from here) for the datasets that I use. For simplicity, let's assume I just had point data instead of polygons:
import folium as fn
def make_map():
m = fm.Map(location=[20.59,78.96], zoom_start=5)
green_p1 = fm.map.FeatureGroup()
green_p1.add_child(
fm.CircleMarker(
[row.Latitude, row.Longitude],
radius=10,
fill=True,
fill_color=fill_color,
fill_opacity=0.7
)
)
map.add_child(green_p1)
return map
If I understand it correctly, this needs to be tweaked now in the fashion that it can passed as the first argument to a holoviews.DynamicMap:
hv.DynamicMap(make_map, streams=my_streams)
where my_streams are some other plots that should be updated with the extent of the folium map.
Is that somehow possible or is my strategy wrong?

How to load and convert .mat file into numpy 2D array?

I have a data in mat file (observations and features) and i want to load it into numpy 2D array. I dont want to convert it into csv first and then load csv into numpy.
Use scipy's loadmat (API-docs).
The docs should be sufficient to get you going, but make sure to read the notes.
There is also the io-tutorial with some examples.

Feature request: NumPy Reader

I have a large collection of NumPy arrays saved on disk. I would like to read them efficiently and concurrently with the training. I can't load them all into memory at once - the data set is too large.
Additionally, it would be nice to apply some user defined transforms on the fly. Also it would be nice to be able to read them from C++, not just Python.
I believe CNTK does not have this capability now, am I correct?
Currently, we don't have build-in numpy reader. However, you have multiple options:
Read the numpy data in batches and feed them to the trainer, here an example that read images into numpy array and feed it to the trainer:
https://github.com/Microsoft/FERPlus
What the data inside your numpy array? Can you convert it to a format readable by one of the CNTK readers?

How to create a pandas object from array with unknown rank?

Given any Numpy array, I want to create the best matching pandas data structure out of pd.Series, pd.DataFrame, etc. Is there a built-in function for this? I guess so but I couldn't find anything in the documentation.