folium choropleth returns blank - folium

I have a GeoDataFrame that plots nicely from Geopandas, but returns blank as Choropleth graph in Folium.
Folium 0.7.0
Geopandas 0.5.0
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [12, 12]
radiosambab.plot('situacionpromedio', antialiased=False)
As a geojson
radiosambab.__geo_interface__
returns
{'type': 'FeatureCollection',
'features': [{'id': '020130302',
'type': 'Feature',
'properties': {'situacionpromedio': 1.1173449839705998},
'geometry': {'type': 'Polygon',
'coordinates': (((-58.46738862003677, -34.53484761336359),
(-58.466080612615286, -34.53427219003239),
(-58.46379657486779, -34.53326322986549),
(-58.46165233386257, -34.530802575280035),
(-58.46133757821172, -34.530441540420355),
(-58.4588949370924, -34.527620828300144),
(-58.45884013885469, -34.52762641175383),
(-58.45875915687486, -34.527621382400326),
(-58.458732162886044, -34.52761970593736),
(-58.45867655438868, -34.52763422563783),
(-58.45856182767256, -34.52767203362345),
(-58.45850001004012, -34.52769515425145),
(-58.458440891778, -34.52771844249678),
(-58.45839257108904, -34.52774240132773),
(-58.45834357673059, -34.5277438516926),
...
Calling
radiosambab['situacionpromedio']
returns a Geoseries as expected:
COD_2010
020130302 1.117345
020131101 1.117371
020130104 1.161630
020130102 1.087263
020130101 1.268362
020120405 1.132843
020130107 1.085900
020130106 1.028195
020130109 1.056225
020130111 1.061627
020120407 1.138702
020120404 1.084368
020120402 1.078862
...
But, when invoking folium.Choropleth, it does not work:
m_2 = folium.Map(location=[-34.603722, -58.381592], tiles='openstreetmap', zoom_start=14)
folium.Choropleth(geo_data=radiosambab.__geo_interface__, data=radiosambab['situacionpromedio'], key_on='feature.id', fill_color='YlOrBr').add_to(m_2)
folium.LayerControl().add_to(m_2)
m_2
Returns
Thanks!

Problem seems to be related to lack of memory. Is actually plotting when restricting the number of polygons. But fails to do so above 2000 polygons aprox.

I had a similar problem and solved it with:
var_geodataframe = var_geodataframe.to_crs(epsg=4326)
May be must use the actual version of folium and review the Geopandas.

Related

Break text files into multiple datasets at each \newline with Pandas

I have a text file, textfile.qdp:
Line to skip 1
Line to skip 2
Line to skip 3
1.25 0.649999976 2.24733017E-2 2.07460159E-3 3.01663446 1.89463757E-2 1.48296626E-2 2.98285842
2.0999999 0.199999988 7.33829737E-2 6.63989689E-3 3.48941302 3.8440533E-2 6.34965161E-3 3.44462299
2.5 0.200000048 0.118000358 8.37391801E-3 2.64556909 3.93543094E-2 6.16234308E-3 2.60005236
2.9000001 0.199999928 0.145619139 9.26280301E-3 2.56852388 4.85827066E-2 6.0398886E-3 2.51390147
3.29999995 0.200000048 0.167878062 9.94068757E-3 2.46484375 5.69529012E-2 6.81256084E-3 2.40107822
3.70000005 0.200000048 0.175842062 1.01562217E-2 2.28405786 6.24930188E-2 8.10874719E-3 2.21345592
4.10000038 0.200000048 0.181325018 1.03028165E-2 2.02467489 6.38177395E-2 1.2183371E-2 1.94867384
4.5 0.199999809 0.157546207 9.59824398E-3 1.76375055 6.11177757E-2 6.072836E-2 1.64190447
4.94999981 0.25 0.156071633 8.54758453E-3 1.51925421 5.52904457E-2 0.149736568 1.3142271
5.5 0.300000191 0.125403479 6.9860979E-3 1.52551162 4.61589135E-2 0.511757791 0.967594922
6.10000038 0.299999952 9.54503566E-2 6.10219687E-3 3.56054449 3.59460302E-2 2.85172343 0.672874987
6.86499977 0.464999914 5.7642214E-2 3.80936684E-3 4.10104704 2.42055673E-2 3.67026114 0.406580269
8.28999996 0.960000038 2.10143197E-2 1.60136714E-3 0.142320022 8.9181494E-3 6.96837786E-4 0.132705033
9.48999977 0.239999771 5.72929019E-3 1.6677354E-3 3.82030606E-2 2.56266794E-3 4.94769251E-4 3.51456255E-2
4.13999987 1.99999809E-2 2.47749758 4.67826687E-2 30.4350224 0.973279834 0.754008532 28.7077332
4.17999983 1.99999809E-2 2.44065595 4.64052781E-2 30.5456734 0.99132967 0.677066088 28.8772774
4.21999979 1.99999809E-2 2.4736743 4.67251018E-2 30.8877811 1.01084304 0.807663918 29.0692749
4.26000023 2.00002193E-2 2.48481822 4.68727946E-2 30.9508438 1.02374947 0.834929705 29.092165
4.30000019 1.99999809E-2 2.54010344 4.73690033E-2 31.119503 1.03903878 0.93061626 29.1498489
4.34000015 1.99999809E-2 2.49571872 4.69326451E-2 31.1599998 1.05370748 0.892735004 29.2135563
4.38000011 1.99999809E-2 2.58409572 4.77907397E-2 31.367794 1.06788957 1.05168498 29.2482204
4.42000008 1.99999809E-2 2.6437602 4.83172201E-2 31.5764256 1.08456028 1.1402396 29.3516254
4.46000004 1.99999809E-2 2.65394902 4.84031737E-2 31.5579567 1.09554553 1.1519351 29.3104763
4.5 1.99999809E-2 2.62269425 4.81106751E-2 31.644083 1.11161876 1.12954116 29.4029236
Each column is a different parameter value, and the text file includes several datasets that, in the end, I want to plot with, e.g., different colors. A new line in the text file marks a different set.
import pandas as pd
import matplotlib as plt
names = ['e', 'de', 'y', 'y_err', 'total', 'model1', 'model2', 'model3']
for j in range(8):
names.append('model%i' %j)
df = pd.read_table('textfile.qdp', skiprows=3, names=names, delimiter=' ', skip_blank_lines=True)
fig, ax = plt.subplots(figsize=(10, 6))
ax.errorbar(df.e, df.y, xerr=df.de, yerr=df.y_err, fmt='o', label='data') # Here I want to plot different dfs
How do I do that with Pandas? I think it is related to this question where:
dfs = {
k: pd.read_csv(pd.io.common.StringIO('\n'.join(dat)), delim_whitespace=True)
for k, *dat in map(str.splitlines, open('my.csv').read().split('\n\n'))
}
but I am not sure how this translates with read_table (also *dat returns Invalid syntax).
You could do it this way:
First the basics (note that you should call matplotlib.pyplot and not matplotlib to access the .subplots function at the end):
import pandas as pd
import matplotlib.pyplot as plt
import io
names = ['e', 'de', 'y', 'y_err', 'total']
for j in range(8):
names.append('model%i' %j)
Keep your arguments of the read_table stored:
kwargs = {
"names":names,
"delimiter":' ',
"skip_blank_lines":True
}
Read and store the content of your files, skipping the rows not needed:
skiprows=3
with open('textfile.qdp') as f:
collect_lines_as_list = f.readlines()
selected_lines = collect_lines_as_list[skiprows:]
content = "".join(selected_lines) # Join all lines to get one string only
Then split the content on blank lines (that is, two successive line return) and store it in a temp StringIO to recreate an object that Pandas will manage. Using dict comprehension (quite the same as the other answer you linked), you can collect all data in one shot:
dfs = {
k: pd.read_table(io.StringIO(data), **kwargs)
for k, data
in enumerate(content.split('\n\n'))
}
The plot your dataframes the way you want (by iterating through your dict values):
fig, axs = plt.subplots(len(dfs), figsize=(10, 6))
for ax, df in zip(list(axs), dfs.values()):
ax.errorbar(df.e, df.y, xerr=df.de, yerr=df.y_err, fmt='o', label='data')
plt.show()
You will get something like this:
If you want to plot all data in the same subplot, proceed like this:
fig, ax = plt.subplots(figsize=(10, 6))
for df in dfs.values():
ax.errorbar(df.e, df.y, xerr=df.de, yerr=df.y_err, fmt='o', label='data')
plt.show()]

missing data in pandas profiling report

I am using Python 2.7 and Pandas Profiling to generate a report out of a dataframe. Following is my code:
import pandas as pd
import pandas_profiling
# the actual dataset is very large, just providing the two elements of the list
data = [{'polarity': 0.0, 'name': u'danesh bhopi', 'sentiment': 'Neutral', 'tweet_id': 1049952424818020353, 'original_tweet_id': 1049952424818020353, 'created_at': Timestamp('2018-10-10 14:18:59'), 'tweet_text': u"Wouldn't mind aus 120 all-out but before that would like to see a Finch \U0001f4af #PakVAus #AUSvPAK", 'source': u'Twitter for Android', 'location': u'pune', 'retweet_count': 0, 'geo': '', 'favorite_count': 0, 'screen_name': u'DaneshBhope'}, {'polarity': 1.0, 'name': u'kamal Kishor parihar', 'sentiment': 'Positive', 'tweet_id': 1049952403980775425, 'original_tweet_id': 1049952403980775425, 'created_at': Timestamp('2018-10-10 14:18:54'), 'tweet_text': u'#the_summer_game What you and Australia think\nPlay for\n win \nDraw\n or....! #PakvAus', 'source': u'Twitter for Android', 'location': u'chembur Mumbai ', 'retweet_count': 0, 'geo': '', 'favorite_count': 0, 'screen_name': u'kaluparihar1'}]
df = pd.DataFrame(data) #data is a python list containing python dictionaries
pfr = pandas_profiling.ProfileReport(df)
pfr.to_file("df_report.html")
The screenshot of the part of the df_report.html file is below:
As you can see in the image, the Unique(%) field in all the variables is 0.0 although the columns have unique values.
Apart from this, the chart in the 'location' variable is broken. There is no bar for the values 22, 15, 4 and the only bar is for the maximum value only. This is happening in all the variables.
Any help would be appreciated.

journal quality kde plots with seaborn/pandas

I'm trying to do some comparative analysis for a publication. I came across seaborn and pandas and really like the ease with which I can create the analysis that I want. However, I find the manuals a bit scanty on the things that I'm trying to understand about the example plots and how to modify the plots to my needs. I'm hoping for some advice here on to get the plots I'm want. Perhaps pandas/seaborn is not what I need.
So, I would like to create subplots, (3,1) or (2,3), of the following figure:
Questions:
I would like the attached plot to have a title on the colorbar. Not sure if this is possible or exactly what is shown, i.e., is it relative frequency or occurrence or a percentage, etc? How can I put a explanatory tile on the colorbar (oriented vertically).
The text is a nice addition. The pearsonr is the correlation, but I'm not sure what is p. My guess is that it is showing the lag, or? If so, how can I remove the p in the text?
I would like to make the same kind of figure for different variables and put it all in a subplot.
Here's the code I pieced together from the seaborn manual/examples and from other users here on SO (thanks guys).
import netCDF4 as nc
import pandas as pd
import xarray as xr
import numpy as np
import seaborn as sns
import pdb
import matplotlib.pyplot as plt
from scipy import stats, integrate
import matplotlib as mpl
import matplotlib.ticker as tkr
import matplotlib.gridspec as gridspec
sns.set(style="white")
sns.set(color_codes=True)
octp = [622.0, 640.0, 616.0, 731.0, 668.0, 631.0, 641.0, 589.0, 801.0,
828.0, 598.0, 742.0,665.0, 611.0, 773.0, 608.0, 734.0, 725.0, 716.0,
699.0, 686.0, 671.0, 700.0, 656.0,686.0, 675.0, 678.0, 653.0, 659.0,
682.0, 674.0, 684.0, 679.0, 704.0, 624.0, 727.0,739.0, 662.0, 801.0,
633.0, 896.0, 729.0, 659.0, 741.0, 510.0, 836.0, 720.0, 685.0,430.0,
833.0, 710.0, 799.0, 534.0, 532.0, 605.0, 519.0, 850.0, 357.0, 858.0,
497.0,404.0, 456.0, 448.0, 836.0, 462.0, 381.0, 499.0, 673.0, 642.0,
641.0, 458.0, 809.0,562.0, 742.0, 732.0, 710.0, 658.0, 533.0, 811.0,
853.0, 856.0, 785.0, 659.0, 697.0,654.0, 673.0, 707.0, 711.0, 423.0,
751.0, 761.0, 638.0, 576.0, 538.0, 596.0, 718.0,843.0, 640.0, 647.0,
692.0, 599.0, 607.0, 537.0, 679.0, 712.0, 612.0, 641.0, 665.0,658.0,
722.0, 656.0, 656.0, 742.0, 505.0, 688.0, 805.0]
cctp = [482.0, 462.0, 425.0, 506.0, 500.0, 464.0, 486.0, 473.0, 577.0,
735.0, 390.0, 590.0,464.0, 417.0, 722.0, 410.0, 679.0, 680.0, 711.0,
658.0, 687.0, 621.0, 643.0, 690.0,630.0, 661.0, 608.0, 658.0, 624.0,
646.0, 651.0, 634.0, 612.0, 636.0, 607.0, 539.0,706.0, 614.0, 706.0,
401.0, 720.0, 746.0, 511.0, 700.0, 453.0, 677.0, 637.0, 605.0,454.0,
733.0, 535.0, 725.0, 668.0, 513.0, 470.0, 589.0, 765.0, 596.0, 749.0,
462.0,469.0, 514.0, 511.0, 789.0, 647.0, 324.0, 555.0, 670.0, 656.0,
786.0, 374.0, 757.0,645.0, 744.0, 708.0, 497.0, 654.0, 288.0, 705.0,
703.0, 446.0, 675.0, 440.0, 652.0,589.0, 542.0, 661.0, 631.0, 343.0,
585.0, 632.0, 591.0, 602.0, 365.0, 535.0, 663.0,561.0, 448.0, 582.0,
591.0, 535.0, 475.0, 422.0, 599.0, 594.0, 569.0, 576.0, 622.0,483.0,
539.0, 515.0, 621.0, 443.0, 435.0, 502.0, 443.0]
cctp = pd.Series(cctp, name='CTP [hPa]')
octp = pd.Series(octp, name='CTP [hPa]')
formatter = tkr.ScalarFormatter(useMathText=True)
formatter.set_scientific(True)
formatter.set_powerlimits((-2, 2))
g = sns.jointplot(cctp,octp, kind="kde",size=8,space=0.2,cbar=True,
n_levels=50,cbar_kws={"format": formatter})
# add a line x=y
x0, x1 = g.ax_joint.get_xlim()
y0, y1 = g.ax_joint.get_ylim()
lims = [max(x0, y0), min(x1, y1)]
g.ax_joint.plot(lims, lims, ':k')
plt.show()
plt.savefig('test_fig.png')
I know I'm asking a lot here. So I put the questions in order of priority.
1: To set the colorbar label, you can add the label key to the cbar_kws dict:
cbar_kws={"format": formatter, "label": 'My colorbar'}
2: To change the stats label, you need to first slightly modify the stats.pearsonr function to only return the first value, instead of the (pearsonr, p) tuple:
pr = lambda a, b: stats.pearsonr(a, b)[0]
Then, you can change that function using jointplot's stat_func kwarg:
stat_func=pr
and finally, you need to change the annotation to get the label right:
annot_kws={'stat':'pearsonr'})
Putting that all together:
pr = lambda a, b: stats.pearsonr(a, b)[0]
g = sns.jointplot(cctp,octp, kind="kde",size=8,space=0.2,cbar=True,
n_levels=50,cbar_kws={"format": formatter, "label": 'My colorbar'},
stat_func=pr, annot_kws={'stat':'pearsonr'})
3: I don't think its possible to put everything in a subplot with jointplot. Happy to be proven wrong there though.

ImageNet index to Wordnet 3.0 synsets

Working with ImageNet Resnet-50 in Caffe the prediction gives a 1000-dimensional vector. Is there an easy way to translate the indices of this vector to Wordnet 3.0 synset identifiers? For instance, that the 415: 'bakery, bakeshop, bakehouse' is "n02776631"?
I note that a similar question, Get ImageNet label for a specific index in the 1000-dimensional output tensor in torch, has been asked about a human-readable label associated with the index and an answer pointed to an index-to-label mapping available in this URL: https://gist.github.com/maraoz/388eddec39d60c6d52d4
From the human readable label I suppose it is possible to find the Wordnet synset identifier via the label-to-synset mapping on this page: http://image-net.org/challenges/LSVRC/2015/browse-synsets but I am wondering whether this is already done?
The mapping seems to be straightforward with the data from https://gist.github.com/maraoz/388eddec39d60c6d52d4 and http://image-net.org/challenges/LSVRC/2015/browse-synsets:
{0: {'id': '01440764-n',
'label': 'tench, Tinca tinca',
'uri': 'http://wordnet-rdf.princeton.edu/wn30/01440764-n'},
1: {'id': '01443537-n',
'label': 'goldfish, Carassius auratus',
'uri': 'http://wordnet-rdf.princeton.edu/wn30/01443537-n'},
2: {'id': '01484850-n',
'label': 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
'uri': 'http://wordnet-rdf.princeton.edu/wn30/01484850-n'},
...
See https://gist.github.com/fnielsen/4a5c94eaa6dcdf29b7a62d886f540372 for full file.
I have not checked thoroughly whether this mapping is actually correct.
This mapping was constructed with:
import ast
from lxml import html
import requests
from pprint import pprint
url_index = ('https://gist.githubusercontent.com/maraoz/'
'388eddec39d60c6d52d4/raw/'
'791d5b370e4e31a4e9058d49005be4888ca98472/gistfile1.txt')
url_synsets = "http://image-net.org/challenges/LSVRC/2014/browse-synsets"
index_to_label = ast.literal_eval(requests.get(url_index).content)
elements = html.fromstring(requests.get(url_synsets).content).xpath('//a')
label_to_synset = {}
for element in elements:
href = element.attrib['href']
if href.startswith('http://imagenet.stanford.edu/synset?wnid='):
label_to_synset[element.text] = href[42:]
index_to_synset = {
k: {
'id': label_to_synset[v] + '-n',
'label': v,
'uri': "http://wordnet-rdf.princeton.edu/wn30/{}-n".format(
label_to_synset[v])
}
for k, v in index_to_label.items()}
pprint(index_to_synset)

tensorflow record with float numpy array

I want to create tensorflow records to feed my model;
so far I use the following code to store uint8 numpy array to TFRecord format;
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def convert_to_record(name, image, label, map):
filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR, name + '.' + params.DATA_EXT)
writer = tf.python_io.TFRecordWriter(filename)
image_raw = image.tostring()
map_raw = map.tostring()
label_raw = label.tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'image_raw': _bytes_feature(image_raw),
'map_raw': _bytes_feature(map_raw),
'label_raw': _bytes_feature(label_raw)
}))
writer.write(example.SerializeToString())
writer.close()
which I read with this example code
features = tf.parse_single_example(example, features={
'image_raw': tf.FixedLenFeature([], tf.string),
'map_raw': tf.FixedLenFeature([], tf.string),
'label_raw': tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
image = tf.reshape(image_, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))
map = tf.decode_raw(features['map_raw'], tf.uint8)
map.set_shape(params.MAP_HEIGHT*params.MAP_WIDTH*params.MAP_DEPTH)
map = tf.reshape(map, (params.MAP_HEIGHT,params.MAP_WIDTH,params.MAP_DEPTH))
label = tf.decode_raw(features['label_raw'], tf.uint8)
label.set_shape(params.NUM_CLASSES)
and that's working fine. Now I want to do the same with my array "map" being a float numpy array, instead of uint8, and I could not find examples on how to do it;
I tried the function _floats_feature, which works if I pass a scalar to it, but not with arrays;
with uint8 the serialization can be done by the method tostring();
How can I serialize a float numpy array and how can I read that back?
FloatList and BytesList expect an iterable. So you need to pass it a list of floats. Remove the extra brackets in your _float_feature, ie
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.ones((3,)).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": _floats_feature(numpy_arr)}))
print(example)
features {
feature {
key: "bytes"
value {
float_list {
value: 1.0
value: 1.0
value: 1.0
}
}
}
}
I will expand on the Yaroslav's answer.
Int64List, BytesList and FloatList expect an iterator of the underlying elements (repeated field). In your case you can use a list as an iterator.
You mentioned: it works if I pass a scalar to it, but not with arrays. And this is expected, because when you pass a scalar, your _floats_feature creates an array of one float element in it (exactly as expected). But when you pass an array you create a list of arrays and pass it to a function which expects a list of floats.
So just remove construction of the array from your function: float_list=tf.train.FloatList(value=value)
I've stumbled across this while working on a similar problem. Since part of the original question was how to read back the float32 feature from tfrecords, I'll leave this here in case it helps anyone:
If map.ravel() was used to input map of dimensions [x, y, z] into _floats_feature:
features = {
...
'map': tf.FixedLenFeature([x, y, z], dtype=tf.float32)
...
}
parsed_example = tf.parse_single_example(serialized=serialized, features=features)
map = parsed_example['map']
Yaroslav's example failed when a nd array was the input:
numpy_arr = np.ones((3,3)).astype(np.float)
I found that it worked when I used numpy_arr.ravel() as the input. But is there a better way to do it?
First of all, many thanks to Yaroslav and Salvador for their enlightening answers.
According to my experience, their methods only works when the input is a 1D NumPy array as the size of (n, ). When the input is a Numpy array with the dimension of more than 2, the following error info appears:
def _float_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
TypeError: array([[0., 1., 2.],
[3., 4., 5.]]) has type numpy.ndarray, but expected one of: int, long, float
So, I'd like to expand on Tsuan's answer, that is, flattening the input before it was fed into the TF example. The modified code is as follows:
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float).flatten()
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
In addition, np.flatten() is more applicable than np.ravel().
Use tfrmaker, a TFRecord utility package. You can install the package with pip:
pip install tfrmaker
Then you could create tfrecords like this:
from tfrmaker import images
# mapping label names with integer encoding.
LABELS = {"bishop": 0, "knight": 1, "pawn": 2, "queen": 3, "rook": 4}
# specifiying data and output directories.
DATA_DIR = "datasets/chess/"
OUTPUT_DIR = "tfrecords/chess/"
# create tfrecords from the images present in the given data directory.
info = images.create(DATA_DIR, LABELS, OUTPUT_DIR)
# info contains a list of information (path: releative path, size: no of images in the tfrecord) about created tfrecords
print(info)
The package also has some cool features like:
dynamic resizing
splitting tfrecords into optimal shards
spliting training, validation, testing of tfrecords
count no of images in tfrecords
asynchronous tfrecord creation
NOTE: This package currently supports image datasets that are organised as directories with class names as sub directory names.