Folium choropleth defaults to filled rather than empty - pandas

I am following this tutorial:
https://python-graph-gallery.com/292-choropleth-map-with-folium/
I have zcta-level geojson for all of the state of New York from here:
https://github.com/OpenDataDE/State-zip-code-GeoJSON/raw/master/ny_new_york_zip_codes_geo.min.json
I am seeking to make a choropleth restricted to some values in Harlem:
harlem = ["10026", "10027", "10030", "10037", "10039", "10029", "10035"]
df2 = df[df.zcta.isin(harlem)]
df2['C_pct'] = df2.C/df2.C.sum()
df2.head()
C zcta C_pct
89 40 10026 0.4
90 40 10027 0.4
91 20 10030 0.2
However, when I do this, it seems to default to coloring all zip codes not in my dataset as if they were 100%, rather than as if they were 0%.
I can see from related questions, I am correctly keying my "key_on" field, which in the data comes from the zcta given in: "features.properties.ZCTA5CE10"
My call is the following:
ny_geo = 'ny_new_york_zip_codes_geo.min.json'
m.choropleth(
geo_data=ny_geo,
name='choropleth',
data=df2,
columns=['zcta', 'C_pct'],
key_on='feature.properties.ZCTA5CE10', # this matches the geojson
fill_color='YlGn',
reset=True,
legend_name='Configurations on 20170203'
)
And yet non-Harlem zips have color, even though there is no data for them.
How can I fix this? Eg, why are Brooklyn and Queens green instead of empty?
(I understand there to be updated syntax to geojson that eschews the choropleth function, but also wasn't able to express this in terms of the pure geojson() function--the error I had there was a key error on "id" which I was never calling, so I assume I misused a default keyword somewhere.)

Related

How to get all the substates from a country in OSMNX?

What would be the code to easily get all the states (second subdivisions) of a country?
The pattern from OSMNX is, more or less:
division
admin_level
country
2
region
3
state
4
city
8
neighborhood
10
For an example, to get all the neighborhoods from a city:
import pandas as pd
import geopandas as gpd
import osmnx as ox
place = 'Rio de Janeiro'
tags = {'admin_level': '10'}
gdf = ox.geometries_from_place(place, tags)
The same wouldn't apply if one wants the states from a country?
place = 'Brasil'
tags = {'admin_level': '4'}
gdf = ox.geometries_from_place(place, tags)
I'm not even sure this snippet doesn't work, because I let it run for 4 hours and it didn't stop running. Maybe the package isn't made for downloading big chunks of data, or there's a solution more efficient than ox.geometries_from_place() for that task, or there's more information I could add to the tags. Help is appreciated.
OSMnx can potentially get all the states or provinces from some country, but this isn't a use case it's optimized for, and your specific use creates a few obstacles. You can see your query reproduced on Overpass Turbo.
You're using the default query area size, so it's making thousands of requests
Brazil's bounding box intersects portions of overseas French territory, which in turn pulls in all of France (spanning the entire globe)
OSMnx uses an r-tree to filter the final results, but globe-spanning results make this index perform very slowly
OSMnx can acquire geometries either via the geometries module (as you're doing) or via the geocode_to_gdf function in the geocoder module. You may want to try the latter if it fits your use case, as it's extremely more efficient.
With that in mind, if you must use the geometries module, you can try a few things to improve performance. First off, adjust the query area so you're downloading everything with one single API request. You're downloading relatively few entities, so the huge query area should still be ok within the timeout interval. The "intersecting overseas France" and "globe-spanning r-tree" problems are harder to solve. But as a demonstration, here's a simple example with Uruguay instead. It takes 20 something seconds to run everything on my machine:
import osmnx as ox
ox.settings.log_console = True
ox.settings.max_query_area_size = 25e12
place = 'Uruguay'
tags = {'admin_level': '4'}
gdf = ox.geometries_from_place(place, tags)
gdf = gdf[gdf["is_in:country"] == place]
gdf.plot()

#animate changes the scope of variables

I am struggling with a weird error. I'm new to Julia so maybe I don't understand something.
Consider the code:
using Plots;
xyz = 1
anim = #animate for (j, iterated_variable) in enumerate(1:10)
xyz = xyz
plot(1,1)
end
it will yield the error "UndefVarError: xyz not defined"
while
xyz = 1
anim = #animate for (j, iterated_variable) in enumerate(1:10)
print(xyz)
plot(1,1)
end
will run and (oddly enough) print exactly:
111111
1111
where the digits 1,2,7,8,9,10 are printed in monospace and the others in the regular font.
Removing the #animation handle makes the code do what you expect it to do.
xyz = 1
for (j, iterated_variable) in enumerate(1:10)
print(xyz)
plot(1,1)
end
and it will output
1111111111.
This error is quite frustrating I must admit, especially since I really am starting to like Julia. Any idea of what is happening?
Julia 1.6.3, VSCodium 1.66.1, Julia language support v1.6.17, notebooks
(edited the question, there was a code mistake)
If by notebooks you mean Pluto.jl notebooks and you run the first snippet all in one cell moving using Plots in a different one should fix the issue

SQL Server Product Configuration Schema

Summary - Our products are very configurable and have many valid options such as model, color, height, activationtype, etc...
This is not difficult to model. I have used an EAV model but what is tripping me up is the dependency metadata. Based on the product model some colors may not be available. So the application DEV only wants the colors for that model. 1 dependency is easy enough. However, oftentimes an attribute is only valid based on a combination of previously selected values.
If model = 123 and color = Blue and ShipToCountry = USA then height can be 54 to 154 inches
If model = 123 and color = BLue and ShipToCountry = Canada then height can be 54 to 175 inches
If model = 123 and color = Black and ShipToCountry = Canada then height can only be 96 inches
I have seen up to 5 dependencies used to dictate what the next attribute's valid list of value is.
Question - What would a schema look like to hold the dependency metadata. I have tried a cross ref table that links possible value combinations based on dependencies. It works but the SQL result set is messy. Hoping for a suggestion that pushes me in the right direction.
How we currently store data
Visual of Application to Choose Product Options
This previous post is close but I am not sure it solves my issue.

What does this openweather error "25.00 square degrees" mean

I am trying to identify the cause of this openweathermap api error. I could not find any references in the documentation.
This request is a SUCCESS:
https://api.openweathermap.org/data/2.5/box/city?bbox=-96.8466%2C37.0905747%2C-92.5829684%2C41.7686%2C9&appid=<apikey>
While this request FAILS
https://api.openweathermap.org/data/2.5/box/city?bbox=-96.5829684%2C26.5383355%2C-79.37923649999999%2C41.0905747%2C9&appid=<apikey>
response
{"cod":"400","message":"Requested area is larger than allowed for your account type (25.00 square degrees)"}
I know it has something to do with the bbox range, but I can't find any documentation. I am currently testing with the free subscription.
It basically means that you are fetching an area larger than what the free subscription area allows. A 25°² area means a region between 4 lat/lng position that is covering 25 degrees, for example :
An area ranging from 75 to 70 degrees North, and 0 to 5 degrees East (5x5° = 25°)
75°N,0°E ---- 75°N,5°E
70°N,0°E ---- 75°N,5°E

Importing timeseries datasets to MATLAB (all values are displayed as NaN)

I am stuck trying to run an economic model using MATLAB - at the data importing part. For most of my code I'm using a freeware toolbox called IRIS.
I have quarterly dataset with 14 variables and 160 datapoints. Essentially the dataset is a 15X161 matrix- including the dates(col1) and variable names(B1:O1).
The command used for uploading data on IRIS is
d = dbload('filename.csv')
but this isn't working. Although MATLAB is creating a 1X1 array called d and creating fields under it (one for each variable). All cells display NaN - not a number.
Why is this happening?
I checked the tutorials on the IRIS toolbox website and tried running and loading a sample dataset from there using this command, but it leads to the same problem. Everywhere I checked- including MATLAB help, this seems to be the correct command to use when using IRIS, but somehow it isn't working.
I also tried uploading the data directly using MATLAB functions and not IRIS. The command I'm using is:
d = dataset('XLSFile','filename.xls','ReadVarNames', true).
Although this is working, and I can see all the variable names, but MATLAB can't read the dates. I tried xlsread and importdata as well, but they don't read the variable names. Is there any way for me to upload the entire Excel sheet with the variable names and dates?
It would be best if I could get the IRIS command to work, since the rest of my code would be compatible with that.
The dataset looks somewhat like this..
HO_GDP HO_CPI HO_CPI HO_RS HO_ER HO_POIL....
4/1/1970 82.33 85.01 55.00 99.87 08.77
7/1/1970 54.22 8.98 25.22 95.11 91.77
10/1/1970 85.41 85.00 85.22 95.34 55.00
1/1/1971 85.99 899 8.89 85.1
You can use the TEXTSCAN function to read the CSV file in MATLAB:
%# some options
numCols = 15; %# number of columns
opts = {'Delimiter',',', 'MultipleDelimsAsOne',true, 'CollectOutput',true};
%# open file for reading
fid = fopen('filename.csv','rt');
%# read header line
headers = textscan(fid, repmat('%s',1,numCols), 1, opts{:});
%# read rest of data rows
%# 1st column as string, the other 14 as floating point
data = textscan(fid, ['%s' repmat('%f',1,numCols-1)], opts{:});
%# close file
fclose(fid);
%# collect data
headers = headers{1};
data = [datenum(data{1},'mm/dd/yyyy') data{2}];
The result for the above sample you posted (assuming values are comma-separated):
>> headers
headers =
'HO_GDP' 'HO_CPI' 'HO_CPI' 'HO_RS' 'HO_ER' 'HO_POIL'
>> data
data =
7.1962e+05 82.33 85.01 55 99.87 8.77
7.1971e+05 54.22 8.98 25.22 95.11 91.77
7.198e+05 85.41 85 85.22 95.34 55
7.1989e+05 85.99 899 8.89 85.1 0
Note how in the last line of the code we convert the date column to serial date number, so that we can store the entire data in one numeric matrix. You can always go back to string representation of dates using DATESTR function:
>> datestr(data(:,1))
ans =
01-Apr-1970
01-Jul-1970
01-Oct-1970
01-Jan-1971