OSMnx - tags missing in graph edges - osmnx

I'm trying to detect all the streets in a city were bikes can ride. To do this I want to include the footways that include the tag "bicycle":"yes", but I can't find it in the data downloaded with OSMnx.
For example the edge with id 45031879 as an xml downloaded directly from openstreetmap website appears like this:
<way id="45031879" visible="true" version="4" changeset="64616340" timestamp="2018-11-18T10:34:12Z" user="livmilan" uid="712033">
<nd ref="571102337"/>
...
<nd ref="1587102704"/>
<tag k="bicycle" v="yes"/> <=====
<tag k="highway" v="footway"/>
</way>
But in the edges when downloaded with OSMnx with command graph = ox.graph_from_place(place, network_type='all')) it appears like this:
{'osmid': 45031879, 'highway': 'footway', 'oneway': False, 'length': 22.818, 'geometry': <shapely.geometry.linestring.LineString object at 0x00000170F3F112C8>}
It appears that the bicycle information went lost. Is there a way to download the additional tags with osmnx?
Thank you

Configure the useful_tags_way setting to add additional OSM way tags to retain as graph edge attributes:
import osmnx as ox
ox.config(log_console=True, use_cache=True,
useful_tags_way = ox.settings.useful_tags_way + ['bicycle'])
place = 'Berkeley, CA, USA'
G = ox.graph_from_place(place, network_type='bike')
edges = ox.graph_to_gdfs(G, nodes=False)
'bicycle' in edges.columns #True

Related

How to get all the substates from a country in OSMNX?

What would be the code to easily get all the states (second subdivisions) of a country?
The pattern from OSMNX is, more or less:
division
admin_level
country
2
region
3
state
4
city
8
neighborhood
10
For an example, to get all the neighborhoods from a city:
import pandas as pd
import geopandas as gpd
import osmnx as ox
place = 'Rio de Janeiro'
tags = {'admin_level': '10'}
gdf = ox.geometries_from_place(place, tags)
The same wouldn't apply if one wants the states from a country?
place = 'Brasil'
tags = {'admin_level': '4'}
gdf = ox.geometries_from_place(place, tags)
I'm not even sure this snippet doesn't work, because I let it run for 4 hours and it didn't stop running. Maybe the package isn't made for downloading big chunks of data, or there's a solution more efficient than ox.geometries_from_place() for that task, or there's more information I could add to the tags. Help is appreciated.
OSMnx can potentially get all the states or provinces from some country, but this isn't a use case it's optimized for, and your specific use creates a few obstacles. You can see your query reproduced on Overpass Turbo.
You're using the default query area size, so it's making thousands of requests
Brazil's bounding box intersects portions of overseas French territory, which in turn pulls in all of France (spanning the entire globe)
OSMnx uses an r-tree to filter the final results, but globe-spanning results make this index perform very slowly
OSMnx can acquire geometries either via the geometries module (as you're doing) or via the geocode_to_gdf function in the geocoder module. You may want to try the latter if it fits your use case, as it's extremely more efficient.
With that in mind, if you must use the geometries module, you can try a few things to improve performance. First off, adjust the query area so you're downloading everything with one single API request. You're downloading relatively few entities, so the huge query area should still be ok within the timeout interval. The "intersecting overseas France" and "globe-spanning r-tree" problems are harder to solve. But as a demonstration, here's a simple example with Uruguay instead. It takes 20 something seconds to run everything on my machine:
import osmnx as ox
ox.settings.log_console = True
ox.settings.max_query_area_size = 25e12
place = 'Uruguay'
tags = {'admin_level': '4'}
gdf = ox.geometries_from_place(place, tags)
gdf = gdf[gdf["is_in:country"] == place]
gdf.plot()

Find frontage street polyline given a street address

I have an app whereby users select parcels of land based on shapefile information. How can I return the associated street polyline location (lat, long)? I want to be able to locate the center of the street and its extents in front of this parcel. Refer to the image below the parcel is in blue and the street polyline I am interested in is in red.
If I could be pointed towards which Esri javascript method or class I could use then I can figure out the rest
Assuming that you have a Road FeatureLayer, what you could do is to spatial query it using the parcel geometry. In order to do that you can use queryFeatures method. You can add a buffer distance in order to get the roads that are around the parcel. Something like this,
let query = roadsFeatureLayer.createQuery();
query.geometry = parcelGeometry; // the parcel polygon geometry
query.distance = 10;
query.units = "meters";
query.spatialRelationship = "intersects";
query.returnGeometry = true;
query.outFields = ["*"];
roadsFeatureLayer.queryFeatures(query)
.then(function(response){
// returns a feature set with the roads features that
// intersect the 10 meters buffered parcel polygon geometry
});
Now, the result includes a set of roads. The decision of wich one you need, is another problem like #seth-lutske mention in the comments.
ArcGIS JS API - FeatureLayer queryFeatures

Can I do any analysis on spacy display using NER?

When accessing this display in spacy NER, can you add the found entities - in this case any tweets with GPE or LOC - to a new dataframe or do any further analysis on this topic? I thought once I got them into a list I could use geopy to visualive it possibly, any thoughts?
colors = {'LOC': 'linear-gradient(90deg, ~aa9cde, #dc9ce7)', 'GPE' : 'radial-gradient(white, blue)'}
options = {'ents' : ['LOC', 'GPE'],'colors':colors}
spacy.displacy.render(doc, style='ent',jupyter=True, options=options, )
The entities are accessible on the doc object. If you want to get all the ents in the doc object into a list, simply use, doc.ents. For example:
import spacy
content = "Narendra Modi is the Prime Minister of India"
nlp = spacy.load('en_core_web_md')
doc = nlp(content)
print(doc.ents)
should output:
(Narendra Modi, India)
Say, you want to the text (or mention) of the entity and the label of the entity (say, PERSON, GPE, LOC, NORP, etc.) then you can get them as follows:
print([(ent, ent.label_) for ent in doc.ents])
should output:
[(Narendra Modi, 'PERSON'), (India, 'GPE')]
You should be able to use them in other places as you see fit.

SpaCy: Set entity information for a token which is included in more than one span

I am trying to use SpaCy for entity context recognition in the world of ontologies. I'm a novice at using SpaCy and just playing around for starters.
I am using the ENVO Ontology as my 'patterns' list for creating a dictionary for entity recognition. In simple terms the data is an ID (CURIE) and the name of the entity it corresponds to along with its category.
Screenshot of my sample data:
The following is the workflow of my initial code:
Creating patterns and terms
# Set terms and patterns
terms = {}
patterns = []
for curie, name, category in envoTerms.to_records(index=False):
if name is not None:
terms[name.lower()] = {'id': curie, 'category': category}
patterns.append(nlp(name))
Setup a custom pipeline
#Language.component('envo_extractor')
def envo_extractor(doc):
matches = matcher(doc)
spans = [Span(doc, start, end, label = 'ENVO') for matchId, start, end in matches]
doc.ents = spans
for i, span in enumerate(spans):
span._.set("has_envo_ids", True)
for token in span:
token._.set("is_envo_term", True)
token._.set("envo_id", terms[span.text.lower()]["id"])
token._.set("category", terms[span.text.lower()]["category"])
return doc
# Setter function for doc level
def has_envo_ids(self, tokens):
return any([t._.get("is_envo_term") for t in tokens])
##EDIT: #################################################################
def resolve_substrings(matcher, doc, i, matches):
# Get the current match and create tuple of entity label, start and end.
# Append entity to the doc's entity. (Don't overwrite doc.ents!)
match_id, start, end = matches[i]
entity = Span(doc, start, end, label="ENVO")
doc.ents += (entity,)
print(entity.text)
#########################################################################
Implement the custom pipeline
nlp = spacy.load("en_core_web_sm")
matcher = PhraseMatcher(nlp.vocab)
#### EDIT: Added 'on_match' rule ################################
matcher.add("ENVO", None, *patterns, on_match=resolve_substrings)
nlp.add_pipe('envo_extractor', after='ner')
and the pipeline looks like this
[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x7fac00c03bd0>),
('tagger', <spacy.pipeline.tagger.Tagger at 0x7fac0303fcc0>),
('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x7fac02fe7460>),
('ner', <spacy.pipeline.ner.EntityRecognizer at 0x7fac02f234c0>),
('envo_extractor', <function __main__.envo_extractor(doc)>),
('attribute_ruler',
<spacy.pipeline.attributeruler.AttributeRuler at 0x7fac0304a940>),
('lemmatizer',
<spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x7fac03068c40>)]
Set extensions
# Set extensions to tokens, spans and docs
Token.set_extension('is_envo_term', default=False, force=True)
Token.set_extension("envo_id", default=False, force=True)
Token.set_extension("category", default=False, force=True)
Doc.set_extension("has_envo_ids", getter=has_envo_ids, force=True)
Doc.set_extension("envo_ids", default=[], force=True)
Span.set_extension("has_envo_ids", getter=has_envo_ids, force=True)
Now when I run the text 'tissue culture', it throws me an error:
nlp('tissue culture')
ValueError: [E1010] Unable to set entity information for token 0 which is included in more than one span in entities, blocked, missing or outside.
I know why the error occurred. It is because there are 2 entries for the 'tissue culture' phrase in the ENVO database as shown below:
Ideally I'd expect the appropriate CURIE to be tagged depending on the phrase that was present in the text. How do I address this error?
My SpaCy Info:
============================== Info about spaCy ==============================
spaCy version 3.0.5
Location *irrelevant*
Platform macOS-10.15.7-x86_64-i386-64bit
Python version 3.9.2
Pipelines en_core_web_sm (3.0.0)
It might be a little late nowadays but, complementing Sofie VL's answer a little bit, and to anyone who might be still interested in it, what I (another spaCy newbie, lol) have done to get rid of overlapping spans, goes as follows:
import spacy
from spacy.util import filter_spans
# [Code to obtain 'entity']...
# 'entity' should be a list, i.e.:
# entity = ["Carolina", "North Carolina"]
pat_orig = len(entity)
filtered = filter_spans(ents) # THIS DOES THE TRICK
pat_filt =len(filtered)
doc.ents = filtered
print("\nCONVERSION REPORT:")
print("Original number of patterns:", pat_orig)
print("Number of patterns after overlapping removal:", pat_filt)
Important to mention that I am using the most recent version of spaCy at this date, v3.1.1. Additionally, it will work only if you actually do not mind about overlapping spans being removed, but if you do, then you might want to give this thread a look. More info regarding 'filter_spans' here.
Best regards.
Since spacy v3, you can use doc.spans to store entities that may be overlapping. This functionality is not supported by doc.ents.
So you have two options:
Implement an on_match callback that will filter out the results of the matcher before you use the result to set doc.ents. From a quick glance at your code (and the later edits), I don't think resolve_substrings is actually resolving conflicts? Ideally, the on_match function should check whether there are conflicts with existing ents, and decide which of them to keep.
Use doc.spans instead of doc.ents if that works for your use-case.

Generate set of Nouns and verbs from n different descriptions, list out descriptions that match a noun and verb

Im new to NLP, i have with columns app name and its description. Data looks like this
app1, description1 (some information of app1, how it works)
app2, description2
.
.
app(n), description(n)
From these descriptions i need to generate a limited set of nouns and verbs. In the final application, when we pair a noun and verb from this list, output should be of list of apps that satisfy that noun+verb.
I dont have any idea where to start, can you please guide me where to start. Thank you.
The task of finding the morpho-syntactic category of words in a sentence is called part-of-speech (or PoS) tagging.
In your case, you probably need also to tokenize your text first.
To do so, you can use nltk, spacy, or the Stanford NLP tagger (among other tools).
Note that depending on the model you use, there can be several labels for nouns (singular nouns, plural nouns, proper nouns) and verbs (depending on the tense and person).
Example with NLTK:
import nltk
description = "This description describes apps with words."
tokenized_description = nltk.word_tokenize(description)
tagged_description = nltk.pos_tag(tokenized_description)
#tagged_description:
# [('This', 'DT'), ('description', 'NN'), ('describes', 'VBZ'), ('apps', 'RP'), ('with', 'IN'), ('words', 'NNS'), ('.', '.')]
# map the tags to a smaller set of tags
universal_tags_description = [(word, nltk.map_tag("wsj", "universal", tag)) for word, tag in tagged_description]
# universal_tags_description:
# [('This', 'DET'), ('description', 'NOUN'), ('describes', 'VERB'), ('apps', 'PRT'), ('with', 'ADP'), ('words', 'NOUN'), ('.', '.')]
filtered = [(word, tag) for word, tag in universal_tags_description if tag in {'NOUN', 'VERB'}]
# filtered:
# [('description', 'NOUN'), ('describes', 'VERB'), ('words', 'NOUN')]