Google maps geocoding API address correction hierarchy - api

I'm trying to understand in what order and what precedence the geocoding API takes when processing the pieces of the address that was passed to it.
I have this example of why I'm asking the question. The correct address is:
2608 N Ocean Bv
Myrtle Beach, SC 29577
Running that into the API, absolutely no problems:
http://maps.googleapis.com/maps/api/geocode/json?address=2608+n+ocean+bv+myrtle+beach+sc+29577&sensor=false
However, take this typoed version of the address:
2608 N Ocean Bv
Mrytle Beach, NC 29577
The city is spelled wrong, and it has the wrong state. Street number, name and zip code are correct. Mrytle Beach does not exist anywhere, and not in NC.
http://maps.googleapis.com/maps/api/geocode/json?address=2608+n+ocean+bv+mrytle+beach+nc+29577&sensor=false
Google comes back with:
2608 N Ocean Bv
North Myrtle Beach, SC 29582
Now, that is a valid address. But why did Google decide that was the address I was looking for?
If you remove the incorrect state, and don't replace it with anything:
http://maps.googleapis.com/maps/api/geocode/json?address=2608+n+ocean+bv+mrytle+beach+29577&sensor=false
Google returns a corrected version of the correct address. So it seems that state trumps zip code - however, North Myrtle Beach does not exist in NC.
I'm thinking that omitting city and state eliminates most of this issue - but I'd like to understand why - if possible. Thanks.
Edit:
After some further playing around - it seems that Google looks for a city match as highest priority, then state - ignore all else. In this case:
Can't find a city called "Mrytle Beach" anywhere in the world.
Let's start in NC then and find the closest match to the street address if there is one.
Ah, here is the closest one to NC - in North Myrtle Beach.
If you change the state in my example above from NC to FL, the more southern Myrtle Beach match is closer to Florida than the more northern North Myrtle Beach address, and that is what Google returns.
I'm trying to understand the reasoning behind this. It seems that this sort of logic would be near last resort - or at least after making use of the zip code passed - which it appears it doesn't use at all.

Related

Problem with getting the locator of a dropdown in Selenium

Please take a look at the page # https://www.kayak.co.in/stays. When I type a location say New Delhi in the Location textbox a dropdown appears indicating the list of options to be selected. However when I try to identify that element using the Dev Tools by clicking it, it disappears and a I am not able to obtain the locator. I tried passing the locator along with Key Action Enter: i.e. SendKeys(locator + Keys.Enter) as per the UI behavior but even it fails to work which is blocking my script. Please let me know how to proceed.
As mentioned I am not able to get the dropdown locator and Keys.Enter doesn't even work
Here's a full SeleniumBase script (https://github.com/seleniumbase/SeleniumBase) that will do all that for you. pip install seleniumbase, and then run the script with python or pytest. The sleeps are only there so that you can see what's happening. Otherwise it runs very fast!
from seleniumbase import BaseCase
if __name__ == "__main__":
from pytest import main
main([__file__])
class KayakTest(BaseCase):
def test_kayak_search(self):
self.open("https://www.kayak.co.in/stays")
self.type(
'input[placeholder*="Enter a city"]',
"New Delhi, National Capital Territory of India"
)
self.sleep(1)
self.add_text('input[placeholder*="Enter a city"]', "\n")
self.sleep(1)
self.click("div.ATGJ-monthWrapper div:nth-of-type(2) div:nth-of-type(2) div:nth-of-type(23)")
self.sleep(1)
self.click("div.ATGJ-monthWrapper div:nth-of-type(2) div:nth-of-type(2) div:nth-of-type(27)")
self.sleep(1)
self.click("svg.c8LPF-icon")
self.sleep(5)
This is the full XPath. Feel free to remove useless parts.
//body[#id='k4-r'][#class='keel kl kl-override HotelsSearch react react-st wide a11y-focus-outlines wide-fd en_IN horizon'][#style='display: block;']/div[#id='RaKq'][#class='Common-Page-StandardPage Base-Search-LandingPage Base-Search-SearchPage Hotels-Search-HotelSearchPage cur_inr vm-fd dual-view']/div[#id='e6vK'][#class='Common-Layout-StandardBody withFooter']/main[#id='e6vK-pageContent'][#class='pageContent withDrawer moved wide ']/div[#id='RaKq-fd'][#class='SearchPage__FrontDoor']/div[#id='dbfN'][#class='Base-Frontdoor-FrontDoor Hotels-Frontdoor-HotelFrontDoor']/div[#id='VQMg'][#class='Common-Frontdoor-HarmonizedFrontDoorContent']/div[#class='coverPhotoContainer splash'][#data-search-form-container]/div[#id='x0Lx'][#class='_kkL _jcQ _ia2']/div[#id='dbfN-primary'][#class='primary-content']/section/div[#class='keel-container s-t-bp']/div[#class='form-section'][#id='main-search-form']/div[#class='form-container']/div[#id='c2Y_4'][#class='Ui-Searchforms-Hotels-Components-HotelSearchForm-container ']/div[#class='HPw7 HPw7-pres-default HPw7-pres-responsive HPw7-pres-dark HPw7-pres-rooms-guests HPw7-pres-wide-dates']/div[#class='HPw7-form-fields-and-submit']/div[#class='HPw7-form-fields']/div[#class='HPw7-destination']/div[#class='BCcW']/div[#class='k_my k_my-mod-theme-mcfly-search k_my-mod-radius-base k_my-mod-size-large k_my-mod-font-size-base k_my-mod-spacing-default k_my-mod-text-overflow-ellipsis k_my-mod-state-default']/input[#type='text'][#value][#class='k_my-input'][#tabindex='0'][#placeholder='Enter a city, hotel, airport, address or landmark'][#aria-autocomplete='list'][#aria-haspopup='listbox']
Generated by https://github.com/sputnick-dev/retrieveCssOrXpathSelectorFromTextOrNode
Simplified:
//input[#type='text'][#value][#placeholder='Enter a city, hotel, airport, address or landmark'][#aria-haspopup='listbox']
Or after typed new delhi:
//input[#type='text'][#value='Near Indira Gandhi Airport New Delhi India, New Delhi, National Capital Territory of India, India'][#class='k_my-input'][#tabindex='0'][#placeholder='Enter a city, hotel, airport, address or landmark'][#aria-autocomplete='list'][#aria-haspopup='listbox'][#aria-activedescendant='1070140428_hotel_Near Indira Gandhi Airport New Delhi India, New Delhi, National Capital Territory of India, India']

Odd results from Google Reverse Geocode API

Recently I've been occasionally receiving odd results from the Google Reverse Geocode API.
For example when looking up the address for coordinates 43.2379396 -72.44746565 the result I get is:
6HQ3+52 Springfield, VT, USA
In another case looking up 43.703563 -72.209753 results with:
PQ3R+C3 Hanover, NH, USA
Does anyone know what the initial 7 bytes of the returned address symbolize? When I receive this type of result it's always 4 bytes of alphanumeric data followed by a plus sign then 2 more alphanumeric bytes.
After some additional research I found that these are Plus Code addresses, a relatively new feature in Google Maps. These are used for places that don't have a street address. These seem to have some similarities to "what 3 words" addresses.

Geocoding returns two possible locations but presents wrong address as most accurate and right address is least accurate

using https://www.google.ca/maps and the geocoding api gives the same results:
using https://www.google.ca/maps and searching for:
143 GARRISON CIR , RED DEER, AB , Canada
returns two results:
143 Garrison PL
143 Garrison Cir
using the API reveals that it considers the first one '... Pl' more accurate than '... Cir' when clearly the second one is more true to the original addressed used to search since it contains 'Cir'...
using:
https://maps.googleapis.com/maps/api/geocode/xml?address=143%20GARRISON%20CIR%20%2C%20RED%20DEER%2C%20AB%20%2C%20Canada
reveals the first result's accuracy is:
ROOFTOP
and the second result's accuracy is:
RANGE_INTERPOLATED {not as accurate}
WHY???
Interestingly... if I use the postal code in the full address {which I verified with Canada Post as being correct}:
'143 GARRISON CIR , RED DEER, AB T4P0P5, Canada'
I get no results from either method!
again... WHY???
The RANGE_INTERPOLATED result means that there is no exact street address feature in the Google database and the service tries to guess where the address is located. Maybe due to this reason the exact ROOFTOP result is scored higher than an interpolation. Especially taking into account that the coordinates of both results are very close to each other:
https://google-developers.appspot.com/maps/documentation/utils/geocoder/#q%3D143%2520GARRISON%2520CIR%2520%252C%2520RED%2520DEER%252C%2520AB%2520%252C%2520Canada
In order to solve this you should report a missing address to Google using Send feedback mechanism:
https://support.google.com/maps/answer/3094045
Also note that an interpolated result for the address has a different postal code T4N 3M4. Even more, if I try to search the postal code T4P 0P5, I'll get back only postal code prefix T4P:
https://google-developers.appspot.com/maps/documentation/utils/geocoder/#q%3D%26options%3Dtrue%26in_country%3DCA%26in_postal_code%3DT4P%25200P5
That means the postal code T4P 0P5 is also missing from Google database and you should report it as well.
As the postal code is missing you are getting ZERO_RESULTS for complete string 143 GARRISON CIR , RED DEER, AB T4P0P5, Canada
https://google-developers.appspot.com/maps/documentation/utils/geocoder/#q%3D143%2520GARRISON%2520CIR%2520%252C%2520RED%2520DEER%252C%2520AB%2520T4P0P5%252C%2520Canada%26options%3Dtrue
As you mentioned, we can see that the same behavior is reproducible on maps.google.com. There are two options for the address and Garrison Pl is the first item while Garrison Cir is the second. That confirms that this is a data issue rather than API issue:
I hope this explains your doubt.

BS4 - grabbing information from something youve already parsed

hey this was kind of explained to me before but having trouble appying the same thing now to almost the same page...
page = 'http://www.imdb.com/genre/action/?ref_=gnr_mn_ac_mp'
table = soup.find_all("table", {"class": "results"})
for item in list(table):
for info in item.contents[1::2]:
info.a.extract()
link = info.a['href']
print(link)
name = info.text.strip()
print(name)
code above tries to capture the link to each page of each film contained in the a tag in the variable info... and the text in it has the name of each film but instead i get all the text. is there any way of just getting the name?
thanks guys in advance!!!
Just just need to pull the text from the anchor tag inside the td with the class title:
In [15]: from bs4 import BeautifulSoup
In [16]: import requests
In [17]: url = "http://www.imdb.com/genre/action/?ref_=gnr_mn_ac_mp"
In [18]: soup = BeautifulSoup(requests.get(url,"lxml").content)
In [19]: for td in soup.select("table.results td.title"):
....: print(td.a.text)
....:
X-Men: Apocalypse
Warcraft
Captain America: Civil War
The Do-Over
Teenage Mutant Ninja Turtles: Out of the Shadows
The Angry Birds Movie
The Nice Guys
Batman v Superman: Dawn of Justice
Suicide Squad
Deadpool
Gods of Egypt
Zootopia
13 Hours: The Secret Soldiers of Benghazi
Now You See Me 2
The Brothers Grimsby
Hardcore Henry
Monster Trucks
Independence Day: Resurgence
Star Trek Beyond
The Legend of Tarzan
Deepwater Horizon
X-Men: Days of Future Past
Star Wars: The Force Awakens
X-Men: First Class
The 5th Wave
Pretty much all the data you would want is inside the td with the title class:
So if you wanted the outline also all you need is the text from the span.outline:
In [24]: for td in soup.select("table.results td.title"):
....: print(td.a.text)
....: print(td.select_one("span.outline").text)
....:
X-Men: Apocalypse
With the emergence of the world's first mutant, Apocalypse, the X-Men must unite to defeat his extinction level plan.
Warcraft
The peaceful realm of Azeroth stands on the brink of war as its civilization faces a fearsome race of...
Captain America: Civil War
Political interference in the Avengers' activities causes a rift between former allies Captain America and Iron Man.
The Do-Over
Two down-on-their-luck guys decide to fake their own deaths and start over with new identities, only to find the people they're pretending to be are in even deeper trouble.
Teenage Mutant Ninja Turtles: Out of the Shadows
As Shredder joins forces with mad scientist Baxter Stockman and henchmen Bebop and Rocksteady to take over the world, the Turtles must confront an even greater nemesis: the notorious Krang.
The Angry Birds Movie
Find out why the birds are so angry. When an island populated by happy, flightless birds is visited by mysterious green piggies, it's up to three unlikely outcasts - Red, Chuck and Bomb - to figure out what the pigs are up to.
The Nice Guys
A mismatched pair of private eyes investigate the apparent suicide of a fading porn star in 1970s Los Angeles.
Batman v Superman: Dawn of Justice
Fearing that the actions of Superman are left unchecked, Batman takes on the Man of Steel, while the world wrestles with what kind of a hero it really needs.
Suicide Squad
A secret government agency recruits imprisoned supervillains to execute dangerous black ops missions in exchange for clemency.
Deadpool
A former Special Forces operative turned mercenary is subjected to a rogue experiment that leaves him with accelerated healing powers, adopting the alter ego Deadpool.
Gods of Egypt
Mortal hero Bek teams with the god Horus in an alliance against Set, the merciless god of darkness, who has usurped Egypt's throne, plunging the once peaceful and prosperous empire into chaos and conflict.
Zootopia
In a city of anthropomorphic animals, a rookie bunny cop and a cynical con artist fox must work together to uncover a conspiracy.
13 Hours: The Secret Soldiers of Benghazi
During an attack on a U.S. compound in Libya, a security team struggles to make sense out of the chaos.
Now You See Me 2
The Four Horsemen resurface and are forcibly recruited by a tech genius to pull off their most impossible heist yet.
The Brothers Grimsby
A new assignment forces a top spy to team up with his football hooligan brother.
Hardcore Henry
Henry is resurrected from death with no memory, and he must save his wife from a telekinetic warlord with a plan to bio-engineer soldiers.
Monster Trucks
Looking for any way to get away from the life and town he was born into, Tripp (Lucas Till), a high school senior...
Independence Day: Resurgence
Two decades after the first Independence Day invasion, Earth is faced with a new extra-Solar threat. But will mankind's new space defenses be enough?
Star Trek Beyond
The USS Enterprise crew explores the furthest reaches of uncharted space, where they encounter a mysterious new enemy who puts them and everything the Federation stands for to the test.
The Legend of Tarzan
Tarzan, having acclimated to life in London, is called back to his former home in the jungle to investigate the activities at a mining encampment.
Deepwater Horizon
A story set on the offshore drilling rig Deepwater Horizon, which exploded during April 2010 and created the worst oil spill in U.S. history.
X-Men: Days of Future Past
The X-Men send Wolverine to the past in a desperate effort to change history and prevent an event that results in doom for both humans and mutants.
Star Wars: The Force Awakens
Three decades after the defeat of the Galactic Empire, a new threat arises. The First Order attempts to rule the galaxy and only a ragtag group of heroes can stop them, along with the help of the Resistance.
X-Men: First Class
In 1962, the United States government enlists the help of Mutants with superhuman abilities to stop a malicious dictator who is determined to start World War III.
The 5th Wave
Four waves of increasingly deadly alien attacks have left most of Earth decimated. Cassie is on the run, desperately trying to save her younger brother.
For runtime td.select_one("span.runtime").text etc..
Just like how you got the link by doing
info.a['href']
You can also get the title of the movie by doing
info.a['title']
Hopefully this is what you're looking for!

VIN Vehicle Identification Number, how to figure out the WMI part?

I am trying to figure out how to breakdown a vehicle vin number.
There is an explanation of how a VIN is build (http://en.wikipedia.org/wiki/Vehicle_Identification_Number#Components_of_the_VIN) but it fails to explain what to do with manufacturers that only have 2 digits assigned instead of 3 digits.
If I understand correct what is written there then every VIN number must be 17 characters long, and the first 3 characters are the WMI (World Manufaturer Identification).
Then there is a list of WMI on the same page, but some manufacturers only have 2 characters in that list, not 3.
How to read such a VIN number ? Will it be only 16 characters long or how do I regognize that a WMI is 2 or 3 characters ?
for example nissan has WMI = JN which is only 2 characters.
2 VIN numbers for Nissan that I know that are valid are :
JN1UC4E26F9001391 and JNKCP0106TT541680
How can I know that for these 2 VIn numbers only the first 2 digits are to be read and used for the WMI ?
In you examples it is in fact 3 characters that Set the WMI information
JN1UC4E26F9001391
NISSAN MOTOR COMPANY, LTD. JN1,1N4 3995 RESEARCH PARK DRIVE ANN ARBOR MI 48104 PASSENGER CAR SEE MEMO 7/29/1986
JNKCP0106TT541680
NISSAN RESEARCH & DEVELOPMENT JN1,JNK 750 17TH STREET, N.W. WASHINGTON DC 20006 PASSENGER CAR ALL 1/13/1992
1st character- Identifies the country in which the vehicle was manufactured.
For example: U.S.A.(1or 4), Canada(2), Mexico(3), Japan(J), Korea(K), England(S), Germany(W), Italy(Z)
2nd character- Identifies the manufacturer. For example; Audi(A),
BMW(B), Buick(4), Cadillac(6), Chevrolet(1), Chrysler(C), Dodge(B),
Ford(F), GM Canada(7), General Motors(G), Honda(H), Jaquar(A), Lincoln(L), Mercedes Benz(D), Mercury(M), Nissan(N), Oldsmobile(3), Pontiac(2or5), Plymouth(P), Saturn(8), Toyota(T), VW(V), Volvo(V).
3rd character- Identifies vehicle type or manufacturing division.
Per VIN descriptor
This link contains the most current WMI information for all, this is were I got my data
WMI for all manufactures
In any Case it is not 3 or 2 for WMI it is
1 for country
2 for manufacturer
3 for type
this gets a little tricky because IF "2" = "N" that is not necessarily Nissan, for example 1N9 = NOMAD CUSTOM CYCLES INC.
Let me know if that helps or if you find a place to get VDS information
Your wiki page seems pretty clear that it's either a 3-character WMI, or a 2-character WMI followed by a 9.
There are an abundance of libraries out there, on GitHub and elsewhere, which are designed to decode VINs. "Do not do a thing already done."