Differentiate API responses between hamlets and street without number - api
We intend to use the Google Maps Geocoding API on our website. We want to use the Google Places Autocomplete API and convert the address into (X,Y) coordinates with the Geocoder API.
Yet, in France, some postal addresses are formatted with a hamlet or a small village and the city (and those are correct addresses) instead of the classical "number, street and city".
Geocoding response for this kind of addresses is:
partial_match = empty
location type = GEOMETRIC_CENTER or RANGE_INTERPOLATED
type = route
But this is the same response as a street name with no street number.
Is there any way to differentiate the API responses between "hamlets" and "street without number"?
Thanks for any insight.
[EDIT] Here is an example of my issue:
Hamlet query : http://maps.googleapis.com/maps/api/geocode/json?address=La+Croix+Fay-de-Bretagne+France&sensor=false
Street without number query : http://maps.googleapis.com/maps/api/geocode/json?address=avenue+Charles+Couchoud+Nantes+France&sensor=false
Both returns the same location_type:
"location_type" : "GEOMETRIC_CENTER"
Short answer - your Hamlet example is actually resolving a street: Map Link
To answer your general question (guessing you have this happening elsewhere where it is not the street) which is how to tell how the Geocoding API determined the partial match. If you get a route it should mean you have returned a street (with or without number). A Hamlet should show up as a sublocality or a neighborhood if not it's own locality (here in Canada this is what I have seen):
neighborhood
sublocality
*Note though that these are NOT valid mailing addresses in Canada, which makes my examples a little different than your specified criteria.
To know whether the route is an exact match or a guess is where the status codes come in:
From the API Doc's:
"ROOFTOP" indicates that the returned result is a precise geocode for which we have location information accurate down to street address precision.
"RANGE_INTERPOLATED" indicates that the returned result reflects an approximation (usually on a road) interpolated between two precise points (such as intersections). Interpolated results are generally returned when rooftop geocodes are unavailable for a street address.
"GEOMETRIC_CENTER" indicates that the returned result is the geometric center of a result such as a polyline (for example, a street) or polygon (region).
"APPROXIMATE" indicates that the returned result is approximate.
I would expect that you would see GEOMETRIC_CENTER more often for neighborhood and sublocality and RANGE_INTERPOLATED more often for roads - but that is a guess, and I think it depend more on how much data they have on the location and how the algorithm ends up calculating the point.
When you do have a Hamlet returning as a route I do not believe there is any way to tell just from the Geocoding API returned data that this is what happened. The only solution I could think of is to handle those cases in code within your application based on a list of known problems - but I can see this being very problematic and labour intensive.
This perhaps speaks to the hamlet concept as an postal / administrative one which does not translate to mapping data at all? I tried to find La Croix here and had no luck differentiating: http://www.laposte.com/find_a_post_code/find
One more note: If the Geocoding API had resolved the Hamlet as well as the street you should have had multiple results to pick from, e.g. Yarmouth Note the types list at the end tells you the type of results returned as well if you have multiple results: "types" : [ "locality", "political" ]
Update
I needed some more examples for me and decided to look for ones which would also benefit your question. I found this list of French Hamlets. Trying a few I got these results:
Blessey: sublocality
Brétigny: sublocality
Hautacam: Either fails to be found altogether or is a park and shows as a locality
La Mongie: sublocality
Ham/Clergy: locality So looks like there may be some which come up as localities - perhaps a size threshold - many of the sublocalities are incredibly small.
I think you can safely assume that if something evaluates as a route your result is for a street and not a hameaux <-- (and to think I did poorly in French Immersion here in Canada). Also if you get a route and a sublocality in the results it is a safe bet which is which.
Finally if you know the address part you have is a sublocality and NOT a street, and wish to specify this in the address I do not believe Google's API allows that. There are other services that do, like ESRI.I have not used them and do not know the how licencing works in detail, but it appears to have both a free and subscription service much like Google.
And just for completeness - I think I found La Croix using the ESRI site in the "address": "La Croix, L'Île-d'Yeu" entry from that list - downside is that it didn't recognize Fay-de-Bretagne and give a single answer.
Related
Euro <=> Dollar conversion endpoint naming convention
I have a question about a simple REST API endpoint. The endpoint can accept a value expressed in EURO then returns the corresponding value in DOLLAR, conversely it can accept a value in DOLLAR an return the value in EURO. I would like to know how I should name this endpoint to respect REST API endpoint naming conventions and best practices. So far, I have thought about: -convert-euro-dollar (Probably bad because it uses a verb) -euro-dollar (Good option?) Thanks in advance!
I would like to know how I should name this endpoint to respect REST API endpoint naming conventions and best practices. REST doesn't care what naming conventions you use for your resource identifiers. (Hint: URL shorteners work.) See Tilkov 2014. The motivation for choosing "good" resource identifiers is much the same as the motivation for choosing "good" variable names -- the machines don't care, therefore you have extra degrees of freedom that you can use to make things easy for some people. Possible people you might want to make things easy for: folks looking at resource identifiers in their browser history, operators looking at identifiers in HTTP access logs, writers trying to document the API, etc. https://www.merriam-webster.com/dictionary/put Verbs are fine; notice that this URL works exactly the way that you and your browser expect it to, even though the identifier includes a HTTP method.
As suggested by https://stackoverflow.com/a/48692503/19060474 and https://stackoverflow.com/a/10883810/19060474, I would go with one of GET /dollar/from-euro GET /euro/to-dollar GET /currency/usd/from/usd GET /currency/eur/to/usd as long as you stay consistent. Keep in mind, that you should be able to easily deduce from the endpoint what it will likely do. So you should make clear in which direction the conversion will be performed. With euro-dollar or convert-euro-dollar this is not clearly expressed because one can not determine if the endpoint expects dollar (which dollar by the way, there are quite some variants like USD, AUD, CAD, ...) and converts to EUR or vice versa. I also suggest you consider using currency codes from the ISO 4217 standard to avoid ambiguity. You can find some of them at https://www.iban.com/currency-codes.
Be aware that answers to this are opinion based, because there is no REST constraint on URI design. All you need is following the URI standards which tells you that the path is hierarchical and the query is non-hierarchical, and that's all. Even that part is sort of flexible. As of the URI design conventions, I like to describe the operation first and convert it into a verb and a noun. After that I choose HTTP method for the verb and try to describe the rest of it with a noun and attach that second noun to the first one and convert it to an URI template. So I like to name my resources with nouns. The endpoint can accept a value expressed in EURO then returns the corresponding value in DOLLAR, conversely it can accept a value in DOLLAR an return the value in EURO. Here the operation name would be convertEuroToDollarOrDollarToEuro. I think either we have two operations here convertEuroToDollar and convertDollarToEuro or we need a more general operation name something like convertCurrency and restrict it to the supported currencies, which are Euro and Dollar. Here either I would use POST /conversion to create a new conversion or I would use GET /conversion to read the conversion result. POST /currency/conversion {"fromCurrency": "EUR", "toCurrency": "USD", "amount": 100} POST /currency/conversion {"fromCurrency": "USD", "toCurrency": "EUR", "amount": 100} GET /conversion/{amount}/{fromCurrency}/to/{toCurrency} GET /conversion/100/EUR/to/USD GET /conversion/100/USD/to/EUR GET /currency/conversion?from={fromCurrency}&to={toCurrency}&amount={amount} GET /currency/conversion?from=EUR&to=USD&amount=100 GET /currency/conversion?from=USD&to=EUR&amount=100 If your service meets the HATEOAS constraint, then this kind of URI structure matters only from service developer perspective, because it is relative easy to figure out the HTTP methods URI templates for the endpoints and bind them to controller methods. From service consumer or REST client perspective what matters here is the operation name, which is convertCurrency and its parameters: fromCurrency, toCurrency, amount. You need to add these to the documentation and if you can with your actual MIME type attach the metadata to the hyperlink, which represent this operation. So at least do something like: { method: "GET", uri: "/conversion/{amount}/{fromCurrency}/to/{toCurrency}", type: "convertCurrency" } A more advanced solutions would describe the documentation of the convertCurrency operation in a machine readable way. For example Hydra does this: https://www.hydra-cg.com/ and maybe HAL forms can be another solution: https://rwcbook.github.io/hal-forms/ .
What is the correct query parameter for "match-all" in a REST filter when there is a default?
If a REST API endpoint gets all, then filters are easy. Lack of a filter means "get all results". GET /persons - gets all persons GET /persons?name=john - gets all persons with name of "john" But if there is a default, then I need some way to explicitly not set the default filter. Continuing the above example, if each person has a state, "approved" or "pending", and if my system is set such that if I do not explicitly specify a state, it will return all "approved": GET /persons - gets all approved persons, because defaults to state=approved GET /persons?state=approved - same thing, gets all approved persons GET /persons?state=pending - gets all pending persons How do I get all persons? What if there are 10 possible states? Or 100? I can think of a few ways: GET /persons?state=any - but then I can never use the actual state any GET /persons?state=* - would work, but feels strange? Or is it? GET /persons?state= - some URL parsing libraries would complain about a blank query parameter, and does this not imply "state is empty" as opposed to "state is anything"? How do I say in my GET, "override the default for the state to be anything"?
Maybe this could work for you: GET /persons?state - gets all persons that have a state name, no matter which value GET /persons?state= - gets all persons that have an empty value for the state name You probably don’t need to differentiate between these two situations, so you could use either one for getting all persons with the state name (I just think that the variant without = is more beautiful). FWIW, the application/x-www-form-urlencoded format (i.e., typically used in HTML forms) doesn’t differ between an empty and no value. As far as the URI standard is concerned, this name-value pair syntax in the query component is only a convention anyway, so you can use whichever syntax/semantics you wish.
I don't think there is one answer to this question. As long as you document that the default state is approved well I don't think it matter to the clients if you pass any, * etc. All of your proposals are fine except the last one. I don't think that is a good one. If I was designing the API I would use all and keep this as a standard. I would also recommend to use paging for all endpoints that returns list of elements. I use offset and limit as paging query parameters. In my API I return 20 elements as default if the client haven't specified another paging criteria.
How to get the user country always in English?
In my app I have a system with geolocation that compares the user's country with a string in my code. Here is something similar: if (UserLocation.country == #"Switzerland") { //country is Switzerland } This system works fine if the user language of the device is English. Some users have reported that it isn't working and that they have the device in German or Italian. How can I force the device to give me the string of the country always in English?
It is generally not ideal to use natural language (i.e., the language we speak and write as humans) to store information that is inherently symbolic. You found a good example yourself that arises when your software travels, as it does with modern devices used world-wide. Another reason could be that even within the same language there may be several widely-used ways to refer to the same "object" or topic. Just think about America, which can be referred to as America, US, USA and probably a couple of other names, too. Or the UK, which is referred to as both the United Kingdom (UK) and Great Britain (GB). Or even for some simple object properties: Is it color or colour? I am sure that more examples can be found for other languages - especially if they are spoken in more than one country. On top of that, natural language is also prone to misspellings, typing mistakes, use of abbreviations and changes to spelling standards over time. Hence, a better solution would be to use a language-independent method. The commonly accepted practice for country names is to use ISO country codes ("CH" for Switzerland, "DE" for Germany, etc., in case you use the 2-letter versions.) In your database (which can be as simple as a plist file) you would create a field called for instance countrycode of length 2 characters which should take a value of "US", "CH", "DE", "FR", etc. following the ISO 3166 standard. You can also use 3-letter ISO country codes, but it seems that the most commonly used is the 2-letter one. Then, for UI purposes you would keep another table, which could be a .strings file for localization with the natural language names of the countries: US = United States; DE = Germany; CH = Switzerland; /* etc. */ On the screen you can translate to natural language: NSString *isoCountryCode = #"CH"; // Or however you want to set it NSString *countryName = NSLocalizedString(isoCountryCode, #"Default string if country name not found."); // Now you can set the country name in human-readable form in the UI myCountryNameLabel.text = countryName; In the rest of your code, i.e., the code that handles the data before showing it to the user in the UI, you only use the ISO codes to distinguish countries. Some people refer to this part of the code as the business logic, and as a simple example it could look something like this: if ( [UserLocation.countryCode isEqualToString: #"CH"] ) { // Do things needed for Switzerland } Internationalization is an extensive topic, and you can find Apple's documentation on it here for iOS. The OS X version is here - but they will be largely identical.
I used Google Places and made a request to this url "https://maps.googleapis.com/maps/api/geocode/json?latlng=\(userLocation.latitude),\(userLocation.longitude)&key=\(self.googlePlacesApiKey)&language=en&result_type=country" so it always gives me the results in english independently from the user's language or locale settings you can change change the information you receive with result_type parameters. Full guide here: https://developers.google.com/maps/documentation/geocoding/intro#Viewports
How to change Geo information to real name
Is there a way to convert Geo information to real place name? For example, I take a photo on "137-159 New Montgomery St San Francisco, CA 94105", the Geotagging information is : geo:37.786971,-122.399677 when I type the geo in google map, it can show the place name to me. Does google provide API to get it? Thanks in advance.
Yes, the Google Maps API GeoCoder GClientGeocoder.getLocations() provides reverse geocoding as well: The term geocoding generally refers to translating a human-readable address into a point on the map. The process of doing the converse, translating a point into a human-readable address, is known as reverse geocoding. The GClientGeocoder.getLocations() method supports both standard and reverse geocoding. If you pass this method a GLatLng object instead of a String address, the geocoder will perform a reverse lookup and return a structured JSON object of the closest addressable location. Note that the closest addressable location may be some distance from the original latitude and longitude values of the query, if the supplied GLatLng is not an exact match for any addressable locations.
Is there a service that can tell whether a given name is Male or Female? [duplicate]
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 8 years ago. Improve this question I am looking for a library or database that can provide guesses about whether a person is male or female based on his or her name or nickname. Something like john => "M", mary => "F", alex => "A", #ambiguous I am looking for something that supports names other than English names (such as Japanese, Indian, etc.). Before I get another answer along the lines of "you are going to offend people by assuming their sex/gender" let me be clear, my application does not interact with anyone. It does not send emails or contact anyone in anyway. There are no users to ask. In many cases, the person in question is dead, and the only information I have is name, birth date, and date of death. The reason I want to know the sex of the individual is to make the grammar of the output nicer and to aid in possible searches that may come latter.
gender.c is an open source C program that does a good job. It comes with data for 44568 first names from all around the world. There is good documentation and a description of the file format (basically plain text) so it should not be to difficult to read it from your own application. Here is what the author says: A few words on quality of data The dictionary of first names has been prepared with utmost care. For example, the Turkish, Indian and Korean names in this dictionary have all been independently classified by several native speakers. I also took special care to list only those names which can currently be found. The lesson from this? Any modifications should be done very cautiously (and they must also adhere to the sorting required by the search algorithm). For example, knowing that "Sascha" is a boy's name in Germany, the author never assumed the English "Sasha" to be a girl's name. Knowing that "Jan" is a boy's name in Germany, I never assumed it to be also a English short form of "Janet". Another case in point is the name "Esra". This is a boy's name in Germany, but a girl's name in Turkey. The program calculates a probability for the name being male of female. It can do so with the name as input alone or with the name and country of origin, which gives significantly better results. You can download it from the website of the German computer magazine c't 40 000 Namen. The article is in German but don't worry, all documentation is English. Here is the direct ftp link 0717-182.zip if you are not interested in the article. The zip-File contains the source code, an windows executable, the database and the documentation.
The gender of a name is something that cannot be inferred programmatically in the general case. You need a name database. Here is a free name database from the US Census Bureau. EDIT: The link for the 2010 name is dead but there are working links and a libraries in the comments.
"I tell ya, life ain't easy for a boy named 'Sue.'" ...So, why make it any harder? If you need to know the sex, just ask... Otherwise, don't worry about it.
I've builded a free API that gives a probabilistic guess on the gender based on a first name. Instead of using any of the above mentioned approaches, i instead use a huge dataset of profiles from social networks to provide a probabilistic guess along with a certainty factor. It also supports optional filtering through country or language id's. It's getting better by the day as more profiles are added to the dataset. It's free to use at http://genderize.io ONE thing you should consider is using a tool that takes demographics into account, as naming conventions will rely heavily on this. Example http://api.genderize.io?name=kim {"name":"kim","gender":"female","probability":"0.89","count":1440} http://api.genderize.io?name=kim&country_id=dk {"name":"kim","gender":"male","probability":"0.95","count":44,"country_id":"dk"}
Here are two oddball approaches that may not even work, and likely wouldn't work en masse without violating the terms of a license: Use the Facebook API (which I know virtually nothing about, it may not even be possible) to perform two searches: one for FB male users with that first name, and one for female. Use the two numbers to decide the probability of gender. Much looser but more scalable, use the Google API and search for the name plus the gender-specific pronouns, and compare the numbers. For instance, there are 592,000,000 results for searching for "Richard his" (not as a phrase), but only 179,000,000 for "Richard her".
Given your stated constraints, your best option is to re-phrase whatever it is you're writing to be gender-neutral unless you know what gender they want to be called in each instance. If writing in English, remember that singular “they” is grammatically fine as a gender-neutral third-person singular pronoun. A good example is the title of this question. As is currently: … mapping a person's name to his or her sex? That would be less awkward if written: … mapping a person's name to their sex?
It's also poor practice to assume that users must be male or female. There are a small but significant number of "intersex" people, most of whom are heartily sick of not having a box to tick.. bignose: interesting on the "singular they". I didn't realize it had such a long history.
It's not a service, but a little app with a database: http://www.codeproject.com/KB/cpp/genderizer.aspx And this tool is in german: http://www.faq-o-matic.net/2011/06/01/zu-einem-vornamen-das-geschlecht-finden/ And another one in VB: http://www.vbarchiv.net/tipps/tipp_1925-geschlecht-anhand-des-vornamens-ermitteln.html I think in combination with some "Most used firstname in 2011" lists you should be able to build something decent.
The python package SexMachine will do that for you. Given any first name it returns if it's male, female or unisex. It relies on the data from the gender.c program by Jorg Michael.
The only thing you'll get from trying to automate it is a bunch of unhappy users. From that census data: JAMES, JOHN, ROBERT, MICHAEL, WILLIAM, DAVID, RICHARD, CHARLES, JOSEPH, THOMAS, CHRISTOPHER, DANIEL, PAUL, MARK, DONALD, GEORGE, KENNETH, STEVEN, EDWARD, BRIAN, RONALD, ANTHONY, KEVIN, JASON, MATTHEW, GARY, TIMOTHY, JOSE, LARRY, JEFFREY, FRANK, SCOTT, ERIC, STEPHEN, ANDREW, RAYMOND, GREGORY, JOSHUA, JERRY, DENNIS, WALTER, PATRICK, PETER, HAROLD, HENRY, CARL, ARTHUR, RYAN, JOE, JUAN, JACK, ALBERT, JUSTIN, TERRY, GERALD, KEITH, SAMUEL, WILLIE, LAWRENCE, ROY, BRANDON, ADAM, FRED, BILLY, LOUIS, JEREMY, AARON, RANDY, EUGENE, CARLOS, RUSSELL, BOBBY, VICTOR, MARTIN, JESSE, SHAWN, CLARENCE, SEAN, CHRIS, JOHNNY, JIMMY, ANTONIO, TONY, LUIS, MIKE, DALE, CURTIS, NORMAN, ALLEN, GLENN, TRAVIS, LEE, MELVIN, KYLE, FRANCIS, JESUS, RAY, JOEL, EDDIE, TROY, ALEXANDER, MARIO, FRANCISCO, MICHEAL, OSCAR, JAY, ALEX, JON, RONNIE, TOMMY, LEON, LEO, WESLEY, DEAN, DAN, LEWIS, COREY, MAURICE, VERNON, ROBERTO, CLYDE, SHANE, SAM, LESTER, CHARLIE, TYLER, GENE, BRETT, ANGEL, LESLIE, CECIL, ANDRE, ELMER, GABRIEL, MITCHELL, ADRIAN, KARL, CORY, CLAUDE, JAMIE, JESSIE, CHRISTIAN, LONNIE, CODY, JULIO, KELLY, JIMMIE, JORDAN, JAIME, CASEY, JOHNNIE, SIDNEY, JULIAN, DARYL, VIRGIL, MARSHALL, PERRY, MARION, TRACY, RENE, FREDDIE, AUSTIN, JACKIE, JOEY, EVAN, DANA, DONNIE, SHANNON, ANGELO, SHAUN, LYNN, CAMERON, BLAKE, KERRY, JEAN, IRA, RUDY, BENNIE, ROBIN, LOREN, NOEL, DEVIN, KIM, GUADALUPE, CARROLL, SAMMY, MARTY, TAYLOR, ELLIS, DALLAS, LAURENCE, DREW, JODY, FRANKIE, PAT, MERLE, TERRELL, DARNELL, TOMMIE, TOBY, VAN, COURTNEY, JAN, CARY, SANTOS, AUBREY, MORGAN, LOUIE, STACY, MICAH, BILLIE, LOGAN, DEMETRIUS, ROBBIE, KENDALL, ROYCE, MICKEY, DEVON, ASHLEY, CAREY, SON, MARLIN, ALI, SAMMIE, MICHEL, RORY, KRIS, AVERY, ALEXIS, GERRY, STACEY, CARMEN, SHELBY, RICKIE, BOBBIE, OLLIE, DENNY, DION, ODELL, MARY, COLBY, HOLLIS, KIRBY, CRUZ, MERRILL, LANE, CLEO, BLAIR, NUMBERS, CLAIR, BERNIE, JOAN, DOMINIQUE, TRISTAN, JAME, GALE, LAVERNE, ALVA, STEVIE, ERIN, AUGUSTINE, YOUNG, JOHNIE, ARIEL, DUSTY, LINDSEY, TRACEY, SCOTTIE, SANDY, SYDNEY, GAIL, DORIAN, LAVERN, REFUGIO, IVORY, ANDREA, SANG, DEON, CAROL, YONG, BERRY, TRINIDAD, SHIRLEY, MARIA, CHANG, ROSARIO, DANNIE, FRANCES, THANH, CONNIE, TORY, LUPE, DEE, SUNG, CHI, QUINN, MINH, THEO, LOU, CHUNG, VALENTINE, JAMEY, WHITNEY, SOL, CHONG, PARIS, OTHA, LACY, DONG, ANTONIA, KELLEY, CARROL, SHAYNE, VAL, JUDE, BRITT, HONG, LEIGH, GAYLE, JAE, NICKY, LESLEY, MAN, KASEY, JEWELL, PATRICIA, LAUREN, ELISHA, MICHAL, LINDSAY, and JEWEL are all names that work for both males and females. If a girl's name is Robert and everyone, including your software, keeps on calling her a man, she'd be rather pissed.
Although databases are probably the most practical solution, if you want to have some fun maybe you could try writing a neural net (or using a neural net library) that takes in the name and outputs one of those 3 options (F,M,A). You could train it using the datasets that exist in the databases suggested by other answers, as well as with any other data you have. This solution would allow you to handle names not specifically categorised previously, and also handle different languages. You might want to pass the language (if you know it) as an input to the neural net as well. I don't know that I can say neural nets (or any other machine learning) would do a good job of categorising though.
It's culture/region dependent: take Andrea, for Italians is only masculine, for Sweden is a female name while Andreas is for men; Shawn is ambiguous in English. If a language has declination, like Latin or Russian, the final letters will change according to grammatical rules, Another source of ambiguities is Family names identical to Personal names. In my opinion it's impossibile to solve in general.
The idea will clearly not work in most languages. However if you could tell the nationality beforehand you could have more luck. In most Slav languages (e.g. russian, polish, bulgarian) you could safely assume that all surnames ending with -va -cha -ska (-a in general are feminine) while -v -ch -shi are masculine. In fact any surname has feminine and masculine form depending on the ending. The same names used in other countries (e.g. US) might use only the masculine form though. The same could be said for first names (-a -ya are feminine) but it is not 100% accurate. But in general you would hardly get a library that is sufficiently accurate.
I haven't used it, but IBM has a Global Name Analytics library (for a price!) that seems pretty comprehensive.
The Z Directory (at vettrasoft.com) has a C-language function, works something like so: void func() { char c = z_guess_sex_byfirstname ("Lon"); switch(c) { case 'M': std::cout << "It's a boy!\n"; break; case 'F': std::cout << "It's a girl!\n"; break; case 'B': std::cout << "this name is for both sexes\n"; break; case '?': std::cout << "sex unknown sorry\n"; break; } } it's database driven, the table has something like 10,000+ names I think, but you need to download and install the z directory (includes many other topo items like countries, geographical landmarks, airports, states, area codes, postal-zip codes, etc along with c++ functions and objects to access the data). However the names are very English-language oriented. The table is a work in progress and gradually updated.
Name-gender maps can work but in multicultural countries it's more like guessing. I can give you one example: Marian in Polish is a typical masculine name, whereas the same name in Great Britain is a female name. In the era of people immigrating all over the world, I'm not sure such database would be very accurate. Good luck!
Some cultures have unisex names - like mine. What do you do then? I think the answer is plain and simple - don't assume - you could cause offence. Just ask if its needed, otherwise gender neutrality.
Well, not anymore. IBM patented that idea a while ago. So if you're looking for any level of flexability (something other than a list of names), you'll either have to (gasp!) ask the user, or simply pay IBM for the rights :) In any case, such autodetection is annoying for many people who have gender-ambiguous names, or even just mean parents. Let's not make this any harder for them.
It's not free, but this is a nice library that I have used before: NetGender for .NET allows you to quickly and easily build Name Verification, Parsing and Gender Determination into your custom applications. Accurately verify whether a particular field contains a valid individual or company. NetGender uses a 100,000+, ethnically diverse, Name Dictionary in combination with an 8,000+ Company Name Dictionary to ensure precise gender determination. http://www.softwarecompany.com/dotnet/netgender.htm
It's interesting that you say you have birth date. That could help. I've seen databases of histories of name popularity. In the film Splash (1984), it was funny that Darryl Hannah's character chooses the name "Madison" from a Madison Avenue street sign, because obviously "Madison" is not a girl's name. 24 years later, Madison is the 4th most popular name for girl babies! Name history from the gov't. (Check out Mary's sad decline through the last 100 years.) When I wrote to the White House as a child, Richard Nixon (or, perhaps a secretary) responded to me with some photos of the historic place, addressed to "Miss Rhett Anderson." "Miss Rhett?" It doesn't even make sense! Can we REALLY not tell the difference between Clark Gable's Rhett (with a mustache, in Gone With The Wind!) and Vivian Lee's Scarlett? I shall never forgive him, despite Neil Young's assurance that "even Richard Nixon has got soul."
I'm pretty sure no such service could exist with an acceptable level of accuracy. Here are the problems which I think are insurmountable: There are plenty of names which are for both men and women. There's a lot of different names in this world, even if you only consider one country. There is the "A Boy Named Sue" issue, raised so eloquently by Johnny Cash :-)
Check out http://genderchecker.com/
You can have a look at my python gender detection project https://github.com/muatik/genderizer It tries to detect authors' genders looking their names and/or sample text(for example tweets) of them. And it also supports mongodb, memcached for performance.
This is not really a programming problem - it comes down to getting a probability table. AFAIK there are no public databases in distilled forms. You could either build this from census data, or buy the data from someone. For example, this is someone who sells the probability table for Canada.
IMHO, it is a generally bad idea to determine sex from an individuals name. A lot of names are intersexual (good grief, is this even a word ?? :-), and also they may be one sex in one culture and another in another. A few stupid examples, just a few that came to mind (from my part of the world, CE) Vanja - female, in eastern countries from here, mostly male Alex - intersex (short for Sandra, female, and Sandro, male) Robin - in western cultures, can be both In some parts of the world, a persons sex can be determined by looking at how the name ends. For example, Marija, Sandra, Ivana, Petra, Sara, Lucija, Ana - you can see that most of these female names end in "ja" or "ra". There are other examples as well. Still, I think it's better just to ask the user for sex.
Got this from hacker news discussion about this
I know of no such service. You can perhaps find the data you are looking for, however. The US government publishes data about the prevalence of names and the gender of the person they're attached to. The Social Security Administration has such a page, and the census may as well, but I haven't taken the time to look. Perhaps other world governments do similar things.
I know of no such service, however .. you could start with a raw list of person names or guess gender according to some rules (e.g. -o => male, -ela, -a => female) In some countries (e.g. germany) the name a person can be given is limited by law - maybe there are some publications concerning that matter, which could be harvested (but I don't know of any in the moment).
What I would do is make a hack which takes the name and searches it against the facebook api. Then looks at the resulting users and count how many of them are female or male. You then can return a percentage. Not so insurmountable anymore. :)
Just ask people, and if they are nice they will give you their 'M's or 'F's , and if they are not then give'em an 'A' .