How to use http://www.census.gov API to pull data - api

Am trying to query data from http://www.census.gov, using their API
I want to get the population of a particular city in the US, by using the city name and the US state code.
Given that I already have the key, what other parameters do I add in the URL below, so that I can get the population.
http://api.census.gov/data/2010/sf1?key=<my key>
any assistance will be greatly appreciated

Judging from your query URI, you wish to access population data from the 2010 Census Summary File. You would add GET paramaters of get and for to your query. Example:
http://api.census.gov/data/2010/sf1?key=b48301d897146e8f8efd9bef3c6eb1fcb864cf&get=P0010001&for=state:06
The population table as given in the get parameter are identified with a "P" and you can use the for parameter to further narrow down your scope. Examples of valid criteria formatted as URIs can be found here...
EDIT: It seems that for a finer grained search such as cities, you're going to need to use the governments cumbersome FIPS (Federal Information Processing Standard) codes (after converting lat/lon regions to their coding system)... I've found this resource that should be helpful, specifically points 5 thru 7, but it seems mega complex...
Another alternative I found is the USA Today census API, it seems that they mirror the data from the census and they do have available endpoints with data granularity at the city level... Check it out here...

no need to use API the data is available in CSV here http://www.census.gov/popest/data/cities/totals/2012/SUB-EST2012.html

Related

How to deal with missing data? Info will be used for data visualization

How does everyone deal with missing values in dataframe? I created a dataframe by using a Census Web Api to get the data. The 'GTCBSA' variable provides the City information which is required for me to use it for (plotly and dash) and I found that there is a lot of missing values in the data. Do I just leave it blank and continue with my data visualization? The following is my variable
Example data for 2004 = https://api.census.gov/data/2004/cps/basic/jun?get=GTCBSA,PEFNTVTY&for=state:*
Variable description = https://api.census.gov/data/2022/cps/basic/jan/variables/GTCBSA.json
There are different ways of dealing with missing data depending on the use case and the type of data that is missing. For example, for a near-continuous stream of timeseries signals data with some missing values, you can attempt to fill the missing values based on nearby values by performing some type of interpolation (linear interpolation, for example).
However, in your case, the missing values are cities and the rows are all independent (each row is a different respondent). As far as I can tell, you don't have any way to reasonably infer the city for the rows where the city is missing so you'll have to drop these rows from consideration.
I am not an expert in the data collection method(s) used by the US census, but from this source, it seems like there are multiple methods used so I can see how it might be possible that the city of the respondent isn't known (the online tool might not be able to obtain the city of the respondent, or perhaps the respondent declined to state their city). Missing data is a very common issue.
However, before dropping all of rows with missing cities, you might do a brief check to see if there is any pattern (e.g. are the rows with missing cities predominantly from one state, for example?). If you are doing any state-level analysis, you could keep the rows with missing cities.

Sorting the response from the Foursquare Places API re:two word name?

We are trying to query the Foursquare api to query for a two word name:
Cava Grill in Gaithersburg, MD
We are trying this via:
https://api.foursquare.com/v2/venues/search?intent=checkin&query=cava%20grill&near=gaithersburg,%20md&limit=1&oauth_token=SEB14NBLGO4HMFTOXQX0JZTSVGM41ENNKE0X1RXHCI5XP3P5&v=20150420
(don't worry ... this is the public API key from the FS page)
Two odd behaviors:
Even though we are explicitly searching for the Cava Grill in Gaithersburg, MD ... the Bethesda, MD one comes up first in the results (odd, why??)
Chipotle Mexican Grill shows up in this result set ... we suppose because of the word "Grill"
So ...
a. anyone know why the Bethesda one would show up higher in the result set? (Should we just narrow the radius tighter?)
b. anyone know if we can look for the "entire query" vs. each word in the query?
Results are queried and sorted differently based on your intent. If you're looking for a specific venue, I suggest changing your intent from checkin to match. Browse may also be a good choice depending on future search params
Here's the nutshell on the intents:
intent=checkin returns a list of venues where the user is most likely is located
intent=browse returns a list of most relevant venues for a requested region, not biased by distance from a central point.
intent=match returns a single result that, with high confidence, is the corresponding foursquare venue for the query-based request
I hope this helps

Venue Search API Inaccurate Using "Near" Instead of "LL"

When I search venues via API I get different results than the foursquare website. For example I'm looking for a venue named "Nopalito" near "San Francisco, CA". I'm under the impression this should return relevant matches:
https://api.foursquare.com/v2/venues/search?query=Nopalito&intent=match&near=San%20Francisco%2C%20CA
I only receive one result for a venue named "Invocation". However, when I run a similar query via foursquare.com website I get what I'd expect:
https://foursquare.com/search?tab=venueResults&q=Nopalito&lat=&lng=&near=San+Francisco%2Cca&source=q
The website search yields two venues named Nopalito in San Francisco, CA.
Seems like a very basic query with a limited number of potential results. What's up? Am I missing something obvious here?
At first glance I'd suggest dropping intent=match, intent=match makes a very restrictive query. The purpose of intent=match (from https://developer.foursquare.com/docs/venues/search ) is to
"Finds venues that are are nearly-exact matches for the given query and ll. This is helpful when trying to correlate an existing place database with foursquare's. It is highly sensitive to the provided location. The results will be sorted best match first, taking distance and spelling mistakes/variations into account."
I'd recommend intent=browse for this type of query.

Adding custom data to GapMinder

Does anyone have any experience adding their own data to GapMinder, the really cool software that Hans Rosling uses in his TED talks? I have an array od objects in JSON that would be easy to show in moving bubbles. This would be really cool.
I can see that my Ubuntu box has what looks like data in /opt/Gapminder Desktop/share/assets/graphs/world, but I would need to figure out:
How to add a measure to a graph
How to add a data series
How to set the time range of the data
Identify the measures to follow at each time step
and so on.
Just for the record: if you want to use Gapminder with your own dataset, you have to convert your data in a format suitable to Gapminder. More specifically, looking in the assets/graphs/world, you will have to:
Edit the file overview.xml, which contains the tree structure of all the indicators (just copy/paste an entry and specify your own data);
Convert your data copying the structure of the xml files in that directory (this is the tricky part): you can specify some metadata in the preamble, and then specify your own data series, with something like:
<t1 m="i20,50.0,99.0,1992" d="90.0, ... ,50.0, ..."/> where i20 is the country id, which is followed by the minima and maxima of the series, and the year it refers to.
In my humble opinion, Gapminder is a great app but it definitely needs more work on integration with other datasets. Way better to use Google Motion Chart as you did, or MooGraph (site and doc), which is unfortunately not as great as Gapminder.
#Stefano
the information you provided is very valuable. Is somewhere available a detailed specification of the XML files containing the data?
Anyway, just to enrich your response, I also found that:
overview.xml file
The link between Nations and their IDs is in this file
The structure of the menus for the selection of the indicators is also in the same file (at the bottom) under the section <indicatorCategorization>
The structure of the datafile XML
For each line the year represents the first year of the serie, and then the values follow one per year, comma separated.
Grazie,
Max
I ended up using the google motion chart API. I ended up with this.

Which parts of an address should be required?

Say I am storing addresses in a DB table, in this fairly common break down:
address_street_line_1,
address_street_line_2,
address_city,
address_state,
address_zip,
address_country_id
(Note: I have read the questions on splitting down further, street type, house number, etc. and for this application I think it would unnecessarily complicate things.)
To work best with international users, which of these fields should NOT be required?
I'm thinking this:
address_street_line_1 REQUIRED
address_city REQUIRED
address_country_id REQUIRED
Should I require state or zip?
Thanks!
Xavier
You can probably only require one field: country.
But what you should really be doing is making the logic dependent on country. Take a look at Address Formats by Country for a comprehensive list. That isn't just about required fields either. It's also about correct formatting. A US address might be:
8031 Main Street
Springfield OH 12345
USA
whereas in Switzerland:
Bodenstr. 173
8043 Zürich
Schweiz
Note: the street numbers and post codes are in the "reverse" order for Switzerland (compared to what English speaking countries use).
Also, your data types need to be broad enough to cover data used in other countries. Zip/post code should absolutely not be a numeric type. For example, "EC2R 8AH" is a valid UK postcode.
That goes back to this principle: if you don't perform arithmetic on it, it's not a numeric type. It's text.
Also, try not to call it Zip Code to end users. That's a US only term. Pretty much everywhere else its call a Postcode, Post code or Postal Code. Also note that the UK postal codes are alphanumeric and include a space.
Not all countries even use postal codes, for example they were rarely used in New Zealand prior to 2006 or so. I think Ireland doesn't use them at all.
If you're truly international, city-states such as Singapore don't actually need a City field.
In the user interface, you can (and perhaps should) make the postcode required for countries where you already know it's required, since that isn't likely to change. And, if you make the UI dynamic enough, you can call it "Zip code" if the selected country is the United States, "Postal code" for Canada, "Postcode" for the UK, etc.
How about making none required? If the user wants to be contacted they'll enter enough information. Or, enter a single text field and let them enter free form information. They know better than you what fields are required for postal deliveries to make it to their door.
I would say everything except street_line_2 and state- and think of 'zip' as more of a postal codes instead of zip code - as you can tell from the variety of format based on the country of origin, this should have a pretty open format.
Even in the U.S., most of the address is not required. A large fraction of U.S. zip codes are allocated to various businesses and organizations - any mail to one of those zips will be delivered the same regardless of the rest of the address. For instance:
General Electric
Schenectady, NY 12345
Internal Revenue Service
Ogden, UT 84201-0027
The city and state are nice, but the mail will probably get delivered without.
The best way that I have found to solve this problem is by abstracting the logic in your application layer, and not the persistence layer. One of the cleanest/simplest ways I've seen this done is by passing the user's data in a value object (creating a common interface that's easy to validate against) to a validator with the current country code, which makes sure all the required attributes are set properly in the value object for that locale. Assuming it passes validation, pass the value object along to the persistence side of your application for storage.
The key here is the value object - you're creating a common interface that multiple pieces of your application can talk to, validate, and read/write from. You can then also use that same value object when displaying the address: have your persistence layer get the information, put it in the value object, pass it to a factory with the current locale which returns the desired address format, and send that output to the front end.
There are no states in New Zealand, so it should definately be optional. So I think you have the right answer in your question.
If you are not going to do any specific lookup, like searching by postal code or by city, I'd say to all combine the address in a single field. This way you will support the different address from different countries.
You will also support address oddities.
If you fear that the requirements are going to change, you could store the address as a Xml field. Modern database like Sql Server 2005 and 2008 can have an index on a Xml node inside a Xml column as long as you are using a schema.
It all come down to requirements. If the client need to group the data inside a grid by country, then you need a country column.
Making fields required is always a tradeoff. If the person doesn't want to fill in the info then they won't -- they'll put in a period, or garbage to get past the "required field" nanny.
I only require street_address_1 in my apps. Also, for the US and many countries, you can buy the mapping between the postal/zip code and the canonical city/state. It's not expensive. (The mapping between individual street addresses and zip is much more expensive.)
For the US, see http://www.usps.com/ncsc/addressinfo/citystate.htm
If you're including an Ajax web interface, ask for the country first, then the post code. If in the US, then use Ajax to fetch and fill in the city/state for the user from the zip.
Some non-US countries, eg UK, can have 3 lines of street addresses if you're asking people to fill in their "preferred address" Eg:
Mirassou (You can register a building's name with the post office
High Street as an alternative to its street number)
Old Town
City, Bucks postal_code
Larry
Actually, city isn't even required in the US.
Many people have rural addresses on state and county roads. Publication 28 at the postal service web site has details. Different companies end up using the "city" field to store other information. This also applies to military base addresses.
Publication 28 link