Is there a publicly available list of the US States in machine readable form? - sql

Where can I find a list of the US States in a form for importing into my database?
SQL would be ideal, otherwise CSV or some other flat file format is fine.
Edit: Complete with the two letter state codes

I needed this a few weeks ago and put it on my blog as SQL and Tab Delimited. The data was sourced from wikipedia in early January so should be up to date.
US States: http://www.john.geek.nz/index.php/2009/01/sql-tips-list-of-us-states/
I use the Worlds Simplest Code Generator if I need to add columns or remove some of the fields - http://secretgeek.net/wscg.asp
I've also done Countries of the world and International Dialling Codes too.
Countries: http://www.john.geek.nz/index.php/2009/01/sql-tips-list-of-countries/
IDC's: http://www.john.geek.nz/index.php/2009/01/sql-tips-list-of-international-dialling-codes-idcs/
Edit: New: Towns and cities of New Zealand

Depending on why you need the states, it is worth keeping in mind that there are more than 50 valid state codes. For someone deployed outside the USA, it is annoying to come across websites that do not allow address entry with perfectly valid state codes like AE and AP. A better resource would be USPS.

Cut/Paste these into notepad and then import..should be easy enough - there are only 50 after all:
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming

Out of interest: As there are only 50 and they rarely change, couldn't you not just manually create such a list from a source and put it on a public webspace?

In response to #cspoe7's astute observation, here is a query with all valid states and their abbreviations according to USPS. I have them sorted here by category (official US states, District of Columbia, US territories, military "states") and then alphabetically.
INSERT INTO State (Name, Abbreviation)
VALUES
('Alabama','AL'), -- States
('Alaska','AK'),
('Arizona','AZ'),
('Arkansas','AR'),
('California','CA'),
('Colorado','CO'),
('Connecticut','CT'),
('Delaware','DE'),
('Florida','FL'),
('Georgia','GA'),
('Hawaii','HI'),
('Idaho','ID'),
('Illinois','IL'),
('Indiana','IN'),
('Iowa','IA'),
('Kansas','KS'),
('Kentucky','KY'),
('Louisiana','LA'),
('Maine','ME'),
('Maryland','MD'),
('Massachusetts','MA'),
('Michigan','MI'),
('Minnesota','MN'),
('Mississippi','MS'),
('Missouri','MO'),
('Montana','MT'),
('Nebraska','NE'),
('Nevada','NV'),
('New Hampshire','NH'),
('New Jersey','NJ'),
('New Mexico','NM'),
('New York','NY'),
('North Carolina','NC'),
('North Dakota','ND'),
('Ohio','OH'),
('Oklahoma','OK'),
('Oregon','OR'),
('Pennsylvania','PA'),
('Rhode Island','RI'),
('South Carolina','SC'),
('South Dakota','SD'),
('Tennessee','TN'),
('Texas','TX'),
('Utah','UT'),
('Vermont','VT'),
('Virginia','VA'),
('Washington','WA'),
('West Virginia','WV'),
('Wisconsin','WI'),
('Wyoming','WY'),
('District of Columbia','DC'),
('American Samoa','AS'), -- Territories
('Federated States of Micronesia','FM'),
('Marshall Islands','MH'),
('Northern Mariana Islands','MP'),
('Palau','PW'),
('Puerto Rico','PR'),
('Virgin Islands','VI'),
('Armed Forces Africa','AE'), -- Armed Forces
('Armed Forces Americas','AA'),
('Armed Forces Canada','AE'),
('Armed Forces Europe','AE'),
('Armed Forces Middle East','AE'),
('Armed Forces Pacific','AP')

If you need to memorize them, let Wakko help you :)

You can download a lot of lists on http://www.freebase.com/ .

http://www.geonames.org/export/
The GeoNames geographical database is available for download free of charge under a creative commons attribution license. It contains over eight million geographical names and consists of 6.5 million unique features whereof 2.2 million populated places and 1.8 million alternate names. All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes. (more statistics ...).
The data is accessible free of charge through a number of webservices and a daily database export.

You could use google sets to make a list of all states as well as lists of more or less anything.

If you need only 52 states SQL server script you can use the following query: solved
INSERT INTO
States ( StateName )
VALUES
( 'Alabama'),
( 'Alaska'),
( 'Arizona'),
( 'Arkansas'),
( 'California'),
( 'Colorado'),
( 'Connecticut'),
( 'Delaware'),
( 'District of Columbia'),
( 'Florida'),
( 'Georgia'),
( 'Hawaii'),
( 'Idaho'),
( 'Illinois'),
( 'Indiana'),
( 'Iowa'),
( 'Kansas'),
( 'Kentucky'),
( 'Louisiana'),
( 'Maine'),
( 'Maryland'),
( 'Massachusetts'),
( 'Michigan'),
( 'Minnesota'),
( 'Mississippi'),
( 'Missouri'),
( 'Montana'),
( 'Nebraska'),
( 'Nevada'),
( 'New Hampshire'),
( 'New Jersey'),
( 'New Mexico'),
( 'New York'),
( 'North Carolina'),
( 'North Dakota'),
( 'Ohio'),
( 'Oklahoma'),
( 'Oregon'),
( 'Pennsylvania'),
( 'Puerto Rico'),
( 'Rhode Island'),
( 'South Carolina'),
( 'South Dakota'),
( 'Tennessee'),
( 'Texas'),
( 'Utah'),
( 'Vermont'),
( 'Virginia'),
( 'Washington'),
( 'West Virginia'),
( 'Wisconsin'),
( 'Wyoming');

I'm just gonna put this list of the United States bash/linux format here so I can save someone some time:
alabama|alaska|arizona|arkansas|california|colorado|connecticut|delaware|florida|georgia|hawaii|idaho|illinois|indiana|iowa|kansas|kentucky|louisiana|maine|maryland|massachusetts|michigan|minnesota|mississippi|missouri|montana|nebraska|nevada|newhampshire|newjersey|newmexico|newyork|northcarolina|northdakota|ohio|oklahoma|oregon|pennsylvania|rhodeisland|southcarolina|southdakota|tennessee|texas|utah|vermont|virginia|washington|westvirginia|wisconsin|wyoming

Related

Best way to add info/description to my items?

I made a geo game a while back where the player has to guess an item from an image (what I call an item is a SQL row basically) for example the bot sends the flag of the Netherlands, you have to type "Netherlands" to win.
Items can be the flag of a country, a capital city, a french department...
I made an info tab where it would basically give info about an item (ie region, former name, capital city, etc).
What I would like to do is properly save this information. I don't really know if I should store this in files like JSON because I would also like to give stats (Win rate per region, amount of games played per region, etc...).
Also, these elements are not fixed because some items have regions, capital cities or whatever and some don't.
Item examples :
(For a flag
Column
Attribute
ID
1
Name
United Kingdom
Former name
United Kingdom of Great Britain and Northern Ireland
Code
GB
Continent
Europe
Subregion
Northern Europe
Capital city
London
...
(For a U.S. State)
Column
Attribute
ID
1
Name
Arizona
Capital city
Phoenix
Largest city
Phoenix
...
The both solution (Add all as column and json) are not the proper way.
I think the best design is to have a key-value table.
Create Table tableName (ID INT, [Key] SYSNAME, [Value])
And data will look like:
ID
Key
Value
1
Name
Arizona
1
Capital City
Phoenix
1
Largest City
Phoenix
2
Name
United Kingdom
2
Former name
United Kingdom of Great Britain and Northern Ireland
Most valuable benefits: No Extra storage for columns with large amount of rows with NULL value.

Updating names in a column with slight differences

I have a column where there are slight differences in each name but I want to only use one of the names. I have using an update query but is there an easier way instead of doing an update query one by one? Examples below. There are multiple versions of the name and I only want to pick the first one.
Name
NORTH HOSP
NORTH HOSPITAL
NORTHERN W HOSP
NORTHERN WEST HOSPITAL
NORTHERN W HOSPITAL
BROTHERS MED CTR
BROTHERS MEDICAL CENTER
HEALTHALLY HOSP
HEALTH ALLY HOSPITAL
Current code example:
update #tablename
set name='NORTH HOSP'
where name in ('NORTH HOSPITAL')
update #tablename
set name='HEALTHALLY HOSP'
where name in ('HEALTH ALLY HOSPITAL')

Figure out average for multiple subsets of rows

I have a csv file with data on store sales for each province, including the store ID. I've already figured out how to get a list of the provinces with the most sales, and a list of the stores with the most sales, but now I need to calculate: 1) The average store sales for each province and 2) The best-selling store in each province and then 3) The difference between them. The data looks like this:
>>> store_sales
sales
store_num province
1396 ONTARIO 223705.21
1891 ONTARIO 71506.85
4823 MANITOBA 114692.70
4861 MANITOBA 257.69
6905 ONTARIO 19713.24
6973 ONTARIO 336392.25
7104 BRITISH COLUMBIA 32233.31
7125 BRITISH COLUMBIA 11873.71
7167 BRITISH COLUMBIA 87488.70
7175 BRITISH COLUMBIA 14096.53
7194 BRITISH COLUMBIA 6327.60
7238 ALBERTA 1958.75
7247 ALBERTA 6231.31
7269 ALBERTA 451.56
7296 ALBERTA 184410.04
7317 SASKATCHEWAN 43491.55
8142 ONTARIO 429871.74
8161 ONTARIO 6479.71
9604 ONTARIO 20823.49
9609 ONTARIO 148.02
9802 ALBERTA 54101.00
9807 ALBERTA 543703.84
I was able to get there by using the following:
import pandas as pd
df = pd.read_csv('/path/to/sales.csv')
store_sales = df.groupby(['store_num', 'province']).agg({'sales': 'sum'})
I think 3) is probably pretty simple but for 1) I'm not sure how to apply an average to subsets of specific rows (I imagine it involves using 'groupby') and for 2) although I was able to generate a list of the best-selling stores, I'm uncertain as to how I could display a single top store for each province (although something tells me it should be simpler than it seems).
For (1), you just need to pass the column name to groupby:
df.groupby("province).mean()
For (2), you just need to apply a different function to groupby:
df.groupby("province).max()
For (3), the difference can be easily calculated by subtracting (1) and (2):
df.groupby("province").max() - df.groupby("province").mean()

Select top three records grouping by two factors

I am trying to identify the three records with the highest values grouped by two factors. I realize this question is similar to this one PostgreSQL: select top three in each group, but I can't figure out how to generalize from this example which includes a single factor, to two factors. I have tried searching stack overflow for an answer to this question beyond the one listed above and I can't find one, but perhaps I'm not searching for the correct terms.
Briefly, I'm connecting to a table with the following schema
city, country, value
I only have a single row per city, country combination, but I have a variable, but the number of city entries I have per country is variable. For example, I have a few dozen cities for Canada, a hundred for the United States, but only two for Uzbekistan.
What I want, as output is a table with the same schema, but only countaining the rows containing the highest three values for city, nested within country. For example, if Canada has the cities and values of
{Canada, toronto, 100}, {Canada, vancouver, 80},
{Canada, montreal,112}, {Canada, calgary, 109},
{Canada, edmonton, 76}, {Canada, winnipeg, 73},
and the United States has the entries of
{{us, nyc, 104}, {us, chicago, 87},
{us, boston, 98}, {us, seattle, 105},
{us, sanfran, 88}, {us, minneapolis, 84},
{us, miami, 103}, {us, houston, 112},
{us, dallas, 78}, {us, tucson, 83}}
and Uzbekistan has the entries of
{uzbekistan, qarshi, 95}, {uzbeckistan, gluiston, 101}
What I would like as output would be
Canada, Montreal, 112
Canada, Toronto, 100
Canada, Calgary, 109
us, houston, 112
us, seattle, 105
us, nyc, 103,
uzbeckistan, qarshi, 95
uzbeckistan, gluiston 101
I've tried the following query
SELECT logincity, logincountry, VAL
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY logincountry, logincity ORDER BY
val DESC) AS Row_ID
FROM a_table)
WHERE Row_ID < 4
ORDER BY logincity
But I end up with more than three cities per country.
Can someone help me out?
Thanks Stack Overflow!
I think you only need partition by logincountry
SELECT logincity, logincountry, VAL
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY logincountry
ORDER BY val DESC) AS Row_ID
FROM a_table ) T
WHERE Row_ID < 4
ORDER BY logincity
TIP: You probably will realize the problem if you include the Row_id on the SELECT
SELECT logincity, logincountry, VAL, Row_ID
On your query all Row_ID = 1
TIP 2: Your query want top 3 cities for each country, so you only have one partition country. So the linked question is the right answer, top 3 of each group in this case country.

Joining multiple fields between the same tables

I have a table called 'Resources' that looks like this:
Country City Street Headcount
UK Halifax High Street 20
United Kingdom Oxford High Street 30
Canada Halifax North St 40
Because of the nature of the location fields, I need to map them to a single 'Address' field, and so I also have the following table called 'Addresses':
Country City Street Address
UK Halifax High Street High Street, Halifax, UK
Canada Halifax North St North Street, Halifax, Canada
United Kingdom Oxford High Street High Street, Oxford, UK
(In reality the Address field does add information rather than just combining what is already there.)
I am currently using the following SQL to produce the query:
SELECT Resources.Country, Resources.City, Resources.Street, Addresses.Address,
Resources.Headcount
FROM Resources
INNER JOIN Addresses ON Resources.Country = Addresses.Country
AND Resources.City = Addresses.City
AND Resources.Street = Addresses.Street
This works for me, but I am worried that I have not seen people use this many ANDs in a single join elsewhere, so don't know if it is a bad idea. (This is simplified version - I may need up to 8 ANDs in a single join in another case) Is this the best way to approach the problem, or is there a better solution?
Thanks
Joining on multiple columns is fine. You don't have to "fear" this.
As far as "a better way". I would suggest creating some variable tables, putting some data in them, and posting that TSQL (DDL and DML) here. Then you can get some possible alternatives. Your question is vague at the present (in regards to the "is there a better way" portion of your question)