How to fix 'relation "us_lex" does not exist' using standardize_address? - sql

I'm trying to parse a column of ~175,000 street names. Some of them are just one word (Jamaicaway), and some are multiple words (St. Edwards Pl). I want just the street body (Jamaicaway and St. Edwards, respectively).
I found the address_standardizer package and installed it, but when I run the example below I get the error relation "us_lex" does not exist.
SELECT house_num, name, suftype, city, country, state, unit
FROM standardize_address('us_lex', 'us_gaz', 'us_rules', 'One
Devonshire Place, PH 301, Boston, MA 02109');
I'd expect to get back just "Devonshire," but I'm getting the error instead. There doesn't seem to be much about this on the package page. Any insight?

You will also have to run:
CREATE EXTENSION address_standardizer_data_us;
Which creates the us_lex and other data tables

Related

How to display only searched string in column in postgresql

I want to only display searched string from a table, as example this is my table:
Table name: guidelines
id content
1 An individual is accused “of” a crime, not “with” or “for” a crime. Accused, often as “the accused”, refers to the individual or individuals standing trial. EXAMPLES: The prosecutor accused the politician of bribery. The accused politician stood trial for bribery. See alleged, charged, suspected.
2 There were a lot of people getting accused on this particular town.
If I use search query to search for "accused", it will show the full result:
SELECT content FROM "guidelines" WHERE "content" 'ILIKE' '%accused%';
Result:
content
An individual is accused “of” a crime, not “with” or “for” a crime. Accused, often as “the accused”, refers to the individual or individuals standing trial. EXAMPLES: The prosecutor accused the politician of bribery. The accused politician stood trial for bribery. See alleged, charged, suspected.
There were a lot of people getting accused on this particular town.
How can I only get the first matching string and followed by the data on the column, as example this is my goal:
content
Accused, often as “the accused”, refers to th...
accused on this particular to...
update: I updated the table and column name to make it better to differentiate table and column
In Postgresql, you can do that by using position function and substring function. see the following query as an example:
SELECT
id,
substring(content, position ('accused' in content)) as matched
FROM
guidelines
WHERE
content LIKE '%accused%'
Try this :
SELECT substring(content from '%#"accused%#"%' for '#') from guidelines;
each # is the place holder defined in the last part for '#' and need and aditional "
So you have % and function will return what is found inside both placeholder. In this case is % or the rest of the string after accused

Understanding the "Not found: Dataset ### was not found in location US" error

I know this topic has come up many times but still here I am. Data processing location seems consistent (dataset, US; query: US) and I am using backticks & long format in the FROM clause
Below are two sequences of code. The first one works perfectly:
SELECT station_id
FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
Whereas the following returns an error message:
SELECT bikeshare_stations.station_id
FROM `bigquery-public-data.austin_bikeshare`
Not found: Dataset glassy-droplet-347618:bigquery-public-data was not found in location US
My question, thus, is why do the first lines of text work while the second doesn't?
You need to understand the different parts of the backticks:
bigquery-public-data is the name of the project;
austin_bikeshare is the name of the schema (aka dataset in BQ); and
bikeshare_stations is the name of the table/view.
Therefore, the shorter format you are looking for is: austin_bikeshare.bikeshare_stations (instead of bigquery-public-data.austin_bikeshare).
Using bigquery-public-data.austin_bikeshare means that you have a schema called bigquery-public-data that contains a table called austin_bikeshare , when this is not true.

(Neo4j) Carry variable over subsequent queries

I am trying to carry over a variable through 2 subsequent queries. It seems like WITH only helps carry over the variable to the next query, but not any before that. Suggestions?
This is example of what I am trying to do:
Person nodes contain information on publishers, writers and editors (e.g. name, gender, etc.)
Story nodes contain data on Story (e.g. title, publish date, etc.)
IN relationships have categories: created, edited, published.
Return editor-publishers who have edited stories published by another editor-publisher:
assume no duplicate Person names
Find all Persons who have edited at least one story who have also published at least one story
Find list of stories published by these editor-publishers in 1
In all editors of stories in 2, return sublist of these editors also in 1
MATCH (EditorPublisher:Person)-[:IN{category: "published"}]->(:Story) // 1
WHERE (EditorPublisher:Person)-[:IN{category: "edited"}]->(:Story)
WITH COLLECT(EditorPublisher.name) as EditorPublisher_list
MATCH (EditorPublisher_stories:Story)<-[:IN{category: "published"}]-(publisher:Person) // 2
WHERE publisher.name in EditorPublisher_list
WITH EditorPublisher_list // throws error EditorPublisher_list variable not found
WITH COLLECT(EditorPublisher_stories.title) as EditorPublisher_stories_list
MATCH (epe:Person)-[contribution:PLAYED]-(eps:Movie) // 3
WHERE epe.name in EditorPublisher_list
AND eps IN EditorPublisher_stories_list
RETURN epe.name
NVM I got it to work. With does keep the variables if i don't rename them.
I just had to do WITH return.nodes, and call the return.nodes in subsequent queries instead of using in [return.nodes.list]

Matching an element in a column, to others in the same column

I have columns taken from excel as a dataframe, the columns are as follows:
HolidayTourProvider|Packages|Meals|Accommodation|LocalTravelVehicle|Cancellationfee
Holiday Tour Provider has a couple of company names
Packages, the features provided in each package are mostly the same like
Meals,Accommodation etc... even though one company may call it "Saver", others may call it "Budget". (each of column mostly follow Yes/No, except Local travel vehicle are again car names like Ford Taurus,jeep cherokee etc..
Cancellation amount is integers)
I need to write a function like
match(HolidayTP,Package)
where the user can give input like
match(AdventureLife, Luxury)
then I need to return all the packages that have similar features with Luxury by other Holiday Tour Providers, no matter what name they give the package like 'Semi Lux', 'Comfort' etc...
I want to give a counter for every match and display all the packages that exceed the counter by 3 or 4.
This is my first python code. I am stuck here.
fb is the total df I exported to
def mapHol(HTP, PACKAGE):
mfb = (fb['HTP']== HTP)&(fb['package']== package)
B = fb[mfb]
for i in fb[i]:
for j in B[j]:
if fb[i]==B[j]:
count+=1
I dont know how to proceed, please help me this is my first major project, I started on my own.

VK API - return city names in latin

I am using VK API to get list of cities in specific country. Does anyone know how to show Russian cities (which are in Cyrilic) in Latin?
Example of JSON response:
http://api.vk.com/method/places.getCities?lang=en&country_id=1&count=1000&need_all=1
I am trying to check if city exists, but if someone enter city name in latin, in some cases city check works, for example Vladivostok is Владивосток, but Moscow is Москва.
I found one solution that works for me: firstly get all cities ids by places.getCities (database.getCities) method and then use database.getCitiesById providing saved cities ids like following:
database.getCitiesById?lang=en&city_ids=1,2,123
In request to this API method you can specify needed language by parameter "lang" (e.g. lang=en) and up to 1k comma-separated cities ids (e.g. city_ids=1,2,123,...).
database.getCitiesById official documentation
As an alternative, you can get the static list of Russian cities here. This JSON array contains the slug name of each city in English. It's a free list, so anyone can edit and add correct information for any localities.