How to store data from Google Ngram API? - api

I need to store the data presented in the graphs on the Google Ngram website. For example, I want to store the occurences of "it's" as a percentage from 1800-2008, as presented in the following link: https://books.google.com/ngrams/graph?content=it%27s&year_start=1800&year_end=2008&corpus=0&smoothing=3&share=&direct_url=t1%3B%2Cit%27s%3B%2Cc0.
The data I want is the data you're able to scroll over on the graph. How can I extract this for about 140 different terms (e.g. "it's", "they're", "she's", etc.)?

econpy wrote a nice little module in Python that you can use through a command-line interface.
For your "it's" example, you would need to type this command in a terminal / windows console:
python getngrams.py it's -startYear=1800 -endYear=2008 -corpus=eng_2009 -smoothing=3
This will automatically save the query result in a CSV file named after your query parameters.

econpy's package, in #HugoMailhot's answer, no longer works (2021) and seems not maintained.
Here's a updated version, with some improvements for easier integration into Python code:
https://gitlab.com/cpbl/google-ngrams
You can call this from the command line (as in econpy's) to create a CSV file, e.g.
getngrams.py it's -startYear=1800 -endYear=2008 -corpus=eng_2009 -smoothing=3
or call it from python to get (and plot) data directly in python, e.g.:
from getngrams import ngrams
df = ngrams('bells and whistles -startYear=1900 -endYear=2018 -smoothing=2')
df.plot()
The xkcd functionality is still there too.
(Issues / bug fix pull requests /etc welcome there)

Related

#BFxForward() in the Bloomberg python api

I've used https://github.com/691175002/BLPInterface as a wrapper to the terribly-documented (and non-supported by Bloomberg Help) Bloomberg python API. I use it to pull price histories, etc.
Lately I've needed to pull specific FX date values. In excel I do that as =#BFxForward("usdjpy",J10, "BidOutright") where J10 is a date.
I would like to pull this information via the Bloomberg Python API (or even better, with the BLPInterace wrapper) but it's not clear how to do it. I've seen someone ask a similar question for a .Net implementation, but the only answer cited page 207 of a developers guide. Every developer guide I can find on bloomberg is well less than 200 pages, and none of it mentions pulling fx values.
Wondering if anyone can point me at some examples or resources to build on to get this ?
It does take some finding, to be sure, but I tracked it down via the Bloomi Terminal. The way I found the information is as follows (for future reference):
Type DAPI in the Bloomberg Terminal
Choose 'Additional Resources' in the left hand panel
Choose 'Help Page for DAPI' in the right hand panel, and a window pops up
Choose 'Constructing Formulas' in the left hand panel
Choose 'FX Broken Dates Forwards Syntax' in the right hand panel
Or paste this link into Bloomi:
{LPHP DAPI:0:1 2277846 }
There are a lot of different examples and options (FX fwds are not my area of expertise), but simply using this format for the ticker seems to work:
ccy1/ccy2 mm/dd/yy Curncy
and then the field PX_BID. You can try this in a BDP call in Excel, for example:
=BDP("EUR/GBP 08/08/22 Curncy","PX_BID")
When it comes to Python, perhaps try using the xbbg python package (other wrappers are available): it does a good job of hiding all the intricacies of the low-level API.
Here's a code sample using xbbg, that pulls back the forward fx rate in the example:
from xbbg import blp
from datetime import datetime
ccy1 = 'EUR'
ccy2 = 'GBP'
fwdDate = datetime(2022,8,8)
ticker = '{0:}/{1:} {2:} Curncy'.format(ccy1,ccy2,fwdDate.strftime('%m/%d/%y'))
df = blp.bdp(ticker,'PX_BID')
print(df)
Output:
px_bid
EUR/GBP 08/08/22 Curncy 0.85344
EDIT: Looking at the OP's choice of Bloomi wrapper, the xbbg call could possibly be replaced by:
blp.referenceRequest(ticker, 'PX_BID')

How can I run Neural Machine Translation with Attention in Google Colab with a different paired language?

I want to use a different language pair at the example provided in TernsorFlow website, Google Colab notebook only picks spanish-english
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/r2/tutorials/text/nmt_with_attention.ipynb
I tried changing the link to the esp-eng data that download's from it, but that didn't help
How can I try a different language set, without locally setting-up colab, it did mention at the end on that page, that I can try a different language set.
The final note on using a different dataset refers to this website which includes tab-delimited files.
You mainly need to change the values in this cell according to the link to the zip file you need.
# Download the file
path_to_zip = tf.keras.utils.get_file(
'spa-eng.zip', origin='http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip',
extract=True)
path_to_file = os.path.dirname(path_to_zip)+"/spa-eng/spa.txt"
You can try other datasets from:
OPUS
WMT
However, in these corpora, the source and target are in two separate files, so you have to adjust the code that extracts pairs, instead of split('\t') it should open two files and get the source and target line by line.

How to adapt tf.contrib.data.TextLineDataset for text from other sources?

For example, if my text data come from a database, how can I get one line/doc(as a database record) using the same mechanism (subclassing Dataset such that the pipeline described here still works) as TextLineDataset ?
By looking at the source code of TextLineDataset, I find that make_dataset_resource() seems an import method to be implemented. But I can't find where the actual code of yielding a line from a file as the docstring of TextLineDataset says: A Dataset comprising lines from one or more text files.

Automatic test data generation

I need to prepare sample test data with 5 million rows of Different employees ie;
It should contain relevant information like -
First Name
Last Name
Address-1
Address-2
Zip code
st
county
country
...etc
Is there any tool that I can use to test it?
I have found the site http://www.generatedata.com/ to be good for this kind of thing - it has a bunch of different formats you can generate data in and outputs in a number of different formats that can be either read in by your code (e.g. from CSV) or easily translated into code using your favorite Unix text manipulation tools.
Either try a webservice, like:
http://www.generatedata.com/
http://www.mockaroo.com/
or try one of the following utils for fake data generation:
PHP "Faker" - https://github.com/fzaninotto/Faker
Perl's Data::Faker - http://metacpan.org/pod/Data::Faker
ruby "faker" - http://faker.rubyforge.org/
http://paulthedutchman.nl/datagenerator/
I would like to suggest a modern PHP fake data generator, with also the ability to fake an entity.
Fakerino: https://github.com/niklongstone/Fakerino

TYPO3 Version 6.0 import from CSV or DB query

I have a CSV file which contains these columns - Timestamp, Author, Title and Content.
Now I would like to import this CSV into TYPO3, so that I can display a list of posts containing these attributes.
If the above is not possible, is there a way to write in manual SQL queries, so that I can manually insert content into TYPO3 ?
I have tried many extensions for importing CSV- wil_import, rs_impory, external import .. but none of them work !!
In the following image, I have installed wil_import, but It does not show up anything.
Do I need to make any changes anywhere else, like configuration or something?
You could use phpmyadmin's CSV import functionality. It works reliably.
I've had same problem once and my day was saved thanks to Francois Suter's (Core team member) extensions: svconnector and svconnector_csv. So, I can really recommend them