How to programmatically search archive of CNN headlines? - api

Is screen scraping the best way to accomplish this?
I can't find a CNN API, but are there third-party APIs that allow you to access archives of CNN headlines, perhaps indirectly through past RSS feeds?
I only need to access the last 60 days worth of headlines.
Thanks!

CNN introduced developer API keys, but you have to apply for them. I was able to apply for one for a project. You are limited in the amount of queries you can do.
Here's the link

You can get the RSS feeds here: http://www.cnn.com/services/rss/ You may also want to look at using feedburner. For programmatically scraping the CNN RSS feeds, take a look at library like Python pattern.

Related

Google Document AI Labeling Task

Here everyone, I am fairly new google cloud console. I am trying to customize a google document ai model that will learn to extract different sections of document to various data. As you can on the image that it fails to train the model and I have been running the Labeling Task for several days now I have not seen progress. Can you please assist in telling what is the right way to customize google document ia modelenter image description here
I have tried to manually label the different sections of the document, it took me a while so I did around 20 test and training dataset which I think the model to not train then I decided to do the Labeling Task as an alternative to manually labeling the dataset.
Here is the information about Labeling Tasks for Document AI
https://cloud.google.com/document-ai/docs/workbench/label-documents#labeling-tasks
Labeling Tasks use Human-in-the-Loop to have human labelers label documents for training data or production review. You can either set up your own labeling team or apply for access to the Google-Managed Workforce.
However, it doesn't seem like this is the correct course of action for what you are trying to do, since the labeling has already been completed.
Could you provide more clarification on what you are trying to accomplish with the Custom Document Extractor?
Note: Some of the error messages that are output from Document AI Workbench are not very descriptive (e.g. Internal Error Occurred) but the product development team is working to surface more helpful errors when possible.

Need advice on tracking objects in hand using TensorFlow

I am trying to develop a model which detects the items that are picked up by a user from a basket. Is this achievable using Tensorflow? My doubt is since the basket would contain the same items the user picks up (say fruits), is it possible to report the product that in the user's hand(the products that are picked up by the user) in real-time, rather than the items in the basket? Please advice on what would be a good starting point to achieve this.
I have read and watched general object detection methods using Tensor Flow and various models but nothing seems to deal with a similar solution or I am unable to co-relate. If there are any tutorials on achieving, the links for the same would be even more helpful. Thanks in advance. Please bear with me if my question is naive, I am still a newbie at ML and Tensorflow.
It is achievable using Tensorflow. First consider how you would approach the problem. Since you need to distinguish between objects inside basket you will need ANN with some basic classification and object detection.
Here are some links to get you started:
https://www.tensorflow.org/tutorials/keras/basic_classification
https://www.edureka.co/blog/tensorflow-object-detection-tutorial/
Good luck!

What is needed for a recommendation engine based on word/text input

I'm new to the Machine-Learning (AI) technology. I'm developing a messenger app for Android/IOs where I would like to recommend the users based on the texts/word/conversation a product from a relative small product portfolio.
Example 1:
In case the user of the messenger writes a sentence including the words "vine", "dinner", "date" the AI should recommend a bottle of vine to the user.
Example 2:
In case the user of the app writes that he has drunk a good coffee this morning, the AI should recommend a mug to the user.
Example 3:
In case the user writes something about a cute boy she met last day, the AI should recommend a "teddy bear" to the user.
I'm a Software Developer since almost 20 year with experience in the development of C/C++/Java based application (Android and IOs apps) as well as some experience in Google Cloud Platform. The ML/AI technology is completely new to me. Okay, I know the basics (input data is needed to train the ML/AI system etc.), but I wonder If there is already a framework which could help me to develop such a system which solves the above described uses-case.
I would appreciate it, if you could give me some hints where and how to start.
Thank you and regards
It is definitely possible to implement such an application, in case you want to do it in Google Cloud you will need some understanding of Tensorflow.
First of all, I recommend to you to do the Machine Learning Crash Course, for a good introduction to Machine Learning and to start to familiarize yourself with TensorFlow. Afterwards I recommend to take a look into Tensorflow tutorials which will give you a more practical introduction to Tensorflow, and include various examples on building/training/testing models.
Once you are famirialized with Tensorflow, you can jump into learning how to run jobs in the Machine Learning engine, you can start by following the quickstart. The documentation includes detailed guides on how to use the ml-engine, plus multiple samples and tutorials.
Since I believe that your application would fall into the Recommender System type, here you can see an example model, in Google Cloud ML Engine, on how to recommend items to users based on his previous searches. In your case, you would have to build a model in order to recommend items to users based on his previous words in the sentence.
The second option, in case you don't want to go through the hassle of building a new model from scratch, would be to use the Google Cloud Natural Language API, which you can understand as pre-trained models using Google (incredibly big) data. In your case, I believe that the Content Classifying API would help you achieve what your application intends to do, however, the outputs (which you can see here) are limited to what the model was trained to do, and might not be specific enough for your application, however it is an easy solution and you can still profit of this API in order to extract labels/information and send it as input to another model.
I hope that these links provide you with some foundations on what is possible to do with Tensorflow in the ML Engine, and are useful to you.

What Tensorflow API to use for Seq2Seq

This year Google produced 5 different packages for seq2seq:
seq2seq (claimed to be general purpose but
inactive)
nmt (active but supposed to be just
about NMT probably)
legacy_seq2seq
(clearly legacy)
contrib/seq2seq
(not complete probably)
tensor2tensor (similar purpose, also
active development)
Which package is actually worth to use for the implementation? It seems they are all different approaches but none of them stable enough.
I've had too a headache about some issue, which framework to choose? I want to implement OCR using Encoder-Decoder with attention. I've been trying to implement it using legacy_seq2seq (it was main library that time), but it was hard to understand all that process, for sure it should not be used any more.
https://github.com/google/seq2seq: for me it looks like trying to making a command line training script with not writing own code. If you want to learn Translation model, this should work but in other case it may not (like for my OCR), because there is not enough of documentation and too little number of users
https://github.com/tensorflow/tensor2tensor: this is very similar to above implementation but it is maintained and you can add more of own code for ex. reading own dataset. The basic usage is again Translation. But it also enable such task like Image Caption, which is nice. So if you want to try ready to use library and your problem is txt->txt or image->txt then you could try this. It should also work for OCR. I'm just not sure it there is enough documentation for each case (like using CNN at feature extractor)
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/seq2seq: apart from above, this is just pure library, which can be useful when you want to create a seq2seq by yourself using TF. It have a function to add Attention, Sequence Loss etc. In my case I chose that option as then I have much more freedom of choosing the each step of framework. I can choose CNN architecture, RNN cell type, Bi or Uni RNN, type of decoder etc. But then you will need to spend some time to get familiar with all the idea behind it.
https://github.com/tensorflow/nmt : another translation framework, based on tf.contrib.seq2seq library
From my perspective you have two option:
If you want to check the idea very fast and be sure that you are using very efficient code, use tensor2tensor library. It should help you to get early results or even very good final model.
If you want to make a research, not being sure how exactly the pipeline should look like or want to learn about idea of seq2seq, use library from tf.contrib.seq2seq.

Number of results google (or other) search programmatically

I am making a little personal project.
Ideally I would like to be able to make programmatically a google search and have the count of results. (My goal is to compare the results count between a lot (100000+) of different phrases).
Is there a free way to make a web search and compare the popularity of different texts, by using Google Bing or whatever (the source is not really important).
I tried Google but seems that freely I can do only 10 requests per day.
Bing is more permissive (5000 free requests per month).
Is there other tools or way to have a count of number of results for a particular sentence freely ?
Thanks in advance.
There are several things you're going to need if you're seeking to create a simple search engine.
First of all you should read and understand where the field of information retrieval started with G. Salton's paper or at least read the wiki page on the vector space model. It will require you learning at least some undergraduate linear algebra. I suggest Gilbert Strang's MIT video lectures for this.
You can then move to the Brin/Page Pagerank paper which outlays the original concept behind the hyperlink matrix and quickly calculating eigenvectors for ranking or read the wiki page.
You may also be interested in looking at the code for Apache Lucene
To get into contemporary search algorithm techniques you need calculus and regression analysis to learn machine learning and deep learning as the current google search has moved away from Pagerank and utilizes these. This is partially due to how link farming enabled people to artificially engineer search results and the huge amount of meta data that modern browsers and web servers allow to be collected.
EDIT:
For the webcrawler only portion I'd recommend WebSPHINX. I used this in my senior research in college in conjunction with Lucene.