Training IBM Watson Discovery Service for FAQ customer supporting - training-data

I'm trying to build FAQs customer support agent with Watson Discovery Service and uploading FAQs documents to this service, but when I query it shows full documents content as a result.
There is answered that Question and Answer pairs should be uploaded as a separate documents, but there are many Question and Answer pairs in FAQ document.
So is it possible to train Watson Discovery Service and improve results for supporting FAQ format?

If you want Discovery to split your FAQ into Question/Answer pairs automatically on ingestion, look into the Segmentation feature: https://console.bluemix.net/docs/services/discovery/building.html#performing-segmentation
You could also try using Passage Retrieval: https://console.bluemix.net/docs/services/discovery/query-parameters.html#passages
(I am an IBM Employee)

Related

Tables recognition using Google Vision API

I use OCR function (DOCUMENT_TEXT_DETECTION) of Google Vision API to process different medical documents. There are some tables in them. According to Google's documentation there is a special BlockType for tables (https://cloud.google.com/vision/docs/reference/rpc/google.cloud.vision.v1#blocktype), however, I can not get this sort of block in a response from Google even in a case when existence of a table is obvious. Does anybody know the reason of such behavior? Do I need to use special options or methods to detect tables using Google Vision API? An example of a table I tried to process:
I found out your question about tables in Google Vision API in Google Forum. The short answer: tables (as blockType) aren't supported now (10/21/2021) but there is a feature request with minor priority: Google Vision API Issue Tracker
I would recommend you to use Document AI: Document AI. I checked and it returned meta info about tables.

How to build Google Analytics 'collect' like api using Google Cloud services

I'm trying to build a data collection web endpoint.The Use case is similar to Google Analytics collect API. I want to add this endpoint(GET method) to all pages on the website and on-page load collect page info through this API.
Actually I'm thinking of doing this by using Google Cloud services like Endpoints, BQ(for storing the data).. I don't want to host it in any dedicated servers. Otherwise, I will be end up doing a lot for managing/monitoring the service.
Please suggest me how do I achieve this with Google Cloud Service? OR direct me to right direction if my idea is wrong
I suggest focussing on deciding where you want to code to run. There are several GCP options that don't require dedicated servers:
Google App Engine
Cloud Functions/Firebase Functions
Cloud Run (new!)
Look here to see which support Cloud Endpoints.
All of these products can support running code that takes the data from the request and sends it to the BigQuery API.
There are various ways of achieving what you want. David's answer is absolutely valid, but I would like to introduce Stackdriver Custom Metrics to the discussion.
Custom metrics are similar to regular Stackdriver Monitoring metrics, but you create your own time series (Stackdriver lingo described here) to keep track of whatever you want and clients can sent in their data through an API.
You could achieve the same thing with a compute solution (Google Cloud Functions for example) and a database (Google BigTable for example) and writing your own logic.. but Custom Metrics is an already built solution that includes dashboards and alerting policies while being a more managed solution.

intelligent web crawler using machine learning

I am building e commerce site
Problem Statement:
I want to crawl web pages to get product name, images and product specifications/features and store it in my database
Input to machine learning algorithm:
Web page with html content
Output expected from machine learning algo:
It should automatically detect whether it's product details page or not
If it's product details page then it should recognize product category
Then it should parse product name, specifications
Question
Which algorithm would be suitable for this problem statement?
Can anyone suggest proper approach to follow?
I'm not an expert in Machine Learning/Naturel Language Processing but my gut feeling says it is very difficult to fully implement this as an ML product.
So first look at whether your targeted eCommercise sites provides some kind of API to extract data. If such APIs are available use those and that will be rally easy than using ML.

How many times can tag an image for Alchemy API?

I would like to ask you about Alchemy API on Bluemix. Using Achemy API, how many times can you tag an image?
Is it only twice times?
As you can read here, on March IBM announced the acquisition of AlchemyAPI and it integrated AlchemyAPI’s deep learning technology with the Watson platform exposing these capabilities to the developer community through IBM Bluemix.
For this reason AlchemyAPI on Bluemix has basically the same features that it had before. That means that you have a certain number of REST API calls included in your plan.
Bluemix provides the following plans:
IBM AlchemyAPI Free Plan: with 1.000 API Events per day per Bluemix Organization
IBM AlchemyAPI Standard: pay per use plan in which you will be charged per API Event.
Among the REST APIs you need the AlchemyVision ones to tag images, and you can use them all the times you need. All you need is an API key, that is obtained after choosing a plan.

Book Database Data

I am creating a database which will contain information about book. (title, author, description, edition etc). Is there anyway i can download book data from the web so that i can insert in my database. I want database to have between 500 - 1000 books information. The database is in Sql Server.
Most well known book content API's are from Amazon.com and more recent from Google (Amazon's API can often seen been republished on the web):
Amazon Content API program
Google Book Search API
To see what is allowed with the Amazon API please see:
Amazon Content API agreement
Some excelent info on other providers can be found here:
Code4lib: Using Book Data Providers to Improve Services to Patrons
You could source data from websites like Amazon.com - they have APIs for this. But, you may not be able to redistribute the data without permission from Amazon. There's nothing stopping you from getting public domain data from a government organisation like the Library of Congress - they probably have it somewhere.
If you are just looking for sample data you can use the pubs db that comes with ms sql server
The Internet Archive offers Open Library bulk exports in a variety of formats (MARC, primarily). If you are looking for records for specific books, you might look at LibraryThing's API.