Sample data for incident triage / incident assignment - training-data

I am looking for sample data for incident triage/ incident assignment to train my model. There are reference to opensource firefox and eclipse bug dumps in the articles published in the internet.
Eg. https://is.muni.cz/th/374278/fi_m/thesis.pdf
But I could n't find any links pertaining to that on googling.
I got hold of a dump available in kaggle website.
https://www.kaggle.com/monika11/bug-triagingbug-assignment/version/1/data
But this doesn't have resolution for each incident. If one any has any pointers in this regard. kindly share it.
-- Subramanian S.

Related

Google BigQuery Cymbal Group data sets

This is not so much a programming question as it is a public data set question. Please let me know if there is a more appropriate venue to ask this.
I have been trying to find out more information about this data set:
https://console.cloud.google.com/marketplace/product/cymbal/cymbal
About Cymbal: Google Cloud's demo brand
Cymbal Group
Synthetic datasets across industries showcasing Google Cloud.
​ I can not see it when I use the Explorer to browse
bigquery-public-data. I can see the cymbal_investments dataset but not
the one described above.
I am especially interested in the Retail Subsidiaries data such as:
Cymbal Superstore — An American superstore and grocer with a
multinational presence.
Cymbal Shops — An American retail chain selling homewares,
electronics, and clothing.
Cymbal Direct — An online direct-to-consumer Chicago-Based footwear
and apparel retailer.
Please let me know if you can point me to the right dataset.
Thanks for any suggestions.

What is needed for a recommendation engine based on word/text input

I'm new to the Machine-Learning (AI) technology. I'm developing a messenger app for Android/IOs where I would like to recommend the users based on the texts/word/conversation a product from a relative small product portfolio.
Example 1:
In case the user of the messenger writes a sentence including the words "vine", "dinner", "date" the AI should recommend a bottle of vine to the user.
Example 2:
In case the user of the app writes that he has drunk a good coffee this morning, the AI should recommend a mug to the user.
Example 3:
In case the user writes something about a cute boy she met last day, the AI should recommend a "teddy bear" to the user.
I'm a Software Developer since almost 20 year with experience in the development of C/C++/Java based application (Android and IOs apps) as well as some experience in Google Cloud Platform. The ML/AI technology is completely new to me. Okay, I know the basics (input data is needed to train the ML/AI system etc.), but I wonder If there is already a framework which could help me to develop such a system which solves the above described uses-case.
I would appreciate it, if you could give me some hints where and how to start.
Thank you and regards
It is definitely possible to implement such an application, in case you want to do it in Google Cloud you will need some understanding of Tensorflow.
First of all, I recommend to you to do the Machine Learning Crash Course, for a good introduction to Machine Learning and to start to familiarize yourself with TensorFlow. Afterwards I recommend to take a look into Tensorflow tutorials which will give you a more practical introduction to Tensorflow, and include various examples on building/training/testing models.
Once you are famirialized with Tensorflow, you can jump into learning how to run jobs in the Machine Learning engine, you can start by following the quickstart. The documentation includes detailed guides on how to use the ml-engine, plus multiple samples and tutorials.
Since I believe that your application would fall into the Recommender System type, here you can see an example model, in Google Cloud ML Engine, on how to recommend items to users based on his previous searches. In your case, you would have to build a model in order to recommend items to users based on his previous words in the sentence.
The second option, in case you don't want to go through the hassle of building a new model from scratch, would be to use the Google Cloud Natural Language API, which you can understand as pre-trained models using Google (incredibly big) data. In your case, I believe that the Content Classifying API would help you achieve what your application intends to do, however, the outputs (which you can see here) are limited to what the model was trained to do, and might not be specific enough for your application, however it is an easy solution and you can still profit of this API in order to extract labels/information and send it as input to another model.
I hope that these links provide you with some foundations on what is possible to do with Tensorflow in the ML Engine, and are useful to you.

Any Call Center conversation log dataset?

I am analyzing several Sentiment Analysis algorithms to classify and prioritize call center calls.
I have been trying to look for this type of data on the web, but found nothing.
Ideally I would like to have several two-way conversations, preferably regarding baking or insurance industry.
The idea is to process this data in order to see if the customer is hangry, and needs a fast reply, or if he hasn't much urgency.
Any help is greatly appreciated.

Where can I find large sample of computer languages for Naive Bayesian Analysis

I am trying to analyse online code and want to use Bayesian Classification. However I need a fair amount of pre classified code as sample data.
Maybe the twenty or so top languages?
Does anyone know of such a corpus?
there was a data set on Kaggle with questions from StackOverflow where the objective was to guess the tags related to the question. That could require guessing the language of code samples (or just looking for keywords)
https://www.kaggle.com/c/facebook-recruiting-iii-keyword-extraction
Other possibilities searching through Github - since all that code is free and open.
StackOverflow itself shares its own data of all user contributed posts (anonymized)

Visualization data gathering for learning

I'm just starting to take an interest in visualization and I'd like to know where I can get my hands on some data, preferably real world, to see what queries and graphics I can draw from it. Its more of a personal exercise to create some pretty looking representations of that data.
After seeing this I wondered where the data came from and what else could be done from Wikipedia. Is there anyway I can obtain data from say, wikipedia?
Also, could anyone recommend any good books? I don't trust the user reviews on the amazon website :-)
You can download the raw Wikipedia data from http://download.wikimedia.org. There are many different views of the data available. The English Wikipedia is by far the largest database, and there isn't a current full dump available, but one is in progress. It will probably take months to finish and be available for download.
The most recent one was 18 GB compressed, which uncompressed to something like 2.5 TB.
A fantastic book is The Visual Display of Quantitative Information by Edward Tufte.