How to build recommendation system for e-commerce - e-commerce

I would like to build recommendation system for e-commerce. I'd like to suggest customer products based on products, which he has watched. Products watched by this customer would be compare with products watched be other customers and in result I would give customer best matches. More over, I would like also boost products, which customers has bought, while they have been watching products similar to products watching by current customer.
I’ve tried to use Amazon Machine Learning, but it’s not the system designed for that kind of problem.
Also I’ve tried to build it on Spark Machine Learning, but most of examples, describe cases based on film rating. I do not really know how to rewrite it on my case.
So I’d like to ask you if you know any recommendation systems which I can use to do that or maybe you know how to use Spark ML engine to build this kind of recommedation system?

Related

Choosing a chat-bot framework for data science research project and understanding the hidden costs of the development and rollout?

The question is about using a chat-bot framework in a research study, where one would like to measure the improvement of a rule-based decision process over time.
For example, we would like to understand how to improve the process of medical condition identification (and treatment) using the minimal set of guided questions and patient interaction.
Medical condition can be formulated into a work-flow rules by doctors; possible technical approach for such study would be developing an app or web site that can be accessed by patients, where they can ask free text questions that a predefined rule-based chat-bot will address. During the study there will be a doctor monitoring the collected data and improving the rules and the possible responses (and also provide new responses when the workflow has reached a dead-end), we do plan to collect the conversations and apply machine learning to generate improved work-flow tree (and questions) over time, however the plan is to do any data analysis and processing offline, there is no intention of building a full product.
This is a low budget academy study, and the PHD student has good development skills and data science knowledge (python) and will be accompanied by a fellow student that will work on the engineering side. One of the conversational-AI options recommended for data scientists was RASA.
I invested the last few days reading and playing with several chat-bots solutions: RASA, Botpress, also looked at Dialogflow and read tons of comparison material which makes it more challenging.
From the sources on the internet it seems that RASA might be a better fit for data science projects, however it would be great to get a sense of the real learning curve and how fast one can expect to have a working bot, and the especially one that has to continuously update the rules.
Few things to clarify, We do have data to generate the questions and in touch with doctors to improve the quality, it seems that we need a way to introduce participants with multiple choices and provide answers (not just free text), being in the research side there is also no need to align with any specific big provider (i.e. Google, Amazon or Microsoft) unless it has a benefit, the important consideration are time, money and felxability, we would like to have a working approach in few weeks (and continuously improve it) the whole experiment will run for no more than 3-4 months. We do need to be able to extract all the data. We are not sure about which channel is best for such study WhatsApp? Website? Other? and what are the involved complexities?
Any thoughts about the challenges and considerations about dealing with chat-bots would be valuable.

Dispatch Planner Optimization Problem using Google OR-Tools for Open VRP

I have been assigned a project related to VRP in which I have a fleet of trucks scattered in different locations that need to pick up and deliver goods from and to different locations.
Google's OR tools works on using depot for optimization, but in my case there is no starting or ending depot and I need to create a planner that can find the optimised routes for all the vehicles with different constraints.
Im struggling to find a solution for such problem, please help for the same.

OptaPlanner Right Tool for Scheduling of Manufacturing Orders

Would you consider OptaPlanner to be the right tool for the planning of manufacturing operations with multiple level routings (final product, subassembly1, subassembly2, subassembly11, subassembly12, ...)?
We are talking about several 1000s of manufacturing orders with 10-20 operations each.
Looks like project shop scheduling, I know. I'm just concerned a about the amount of data and the ability to find an optimal solution in a reasonable amount of time...
Are there real world examples for this problem domain and OptaPlanner out there?
See the project job scheduling example. That's not our easiest or prettiest example, but it works and you can make it pretty.
For scaling, if it would end up as a problem (I doubt it for only 1k entities), there are plenty of power tweaking options (multithreaded solving, partitioned search, ...)

How to build a statistic model to determine if an update of web site content boosted sales, considering sales have natural growth

The data available includes ERP data for real order quantity and revenue, as well as adobe online analytics data for addin cart and online revenue.
It was asked to determine if an update of content will impact sales, so we have some proof to roll out similar update to all contents. However, the sales by nature will increase. How do we build a model to exclude natual increase sales and provide a statistical proof of increase/decrease by the update?
Thanks,
If I got this right, I come up with two possible solutions:
if the natural grownth is predictable, you should be able to clean it out by approximating. for instance if you have 2% monthly steady sales growth (this can easyly be extracted from the ERP), you can roughly substract them from the results of the updated site. the approach details greatly depend on how presice you wish the model to be
perform A/B site testing. in this case you'll get the real figures. this requires to involve your web team

Content-based reco system in neo4j for large dataset

I am trying to make a recommendation website of books. I have crawled some book sites and, and have around 15 million separate books in the DB, which is in neo4j.
Now for some genres, like mystery and thriller, there are about 1 million books at least. I have to make a top 20 list of recommendations. My current approach-
get the books
run a similarity comparison (vec-cosine or pearsons)
sort and display
are expensive and take time, not at all good for a realtime system. I thought keeping a sorted list per genre by linking neo4j to a traditional DB and getting the top ones from that db via neo4j. But that also is slow (takes a few 10s of seconds). Is there a simpler and more intuitive way to do this? Any ideas will help.
It would be good to know what other criteria you would like to base your recommendations on, e.g. how exactly you measure similarities between books. I'm assuming it's not purely genre based.
One approach we have been taking with these dense nodes (such as your genres, or cities people live in, etc.), is to find recommendations first based on some other criteria, then boost the relevance score of the recommendation if it is connected to the correct dense node. Such a query is much more performant.
For example, when recommending 20 people you should be friends with, I'd find 100 candidates based on all other criteria and then boost the scores of candidates living in the same location as the user we're recommending for. That's 100 single-hop traversals, which will be very quick.
Have a look at this recent webinar recording, you may find some inspiration in it.
Regarding similarity measures, these may need to be pre-computed, linking similar books together by SIMILAR_TO relationships. Such pre-computation might be done using the Runtime of GraphAware Framework, which only executes this background computation during quiet periods, thus not interfering with your regular transactional processing. Look at the NodeRank module, which computes PageRank in Neo4j during quiet periods.