Guess if a user will make or not a conversion - tensorflow

My friends,
In the past couple of years I read a lot about AI with JS and some libraries like TensorFlow. I have great interest in the subject but never used it on a serious project. However, after struggling a lot with linear regression to solve an optimization problem I have, I think that finally I will get much better results, with greater performance, using AI. I work for 12 years with web development and lots of server side, but never worked with any AI library, so please, have a little patience with me if I say something stupid!
My problem is this: every user that visits our platform (website) we save the Hour, Day of the week, if the device requesting the page was a smartphone or computer... and such of the FIRST access the user made. If the user keeps visiting other pages, we dont care, we only save the data of the FIRST visit. And if the user anytime does something that we consider a conversion, we assign that conversion to the record of the first access that user made. So we have almost 3 millions of lines like this:
SESSION HOUR DAY_WEEK DEVICE CONVERSION
9847 7 MONDAY SMARTPHONE NO
2233 13 TUESDAY COMPUTER YES
5543 19 SUNDAY COMPUTER YES
3721 8 FRIDAY SMARTPHONE NO
1849 12 SUNDAY COMPUTER NO
6382 0 MONDAY SMARTPHONE YES
What I would like to do is this: next time a user visits our platform, we wanna know the probability of that user making a conversion. If a user access now, our website, depending on their device, day of week, hour... we wanna know the probability of that user making a future conversion. With that, we can show very specific messages to the user while he is using our platform and a different price model according to that probability.
CURRENTLY we are using a liner regression, and it predicts if the user will make a conversion with an accuracy of around 30%. It's pretty low but so far, it's the best we got it, and this linear regression generates almost 18% increase in conversions when we use it to show specific messages/prices to that specific user compaired to when we dont use it. SO, with a 30% accuracy our linear regression already provides 18% better conversions (and with that, higher revenues and so on).
In case you are curious, our linear regression model works like this: we generate a linear equation to every first user access on our system with variables that our system tries to find in order to minimize error sqr(expected value - value). Using the data above, our model would generate these equations below (SUNDAY = 0, MONDAY = 1...COMPUTER = 0, SMARTPHONE = 1... CONVERSION YES = 1 and NO = 0)
A*7 + B*1 + C*1 = 0
A*13 + B*2 + C*0 = 1
A*19 + B*0 + C*0 = 1
A*8 + B*6 + C*1 = 0
A*12 + B*0 + C*1 = 0
A*0 + B*1 + C*0 = 1
So, our system find the best A, B and C that generates the minimizes error. How can we do that with AI? If possible, it would be nice if we could use TensorFlow or anything with JS! I know there are several AI models, and I have no idea which one would best fit what we need!

Related

How to speed up construction phase whilst having an trivial overlapping constraint

We're are trying to put together a proof of concept planning constraints solver using OptaPlanner. However the construction phase seems slow for even a trivial set of constraints i.e. assign to one User with no overlapping Tasks for that User.
Problem overview:
We are assigning Tasks to Users
Only one Task can be assigned to User
The Tasks can be variable length: 1-16 hours
Users can only do one Task at a time
Users have 8 hours per day
We are using the Time Grain pattern - 1 grain = 1 hour.
See constraints configuration below.
This works fine (returns in a 20 seconds) for a small number of Users and Tasks e.g. 30 Users / 1000 Tasks but when we start scaling up the performance rapidly drops off. Simply increasing the number of Users without increasing the number of Tasks (300 Users / 1000 Tasks) increases the solve time to 120 seconds.
But we hope to scale up to 300 Users / 10000 Tasks and incorporate much more elaborate constraints.
Is there a way to optimise the constraints/configuration?
Constraint constraint1 = constraintFactory.forEach(Task.class)
.filter(st -> st.getUser() == null)
.penalize("Assign Task", HardSoftLongScore.ONE_HARD);
Constraint constraint2 = constraintFactory.forEach(Task.class)
.filter(st -> st.getStartDate() == null)
.penalize("Assign Start Date", HardSoftLongScore.ONE_HARD);
Constraint constraint3 = constraintFactory
.forEachUniquePair(Task.class,
equal(Task::getUser),
overlapping(st -> st.getStartDate().getId(),
st -> st.getStartDate().getId() + st.getDurationInHours()))
.penalizeLong("Crew conflict", HardSoftLongScore.ONE_HARD,
(st1, st2) -> {
int x1 = st1.getStartDate().getId() > st2.getStartDate().getId() ? st1.getStartDate().getId(): st2.getStartDate().getId();
int x2 = st1.getStartDate().getId() + st1.getDurationInHours() < st2.getStartDate().getId() + st2.getDurationInHours() ?
st1.getStartDate().getId() + st1.getDurationInHours(): st2.getStartDate().getId() + + st2.getDurationInHours();
return Math.abs(x2-x1);
});
constraint1 and constraint2 seem redundant to me. The Construction Heuristic phase will initialize all planning variables (automatically, without being penalized for not doing so) and Local Search will never set a planning variable to null (unless you're optimizing an over-constrained problem).
You should be able to remove constraint1 and constraint2 without impact on the solution quality.
Other than that, it seems you have two planning variables (Task.user and Task.startDate). By default, in each CH step, both variables of a selected entity are initialized "together". That means OptaPlanner looks for the best initial pair of values for that entity in the Cartesian product of all users and all time grains. This scales poorly.
See the Scaling construction heuristics chapter to learn how to change that default behavior and for other ways how to make Construction Heuristic algorithms scale better.

Optaplanner:Add Dynamic visits without changing the already created visits

I am saving the best solution into the DB, and we display that on the web page. I am looking for some solution where a user can add more visits, but that should not change already published trips.
I have checked the documentation and found ProblemFactChange can be used, but only when the solver is already running.
In my case solver is already terminated and the solution is also published. Now I want to add more visits to the vehicle without modifying the existing visits of the Vehicle. Is this possible with Optaplanner? if yes any example of documentation would be very helpful.
You can use PlanningPin annotation for avoiding unwanted changes.
Optaplanner - Pinned planning entities
If you're not looking for pinning (see Ismail's excellent answer), take a look at the OptaPlanner School Timetabling example, which allows adding lessons between solver runs. The lessons simply get stored in the database and then get loaded when the solver starts.
The difficulty with VRP is the chained model complexity (we're working on an alternative): If you add a visit X between A and B, then make sure that afterwards A.next = X, B.previous = X, X.previous = A, X.next = B and X.vehicle = A.vehicle. Not the mention the arrival times etc.
My suggestion would be to resolve what is left after the changes have been introduced. Let's say you are you visited half of your destinations (A -> B -> C) but not yet (C - > D -> E) when two new possible destinations (D' and E') are introduced. Would not this be the same thing as you are starting in C and trying plan for D, D', E and E'? The solution needs to be updated on the progress though so the remainder + changes can be input to the next solution.
Just my two cent.

Sentinel 1 data gaps in swath overlap (not sequential scenes) in Google Earth Engine

I am working on a project using the Sentinel 1 GRD product in Google Earth Engine and I have found a couple examples of missing data, apparently in swath overlaps in the descending orbit. This is not the issue discussed here and explained on the GEE developers forum. It is a much larger gap and does not appear to be the product of the terrain correction as explained for that other issue.
This gap seems to persist regardless of year changes in the date range or polarization. The gap is resolved by changing the orbit filter param from 'DESCENDING' to 'ASCENDING', presumably because of the different swaths or by increasing the date range. I get that increasing the date range increases revisits and thus coverage but is this then just a byproduct of the orbital geometry? ie it takes more than the standard temporal repeat to image that area? I am just trying to understand where this data gap is coming from.
Code example:
var geometry = ee.Geometry.Polygon(
[[[-123.79472413785096, 46.20720039434629],
[-123.79472413785096, 42.40398120362418],
[-117.19194093472596, 42.40398120362418],
[-117.19194093472596, 46.20720039434629]]], null, false)
var filtered = ee.ImageCollection('COPERNICUS/S1_GRD').filterDate('2019-01-01','2019-04-30')
.filterBounds(geometry)
.filter(ee.Filter.eq('orbitProperties_pass', 'DESCENDING'))
.filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH'))
.filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
.filter(ee.Filter.eq('instrumentMode', 'IW'))
.select(["VV","VH"])
print(filtered)
var filtered_mean = filtered.mean()
print(filtered_mean)
Map.addLayer(filtered_mean.select('VH'),{min:-25,max:1},'filtered')
You can view an example here: https://code.earthengine.google.com/26556660c352fb25b98ac80667298959

offline GTFS connection from A to B

The project I am working on involves static offline GTFS data in a mobile app. All the GTFS data is available inside realm-objects (or SQLite if needed).
Now, I would like to establish all train- or bus-connections from A to B (starting after a certain departure-time).
How do I query the GTFS-data in order to get a connection from A to B ???
I reealized to get all trips leaving from A.
I realized to get all station-names along that trip including times.
But I find it very hard to get the connection information between two locations A and B. What SQL queries do i have to set up in order to get that information ?
Any help appreciated !
If you just want to dynamically calculate shortest travel routes between static hubs, in an offline application, determined like the following image, you can use the following formula:
   (Source)
Here's the pseudo code:
1 function Dijkstra(Graph, source):
2
3 create vertex set Q
4
5 for each vertex v in Graph: // Initialization
6 dist[v] ← INFINITY // Unknown distance from source to v
7 prev[v] ← UNDEFINED // Previous node in optimal path from source
8 add v to Q // All nodes initially in Q (unvisited nodes)
9
10 dist[source] ← 0 // Distance from source to source
11
12 while Q is not empty:
13 u ← vertex in Q with min dist[u] // Node with the least distance
14 // will be selected first
15 remove u from Q
16
17 for each neighbor v of u: // where v is still in Q.
18 alt ← dist[u] + length(u, v)
19 if alt < dist[v]: // A shorter path to v has been found
20 dist[v] ← alt
21 prev[v] ← u
22
23 return dist[], prev[]
(Source)
Alright, I'll admit so far my answer was a tad facetious...
I suspect you don't realize just how complicated it is to do what you're asking, especially in an offline environment. Companies like Google and Microsoft have spent millions on research with huge teams of data scientists.
If this is something you are serious about, I'd encourage you to start with a 10×10 grid and work on the logic of getting from "Point A → Point B" when you start adding barriers in random places (this simulating roads beginning & ending). Recently I was surprised how complicated a seemingly-simple, somewhat-related Stack Overflow "pipe sizes conversion" question that I answered had become.
If you didn't have the "offline" condition, I would've suggested looking into getting a [free] API Key for Google Web Services Directions API. Basically you tell it a where Points A & B are, and it gives you detailed route information, via transit or other methods.
Another of Google's many API's that could be helpful is the Snap to Roads API, which turns partial or error-ridden paths into driveable ones.
The study Route Logic and Shortest Path Logic is actually really fascinating stuff, and there are some amazing resources to learn about the related theories (see below), including a video from Google explaining how they went about it.
...but unfortunately this isn't something that's going to be accomplished with a simple SQL Query.
Actual Resources:
YouTube : Google Tech Talk: Fast Route Planning
Wikipedia : Shortest path problem
Wikipedia : Dijkstra's algorithm
...and some slightly Lighter Reading:
Wikipedia : Travelling Salesman Problem
Why UPS drivers don’t turn left and you probably shouldn’t either
Google Web Services : Directions API Developer's Guide
Stack Overflow : Shortest Route of converters between two different pipe sizes?
Wikipedia : Seven Bridges of Königsberg

options for questions in Watson conversation api

I need to get the available options for a certain question in Watson conversation api?
For example I have a conversation app and in some cases Y need to give the users a list to select an option from it.
So I am searching for a way to get the available reply options for a certain question.
I can't answer to the NPM part, but you can get a list of the top 10 possible answers by setting alternate_intents to true. For example.
{
"context":{
"conversation_id":"cbbea7b5-6971-4437-99e0-a82927607079",
"system":{
"dialog_stack":["root"
],
"dialog_turn_counter":1,
"dialog_request_counter":1
}
},
"alternate_intents":true,
"input":{
"text":"Is it hot outside?"
}
}
This will return at most the top ten answers. If there is a limited number of intents it will only show them.
Part of your JSON response will have something like this:
"intents":[{
"intent":"temperature",
"confidence":0.9822100598134365
},
{
"intent":"conditions",
"confidence":0.017789940186563623
}
This won't get you the output text though from the node. So you will need to have your answer store elsewhere to cross reference.
Also be aware that just because it is in the list, doesn't mean it's a valid answer to give the end user. The confidence level needs to be taken into account.
The confidence level also does not work like a normal confidence. You need to determine your upper and lower bounds. I detail this briefly here.
Unlike earlier versions of WEA, the confidence is relative to the
number of intents you have. So the quickest way to find the lowest
confidence is to send a really ambiguous word.
These are the results I get for determining temperature or conditions.
treehouse = conditions / 0.5940327076534431
goldfish = conditions / 0.5940327076534431
music = conditions / 0.5940327076534431
See a pattern?🙂 So the low confidence level I will set at 0.6. Next
is to determine the higher confidence range. You can do this by mixing
intents within the same question text. It may take a few goes to get a
reasonable result.
These are results from trying this (C = Conditions, T = Temperature).
hot rain = T/0.7710267712183176, C/0.22897322878168241
windy desert = C/0.8597747113239446, T/0.14022528867605547
ice wind = C/0.5940327076534431, T/0.405967292346557
I purposely left out high confidence ones. In this I am going to go
with 0.8 as the high confidence level.