How to make Cohort User Retention analysis from User Login Data in Tableau? - data-visualization

I have this data which contain two tables:
User Table
UserID
DateOfJoining
Events Table
EventID
UserID
AnonymousID
EventName
EventRequestedOn
RequestUserAgent
RequestURL
I'm kind of new to Tableau. I'm not understanding what filters to apply to get a cohort retention analysis like the one below. Could you help me with this.
I explored around the filters and came up with the visualization below, but the data doesn't seem to be right to me. Could you tell me what is going wrong. The use case is to prepare a visualization so that I can understand how to retain users in what areas for which events.

Related

SQL - How do I expand a dataset to do a cohort analysis?

This is my first post so apologies if something is posted incorrectly - please let me know and I will fix it.
I am trying to build a SQL query in bigQuery that creates a cohort analysis so I can see how many customers have been retained over time by the month they joined (their cohort).
I work in insurance, so we have data on customers, when they joined, and any time they changed the policy (e.g., when they added a car to their coverage), but I do not have it laid out as every month of premium. The data is as follows:
Data vs how I need the data
Do you know how I could fill in the missing months?

SQL Server 2012 Managment Studio Basic SELECT queries

I am new to SQL Server. I have been assigned to do some simple queries to start off, then eventually move on to more complex queries.
I have spent a lot of time on this website: http://www.w3schools.com and I understand it, I think, but then when I go back to my company's database, I find myself searching from many, many, different tables with different information.
For example, a table would say [Acct_Name] and the query comes back with not the correct account name (s) that I need. Any advice that you think might help me? Thank you.
It sounds like you are looking to limit your results to specific accounts. There are many ways to go about this, so no one will be able to give you a all encompassing answer but if you are looking to just pull a single account
SELECT * FROM (your table name) WHERE Acct_Name = 'the account name'
The * means you are selecting all columns in the table and your WHERE clause is where you set your search conditionals, like account name or by account ID. If you had a account creation date, you could get all accounts created on or before a date like this
SELECT * FROM (your table name) WHERE Created < '2016-06-01 00:00:00'
Replace the column name 'Created' with the column that holds the date field of account creation
Learning the WHERE clause and what you can do there to limit your results will get you on a solid footing to start, from there you will want to learn JOINs and how to link tables by primary keys.
Code academy has some great tutorials https://www.codecademy.com/learn/learn-sql

Managing PerformancePoint Filters With Slowly Changing Dimensions

Just a bit of background info:
I have dimension table which uses SCD2 to track user changes in our company (team changes, job title changes etc) See example below:
I've built an Analysis Services Cube and created all the necessary hierarchy's for the dimensions and it works well when navigating and drilling down through the fact table.
The problem I have is with the filters on the PerformancePoint dashboard. As I'm using the User Dimension table with it's multiple instances of users it's showing duplicates up in the list. I can understand why as the surrogate ID is being referenced on the Dimension. But if I choose the first instance of the A-team I will see all their sales for a particular period and if I choose the second instance I will see all their sales for a different period.
What is the best way to handle this type of behavior? Ideally I'd like to see a distinct list of teams in alphabetical order and when I choose the team name it shows all of their data over time.
I've considered using MDX query filters but I'd like to see if there's anything I haven't thought about.
I realise this isn't an easy and quick question but any help would be appreciated!
The answer was simple after having a trawl through my User Dimension table on the Cube.
Under my user dimension I added 2 duplicate attributes to my attributes list ("Team Filter" is a copy of "Team", "User Filter" a copy of "User Name") these will be used only for filtering the dashboard.
Under the attribute properties for each duplicate I then set AttributeHierarchyOptimizedState to "Not Optimized", I also set their AttributeHierarchyVisible to false as I'd shown the two duplicate attributes in the hierarchy window in the middle.
Deploy your Cube to the server and go in to PerformancePoint. Create a new MDX Filter (this image shows the finished filter)
This is the code I used, it only shows dimension members which have a fact against them (reduces the list a considerable amount) and by using allmembers at the dimension it also gives me the option to show "All" at the top of the list.
Deploy the new filters and now you can see the distinct list of users and teams, works perfectly and selects every instance (regardless of the SCD2 row)

MySQL joins for friend feed

I'm currently logging all actions of users and want to display their actions for the people following them to see - kind of like Facebook does it for friends.
I'm logging all these actions in a table with the following structure:
id - PK
userid - id of the user whose action gets logged
actiondate - when the action happened
actiontypeid - id of the type of action (actiontypes stored in a different table - i.e. following other users, writing on people's profiles, creating new content, commenting on existing content, etc.)
objectid - id of the object they just created (i.e. comment id)
onobjectid - id of the object they did the action to (i.e. id of the content that they commented on)
Now the problem is there are several types of actions that get logged (actiontypeid).
What would be the best way of retrieving the data to display to the user?
The easiest way out would be gabbing the people the user follows dataset and then just go from there and grab all other info from the other tables (i.e. the names of the users the people you're following just started following, names of the user profiles they wrote on, etc.). This however would create a a huge amount of small queries and trips to the database in a while loop. Not a good idea.
I could use joins to retrieve everything in one massive data set, but how would I know where to grab the data from in just one query? - there's different types of actions that require me to look into several different tables to retrieve data, based on the actiontypeid...
i.e. To get User X is now following User Y I'd have to get my data (User Y's username) from the followers table, whereas User X commented on content Y would need me to look in the content table to get the content's title and URL.
Any tips are welcome, thanks!
Consider creating several views for different actiontypeids. Union them to have one full history.

Design of Databases for storing the details of the recurrent occurrence of an event

I need to implement a feature similar to the one provided by Microsoft Outlook to make your meeting appointment recurrent. I am trying to figure out the optimized Database design that I will be requiring for implementing this feature.
The requirement is something like that each run or task entered by the user will also be applicable for scheduling like a recurrent event - weekly, monthly or yearly. Could you please suggest me the Database model - table structure (with constraints) for storing these details in the DB which can be afterwards accessed by the program to do the appropriate task. Screenshots for some of the possible scheduler details can be found at the following link.
We have a mysql DB running at the backend for storing these details. As soon as the user submits a request, a request id with the details of the request is stored in the table and then a action corresponding to it is taken by the program. More clarification would be like that the users intent is to run a sql script,getting the values and then performing statistical analysis to it. But as the oracle reference DB is dynamically updated by many users, he wants to run it in a recurrent manner and get the analysis done. Note that the mysql db and the ref DB are different.
Please let me know if you require any other details.!
I would suggest storing the details of the first occurence in one table (scheduled tasks) and then the recurance (recurring tasks) details in another.
I might also then be tempted to update the scheduled task table with the next occurance as each task is completed.
As for the Table layout, a rough sketch would be as follows:
[ScehduledTasks]
TaskId (Primary Key)
Description and Details etc...
Start Datetime
End Datetime
[RecurringTasks]
TaskId (Foreign Key)
Frequency : Daily, Weekly, Monthly or Yearly.
DayNo : What Day to run on (1-7 for weekly, 1-31 for monthly, 1-365 for yearly)
Interval : Every x weeks, months etc.
WeekOfMonth : first, second, third... etc If populated then DayNo specifies the day of the week.
MonthOfYear : 1-12.
EndDatetime : The last date to perform
Occurences : The number of times to perform. If this and the previous value are null then perform for ever.
Obvious certain fields would be blank depending on how the task was set up, but I think the above covers all you would need to emulate the tasks in Outlook.