SQL GROUPING SETS averages with multiple many-to-many dimensions - sql

I have a table of data with the following:
User,Platform,Dt,Activity_Flag,Total_Purchases
1,iOS,05/05/2016,1,1
1,Android,05/05/2016,1,2
2,iOS,05/05/2016,1,0
2,Android,05/05/2016,1,2
3,iOS,05/05/2016,1,1
3,Android,06/05/2016,1,3
1,iOS,06/05/2016,1,2
4,Android,06/05/2016,1,2
1,Android,06/05/2016,1,0
3,iOS,07/05/2016,1,2
2,iOS,08/05/2016,1,0
I want to do a GROUPING SETS (Platform,Dt,(Platform,Dt),()) aggregation to be able to find for each combination of Platform and Dt the following:
Total Purchases
Total Unique Users
Average Purchases per User per Day
The first two are simple as these can be achieved via a sum(Total_Purchases) and count(distinct user) respectively.
The problem I have is with the last metric. The result set should look like this but I don't know how to get the last column to be calculated correctly:
Platform,Dt,Total_Purchases,Total_Unique_Users,Average_Purchases_Per_User_Per_Day
Android,05/05/2016,4,2,2.0
iOS,05/05/2016,2,3,0.7
Android,06/05/2016,5,3,1.7
iOS,06/05/2016,2,1,2.0
iOS,07/05/2016,2,1,2.0
iOS,08/05/2016,0,1,0.0
,05/05/2016,6,3,2.0
,06/05/2016,7,3,2.3
,07/05/2016,1,1,1.0
,08/05/2016,1,1,1.0
Android,,9,4,1.8
iOS,,6,3,1.2
,,15,4,1.6
For the first ten rows we see that getting the Average purchase per user per day is a simple division of the first two columns as the dimension in these rows represent a single date only. But when we look at the final 3 rows we see that the division is not the way to achieve the desired result. This is because it needs to take an average for each day in turn to get the overall per day amount.
If this isn't clear please let me know and I'll be happy to explain better. This is my first post on this site!

Related

Creating custom timestamp buckets bigquery

I have an hourly (timestamp) dataset of events from the past month.
I would like to check the performance of events that occurred between certain hours, group them together and average the results.
For example: AVG income of the hours 23:00-02:00 per user:
So if I have this data set below. I'd like to summarise the coloured rows and then average them (the result should be 218).
I tried NTILE but it couldn't divide the data properly, ignoring the irrelevant hours.
Is there a good way to create these custom buckets using SQL?
dataset
From description not exactly sure how you want to aggregate. If you provide an example dataset can update answer.
However you can easily achieve this with an AVG and IF statement.
AVG(IF(EXTRACT(HOUR FROM timestamp_field) BETWEEN 0 AND 4, value, NULL) as avg_value
Using the above you can then group by either day or month to get the aggregation level you want.

Tableau combining rows with the same info

I have a dashboard in Tableau which shows different payments received - the amount, the date the payment was received, and a calculated field which shows the number days since the payment was received.
However, a lot of payments are the same, with the same amount, and received on the same day; so Tableau collapses these together, and adds the total days since the payments were received together in the final column, i.e. five lots of £5.50, each received on 1st January shows as below (as of 01/02/2018)
Column 1 Column 2 Column 3
£5.50 01/01/2018 155
But I need separate rows for each. Does anyone know how to stop tableau doing this, or of a workaround?
Many thanks.
You could try using RANK_UNIQUE function.
First of all, in the Analysis Menu, uncheck Aggregate Measures.
Then, starting from this data:
You can get this result:
Additionally, you may want to hide Rank from rows just not-showing header.
Is this something close to what you're looking for?
EDIT/UPDATE
In order to get all values and not just for the top rows, just move the Rank at the very beginning of the shelf:

Query to find average stock ... with a twist

We are trying to calculate average stock from a movements table in a single sql sentence.
As far as we are, no problem with what we thought was a standard approach, instead of adding up the daily stock and divide by the number of days, as we don’t have daily stock, we simply add (movements*remaining days) :
select sum(quantity*(END_DATE-move_date))/(END_DATE-START_DATE)
from move_table
where move_date<=END_DATE
This is a simplified example, in real life we already take care of the initial stock at the starting date. Let’s say there are no movements prior to start_date.
Quantity sign depends on move type (sale, purchase, inventory, etc).
Of course this is done grouping by product, warehouse, ... but you get the idea.
It works as expected and the calculus is fine.
But (there is always a “but”), our customer doesn’t like accounting days when there is no stock (all stock sold out). So, he doesnt like
Sum of (daily_stock) / number_of_days (which is what we calculate using a diferent math)
Instead, he would like
Sum of (daily stock) / number_of_days_in_which_stock_is_not_zero
For sure we can do this in any programming language without much effort, but I was wondering how to do it using plain sql ... and wasn’t able to come up with a solution.
Any suggestion?
Consider creating a new table called something like Stock_EndOfDay_History that has the following columns.
stock#
date
stock_count_eod
This table would get a new row for each stock item at the start of a new day for the prior day. Rows could then be purged from this table once the applicable date value went outside the date window of interest.
To get the "number_of_days_in_which_stock_is_not_zero", use this.
SELECT COUNT(*) AS 'Not_Zero_Stock_Days' FROM Stock_EndOfDay_History
WHERE stock# = <stock#_value>
AND <date_window_clause>
Other approaches might attempt to just add a new column to the existing stock table to maintain a cumulative sum of the " number_of_days_in_which_stock_is_not_zero". But inevitably, questions will be asked as to how did the non-zero stock days count get calculated? Using this new table approach will address those questions better than the new column approach.

How to handle monthly and yearly values

I have a Fact table that holds what are more or less, sales goals. The ETL process that populates it, generates 12 "weighted" values into seperate rows, one per month. Each row however, also includes a field that holds the yearly value. I do this with unpivot. This all works. Now Im trying to get at this data in the cube with an SSRS report. The problem seems to be that I can query and see the results that include either the yearly goal values or the monthly, weighted values, but not both in the same set.
[update for fact table details]
My Fact table looks something like this:
FK_Account
FK_User
Target
Projected
GoalYear
FK_DateKey
FK_Dept
MonthlyWeightedTarget
MonthlyWeightedProjected
When I load this fact table via the ETL, I get the date key associated with each monthly value (MonthlyWeightedTarget). That will be 12 seperate records, but each one will have the same yearly value. Im not including next years value as a seperate column, because there are seperate records already associated with that year.
Basically, the users define a set of goals associated with a given year. Then I am applying a "weighting" to generate 12 seperate "monthly" records, which total up to the yearly target goal. Hope this makes sense.
What I need to see is something like this result:
Account Name
YTDgoal
YearGoal
NextYrGoal
I created a calculated member for the NextYrGoal, but now Im not sure I even need it.
What would be a good approach for handling the above (getting the ytd, yearly and next year values) ?
If I was getting at these values with TSQL, I would sum on the monthly values, and just include the associated yearly and next years values, grouping by account, year-goal, next-year-goal

Sql Queries for finding the sales trend

Suppose ,I have a table which has all the billing records. Now I want to see the sales trend for a user given time duration group by each 3 days ...what should be the sql query regarding this?
please help,Otherwise I am gone ...
I can only give a vague suggestion as per the question, however you may want to have a derived column with a standardised date (as per MS date format, just a number per day) that you could then use a modulus (3) on so that days are equal per 3 day period. You can then group and aggregate over this column to get the values for a 3 day period. Obviously to display the date nicely you would have to multiply back and convert your column as well.
Again I'm not sure of the specifics, but I think this general idea could be achieved to get a result (may well not be the best way so it would help to add more to the question...)