I have a table with sales information at the transaction level. We want to institute a new model where we compensate sales reps if a customer has been makes a purchase after more than a year of dormancy. To figure out how much this would have cost historically, I want to add a column with a flag for whether or not each purchase was the Buyer's first in the past 365 days. What I'd like to do is a rowcount in Powerpivot, for all sales made by that customer in the past 365 days, and wrap it in an IF to set the result to 0 or 1.
Example:
Order Date Buyer First Purchase in Year?
1/1/2015 1 1
1/2/2015 2 1
2/1/2015 1 0
4/1/2015 2 0
3/1/2016 2 1
5/1/2017 2 1
Any assistance would be greatly appreciated.
Excellent business use case! It's quite relevant in the business world.
To break this down for you, I will create 3 columns: 2 with some calculations, and 1 with the result. Once you understood how I did this, you can combine all 3 column formulas and make a single column for your dataset, if you like.
Here's a picture of the results:
So here's the 3 columns that I created:
Last Purchase - in order to run this calculation, you need to know when the buyer made their last purchase.
CALCULATE(MAX([Order Date]),FILTER(Table1,[Order Date]<EARLIER([Order Date]) && [Buyer]=EARLIER([Buyer])))
Days Since Last Purchase - now you can compare the Last Purchase date to the current Order Date.
DATEDIFF([Last Purchase],[Order Date],DAY)
First Purchase in 1 Year - finally, the results column. This simply checks to see if it has been more than 365 days since the last purchase OR if the last purchase column is blank (which means it was the first purchase), and creates the flag you want.
IF([Days Since Last Purchase]>365 || ISBLANK([Days Since Last Purchase]),1,0)
Now, you can easily combine the logic of these 3 columns into a single column and get what you want. Hope this helps!
One note I wanted to add is that for this type of analysis it's not a wise move to do row counts as you had originally suggested, as your dataset can easily expand later on (what if you wanted to add more attribute columns?) and then you would have problems. So this solution that I shared with you is much more robust.
Related
I am working on the workforce analysis project. And I did some case when conditional calculations in Google Data Studio. However, when I successfully conducted the creation of the new field, I couldn't do the calculation again based on the fields I created.
Based on my raw data, I generated the start_headcount, new_hires, terminated, end_headcount by applying the Case When conditional calculations. However, I failed in the next step to calculate the Turnover rate and Retention rate.
The formula for Turnover rate is
terms/((start_headcount+end_headcount)/2)
for retention is
end_headcount/start_headcount
However, the result is wrong. Part of my table is as below:
Supervisor sheadcount newhire terms eheadcount turnover Retention
A 1 3 1 3 200% 0%
B 6 2 2 6 200% 500%
C 6 1 3 4 600% 300%
So the result is wrong. The turnover rate for A should be 1/((1+3)/2)=50%; For B should be 2/((6+6)/2)=33.33%.
I don't know why it is going wrong. Can anyone help?
For example, I wrote below for start_headcount for each employee
CASE
WHEN Last Hire Date<'2018-01-01' AND Termination Date>= '2018-01-01'
OR Last Hire Date<'2018-01-01' AND Termination Date IS NULL
THEN 1
ELSE 0
END
which means if an employee meets the above standard, will get 1. And then they all grouped under a supervisor. I think it might be the problem why the turnover rate in sum is wrong since it is not calculated on the grouped date but on each record and then summed up.
Most likely you are trying to do both steps within the same query and thus newly created fields like start_headcount, etc. not visible yet within the same select statement - instead you need to put first calculation as a subquery as in example below
#standardSQL
SELECT *, terms/((start_headcount+end_headcount)/2) AS turnover
FROM (
<query for your first step>
)
I have a dashboard in Tableau which shows different payments received - the amount, the date the payment was received, and a calculated field which shows the number days since the payment was received.
However, a lot of payments are the same, with the same amount, and received on the same day; so Tableau collapses these together, and adds the total days since the payments were received together in the final column, i.e. five lots of £5.50, each received on 1st January shows as below (as of 01/02/2018)
Column 1 Column 2 Column 3
£5.50 01/01/2018 155
But I need separate rows for each. Does anyone know how to stop tableau doing this, or of a workaround?
Many thanks.
You could try using RANK_UNIQUE function.
First of all, in the Analysis Menu, uncheck Aggregate Measures.
Then, starting from this data:
You can get this result:
Additionally, you may want to hide Rank from rows just not-showing header.
Is this something close to what you're looking for?
EDIT/UPDATE
In order to get all values and not just for the top rows, just move the Rank at the very beginning of the shelf:
My dataset provides a monthly snapshot of customer accounts. Below is a very simplified version:
Date_ID | Acc_ID
------- | -------
20160430| 1
20160430| 2
20160430| 3
20160531| 1
20160531| 2
20160531| 3
20160531| 4
20160531| 5
20160531| 6
20160531| 7
20160630| 4
20160630| 5
20160630| 6
20160630| 7
20160630| 8
Customers can open or close their accounts, and I want to calculate the number of 'new' customers every month. The number of 'exited' customers will also be helpful if this is possible.
So in the above example, I should get the following result:
Month | New Customers
------- | -------
20160430| 3
20160531| 4
20160630| 1
Basically I want to compare distinct account numbers in the selected and previous month, any that exist in the selected month and not previous are new members, any that were there last month and not in the selected are exited.
I've searched but I can't seem to find any similar problems, and I hardly know where to start myself - I've tried using CALCULATE and FILTER along with DATEADD to filter the data to get two months, and then count the unique values. My PowerPivot skills aren't up to scratch to solve this on my own however!
Getting the new users is relatively straightforward - I'd add a calculated column which counts rows for that user in earlier months and if they don't exist then they are a new user:
=IF(CALCULATE(COUNTROWS(data),
FILTER(data, [Acc_ID] = EARLIER([Acc_ID])
&& [Date_ID] < EARLIER([Date_ID]))) = BLANK(),
"new",
"existing")
Once this is in place you can simply write a measure for new_users:
=CALCULATE(COUNTROWS(data), data[customer_type] = "new")
Getting the cancelled users is a little harder because it means you have to be able to look backwards to the prior month - none of the time intelligence stuff in PowerPivot will work out of the box here as you don't have a true date column.
It's nearly always good practice to have a separate date table in your PowerPivot models and it is a good way to solve this problem - essentially the table should be 1 record per date with a unique key that can be used to create a relationship. Perhaps post back with a few more details.
This is an alternative method to Jacobs which also works. It avoids creating a calculated column, but I actually find the calculated column useful to use as a flag against other measures.
=CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATESBETWEEN(
'Dates'[Date], 0, LASTDATE('Dates'[Date])
)
) - CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATESBETWEEN(
'Dates'[Date], 0, FIRSTDATE('Dates'[Date]) - 1
)
)
It basically uses the dates table to make a distinct count of all Acc_ID from the beginning of time until the first day of the period of time selected, and subtracts that from the distinct count of all Acc_ID from the beginning of time until the last day of the period of time selected. This is essentially the number of new distinct Acc_ID, although you can't work out which Acc_ID's these are using this method.
I could then calculate 'exited accounts' by taking the previous months total as 'existing accounts':
=CALCULATE(
DISTINCTCOUNT('Accounts'[Acc_ID]),
DATEADD('Dates'[Date], -1, MONTH)
)
Then adding the 'new accounts', and subtracting the 'total accounts':
=DISTINCTCOUNT('Accounts'[Acc_ID])
The data I am working with is oil and gas production data. The production table uniquely identifies each well and contains a time series of production values. I want to be able to calculate a column that contains the month number occurrence of production for every well in the production table. This needs to be a calculation, so I can graph the production for various wells based on the production month, not the calendar month. (I want to compare well performance across wells over the life of wells.) Also note that there could be gaps in the production data so you can't depend on having twelve months of sequential production for each well.
I tried using the answer in this postRankValues but the calculation would never finish. I have over 4 million rows of production data.
In the table shown below, the values shown in ProdMonth is what I need to calculate based on their time occurrence shown in ProdDate. This needs to be performed as a row calculation for each unique WellId
Thanks.
WellID ProdDate ProdMonth
1 12/1/2011 1
1 1/1/2012 2
1 2/1/2012 3
1 3/1/2012 4
… … …
1 11/1/2012 12
2 3/1/2014 1
2 4/1/2014 2
2 5/1/2014 3
2 6/1/2014 4
2 7/1/2014 5
… … …
2 2/1/2014 12
I would create a new date table that has a row for each day (the granularity of your data). I would then add to that table the ProdMonth column. This will ensure you have dates for all days (even if there are gaps in the well reporting data). Then you can use a relationship between the well production data and the Date table on the ProdDate field. Then if you pull in the ProdMonth from the date table, you'll have a list of all of the ProdMonths (hint: you may need to select 'show values with no data' on the field right click menu in the fields well). Then if you add to the same visualization WellID you should be able to see which wells were active in which ProdMonth. If WellID is a number, you might need do use the 'do not summarize' feature on the WellID to get the result you desire.
I posted this question on the PowerPivotPro and Tom Allan provided the DAX formula I needed. First step was to calculate a field that concatenated Year and Month (YearMonth). Then utilized the RANKXX function as such:
= RANKX ( FILTER ( Data, [WellID] = EARLIER ( [WellID] ) ), [YearMonth], , 1, DENSE )
That did the trick and performed fairly quickly on 12mm rows.
I've been trying for a while and I'm just about to give up. I need to prepare a report that displays Item Numbers, the line they were produced on, and their production date, among other things. So, as you would imagine, each row contains a line number, item number, production date, and information regarding the number of items planned and produced for that entry.
I need to group the rows by line first, that was simple enough, afterwards, I need to group them by week, that also worked like a charm, except the dates were not really in order after this. I would need to apply a sort but by day this time. This works well but it's the next step that causes the most trouble. I also need to group the runs of items produced. For example:
Day - Item
Day 1 - Item A
Day 2 - Item A
Day 3 - Item A
These would be grouped with a footer counting the number of items produced for those consecutive entries. However, sometimes production looks like this:
Day - Item
Day 1 - Item B
Day 2 - Item B
Day 3 - Item A
Day 3 - Item B
I don't see a way to have the items ordered in a particular way that they can be grouped since I'm already ordering/sorting them by date because the date order is messed up by the first group. If I'm to group items at that point I would get one group header/footer per row, meaning it's not working at all.
My client suggests I make it so that when Access "notices the item number changes it gives a total". While that's wonderful in words, it implies that the rows should be sorted by item number and date. He will produce item A for three days, then produce item B for 2 days but part of the problem is that sometimes he will produce A for two and a half days and start B on that third day (following A) so if it's ordered by date, it may put one row above the other since they are on the same day. To my knowledge there is no real way to have Access to just "know" which products are produced first so as to group them after the item number changes. Of course it can keep the order they were entered in but if I ever need them sorted, that order will be lost.
I'm not sure if this is at all possible with this kinda of table structure. If not, can anyone suggest an alternative table structure? Or perhaps there's a way to have the first group by to not mess up the dates, which would allow me to remove the sort by date (although I'm not sure that it would work even if I could do that).
#Steve Kass
Day - Item
Day 1 - Item B
Day 2 - Item B
Day 3 - Item B
Day 3 - Item A
Day 3 - Item C
Day 4 - Item A
Day 5 - Item C
This is how it's laid out in his Excel sheet:
Day - Item
Day 1 - Item B
Day 2 - Item C
Day 3 - Item C
Day 3 - Item A
Day 4 - Item A
Day 4 - Item D
Day 5 - Item D
I've picked letters that represent the alphabetical order of the actual item numbers.
#Abe Miessler, Query so far:
SELECT Planned.Line,
Planned.[Production Date],
Items.[Item Number],
Items.[Bottles/Pallet],
Planned.PQ1,
Planned.AQ1,
Planned.PQ2,
Planned.AQ2,
Planned.PQ3,
Planned.AQ3
FROM Items
INNER JOIN Planned
ON Items.ID = Planned.ItemID;
#David-W-Fenton: Well I'm being asked to have a production summary per run. A run would be described as consecutive production of the same product. Products are produced on one of two lines and there can be multiple entries per day. The report must be grouped first by line so that each group shows entries for that line. That was done with a simple grouping. Within each line grouping I'm required to separate entries by week. Now, within each week, the days are not appearing in order. If the days are not in order we will not see a run simply because a run will most likely happen with consecutive days. One product will be produced for 3 days in a row for example, if these days are mixed up with the other days of the week, there will not be a consecutive, identifiable run. To have the entries in each week be in the correct order (by day) I applied a sort. What I've noticed is that after applying this sort each entry is handled as a separate "group" but without a header/footer. This results in not being able to group by product number afterwards since each entry is within its own "group".
I think you're asking for something impossible. But just in case you aren't, please let us know what order you want if these are your rows:
Day - Item
Day 1 - Item B
Day 2 - Item B
Day 3 - Item A
Day 3 - Item B
Day 3 - Item C
Day 4 - Item A
Day 5 - Item C
You say in a comment that you started with this:
Group by=>line
Group by=>week
Group by=>product number
...but it didn't work "because after grouping by week, they're grouped by week but within the week they're no longer ordered." So you (correctly) added a sorting group, thus:
Group by=>line
Group by=>week
Sort by=>day
Group by=>product number
But you say:
Now it's in order and you can see
consecutive days with the same
products but grouping results in each
row being grouped separately.
Where are the controls displaying the data? In the detail or in the group/sort header? It makes all the difference in the world. To display all records, you use the DETAIL. To show summary data, you use the HEADER. It sounds to me like you're putting your controls in the header instead of the detail.
Can you take a screenshot of your report in design view and insert it into your question? Without it, I don't see how to get any further.