Rank in powerpivot - powerpivot

In Powerpivot, I have a problem in ranking in Table 1, based on Sales and Year. I want to have the result like that:
Year Store Sales **Rank**
2013 A 200 3
2013 B 250 2
2013 C 300 1
2014 A 350 2
2014 B 300 3
2014 C 400 1
Which rank function could I use to have this rank result?
Thanks in advance.

Tran,
Probably the smartest way to go is to use the 'X' functions. They can be a bit tricky and non intuitive, yet are extremely powerful.
First, create a simple measure to calculate the total sales:
TotalSales:=SUM(Stores[Sales])
Then, use this formula below to calculate the rank (per store per year):
Rank:=RANKX(ALL(Stores[Store]), [TotalSales])
That should do what you are looking for. Once those two measures are ready, create a new powerpivot table, dray Year and Store onto rows pane and add required values.
ALL function overwrites the applied rows filter and thus allows to calculate rank per year.
The result should look like this:
Hope this helps.

Related

DB2 sum over dynamic batch of rows

I'm working on a project that involves building an automated tool for our pricing team to look at the effects of their pricing changes on demand. I'm writing in Python and using SQL to query our DB2 data sources.
The idea is to allow the pricing team to tell the tool the line they want to check and the week number that they made a price change (week_id in the form 202219 in case it is needed), the program will then calculate the number of completed weeks between runtime and the price change to determine how many weeks before and after the price change to return for comparison (this variable is called deltaWeek).
My thought right now is to use a CTE to calculate the total demand by week, then I want to reference that CTE to gather week_id batches the size of deltaWeek and SUM() the total quantity for each batch.
I have the CTE query working, and the output of the query is good, and assuming the week of the price change was week 12, deltaWeek = 6, and the quantity is all the same (which it isn't in reality, but it makes it easy) a condensed output looks like this (it excludes the week of the price change on purpose)
ROW_NUM
WEEK_ID
QUANTITY
1
201906
10
2
201907
10
3
201908
10
4
201909
10
5
201910
10
6
201911
10
7
201913
10
8
201914
10
9
201915
10
10
201916
10
11
201917
10
12
201918
10
Is there a way in DB2 to reference this CTE and return something that would look like this
BATCH
QUANTITY
1
60
2
60
where BATCH 1 represents SUM(QUANTITY) for ROW_NUM 1-6 FROM WEEKLY_TOTALS_CTE and BATCH 2 is similar for ROW_NUM 7-12
More generally, because deltaWeek, and thus the number of weeks in any given batch will depend on when the tool is ran, I need to total from ROW 1 - deltaWeek, then deltaWeek+1 - deltaWeek*2, etc.. I have working python functions to make SQL templates using parameters so I can pass deltaWeek into the query if I can figure out the logic to make this query work.
If this is a terrible idea to try to make work, I guess I can just run the query using pd.read_sql and then use iloc[] to do the batch aggregation, but I feel like it should be able to be done all in the query, maybe?
Thank you for any help/reference.

Calculation for month number in time series data

The data I am working with is oil and gas production data. The production table uniquely identifies each well and contains a time series of production values. I want to be able to calculate a column that contains the month number occurrence of production for every well in the production table. This needs to be a calculation, so I can graph the production for various wells based on the production month, not the calendar month. (I want to compare well performance across wells over the life of wells.) Also note that there could be gaps in the production data so you can't depend on having twelve months of sequential production for each well.
I tried using the answer in this postRankValues but the calculation would never finish. I have over 4 million rows of production data.
In the table shown below, the values shown in ProdMonth is what I need to calculate based on their time occurrence shown in ProdDate. This needs to be performed as a row calculation for each unique WellId
Thanks.
WellID ProdDate ProdMonth
1 12/1/2011 1
1 1/1/2012 2
1 2/1/2012 3
1 3/1/2012 4
… … …
1 11/1/2012 12
2 3/1/2014 1
2 4/1/2014 2
2 5/1/2014 3
2 6/1/2014 4
2 7/1/2014 5
… … …
2 2/1/2014 12
I would create a new date table that has a row for each day (the granularity of your data). I would then add to that table the ProdMonth column. This will ensure you have dates for all days (even if there are gaps in the well reporting data). Then you can use a relationship between the well production data and the Date table on the ProdDate field. Then if you pull in the ProdMonth from the date table, you'll have a list of all of the ProdMonths (hint: you may need to select 'show values with no data' on the field right click menu in the fields well). Then if you add to the same visualization WellID you should be able to see which wells were active in which ProdMonth. If WellID is a number, you might need do use the 'do not summarize' feature on the WellID to get the result you desire.
I posted this question on the PowerPivotPro and Tom Allan provided the DAX formula I needed. First step was to calculate a field that concatenated Year and Month (YearMonth). Then utilized the RANKXX function as such:
= RANKX ( FILTER ( Data, [WellID] = EARLIER ( [WellID] ) ), [YearMonth], , 1, DENSE )
That did the trick and performed fairly quickly on 12mm rows.

SSRS How to Compare Columns to First Column in Group

I'm trying to create what seems like should be a pretty simple matrix report and I'm hoping someone can help. I have dataset that returns sales region, Date, and sales amount. The requirement is to compare sales for the various time periods to the current date. I'm looking to get my matrix to look something like this:
CurrentSales Date2Sales CurrentVSDate2 Date3Sales CurrentVSDate3
1 1000 1500 -500 800 200
2 1200 1000 200 900 300
3 1500 1100 400 1400 100
I can get the difference from one column to the next, but I need all columns to reference the CurrentSales column. Any help would be greatly appreciated.
Currently my data set is pulling in a date, region, product and sales amount. I then have three parameters, CurrentDate, PreviousMonth, PreviousQuarter. The regions and products are my row groups and the dates are the column groups. Next I added a column inside the group with the following expression: =Sum(Fields!SalesAmount.Value)-Previous(Sum(Fields!SalesAmount.Value),"BookingDate"). I know this isn't correct because it compares the values to the previous date in the column group and I need the comparision to be to the First date in the column group.
Example:
Using Expressions you can:
=iif(Sum(Fields!SalesAmount.Value)= Previous(Sum(Fields!Date2Sales.Value)),
=iif(Sum(Fields!EndBalance.Value)=0, Nothing, Sum(Fields!EndBalance.Value)) You can also use Switch.
The easiest way to get this result would probably be in your query. Add a field to every row returned maybe called "Current Sales." Use a correlated subquery there to get the right value for comparison. Then your comparison can be as simple as =Fields!Sales.Value - Fields!CurrentSales.Value or similar.
There are some ways to do this at the report level, but they are more of a pain: my current favorite of those is to use custom code embedded in the report. Another approach is to use Aggregates of aggregates.

SQL YTD for previous years and this year

Wondering if anyone can help with the code for this.
I want to query the data and get 2 entries, one for YTD previous year and one for this year YTD.
Only way I know how to do this is as 2 separate queries with where clauses.. I would prefer to not have to run the query twice.
One column called DatePeriod and populated with 2011 YTD and 2012YTD, would be even better if I could get it to do 2011YTD, 2012YTD, 2011Total, 2012Total... though guessing this is 4 queries.
Thanks
EDIT:
In response to help clear a few things up:
This is being coded in MS SQL.
The data looks like so: (very basic example)
Date | Call_Volume
1/1/2012 | 4
What I would like is to have the Call_Volume summed up, I have queries that group it by week, and others that do it by month. I could pull all the dailies in and do this in Excel but the table has millions of rows so always best to reduce the size of my output.
I currently group by Week/Month and Year and union all so its 1 output. But that means I have 3 queries accessing the same table, large pain, very slow not efficient and that is fine but now I also need a YTD so its either 1 more query or if I could find a way to add it to the yearly query that would ideal:
So
DatePeriod | Sum_Calls
2011 Total | 40
2011 YTD | 12
2012 Total | 45
2012 YTD | 15
Hope this makes any sense.
SQL is built to do operations on rows, not columns (you select columns, of course, but aggregate operations are all on rows).
The most standard approach to this is something like:
SELECT SUM(your_table.sales), YEAR(your_table.sale_date)
FROM your_table
GROUP BY YEAR(your_table.sale_date)
Now you'll get one row for each year on record, with no limit to how many years you can process. If you're already grouping by another field, that's fine; you'll then get one row for each year in each of those groups.
Your program can then iterate over the rows and organize/render them however you like.
If you absolutely, positively must have columns instead, you'll be stuck with something like this:
SELECT SUM(IF(YEAR(date) = 2011, sales, 0)) AS total_2011,
SUM(IF(YEAR(date) = 2012, total_2012, 0)) AS total_2012
FROM your_table
If you're building the query programmatically you can add as many of those column criteria as you need, but I wouldn't count on this running very efficiently.
(These examples are written with some MySQL-specific functions. Corresponding functions exist for other engines but the syntax would be a little different.)

Is this SQL the most efficient way

We have a table that converts SAT scores into ACT scores using a year. if the data changes in the future we would add the new scores along with the year the scores change. We need to pass in a year and sat score and return the correct act score.
sample data with three rows would be
act sat year
28 1010 1998
29 1010 2012
30 1010 2015
If I pass in a SAT score of 1010 and a year of 2014 I should return an act score of 29 back.
I wrote the following SQL statement that works.
select act,
RANK() OVER(ORDER BY year DESC)
from keessattbl
where sat = 1010 and INT(year) <= 2014
FETCH FIRST ROW ONLY
Is this the most efficient way to handle this.
Thanks in advance Doug
Another option would be to use the following:
select k1.*
from keessattbl k1
where k1.sat = 1010
and k1.year = (select max(k2.year)
from keessattbl k2
where k2.sat = k1.sat
and k2.year <= 2014)
You will need to check which one is more efficient. If year (and possibly sat) is indexed, then both are probably quite fast.
But you will need to look at the execution plan (or simply time the statements) to find out.
I would say "Sure." Is it not performing well?
Also, most DBMS's have some way to get the first row of a result set, so you don't need to use DB2 unless you want to.
if you are not sure if it's the most efficient way to write then you can check by doing an EXPLAIN on the query. write the query another way, do an EXPLAIN on it and compare the costs. IBM provides the IBM Data Studio product for free. you can just right-click on your sql and select Visual Explain to get the results in the gui.