MDX - Top X Sales People by Total Sales for Each Date - ssas

I'm trying to do this, but with MDX in my cube:
select
*
from
(
select
Date, SalesPerson, TotalSales, row_number() over(partition by Date order by TotalSales desc) as Num
from SalesFact as ms
) as x
where
Num < 5
order by
Date, SalesPerson, Num desc
Let's say I have a cube with these dimensions:
Date (Year, Month, Date) - date is always 1st of month
SalesPerson
The fact table has three columns - Date, SalesPerson, TotalSales - ie, the amount that person sold in that month.
I want, for each month, to see the top 5 sales people, and each of their TotalSales. The top 5 sales people can be different from one month to the next.
I am able to get the results for one month, using a query that looks like this:
select
[Measures].[TotalSales] on columns,
(
subset
(
order
(
[SalesPerson].children,
[Measures].[TotalSales],
bdesc
),
0,
5
)
) on rows
from
Hypercube
where
(
[Date].[Date].&[2009-03-01T00:00:00]
)
What I'm after is a query that puts Date and SalesPerson on rows, and TotalSales on columns.
I want to see over time each month, and for each month, the top 5 sales people, and how much they sold.
When I try to do it this way, it doesn't seem to filter / group the sales people by each date (get top 5 for each date). The values returned are all over the place and include very low and null values. Notably, the SalesPerson list is the same for each date, even though TotalSales varies a lot.
select
[Measures].[TotalSales] on columns,
(
[Date].[Hierarchy].[Date].members,
subset
(
order
(
[SalesPerson].children,
[Measures].[TotalSales],
bdesc
),
0,
5
)
) on rows
from
Hypercube
It seems that everything inside "subset" needs to be filtered by the current [Date].[Hierarchy].[Date], but using CurrentMember gives a crossjoin / axis error:
select
[Measures].[TotalSales] on columns,
(
[Date].[Hierarchy].[Date].members,
subset
(
order
(
([SalesPerson].children, [Date].[Hierarchy].CurrentMember),
[Measures].[TotalSales],
bdesc
),
0,
5
)
) on rows
from
Hypercube
Error: Executing the query ... Query (3, 2) The Hierarchy hierarchy
is used more than once in the Crossjoin function.
Execution complete
I've tried several variations of the last query with no luck.
Hopefully the answers will be helpful to others new to MDX as well.

I eventually found out how to do what I was looking for. The solution revolved around using the Generate function, and starting with the basic example on MSDN and modifying the dimensions and measure to be the ones in my cube got me going in the right direction.
From http://msdn.microsoft.com/en-us/library/ms145526.aspx
Is there a better way?
Also, be wary of trying to refactor sets into the with block. This seems to change when the set is evaluated / change its scope and will change the results.
with
set
Dates as
{
[Date].[Hierarchy].[Date].&[2009-02-01T00:00:00],
[Date].[Hierarchy].[Date].&[2009-03-01T00:00:00],
[Date].[Hierarchy].[Date].&[2009-04-01T00:00:00]
}
select
Measures.[TotalSales]
on columns,
generate
(
Dates,
topcount
(
[Date].Hierarchy.CurrentMember
*
[SalesPerson].Children,
5,
Measures.[TotalSales]
)
)
on rows
from
Hypercube

Related

SQL: How to create supplemental time-series records "out of thin air" from existing records

Suppose I have a table CUSTEVENTS listing customers active in certain months. I now want to consider a customer as being active even if it was in the prior two months.
Simple example, the data might start as:
MONTH_ENDING
CUSTNUM
2022-10-31
72378
2022-11-30
72378
It should be transformed into the following, given the expanded definition of active:
MONTH_ENDING
CUSTNUM
2022-10-31
72378
2022-11-30
72378
**2022-12-31
72378**
**2023-01-31
72378***
I'm arrive at the simplest / most elegant way to get there. I could certainly explode out the data using a time series reference table which would list all the pairs of MONTH_ENDING and "additional" MONTH_ENDING values that "count". Or perhaps I could UNION three subqueries that take the MONTH_ENDING, add_months(MONTH_ENDING,1) add_months(MONTH_ENDING,2). But, maybe there's something even more concise not involving multiple unioned queries or an instrumental time-mapping table.
I happen to be using Teradata but I'm not sure I care about platform-specificity; if there's a Teradata-only approach that works, I'll gladly take it.
The general approach is to first calculate the "Last" event time for a given customer, which is handled by something like
LAG(EVENT_DT) OVER (PARTITION BY CUSTNUM ORDER BY EVENT_DT)
The next concept is islands. You want to calculate that an island begins if the event happened after {your window} has elapsed from the prior one. Vice versa to calculate the island's end.
You can actually find some great online articles about this classic problem: Gaps and Islands problem.
If you understand CTE's, you can probably follow it through this example code I wrote. The first CTE is there to simply allow you to easily add a condition (instead of 1=1) for the events you care about.
WITH CTE_CONDITION AS (
SELECT
EVENT_DT AS dtm,
CUSTNUM
FROM
My_First_Table
WHERE
1 = 1
AND EVENT_DT is not null
),
CTE_LAGGED AS (
SELECT
dtm,
CUSTNUM,
LAG(dtm) OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS previous_datetime,
LEAD(dtm) OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS next_datetime,
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
CTE_CONDITION.dtm
) AS island_location
FROM
CTE_CONDITION
),
CTE_ISLAND_START AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS island_number,
CUSTNUM,
dtm AS island_start_datetime,
island_location AS island_start_location
FROM
CTE_LAGGED
WHERE
(
DATEDIFF(MONTH, previous_datetime, dtm) > 2
OR CTE_LAGGED.previous_datetime IS NULL
)
),
CTE_ISLAND_END AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS island_number,
CUSTNUM,
dtm AS island_end_datetime,
island_location AS island_end_location
FROM
CTE_LAGGED
WHERE
DATEDIFF(MONTH, dtm, next_datetime) > 2
OR CTE_LAGGED.next_datetime IS NULL
)
SELECT
CTE_ISLAND_START.CUSTNUM,
CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime,
DATEDIFF(
MONTH, CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime
) AS ISLAND_DURATION_MONTH,
(
SELECT
COUNT(*)
FROM
CTE_LAGGED
WHERE
CTE_LAGGED.dtm BETWEEN CTE_ISLAND_START.island_start_datetime
AND CTE_ISLAND_END.island_end_datetime
AND CTE_LAGGED.CUSTNUM = CTE_ISLAND_START.CUSTNUM
AND CTE_LAGGED.CUSTNUM = CTE_ISLAND_START.CUSTNUM
) AS island_row_count
FROM
CTE_ISLAND_START
INNER JOIN CTE_ISLAND_END ON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number
AND CTE_ISLAND_START.CUSTNUM = CTE_ISLAND_END.CUSTNUM
I wrote this into a Rasgo template using Snowflake syntax, but only minor adjustments should be needed to get this to work in Teradata.
Once you have this result, then this tells you the periods of activity that include the 2 month window. You can then use a calendar table at each month-begin and query or not whether the customer was "active" or not based on whether that date falls into these active ranges.

Add missing months with values from previous month

I need to use this SQL query for a software and get the time in a particular format hence the reason for the Time column however I need the query to insert the months that are missing with the value from the previous month. This is the query I currently have.
SELECT [accountnumber],SUM([postingamount]) AS Amount, [accountingdate],
convert(varchar(4),year(accountingdate))+'M'+ Format(DATEPART( MONTH, accountingdate) , '00')
AS [Time]
FROM [7 GL Detail MACL]
where [accountingdate]>='2019-01-01'
GROUP BY [accountingdate],[postingamount],[accountnumber]
Current Results
Expected Results
Since you didn't specify the RDBMS system you're using, I can't guarantee that this logic will work because every system uses slightly different SQL syntax.
However I used Rasgo datespine function to generate this SQL, as it is quite complex to wrap your head around, and tested it on Snowflake.
The main differences between Snowflake and other systems are: DATEADD and TABLE (GENERATOR())
In case you can't modify this to work in your system, here are the basic steps which you'll want to follow:
Select unique accountnumbers
Select unique dates (month beginnings?) This is where Snowflake uses GENERATOR but other systems might actually have a Calendar table you can select from
Cross Join (cartesian join) these to create every possible combination of accountnumber and date
Outer Join #3 to your data (might have to truncate your date to month-begin)
Filter out rows that dont apply. Like for instance you might have just inserted a row for 1/1/2019 for an account that didn't even begin until 12/12/2020.
WITH GLOBAL_SPINE AS (
SELECT
ROW_NUMBER() OVER (ORDER BY NULL) as INTERVAL_ID,
DATEADD('MONTH', (INTERVAL_ID - 1), '2019-01-01'::timestamp_ntz) as SPINE_START,
DATEADD('MONTH', INTERVAL_ID, '2022-06-01'::timestamp_ntz) as SPINE_END
FROM TABLE (GENERATOR(ROWCOUNT => 42))
),
GROUPS AS (
SELECT
accountnumber,
MIN(DESIRED_INTERVAL) AS LOCAL_START,
MAX(DESIRED_INTERVAL) AS LOCAL_END
FROM [7 GL Detail MACL]
GROUP BY
accountnumber
),
GROUP_SPINE AS (
SELECT
accountnumber,
SPINE_START AS GROUP_START,
SPINE_END AS GROUP_END
FROM GROUPS G
CROSS JOIN LATERAL (
SELECT
SPINE_START, SPINE_END
FROM GLOBAL_SPINE S
WHERE S.SPINE_START >= G.LOCAL_START
)
)
SELECT
G.accountnumber AS GROUP_BY_accountnumber,
GROUP_START,
GROUP_END,
T.*
FROM GROUP_SPINE G
LEFT JOIN {{ your_table }} T
ON DESIRED_INTERVAL >= G.GROUP_START
AND DESIRED_INTERVAL < G.GROUP_END
AND G.accountnumber = T.accountnumber;
You were also doing an aggregation step, but I figure once you get this complicated part down, you can figure out how to finally aggregate it the way you want it.

Multiple sum subqueries for percentage

I need help with the following problem: I want to make a query that contains multiples sums and then takes those sums and uses them to get a percentage: percentage= s1/s1+s2.
I have as input the following data:
Orders shipping date, Nb of orders that have arrived late, Nb of orders that have arrived on time
What I want as output: The percentage of orders that have arrived late and orders that have arrived on time.
I want another column in the table that will have the percentage using SQL.
Concrete example:
*On 2022/01/04 **10:00 AM** I have 3 orders late and 4 order on time=> 7 orders in total. Percentage=3/7 (late), (4/7) on time
*At 2022/01/04 **11:00 AM** I have 5 orders late and 6 orders on time=>11 orders in total (but all this entry is summed with the previous entry so:) <=> 5+3 orders late, 4+6 orders on time, 18 orders in total => percentage= 8/18 late, 10 on time.
In order to sum previous entries order numbers with status "LATE" to current on time order number I wrote the following sql:
(sum1=s1)
SELECT s1.EventDate, (
SELECT SUM(s2.NbOfOrders)
FROM OrderShipmentStats s2
WHERE s2.EventDate <= s1.EventDate AND s2.Status='LATE'
) AS cnt
FROM OrderShipmentStats s1
GROUP BY s1.EventDate, s1.Status
The same kind of sql was written for "On Time" and it works. But what I need to do now is get the values and add them together of the two sql queries and based on the status which is late or on time do s1/s1+s2 or s2/s2+s1.
My problem is that I do not know how to do this formula in a single query using those 2 subqueries, any help would be great.
Picture with Table
Above there is the link with the picture containing how the table looks(I am new so I am not allowed to embed a photo).
The percentage column is the one I will add and there are lines pointing towards how that is calculated.
I created the table based on your image and added a few rows to it.
In the query you could see total orders count per hour, per status and the grand total as you mentioned in the image.
The query looks like:
create table OrderShipmentsStats
(
EventDate datetime not null,
Status varchar(10) not null,
OrdersCount int not null
)
insert into OrderShipmentsStats
values
('2022-01-04T10:00:00','Late',3),
('2022-01-04T10:00:00','On Time',4),
('2022-01-04T11:00:00','Late',5),
('2022-01-04T11:00:00','On Time',6),
('2022-01-04T12:00:00','Late',1),
('2022-01-04T12:00:00','On Time',2)
SELECT
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
GrandStatusTotal,
-- at the line below, multiplying by 1.0 will convert the result and we would receive smth like 0.45, 0.123, some percentage
-- but we want the actual percent like 15%, or 50%. to obtain it, just multiply by 100
cast(1.0 * o.StatusTotal / o.GrandStatusTotal as decimal(5,3)) * 100 as Percentage
from
(
select
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
SUM(TotalPerHour) over (partition by Status order by EventDate asc) as GrandStatusTotal
from
(
select
EventDate,
Status,
OrdersCount,
Sum(OrdersCount) over (partition by EventDate order by EventDate asc) as TotalPerHour,
SUM(OrdersCount) over (partition by Status order by EventDate asc) as StatusTotal
from OrderShipmentsStats
) as t
) as o
order by EventDate, Status

Get sum of previous 6 values including the group

I need to sum up the values for the last 7 days,so it should be the current plus the previous 6. This should happen for each row i.e. in each row the column value would be current + previous 6.
The case :-
(Note:- I will calculate the hours,by suming up the seconds).
I tried using the below query :-
select SUM([drivingTime]) OVER(PARTITION BY driverid ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from [f.DriverHseCan]
The problem I face is I have to do grouping on driver,asset for a date
In the above case,the driving time should be sumed up and then,its previous 6 rows should be taken,
I cant do this using rank() because I need these rows as well as I have to show it in the report.
I tried doing this in SSRS and SQL both.
In short it is adding total driving time for current+ 6 previous days
Try the following query
SELECT
s.date
, s.driverid
, s.assetid
, s.drivingtime
, SUM(s2.drivingtime) AS total_drivingtime
FROM f.DriverHseCan s
JOIN (
SELECT date,driverid, SUM(drivingtime) drivingtime
FROM f.DriverHseCan
GROUP BY date,driverid
) AS s2
ON s.driverid = s2.driverid AND s2.date BETWEEN DATEADD(d,-6,s.date) AND s.date
GROUP BY
s.date
, s.driverid
, s.assetid
, s.drivingtime
If you have week start/end dates, there could be better performing alternatives to solve your problem, e.g. use the week number in SSRS expressions rather than do the self join on SQL server
I think aggregation does what you want:
select sum(sum([drivingTime])) over (partition by driverid
order by date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)
from [f.DriverHseCan]
group by driverid, date
I guess you need to use CROSS APPLY.
Something like following? :
SELECT driverID,
date,
CA.Last6DayDrivingTime
FROM YourTable YT
CROSS APPLY
(
SELECT SUM(drivingTime) AS Last6DayDrivingTime
FROM YourTable CA ON CA.driverID=YT.driverID
WHERE CA.date BETWEEN DATEADD(DAY,-6,YT.date) AND YT.date)
) CA
Edit:
As you commented that cross apply slow down the performance, other option is to pre calculate the week values in temp table or using CTE and then use them in your main query.

Why would the query show data from the wrong month?

I have a query:
;with date_cte as(
SELECT r.starburst_dept_name,r.monthly_past_date as PrevDate,x.monthly_past_date as CurrDate,r.starburst_dept_average - x.starburst_dept_average as Average
FROM
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
) r
JOIN
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
Where month(monthly_past_date) > month(DATEADD(m,-2,monthly_past_date))
) x
ON r.starburst_dept_name = x.starburst_dept_name AND r.rowid = x.rowid+1
Where r.starburst_dept_name is NOT NULL
)
Select *
From date_cte
Order by Average DESC
So doing some testing, I have alter some columns data, to see why it gives me certain information. I don't know why when I run the query it gives my a date column that should not be there from "january" (row 4) like the picture below:
The database has more data that has the same exact date '2014-01-25 00:00:00.000', so I'm not sure why it would only get that row and compare the average?
I did before I run the query alter the column in that row and change the date? But I'm not sure if that would have something to do with it.
UPDATE:
I have added the sqlfinddle,
What I would like to get it subtract the average
from last_month - last 2 month ago.
It Was actually working until I made a change and alter the data.
I made the changes to test a certain situation, which obviously lead
to learning that there are flaws to the query.
Based on your SQL Fiddle, this eliminates joins from prior than month-2 from showing up.
SELECT
thismonth.starburst_dept_name
,lastmonth.monthtly_past_date [PrevDate]
,thismonth.monthtly_past_date [CurrDate]
,thismonth.starburst_dept_average - lastmonth.starburst_dept_average as Average
FROM dbo.cse_reports thismonth
inner join dbo.cse_reports lastmonth on
thismonth.starburst_dept_name = lastmonth.starburst_dept_name
AND month(DATEADD(MONTH,-1,thismonth.monthtly_past_date))=month(lastmonth.monthtly_past_date)
WHERE MONTH(thismonth.monthtly_past_date)=month(DATEADD(MONTH,-1,GETDATE()))
Order by thismonth.starburst_dept_average - lastmonth.starburst_dept_average DESC