20 Yr SQL pro, new to MDX.
Trying to create a measure to get sales for products 30, 60, 90 days etc. after launch, but I want to exclude incomplete time periods. Here would be the sql:
select ProductName, sum(sales) '60DaySales'
from dimProduct p join factSales s on p.productkey = s.productkey
join dimCalendar c on s.orderDateKey = c.CalendarKey
where datediff(dd,p.LaunchDate,c.Date) between 31 and 62
and exists (select 1 from sales etc... where date >= 62 days)
Basically I only want to show '60DaySales' for products that also have sales beyond 62 days.
I have this MDX which gets me the time period:
sum(
filter(
[Sales].[Days Since Launch].members
,sales.[Days Since Launch].membervalue > 30 AND
sales.[Days Since Launch].membervalue < 63
)
,[Measures].[SalesBase]
)
but I'm not sure how to exclude items with no sales beyond 62 days. I've tried some combinations of iif(exists.. ) and nonempty but no luck...
I'd add extra columns rather than a calculated measures. I had a similar tasks (Sales for 30,60,90 days, but from the first sale date of customer). The best way is to add a columns to your sale measure table:
sales30 = iif(dateadd(day,30,p.LaunchDate) >= c.Date, sales, null),
sales60 = iif(dateadd(day,60,p.LaunchDate) >= c.Date, sales, null),
sales90 = iif(dateadd(day,90,p.LaunchDate) >= c.Date, sales, null)
Tasks like sales 30 days per every product is doable via MDX, but they are performance killers for big dimensions. SQL Server does it better due to its concurrent nature. Meanwhile, MDX isn't good at heavy calculations like these. So I am not even providing the code.
Related
I'm attempting to determine the YoY growth by month, 2017 to 2018, for number of Company bookings per property.
I've tried casting and windowed functions but am not obtaining the correct result.
Example Table 1: Bookings
BookID Amnt BookType InDate OutDate PropertyID Name Status
-----------------------------------------------------------------
789555 $1000 Company 1/1/2018 3/1/2018 22111 Wendy Active
478141 $1250 Owner 1/1/2017 2/1/2017 35825 John Cancelled
There are only two book types (e.g., Company, Owner) and two Book Status (e.g., Active and Cancelled).
Example Table 2: Properties
Property ID State Property Start Date Property End Date
---------------------------------------------------------------------
33111 New York 2/3/2017
35825 Michigan 7/21/2016
The Property End Date is blank when the company still owns it.
Example Table 3: Months
Start of Month End of Month
-------------------------------------------
1/1/2018 1/31/2018
The previous developer created this table which includes a row for each month from 2015-2020.
I've tried many various iterations of my current code and can't even come close.
Desired Outcome
I need to find the YoY growth by month, 2017 to 2018, for number of Company bookings per property. The stakeholder has requested the output to have the below columns:
Month Name Bookings_Per_Property_2017 Bookings_Per_Property_2018 YoY
-----------------------------------------------------------------------
The number of Company bookings per property in a month should be calculated by counting the total number of active Company bookings made in a month divided by the total number of properties active in the month.
Here is a solution that should be close to what you need. It works by:
LEFT JOINing the three tables; the important part is to properly check the overlaps in date ranges between months(StartOfMonth, EndOfMonth), bookings(InDate, OutDate) and properties(PropertyStartDate, PropertyEndDate): you can have a look at this reference post for general discussion on how to proceed efficiently
aggregating by month, and using conditional COUNT(DISTINCT ...) to count the number of properties and bookings in each month and year. The logic implicitly relies on the fact that this aggregate function ignores NULL values. Since we are using LEFT JOINs, we also need to handle the possibility that a denominator could have a 0 value.
Notes:
you did not provide expected results so this cannot be tested
also, you did not explain how to compute the YoY column, so I left it alone; I assume that you can easily compute it from the other columns
Query:
SELECT
MONTH(m.StartOfMonth) AS [Month],
COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2017 THEN b.BookID END)
/ NULLIF(COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2017 THEN p.PropertyID END), 0)
AS Bookings_Per_Property_2017,
COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2018 THEN b.BookID END)
/ NULLIF(COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2018 THEN p.PropertyID END), 0)
AS Bookings_Per_Property_2018
FROM months m
LEFT JOIN bookings b
ON m.StartOfMonth <= b.OutDate
AND m.EndOfMonth >= b.InDate
AND b.status = 'Active'
AND b.BookType = 'Company'
LEFT JOIN properties p
ON m.StartOfMonth <= COLAESCE(p.PropertyEndDate, m.StartOfMonth)
AND m.EndOfMonth >= p.PropertyStartDate
GROUP BY MONTH(m.StartOfMonth)
I am trying to work out How many field engineers work over 48 hours over a 17 week period. (by law you cannot work over 48 hours over a 17 week period)
I managed to run the query for 1 Engineer but when I run it without an Engineer filter my query is very slow.
I need to get the count of Engineers working over 48 hours and count under 48 hours then get an Average time worked per week.
Note: I am doing a Union on SPICEMEISTER & SMARTMEISTER because they are our old and new databases.
• How many field engineers go over the 48 hours
• How many field engineers are under the 48 hours
• What is the average time worked per week for engineers
SELECT DS_Date
,TechPersNo
FROM
(
SELECT DISTINCT
SMDS.EPL_DAT as DS_Date
,EN.pers_no as TechPersNo
FROM
[SpiceMeister].[FS_OTBE].[EngPayrollNumbers] EN
INNER JOIN
[SmartMeister].[Main].[PlusDailyKopf] SMDS
ON RIGHT(CAST(EN.[TechnicianID] AS CHAR(10)),5) = SMDS.PRPO_TECHNIKERNR
WHERE
SMDS.EPL_DAT >= '2017-01-01'
and
SMDS.EPL_DAT < '2018-03-01'
UNION ALL
SELECT DISTINCT
SPDS.DailySummaryDate as DS_Date
,EN.pers_no as TechPersNo
FROM
[SpiceMeister].[FS_OTBE].[EngPayrollNumbers] EN
INNER JOIN
[SpiceMeister].[FS_DS_BO].[DailySummaryHeader] SPDS
ON EN.TechnicianID = SPDS.TechnicianID
WHERE
SPDS.DailySummaryDate >= '2018-03-01'
) as Techa
where TechPersNo = 850009
) Tech
cross APPLY
Fast results
The slowness is definitely due to the use of cross apply with a correlated subquery. This will force the computation on a per-row basis and prevents SQL Server from optimizing anything.
This seems more like it should be a 'group by' query, but I can see why you had trouble making it up on account of the complex cumulative calculation in which you need output by person and by date, but the average involves not the date in question but a date range ending on the date in question.
What I would do first is make a common query to capture your base data between the two datasets. That's what I do in the 'dailySummaries' common table expression below. Then I would join dailySummaries onto itself matching by the employee and selecting the date range required. From that, I would group by employee and date, aggregating by the date range.
with
dailySummaries as (
select techPersNo = en.pers_no,
ds_date = smds.epl_dat,
dtDif = datediff(minute, smds.abfahrt_zeit, smds.rueck_zeit)
from smartMeister.main.plusDailyKopf
join smartMeister.main.plusDailyKopf smds
on right(cast(en.technicianid as char(10)),5) = smds.prpo_technikernr
where smds.epl_dat < '2018-03-01'
union all
select techPersNo = en.pers_no,
dailySummaryDate,
datediff(minute,
iif(spds.leaveHome < spds.workStart, spds.leaveHome, spds.workStart),
iif(spds.arrivehome > spds.workEnd, spds.arrivehome, spds.workEnd)
)
from spiceMeister.fs_ds_bo.dailySummaryHeader spds
join spiceMeister.fs_ds_bo.dailySummaryHeader spds
on en.TechnicianID = spds.TechnicianID
where spds.DailySummaryDate >= '2018-03-01'
)
select ds.ds_date,
ds.techPersNo,
AvgHr = convert(real, sum(dsPrev.dtDif)) / (60*17)
from dailySummaries ds
left join dailySummaries dsPrev
on ds.techPersNo = dsPrev.techPersNo
and dsPrev.ds_date between dateadd(day,-118, ds.ds_date) and ds.ds_date
where ds.ds_date >= '2017-01-01'
group by ds_date,
techPersNo
order by ds_date
I may have gotten a thing or two wrong in translation but you get the idea.
In the future, post a more minimal example. The union of the two datasets from the separate databases is not central to the problem you were trying to ask about. A lot of the date filterings aren't core to the question. The special casting in your join match logic is not important. These things cloud the real issue.
I am trying to calculate the sum of volume for the last thirty days for a set of stocks on particular days in the table important_stock_dates. The table all_stock_dates contains the same stocks but with trading volume for all dates, not just the particular days.
Sample data
all_stock_dates
stockid, date, volume
0231245, 20060314, 153
0231245, 20060315, 154
2135411, 20060314, 23
important_stock_dates
stockid, date, thirtydaysprior
0231245, 20060314, 20060130
0231245, 20060315, 20060201
2135411, 20060314, 20060130
My code
create table sum_trading_volume as
select a.stockid, a.date, sum(b.volume) as thirty_day_volume
from important_stock_dates a, all_stock_dates b
where b.date<a.date AND b.date ge a.thirtydaysprior
group by a.stockid, a.date;
Desired outcome
A table with all the observations from important_stock_dates that also has the sum of the volume from the previous 30 days based on matching stockid and dates in all_stock_dates.
Problem
The problem I'm running into is that important_stock_dates has 15 million observations and all_stock_dates has 350 million. It uses up a few hundred gigabytes of swap file running this code (maxes out the hard drive) then aborts. I can't see how to optimize the code. I couldn't find a similar problem on StackOverflow or Google.
Presumably, the query that you want joins on stockid:
create table sum_trading_volume as
select isd.stockid, isd.date, sum(asd.volume) as thirty_day_volume
from important_stock_dates isd join
all_stock_dates asd
on isd.stockid = asd.stockid and
asd.date < isd.date and asd.date >= isd.thirtydaysprior
group by isd.stockid, isd.date;
If this worked, it will probably run to completion.
I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.
I'm trying to do a JOIN query to analyze some stocks. In my first table called top10perday, I list 10 stocks per day that I have chosen to "buy" the next day and sell the following day:
date symbol
07-Aug-08 PM
07-Aug-08 HNZ
07-Aug-08 KFT
07-Aug-08 MET
...
08-Aug-08 WYE
08-Aug-08 XOM
08-Aug-08 SGP
08-Aug-08 JNJ
For instance, for record #1:
the date of the record is 07-Aug-08
I want to buy a share of PM stock on the next trading day after 07-Aug-08 (which is 08-Aug-08)
I want to sell that shar eof PM stock two trading days after 07-Aug-08), which turns out to be 11-Aug-08
My stock prices are in a table called prices, which looks like this:
date symbol price
07-Aug-08 PM 54.64
08-Aug-08 PM 55.21
11-Aug-08 PM 55.75
12-Aug-08 PM 55.95
... many more records with trading day, symbol, price
I want to do a JOIN so that my result set looks like this:
date symbol price-next-day price-two-days
07-Aug-08 PM 55.21 55.75
...
list one record per date and symbol in table1.
I have tried doing something like:
SELECT top10perday.date, top10perday.symbol, Min(prices.date) AS MinOfdate
FROM prices INNER JOIN top10perday ON prices.symbol = top10perday.symbol
GROUP BY top10perday.date, top10perday.symbol
HAVING (((Min(prices.date))>[date]));
I have tried many variations of this, but I'm clearly not on the right path, because the result set just includes 10 rows as of the earliest date shown in my top10perday table.
I am using Microsoft Access. Thanks in advance for your help! :-)
This syntax worked in Access 2003:
SELECT t10.Date, t10.Symbol, p1.date, p1.price, p2.date, p2.price
FROM
(top10perday AS t10
LEFT JOIN prices AS p1
ON t10.Symbol = p1.symbol)
INNER JOIN prices AS p2 ON t10.Symbol = p2.symbol
WHERE (
((p1.date)=((Select Min([date]) as md
from prices
where [date]>t10.[Date] and symbol = t10.symbol
))
) AND ((p2.date)=((Select Min([date]) as md
from prices
where [date]>p1.[Date] and symbol = t10.symbol)
))
);
the idea is to get the first (min) date that is greater than the date in the previous table (top10perday and the prices as p1)
This should just be a join between three copies of the prices table. The problem is that you need to join to the next trading day, and that's a slightly trickier problem, since it's not always the next day. So we end up with a more complex situation (particularly as some days are skipped beacuse of holidays).
If it weren't Access you could use row_number() to order your prices by date (using a different sequence per stock code).
WITH OrderedPrices AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY date) AS RowNum
FROM Prices
)
SELECT orig.*, next_day.price, two_days.price
FROM OrderedPrices orig
JOIN
OrderedPrices next_day
ON next_day.symbol = orig.symbol AND next_day.RowNum = orig.RowNum + 1
JOIN
OrderedPrices two_days
ON two_days.symbol = orig.symbol AND two_days.RowNum = orig.RowNum + 2
;
But you're using Access, so I don't think you have ROW_NUMBER().
Instead, you could have a table which lists the dates, having a TradingDayNumber... then use that to facilitate your join.
SELECT orig.*, next_day.price, two_days.price
FROM Prices orig
JOIN
TradingDays d0
ON d1.date = orig.date
JOIN
TradingDays d1
ON d1.TradingDayNum = d0.TradingDayNum + 1
JOIN
TradingDays d2
ON d2.TradingDayNum = d0.TradingDayNum + 2
JOIN
Prices next_day
ON next_day.symbol = orig.symbol AND next_day.date = d1.date
JOIN
Prices two_days
ON two_days.symbol = orig.symbol AND two_days.date = d2.date
But obviously you'll need to construct your TradingDays table...
Rob
My guess is:
SELECT top10perday.date, top10perday.symbol, MIN(pnd.price) AS PriceNextDay, MIN(ptd.price) AS PriceTwoDays
FROM top10perday
LEFT OUTER JOIN prices AS pnd ON (pnd.symbol = top10perday.symbol AND pnd.date > top10perday.date)
LEFT OUTER JOIN prices AS ptd ON (ptd.symbol = top10perday.symbol AND ptd.date > pnd.date)
GROUP BY top10perday.date, top10perday.symbol
HAVING ((pnd.date = Min(pnd.date) AND ptd.date = Min(ptd.date));
It´s just a shoot in the dark but my reasoning is: List all stocks you want (top10perday) and for each stock get the price, if exists, with mininum date after its date to populate the PriceNextDay and the price with minimun date after the PriceNextDay to populate the PriceTwoDays. The performance may stinks. But test it and see if it works. Later we can try to improve it.
**EDIT**ed to include Rob Farley´s comment.
I'm not a guru on this transformation but I can point you at an idea. Try using Pivot on the date column for each symbol in your query from a date to a date. This should give you a table with many columns with the name of the date you're using, and the price on each day. Indeed it should do this for every stock symbol you have over a given time.
Based on what you're trying to graph though, I think it would be interesting for you to look at the VWSP not just the spot price on your trades if you're trying to plot the stock performance.