I am trying to translate a query that I wrote in TSQL to DAX and have tough time figuring out how to use subquery in DAX.
Here is my source SQL code:
Select [DayOfWeekNumber], [DayOfWeek], Ratio=Avg (100*Opened/ Sent) From
(select [DayOfWeekNumber], [DayOfWeek], e.Schedule_ID
, Count (e.opened) as Opened
, Sum (e.NoOfEmailSent) as Sent
from
Events e
join dim_date D on d.ID_Date = e.ID_Date
where
e.AccountNumber = 1
group by d.[DayOfWeek], [DayOfWeekNumber], e.Schedule_ID
having
Sum (e.NoOfEmailSent) > Count (e.opened)
) OBSHour
group by [DayOfWeekNumber], [DayOfWeek]
Order by [DayOfWeekNumber]
The purpose of this query is to calculate open ratio of number of opened emails vs number of emails sent by first calculating opened / Sent for each email schedule and weekdays and then taking average of all ratios for the same weekday.
For example if data is like this for a Sunday:
Schedule_ID Open Sent Ratio
123 10 100 .1
125 2 10 .2
129 1 4 .25
Then final ratio for Sunday will become (.1+.2+.25)/3=.18
I appreciate your help.
Related
I have a dataset of parts, price per part, and month. I am accessing this data via a live connection to a SQL Server database. This database gets updated monthly with new prices for each part. What I would like to do is graph one year of price data for the ten parts whose prices changed the most over the last month (either as a percentage of last month's price or as a total change in dollars.)
Since my database connection is live, ideally Tableau would grab the new price data each month, updating the top ten parts whose prices changed for the new period. I don't want to manually have to change the months or use a stored procedure if possible.
part price date
110 167.66 2018-12-01 00:00:00.000
113 157.82 2018-12-01 00:00:00.000
121 99.16 2018-12-01 00:00:00.000
133 109.82 2018-12-01 00:00:00.000
137 178.66 2018-12-01 00:00:00.000
138 154.99 2018-12-01 00:00:00.000
143 67.32 2018-12-01 00:00:00.000
149 103.82 2018-12-01 00:00:00.000
113 167.34 2018-11-01 00:00:00.000
121 88.37 2018-11-01 00:00:00.000
133 264.02 2018-11-01 00:00:00.000
Create a calculated field called Recent_Price as
if DateDiff(‘month’, [date], Today()) <= 1 then [price] end. This returns the price for recent records and null for older records. You might need to tweak the condition based on details, or use an LOD calc to always get the last 2 values regardless of today’s date.
Create a calculated field called Price_Change as Max([Recent_Price]) - Min([Recent_Price]) Note you can’t tell from this whether the change was positive or negative, just its magnitude.
Make sure part is a discrete dimension. Drag it to the Filter Shelf. Set the filter to show the the Top N part by Price_Change
It’s not hard to extend this to include the sign in the price change, or to convert it a percentage. Hint, you’ll probably need a pair of calcs like that in step 1 to select prices for specific months
You haven't provided any sample data, but you could follow something like this,
;WITH top_parts AS (
-- select the top 10 parts based on some criteria
SELECT TOP 10 parts.[id], parts.[name] FROM parts
ORDER BY <most changed>
)
SELECT price.[date], p.[name], price.[price] FROM top_parts p
INNER JOIN part_price price ON p.[id] = price.[part_id]
ORDER BY price.[date]
Use a CTE to get your top parts.
Select from the CTE, join to the price table to get the prices for each part.
Order the prices or bucketize them into months.
Feed it to your graph.
It will be something like this for just one month. If you need the whole year you have to specify clearly what exactly you want to see:
;WITH cte as (
SELECT TOP 10 m0.Part
, Diff = ABS(m0.Price - m1.Price)
, DiffPrc = ABS(m0.Price - m1.Price) / m1.Price
FROM Parts as m0
INNER JOIN (SELECT MaxDate = MAX([Date] FROM Parts) as md
ON md.MaxDate = m0.[Date]
INNER JOIN Parts as m1 ON m0.Part = m1.Part and DATEADD(MONTH,-1,md.MaxDate) = m1.[Date]
ORDER BY ABS(m0.Price - m1.Price) DESC
-- Top 10 by percentage:
-- ORDER BY ABS(m0.Price - m1.Price) / m1.Price DESC
)
SELECT * FROM Parts as p
INNER JOIN cte ON cte.Part = p.Part
-- Input from user,you decide in which format last month date will be pass
-- In other words , #InputLastMonth is parameter of proc
--Suppose it pass in yyyy-MM-dd manner
Declare #InputLastMonth date='2018-12-31'
-- to get last one year data
--Declare local variable which is not pass
declare #From date= dateadd(day,1,dateadd(month,-12, #InputLastMonth))
Declare #TopN int=10-- requirement
--select #InputLastMonth,#From
Select TOP (#TopN) parts,ChangePrice
(
select parts,ABS(max(price)-min(price)) as ChangePrice
from dbo.Table1
where dates>=#From and dates<=#InputLastMonth
group by parts
)t4
order by ChangePrice desc
By change most ,I understand that,suppose there is one parts 'Part1' which was price 100 in first month and change to 1000 in last months.
On the other hand Part2 change several times during same period but final change was only 12.
In other word Part1 change only twice but change difference was huge,Part2 change several time but change difference was small.
So Part1 will be preferred.
Second thing is change can be negative as well as positive.
Correct me if I have not understood your requirement.
I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.
I am having performance issue on a set of SQLs to generate current month's statement in realtime.
Customers will purchase some goods using points from an online system, and the statement containing "open_balance", "point_earned", "point_used", "current_balance" should be generated.
The following shows the shortened schema :
//~200k records
customer: {account_id:string, create_date:timestamp, bill_day:int} //totally 14 fields
//~250k records per month, kept for 6 month
history_point: {point_id:long, account_id:string, point_date:timestamp, point:int} //totally 9 fields
//each customer have maximum of 12 past statements kept
history_statement: {account_id:string, open_date:date, close_date:date, open_balance:int, point_earned:int, point_used:int, close_balance:int} //totally 9 fields
On every bill day, the view should automatically create a new month statement.
i.e. If bill_day is 15, then transaction done on or after 16 Dec 2013 00:00:00 should belongs to new bill cycle of 16 Dec 2013 00:00:00 - 15 Jan 2014 23:59:59
I tried the approach described below,
Calculate the last close day for each account (in materialized view, so that it update only after there is new customer or past month statement inserted into history_statement)
Generate a record for each customer each month that I need to calculate (Also in materialized view)
Sieve the point record for only point records within the date that I will calculate (This takes ~0.1s only)
Join 2 with 3 to obtain point earned and used for each customer each month
Join 4 with 4 on date less than open date to sum for open and close balance
6a. Select from 5 where open date is less than 1 month old as current balance (these are not closed yet, and the point reflect the point each customer own now)
6b. All the statements are obtained by union of history_statement and 5
On a development server, the average response time (200K customer, 1.5M transactions in current month) is ~3s which is pretty slow for web application, and on the testing server, where resources are likely to be shared, the average response time (200K customer, ~200k transaction each month for 8 months) is 10-15s.
Does anyone have some idea on writing a query with better approach or to speed up the query?
Related SQL:
2: IV_STCLOSE_2_1_T(Materialized view)
3: IV_STCLOSE_2_2_T (~0.15s)
SELECT ACCOUNT_ID, POINT_DATE, POINT
FROM history_point
WHERE point_date >= (
SELECT MIN(open_date)
FROM IV_STCLOSE_2_1_t
)
4: IV_STCLOSE_3_T (~1.5s)
SELECT p0.account_id, p0.open_date, p0.close_date, COALESCE(SUM(DECODE(SIGN(p.point),-1,p.point)),0) AS point_used, COALESCE(SUM(DECODE(SIGN(p.point),1,p.point)),0) AS point_earned
FROM iv_stclose_2_1_t p0
LEFT JOIN iv_stclose_2_2_t p
ON p.account_id = p0.account_id
AND p.point_date >= p0.open_date
AND p.point_date < p0.close_date + INTERVAL '1' DAY
GROUP BY p0.account_id, p0.open_date, p0.close_date
5: IV_STCLOSE_4_T (~3s)
WITH t AS (SELECT * FROM IV_STCLOSE_3_T)
SELECT t1.account_id AS STAT_ACCOUNT_ID, t1.open_date, t1.close_date, t1.open_balance, t1.point_earned AS point_earn, t1.point_used , t1.open_balance + t1.point_earned + t1.point_used AS close_balance
FROM (
SELECT v1.account_id, v1.open_date, v1.close_date, v1.point_earned, v1.point_used, COALESCE(sum(v2.point_used + v2.point_earned),0) AS OPEN_BALANCE
FROM t v1
LEFT JOIN t v2
ON v1.account_id = v2.account_id
AND v1.OPEN_DATE > v2.OPEN_DATE
GROUP BY v1.account_id, v1.open_date, v1.close_date, v1.point_earned, v1.point_used
) t1
It turns out to be that in IV_STCLOSE_4_T
WITH t AS (SELECT * FROM IV_STCLOSE_3_T)
is problematic.
At first thought WITH t AS would be faster as IV_STCLOSE_3_T is only evaluated once, but it apparently forced materializing the whole IV_STCLOSE_3_T, generating over 200k records despite I only need at most 12 of them from a single customer at any time.
With the above statement removed and appropriately indexing account_id, the cost reduced from over 500k to less than 500.
This is my table structure
Id Date Candidates
1 2013-04-07 16
2 2013-04-27 12
3 2013-10-22 13
4 2013-10-08 1
5 2013-10-24 9
6 2012-07-11 14
7 2012-07-14 5
I want dynamic query for finding maximum number of candidates recruited. I want result on the basis of maximum number of candidates on year wise and month wise and date wise. How to write query for this criteria in sql. Please anyone help me
The Output should be, If i need on month basis means the output for above data is,
the highest number of candidates recruited is on month-04 and the total is 28,
The output is Month-April and Candidates-28
I need output as on which maximum number candidates are recruited for above data.
A common table expression can get the date details, and then you'll just need a regular GROUP BY to get the correct result;
WITH cte AS (
SELECT SUBSTRING(CONVERT(VARCHAR, Date, 112), 1, 4) year, candidates
FROM recruited
)
SELECT TOP 1 year best_year, SUM(candidates) candidates FROM cte
GROUP BY year ORDER BY candidates DESC;
An SQLfiddle to test with.
Goal:
Combine two queries I currently run.
Have the WEEK from query 1 be a filtering criteria for query 2.
Query 1:
----------------------------------------------------
-- ************************************************
-- Accounts Recieveable (WEEKLY) snapshot
-- ************************************************
----------------------------------------------------
SELECT
TRUNC(TX.ORIG_POST_DATE,'WW') AS WEEK,
SUM(TX.AMOUNT) AS OUTSTANDING
FROM
TX
WHERE
--Transaction types
(TX.DETAIL_TYPE = "Charges" OR
TX.DETAIL_TYPE = "Payments" OR
TX.DETAIL_TYPE = "Adjustments")
GROUP BY
TRUNC(tx.ORIG_POST_DATE,'WW')
ORDER BY
TRUNC(tx.ORIG_POST_DATE,'WW')
Output Query 1:
WEEK OUTSTANDING
1/1/2012 18203.95
1/8/2012 17605
1/15/2012 19402.33
1/22/2012 18693.45
1/29/2012 19100
Query 2:
----------------------------------------------------
-- ************************************************
-- Weekly Charge AVG over previous 13 weeks based on WEEK above
-- ************************************************
----------------------------------------------------
SELECT
sum(tx.AMOUNT)/91
FROM
TX
WHERE
--Post date
TX.ORIG_POST_DATE <= WEEK AND
TX.ORIG_POST_DATE >= WEEK-91 AND
--Charges
(TX.DETAIL_TYPE = "Charge")
Output Query 2:
thirteen_Week_Avg
1890.15626
Desired Output
WEEK OUTSTANDING Thirteen_Week_Avg
1/1/2012 18203.95 1890.15626
1/8/2012 17605 1900.15626
1/15/2012 19402.33 1888.65132
1/22/2012 18693.45 1905.654
1/29/2012 19100 1900.564
Note the Thirteen_Week_Avg is 13 weeks prior to the "WEEK" Field. So it changes each week as the window of the average moves forward.
Also what tutorials do you guys know of that I could read to better understand the solution this type of question?
Try using an analytic function such as:
select WEEK, sum(OUTSTANDING) as OUTSTANDING, THIRTEEN_WEEK_AVG
from (select trunc(TX.ORIG_POST_DATE, 'WW') as WEEK
,AMOUNT as OUTSTANDING
,avg(
TX.AMOUNT)
over (order by trunc(TX.ORIG_POST_DATE, 'WW')
range numtodsinterval(7 * 13, 'day') preceding)
as THIRTEEN_WEEK_AVG
from TX
where (TX.DETAIL_TYPE = 'Charges'
or TX.DETAIL_TYPE = 'Payments'
or TX.DETAIL_TYPE = 'Adjustments'))
group by WEEK, THIRTEEN_WEEK_AVG
order by WEEK
An introduction to analytic functions can be found here. And how NUMTODSINTERVAL works is here.
My first thought is this is best handled by a stored procedure that sets two cursors, one for each of the queries and each cursor takes in a week parameter. You could call the first cursor that outputs the week and outstanding and have this loop through however many times and move back 1 week each time through. then pass that week to the thirteen week avg cursor and let it output the avg amount.
If you just want it on the screen you can use dbms_output.put_line. If you want to write it to a file such as a csv then you need to set a filehandler and all the associated plumbing to create/open/write/save a file.
O'reilly has a pretty good pl/sql book that explains procs and cursors well.