Measure over one dimension conditional on value of another dimension

Measure over one dimension conditional on value of another dimension - mdx

I am new to MDX and have a problem I have been struggling with.
I have essentially the below table (with obviously more dimensions behind the scenes in my cube). If an ID was new yesterday it will appear in the cube with date T-1 and Category "New", it will also have a different set of values for date T and Category "Old". If it was Old yesterday it will be old today and have a different value.
I can get the sum of values for "New" IDs on the T-1 slice, but cannot get the sum of values on the Today (T) slice for the IDs which were "New" yesterday (and therefore are old today) - I.e. the value today of the IDs which were new yesterday.
| ID | Category | Date | Value |
|---------|------------|------|-------|
| 1 | New | T-1 | a |
| 1 | Old | T | b |
| 2 | Old | T-1 | c |
| 2 | Old | T | d |
| 3 | New | T-1 | e |
| 3 | Old | T | f |
If the data was in a flat table I think I could SQL query it, but I'm a bit stumped when it comes to the MDX. Thanks.
EDIT:
The MDX below gives me the sum of the "Old" IDs. I think I somehow need to create a new measure which looks at the PrevMember of ID along the date slice and checks whether it was new, and if so then assign the current measure of the ID that this new measure - but I do not know how to do it...
SELECT
NON EMPTY (
[Date].Members
) ON 0
,
NON EMPTY (
[Measures].[Value.SUM]
) ON 1
FROM [MyCube]
WHERE (
[Category].&[Old]
)
EDIT2:
Using SouravA's answer I have got my code working using the below code.
WITH
SET NewIDs AS
INTERSECT(NonEmpty([ID].Children, CROSSJOIN([Date].&[T-1], [Category].&[Old]))
, NonEmpty([ID].Children, CROSSJOIN([Date].&[T], [Category].&[EOD]))
)
SELECT
NON EMPTY([Dim2].Children
) ON 0
,
NON EMPTY([NewIDs]
) ON 1
FROM [MyCube]
WHERE [Measures].[Value]

Does this help? (Added comments in code below)
WITH SET NewYesterdayOldToday ///Get all such Ids which were new yesterday but are old today
AS
INTERSECT(
NonEmpty(
[YourTable].[Id].[All].MEMBERS,
[YourTable].[Category].&[Old] * [YourTable].[Date]. &[T])
,
NonEmpty(
[YourTable].[Id].[All].MEMBERS,
[YourTable].[Category].&[New] * [YourTable].[Date]. &[T-1])
)
MEMBER TotalValue AS ///Getting the sum of values corresponding to those Ids which are "NewYesterdayOldToday" today
(
NewYesterdayOldToday, [YourTable].[Date]. &[T], [Measures].[Value.SUM]
)
SELECT TotalValue
ON 0
FROM [MyCube]

Related

How to sum the minutes of each activity in Postgresql?

The column "activitie_time_enter" has the times.
The column "activitie_still" indicates the type of activity.
The column "activitie_walking" indicates the other type of activity.
Table example:
activitie_time_enter | activitie_still | activitie_walking
17:30:20 | Still |
17:31:32 | Still |
17:32:24 | | Walking
17:33:37 | | Walking
17:34:20 | Still |
17:35:37 | Still |
17:45:13 | Still |
17:50:23 | Still |
17:51:32 | | Walking
What I need is to sum up the total minutes for each activity separately.
Any suggestions or solution?

First calculate the duration for each activity (the with CTE) and then do conditional sum.
with t as
(
select
*, lead(activitie_time_enter) over (order by activitie_time_enter) - activitie_time_enter as duration
from _table
)
select
sum (duration) filter (where activitie_still = 'Still') as total_still,
sum (duration) filter (where activitie_walking = 'Walking') as total_walking
from t;
/** Result:
total_still|total_walking|
-----------+-------------+
00:19:16| 00:01:56|
*/
BTW do you really need two columns (activitie_still and activitie_walking)? Only one activity column with those values will do. This will allow more activities (Running, Sleeping, Working etc.) w/o having to change the table structure.

Dividing sum results

I'm really sorry as this was probably answered before, but I couldn't find something that solved the problem.
In this case, I'm trying to get the result of dividing two sums in the same column.
| Id | month | budget | sales |
| -- | ----- | ------ | ----- |
| 1 | jan | 1000 | 800 |
| 2 | jan | 1000 | 850 |
| 1 | feb | 1200 | 800 |
| 2 | feb | 1100 | 850 |
What i want is to get the % of completition for each id and month (example: get 0,8 or 80% in a fifth column for id 1 in jan)
I have something like
sel
id,
month,
sum (daily_budget) as budget,
sum (daily_sales) as sales,
budget/sales over (partition by 1,2) as efectivenes
from sales
group by 1,2
I know im doing this wrong but I'm kinda new with sql and cant find the way :|
Thanks!

This should do it
CAST(ROUND(SUM(daily_sales) * 100.00 / SUM(daily_budget), 1) AS DECIMAL(5,2)) AS Effectiveness

I'm new at SQL too but maybe I can help. Try this?
sel
id,
month,
sum (daily_budget) as budget,
sum (daily_sales) as sales,
(sum(daily_budget)/sum(daily_sales)) over (partition by id) as efectivenes
from sales
group by id

If you want to ALTER your table so that it contains a fifth column where the result of budget/sales is automatically calculated, all you need to do this add the formula to this auto-generated column. The example I am about to show is based on MySQL.
Open MySQL
Find the table you wish to modify in the Navigator Pane, right-click on it and select "Alter Table"
Add a new row to your table. Make sure you select NN (Not Null) and G (Generated Column) check boxes
In the Default/Expression column, simply enter the expression budget / sales.
Once you run your next query, you should see your column generated and populated with the calculated results. If you simply want the SQL statement to do the same from the console, it will be something like this: ALTER table YOUR_TABLE_NAME add result FLOAT as (budget / sales);

Writing a String comparison function in bigquery

I am trying to write a function with bigquery UDF to compare a list of string with other list of strings.
basically I would like to know how many new users do we have per week and from these new Users how many of them kept visiting our website in future weeks. For that I created a query which gives me a String of all emails per week (with group_concat) and saved it as a table. now need to know how can I compare each with Other collections of emails per week.
At the end, I would like to have a table like this :
+----------------+-------+-------+--------+------+
| | week 1 | week 2 | week 3| week 4 | ... |
+----------------+-------+-------+--------+------+
| week1 | 17 | 7 | 5 | 9 | ... |
+----------------+-------+-------+--------+------+
| week2 | | 19 | 13 | 8 | ... |
+-----------------+-------+-------+--------+-----+
| week3 | | | 24 | 15 | ... |
+-----------------+-------+-------+--------+-----+

Just to give you an idea to play with
SELECT
CONCAT('week', STRING(prev)) AS WEEK,
SUM(IF(next=19, authors, 0)) AS week19,
SUM(IF(next=20, authors, 0)) AS week20,
SUM(IF(next=21, authors, 0)) AS week21,
SUM(IF(next=22, authors, 0)) AS week22,
SUM(IF(next=23, authors, 0)) AS week23
FROM (
SELECT prev, next, COUNT(author) AS authors
FROM (
SELECT
prev_week.week_created AS prev,
next_week.week_created AS next,
prev_week.author AS author
FROM (
SELECT
WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created,
author
FROM [fh-bigquery:reddit_posts.2016_05]
GROUP BY 1,2
) next_week
LEFT JOIN (
SELECT
WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created,
author
FROM [fh-bigquery:reddit_posts.2016_05]
GROUP BY 1,2
) AS prev_week
ON prev_week.author = next_week.author
HAVING prev <= next
)
GROUP BY 1,2
)
GROUP BY 1
ORDER BY 1
Result is as below
This is the closest to what you asked i can think of
Meantime, please note - BigQuery is less tailored for reports design rather for data crunching. So I think that creating matrix/pivot within BigQuery (outer select) is not the best fit - it can be done in your reporting tool. But calculating all pairs prev|next|count (inner select) is definitely suitable here in BigQuery

DAX Query to get average of a column within the same table

We have a table named MetricsTable which has columns A1 and Group simply.
We want to add a calculated column AvgA1 to this table which calculates the average of column A1 filtered by the value of Group . What should be our DAX query? The point is that we want to claculate the average from the values within the same table.
| id | A1| Group | AvgA1
| -- | --- | --- ------| ----
| 1 | 20 | Group1| 20
| 2 | 10 | Group2| 30
| 3 | 50 | Group2| 30
| 4 | 30 | Group2| 30
| 5 | 35 | Group3| 35
Regards

Likely you should use a measure and put that measure into a pivot table's 'Values' section:
AverageA1:=
AVERAGE( Metrics[A1] )
Then it will be updated based on filter and slicer selections in the pivot table, and subtotaled appropriately across various dimension categories.
If it strictly needs to be a column in the table for reasons not enumerated in your question, then the following will work:
AverageA1 =
CALCULATE(
AVERAGE( Metrics[A1] )
,ALLEXCEPT( Metrics, Metrics[Group] )
)
CALCULATE() takes an expression and a list of 0-N arguments to modify the filter context in which that expression is evaluated.
ALLEXCEPT() takes a table, and a list of 1-N fields from which to preserve context. The current (row) context in the evaluation of this column definition is the value of every field on that row. We remove the context from ALL fields EXCEPT those named in arguments 2-N of ALLEXCEPT(). Thus we preserve the row context of [Group], and calculate an average across the table where that [Group] is the same as in the current context.

yet another date gap-fill SQL puzzle

I'm using Vertica, which precludes me from using CROSS APPLY, unfortunately. And apparently there's no such thing as CTEs in Vertica.
Here's what I've got:
t:
day | id | metric | d_metric
-----------+----+--------+----------
2011-12-01 | 1 | 10 | 10
2011-12-03 | 1 | 12 | 2
2011-12-04 | 1 | 15 | 3
Note that on the first day, the delta is equal to the metric value.
I'd like to fill in the gaps, like this:
t_fill:
day | id | metric | d_metric
-----------+----+--------+----------
2011-12-01 | 1 | 10 | 10
2011-12-02 | 1 | 10 | 0 -- a delta of 0
2011-12-03 | 1 | 12 | 2
2011-12-04 | 1 | 15 | 3
I've thought of a way to do this day by day, but what I'd really like is a solution that works in one go.
I think I could get something working with LAST_VALUE, but I can't come up with the right JOIN statements that will let me properly partition and order on each id's day-by-day history.
edit:
assume I have a table like this:
calendar:
day
------------
2011-01-01
2011-01-02
...
that can be involved with joins. My intent would be to maintain the date range in calendar to match the date range in t.
edit:
A few more notes on what I'm looking for, just to be specific:
In generating t_fill, I'd like to exactly cover the date range in t, as well as any dates that are missing in between. So a correct t_fill will start on the same date and end on the same date as t.
t_fill has two properties:
1) once an id appears on some date, it will always have a row for each later date. This is the gap-filling implied in the original question.
2) Should no row for an id ever appear again after some date, the t_fill solution should merrily generate rows with the same metric value (and 0 delta) from the date of that last data point up to the end date of t.
A solution might backfill earlier dates up to the start of the date range in t. That is, for any id that appears after the first date in t, rows between the first date in t and the first date for the id will be filled with metric=0 and d_metric=0. I don't prefer this kind of solution, since it has a higher growth factor for each id that enters the system. But I could easily deal with it by selecting into a new table only rows where metric!=0 and d_metric!=0.

This about what Jonathan Leffler proposed, but into old-fashioned low-level SQL (without fancy CTE's or window functions or aggregating subqueries):
SET search_path='tmp'
DROP TABLE ttable CASCADE;
CREATE TABLE ttable
( zday date NOT NULL
, id INTEGER NOT NULL
, metric INTEGER NOT NULL
, d_metric INTEGER NOT NULL
, PRIMARY KEY (id,zday)
);
INSERT INTO ttable(zday,id,metric,d_metric) VALUES
('2011-12-01',1,10,10)
,('2011-12-03',1,12,2)
,('2011-12-04',1,15,3)
;
DROP TABLE ctable CASCADE;
CREATE TABLE ctable
( zday date NOT NULL
, PRIMARY KEY (zday)
);
INSERT INTO ctable(zday) VALUES
('2011-12-01')
,('2011-12-02')
,('2011-12-03')
,('2011-12-04')
;
CREATE VIEW v_cte AS (
SELECT t.zday,t.id,t.metric,t.d_metric
FROM ttable t
JOIN ctable c ON c.zday = t.zday
UNION
SELECT c.zday,t.id,t.metric, 0
FROM ctable c, ttable t
WHERE t.zday < c.zday
AND NOT EXISTS ( SELECT *
FROM ttable nx
WHERE nx.id = t.id
AND nx.zday = c.zday
)
AND NOT EXISTS ( SELECT *
FROM ttable nx
WHERE nx.id = t.id
AND nx.zday < c.zday
AND nx.zday > t.zday
)
)
;
SELECT * FROM v_cte;
The results:
zday | id | metric | d_metric
------------+----+--------+----------
2011-12-01 | 1 | 10 | 10
2011-12-02 | 1 | 10 | 0
2011-12-03 | 1 | 12 | 2
2011-12-04 | 1 | 15 | 3
(4 rows)

I am not Vertica user, but if you do not want to use their native support for GAP fillings, here you can find a more generic SQL-only solution to do so.

If you want to use something like a CTE, how about using a temporary table? Essentially, a CTE is a view for a particular query.
Depending on your needs you can make the temporary table transaction or session-scoped.
I'm still curious to know why gap-filling with constant-interpolation wouldn't work here.

Given the complete calendar table, it is doable, though not exactly trivial. Without the calendar table, it would be a lot harder.
Your query needs to be stated moderately precisely, which is usually half the battle in any issue with 'how to write the query'. I think you are looking for:
For each date in Calendar between the minimum and maximum dates represented in T (or other stipulated range),
For each distinct ID represented in T,
Find the metric for the given ID for the most recent record in T on or before the date.
This gives you a complete list of dates with metrics.
You then need to self-join two copies of that list with dates one day apart to form the deltas.
Note that if some ID values don't appear at the start of the date range, they won't show up.
With that as guidance, you should be able get going, I believe.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Measure over one dimension conditional on value of another dimension - mdx

Related

How to sum the minutes of each activity in Postgresql?

Dividing sum results

Writing a String comparison function in bigquery

DAX Query to get average of a column within the same table

yet another date gap-fill SQL puzzle

Categories

Resources