DAX: Is the value in one column the same this month as it was last? - powerpivot

I need to create a calculated column. I have a list of items with serial #s, and those items are assigned to someone each month. I need to know (0/1) whether the owner of that item this month is the same as the owner of that item last month. (So I can create a measure to average how many are changing owners month-to-month.)
Basically, I'm trying to achieve the last column:
Month ItemID Owner Same Owner as Prev Mth
2015/01/31 A1 Al
2015/01/31 A2 Bob
2015/01/31 A3 Carl
2015/02/28 A1 Al 1
2015/02/28 A2 Carl 0
2015/02/28 A3 Carl 1
2015/03/31 A1 Bob 0
2015/03/31 A2 Bob 0
2015/03/31 A3 Bob 0
2015/04/30 A1 Bob 1
2015/04/30 A2 Bob 1
2015/04/30 A3 Al 0
I tried a CALCULATE(Max([Owner]), FILTER(tbl, DATEADD([Month],-1,MONTH)=EARLIER([Month]), FILTER(tbl, [ItemID] = EARLIER([ItemID]))
But Max doesn't work on text fields. So I am kind of stumped. I know this shouldn't be that hard...

Date logic is almost always an issue of modeling rather than clever functions.
You will need a date table with a monotonically incremented integer id for months. I typically refer to this as MonthSequential or MonthIndex depending on the intended audience for the model. This field simply increments by 1 for each month in the date table without wrapping at year boundaries. Thus if the first month in your model is January, 2014, that month will have MonthSequential=1. February, 2014 has MonthSequential=2, and so on to December, 2014 with MonthSequential=12. January, 2015 has MonthSequential=13.
This allows very simple arithmetic to identify any month or range of months an arbitrary amount of time from the current month. Once you have this field in your date dimension (and your Items[Month] field related to your DimDate[Date] field), life gets pretty easy:
SameOwnerPreviousMonth=
IF(
CALCULATE(
VALUES(Items[Owner])
,FILTER(
ALLEXCEPT(Items, Items[ItemID])
,RELATED(DimDate[MonthSequential]) =
EARLIER(RELATED(DimDate[MonthSequential])) - 1
)
) = Items[Owner]
,1
,0
)
There's some funkiness here with row context, which I will explain.
Any calculated column is defined by some formula. That formula is evaluated in the row context of the table. What happens is a row-by-row iteration through the table. The formula you provide is evaluated once per row and that creates the value for that calculated column.
This being said, the storage engine and formula engine behind DAX have no concept of a row ordering. This means that any formula we define for a calculated column must provide its own ordering or reference to another row if we need to do that.
So, what do we do to find the owner in the previous month? Well, we need to look through the entire Items table and find the row which has the same [ItemId] and falls in the month immediately prior to the month on the current row. Our [MonthSequential] makes finding a date in the previous month trivial, and DAX offers many context-manipulating functions to preserve or eliminate context.
Note: I will refer to function arguments positionally, with the first argument to a function indicated by (1).
Let's step through the solution. We'll ignore the IF() because that is trivial. The meat of the formula lies in the CALCULATE() which identifies the [Owner] in the previous month:
CALCULATE(
VALUES(Items[Owner])
,FILTER(
ALLEXCEPT(Items, Items[ItemID])
,RELATED(DimDate[MonthSequential]) =
EARLIER(RELATED(DimDate[MonthSequential])) - 1
)
)
CALCULATE() evaluates arguments (2)-(n) first, to create a new filter context. That filter context is then used to evaluate the expression in (1).
FILTER() iterates row-by-row through the table provided in (1) and evaluates the boolean expression in (2) for each row in (1). It returns a table made up of the subset of rows of (1) for which (2) evaluates to true. Since we are already iterating through the entire Items table in evaluating our calculated column, we end up with two sets of row context. The outer row context is the iteration through the whole table. The inner row context is the iteration through (1) of our filter. The outer row context affects the inner, and we must modify/remove select portions of the outer context as needed.
The table we iterate over is ALLEXCEPT(Items, Items[ItemId]). ALLEXCEPT() strips out all context, except for the fields named. On any given row in our outer context, we preserve the value of Items[ItemId] and strip all other context ([Month] and [Owner], along with any other fields you've not named in your sample data). This gives us a table for our FILTER() made up of every row in Items which shares the [ItemId] of the current row in the outer filter context. This subset table becomes the generator of our inner row context.
Now we're iterating over FILTER()'s (1), explained above. RELATED() allows us to call out to get a value from another table related to the current one. We grab the [MonthSequential] value that is tied to the current row in our inner row context. We want to find the month that is immediately prior to the current month in the outer row context. To refer to a value in the outer row context, we need to escape the inner.
EARLIER() allows us to escape the current (inner) row context and refer to the last valid (outer) row context. This can happen through arbitrary levels of nesting of contexts. Luckily, we only have two. EARLIER(RELATED(DimDate[MonthSequential])) finds the [MonthSequential] value of the current row in the outer context. We simply subtract 1 from that to get the prior month (and since we're using [MonthSequential], we have no need to implement any logic to handle wrapping around year barriers).
Thus the context in which we evaluate VALUES(Items[Owner]) is that subset of our Items table where [ItemId] is equal to the current row in our outer row context, and the value of [MonthSequential] is one less than the current row in the outer row context. VALUES() returns the list of values which make up the column reference inside. In this case, since every [ItemId] is associated with only a single [Owner] in any given month, that list is only a single value which can be implicitly cast to a scalar value and represented in our calculated column.
Our IF() simply tests this [Owner] value against that of the current row in the outer row context and returns a 1 or 0 as appropriate.
This will break if you have a single [ItemId] which has multiple distinct [Owner]s in a given month.
Model diagram:

Related

How to concatenate hidden cube dimension and measure with an MDX query?

I have a cube with three (relevant) dimensions (quarter, element and qualifier). The measure of interest is score, which is numeric.
The qualifier dimension is sparse, i.e. for each unique combination of quarter and element, the measure score is reported only for one qualifier, for the other members of the qualifier dimension the measure score is blank. Which qualifier is 'active' entirely depends on the quarter and element.
I want to build a table with quarter members as columns and element members as rows. The cell values should be the concatenation of the score measure and the name of that qualifier member (string) for which the score at the relevant intersection of quarter, element and qualifier is not blank.
To make matters more complex, one of the member names of the qualifier needs to be replaced with blanks in the table. There are four distinct members, which should be renamed in the table as follows: the member names '+', '-', '' stay as they are, while the name 'No Qualifier' should become blank, i.e. ''.
Below is an example of the structure I would like to get (note that there should be no '+' or '-' in case the corresponding score is reported in either the qualifier members '' or 'No Qualifier'):
2021Q4
2020Q4
2019Q4
Element 1
3
2+
2
Element 2
1
1
1
Element 3
2
3
2+
Element 4
2-
2-
3
I suppose I would need to create a calculated member, but so far I only get the concatenation and replacement to work with the currentmember function if I include the qualifier dimension explicitly in either rows or columns, which however is not a feasible output structure. Also, changing the underlying cube is not possible (it is maintained by IT).
I am new to cubes and MDX and have already spent two days trying to figure this out by myself. Since this is a work project, I am starting to panic. Any help would really be appreciated!

Finding the last 4, 3, 2, 1 months consecutive order drops among clients based on drop variance

Here I have this query that finds out the drop percentage of a bunch of clients based on the orders they have received(i.e. It finds the percentage difference in orders by comparing the current month with the previous month). What I want to achieve here is to have a field where I can see the clients who had 4 months continuous drop, 3 months drop, 2 months drop, and 1 month drop.
I know, it can only be achieved by comparing the last 4 months using the lag function or sub queries. can you guys pls help me out on this one, would appreciate it very much
select
fd.customers2, fd.Month1, fd.year1, fd.variance, case when
(fd.variance < -0.00001 and fd.year1 = '2022.0' and fd.Month1 = '1')
then '1month drop' else fd.customers2 end as 1_most_host_drop
from 
(SELECT
c.*,
sa.customers as customers2,
sum(sa.order) as orders,
date_part(mon, sa.date) as Month1,
date_part(year, sa.date) as year1,
(cast(orders - LAG(orders) OVER(Partition by customers2 ORDER BY
 year1, Month1) as NUMERIC(10,2))/NULLIF(LAG(orders) 
OVER(partition by customers2 ORDER BY year1, Month1) * 1, 0)) AS variance
FROM stats sa join (select distinct
    d.id, d.customers 
     from configer d 
    ) c on sa.customers=c.customers
WHERE sa.date >= '2021-04-1' 
GROUP BY Month1, sa.customers, c.id,  year1, 
     c.customers)fd
In a spirit of friendliness: I think you are a little premature in posting this here as there are several issues with the syntax before even reaching the point where you can solve the problem:
You have at least two places with a comma immediately preceding the word FROM:
...AS variance, FROM stats_archive sa ...
...d.customers, FROM config d...
Recommend you don't use VARIANCE as an alias (it is a system function in PostgreSQL and so is likely also a system function name in Redshift)
Not super important, but there's no need for c.* - just select the columns you will use
DATE_PART requires a string as the first parameter DATE_PART('mon',current_date)
I might be wrong about this, but I suspect you cannot use column aliases in the partition by or order by of a window function. Put the originating expressions there instead:
... OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
LAG has three parameters. (1) The column you want to retrieve the value from, (2) the row offset, where a positive integer indicates how many rows prior to the current row you should retrieve a value from according to the partition and order context and (3) the value the function should return as a default (in case of the first row in the partition). As such, you don't need NULLIF. So, to get the row immediately prior to the current row, or return 0 in case the current row is the first row in the partition:
LAG(orders,1,0) OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
If you use 0 as a default in the calculation of what is currently aliased variance, you will almost certainly run into a div/0 error either now, or worse, when you least expect it in the future. You should protect against that with some CASE logic or better, provide a more appropriate default value or even better, calculate the LAG with the default 0, then filter out the 0 rows before doing the calculation.
You can't use column aliases in the GROUP BY. You must reference each field that is not participating in an aggregate in the group by, whether through direct mention (sa.date) or indirectly in an expression (DATE_PART('mon',sa.date))
Your date should be '2021-04-01'
All in all, without sample data, expected results using the posted sample data and without first removing syntax errors, it is a tall order to attempt to offer advice on the problem which is any more specific than:
Build the source of the calculation as a completely separate query first. Calculate the LAG in that source query. Only when you've run that source query and verified that the LAG is producing the correct result should you then wrap it as a sub-query or CTE (not sure if Redshift supports these, but presumably) at which point you can filter out the rows with a zero as the denominator (the first month of orders for each customer).
Good luck!

Modify Postgres query to use generate_series for overall summation over each of several consecutive range intervals

I'm still quite new with SQL, coming from an ORM-centric environment, so please be patient with me.
Provided with a table in the form of:
CREATE TABLE event (id int, order_dates tsrange, flow int);
INSERT INTO event VALUES
(1,'[2021-09-01 10:55:01,2021-09-04 15:16:01)',50),
(2,'[2021-08-15 20:14:27,2021-08-18 22:19:27)',36),
(3,'[2021-08-03 12:51:47,2021-08-05 11:28:47)',41),
(4,'[2021-08-17 09:14:30,2021-08-20 13:57:30)',29),
(5,'[2021-08-02 20:29:07,2021-08-04 19:19:07)',27),
(6,'[2021-08-26 02:01:13,2021-08-26 08:01:13)',39),
(7,'[2021-08-25 23:03:25,2021-08-27 03:22:25)',10),
(8,'[2021-08-12 23:40:24,2021-08-15 08:32:24)',26),
(9,'[2021-08-24 17:19:59,2021-08-29 00:48:59)',5),
(10,'[2021-09-01 02:01:17,2021-09-02 12:31:17)',48); -- etc
the query below does the following:
(here, 'the range' is 2021-08-03T00:00:00 from to 2021-08-04T00:00:00)
For each event that overlaps with the range
Trim the Lower and Upper timestamp values of order_dates to the bounds of the range
Multiply the remaining duration of each applicable event by the event.flow value
Sum all of the multiplied values for a final single value output
Basically, I get all of the events that overlap the range, but only calculate the total value based on the portion of each event that is within the range.
SELECT SUM("total_value")
FROM
(SELECT (EXTRACT(epoch
FROM (LEAST(UPPER("event"."order_dates"), '2021-08-04T00:00:00'::timestamp) - GREATEST(LOWER("event"."order_dates"), '2021-08-03T00:00:00'::timestamp)))::INTEGER * "event"."flow") AS "total_value"
FROM "event"
WHERE "event"."order_dates" && tsrange('2021-08-03T00:00:00'::timestamp, '2021-08-04T00:00:00'::timestamp, '[)')
GROUP BY "event"."id",
GREATEST(LOWER("event"."order_dates"), '2021-08-03T00:00:00'::timestamp),
LEAST(UPPER("event"."order_dates"), '2021-08-04T00:00:00'::timestamp),
EXTRACT(epoch
FROM (LEAST(UPPER("event"."order_dates"), '2021-08-04T00:00:00'::timestamp) - GREATEST(LOWER("event"."order_dates"), '2021-08-03T00:00:00'::timestamp)))::INTEGER, (EXTRACT(epoch
FROM (LEAST(UPPER("event"."order_dates"), '2021-08-04T00:00:00'::timestamp) - GREATEST(LOWER("event"."order_dates"), '2021-08-03T00:00:00'::timestamp)))::INTEGER * "event"."flow")) subquery
The DB<>Fiddle demonstrating this: https://www.db-fiddle.com/f/jMBtKKRS33Qf2FEoY5EdPA/1
This query started out as a complex set of django annotations and aggregation, and I have simplified it to remove the parts not necessary for this question.
So with the above I get a single total value over the input range (in this case a 1-day range).
But I want to be able to use generate_series to perform this same overall summation to each of several consecutive range intervals
e.g.: query for the total during each of the following ranges:
['2021-08-01T00:00:00', '2021-08-02T00:00:00')
['2021-08-02T00:00:00', '2021-08-03T00:00:00')
['2021-08-03T00:00:00', '2021-08-04T00:00:00')
['2021-08-04T00:00:00', '2021-08-05T00:00:00')
This is somewhat related to my previous question here, but since the timestamps for the queried range are used in so many places within the query, I'm pretty lost for how to do this.
Any help/direction will be appreciated.
This should get you started: https://www.db-fiddle.com/f/qm4F7qqWZMrtXtMejimVJr/1.
Basically what I did was to prepare the ranges with a CTE up-front, then select from that table expression with a CROSS JOIN LATERAL of your original query. Next, I replaced all occurrences of 20210803 with lower(target_range) and 20210804 with upper(target_range), then added the GROUP BY of target_range. Note that only those ranges that overlap at least one row in the input will appear in the output; change the cross join to a LEFT JOIN to always see your input ranges in the output, even if value is null. (If so, ON TRUE is fine for the join condition, since you already do the filtering the WHERE of the inner subquery.)

SAP BO - how to get 1/0 distinct values per week in each row

the problem I am trying to solve is having a SAP Business Objects query calculate a variable for me because calculating it in a large excel file crashes the process.
I am having a bunch of columns with daily/weekly data. I would like to get a "1" for the first instance of Name/Person/Certain Identificator within a single week and "0" for all the rest.
So for example if item "Glass" was sold 5 times in week 4 in this variable/column first sale will get "1" and next 4 sales will get "0". This will allow me to have the number of distinct items being sold in a particular week.
I am aware there are Count and Count distinct functions in Business Objects, however I would prefer to have this 1/0 system for the entire raw table of data because I am using it as a source for a whole dashboard and there are lots of metrics where distinct will be part/slicer for.
The way I doing it previously is with excel formula: =IF(SUMPRODUCT(($A$2:$A5000=$A2)*($G$2:$G5000=$G2))>1,0,1)
This does the trick and gives a "1" for the first instance of value in column G appearing in certain value range in column A ( column A is the week ) and gives "0" when the same value reappears for the same week value in column A. It will give "1" again when the week value change.
Since it is comparing 2 cells in each row for the entire columns of data as the data gets bigger this tends to crash.
I was so far unable to emulate this in Business Objects and I think I exhausted my abilities and googling.
Could anyone share their two cents on this please?
Assuming you have an object in the query that uniquely identifies a row, you can do this in a couple of simple steps.
Let's assume your query contains the following objects:
Sale ID
Name
Person
Sale Date
Week #
Price
etc.
You want to show a 1 for the first occurrence of each Name/Week #.
Step 1: Create a variable with the following definition. Let's call it [FirstOne]
=Min([Sale ID]) In ([Name];[Week #])
Step 2: In the report block, add a column with the following formula:
=If [FirstOne] = [Sale ID] Then 1 Else 0
This should produce a 1 in the row that represents the first occurrence of Name within a Week #. If you then wanted to show a 1 one the first occurrence of Name/Person/Week #, you could just modify the [FirstOne] variable accordingly:
=Min([Sale ID]) In ([Name];[Person];[Week #])
I think you want logic around row_number():
select t.*,
(case when 1 = row_number() over (partition by name, person, week, identifier
order by ??
)
then 1 else 0
end) as new_indicator
from t;
Note the ??. SQL tables represent unordered sets. There is no "first" row in a table or group of rows, unless a column specifies that ordering. The ?? is for such a column (perhaps a date/time column, perhaps an id).
If you only want one row to be marked, you can put anything there, such as order by (select null) or order by week.

MS SQL 2000 - How to efficiently walk through a set of previous records and process them in groups. Large table

I'd like to consult one thing. I have table in DB. It has 2 columns and looks like this:
Name...bilance
Jane...+3
Jane...-5
Jane...0
Jane...-8
Jane...-2
Paul...-1
Paul...2
Paul....9
Paul...1
...
I have to walk through this table and if I find record with different "name" (than was on previous row) I process all rows with the previous "name". (If I step on the first Paul row I process all Jane rows)
The processing goes like this:
Now I work only with Jane records and walk through them one by one. On each record I stop and compare it with all previous Jane rows one by one.
The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs
Summary:
I loop through this table in 3 levels paralelly (nested loops)
1st level = search for changes of "name" column
2nd level = if change was found, get all rows with previous "name" and walk through them
3rd level = on each row stop and walk through all previous rows with current "name"
Can this be solved only using CURSOR and FETCHING, or is there some smoother solution?
My real table has 30 000 rows and 1500 people and If I do the logic in PHP, it takes long minutes and than timeouts. So I would like to rewrite it to MS SQL 2000 (no other DB is allowed). Are cursors fast solution or is it better to use something else?
Thank you for your opinions.
UPDATE:
There are lots of questions about my "summarization". Problem is a little bit more difficult than I explained. I simplified it just to describe my algorithm.
Each row of my table contains much more columns. The most important is month. That's why there are more rows for each person. Each is for different month.
"Bilances" are "working overtimes" and "arrear hours" of workers. And I need to sumarize + and - bilances to neutralize them using values from previous months. I want to have as many zeroes as possible. All the table must stay as it is, just bilances must be changed to zeroes.
Example:
Row (Jane -5) will be summarized with row (Jane +3). Instead of 3 I will get 0 and instead of -5 I will get -2. Because I used this -5 to reduce +3.
Next row (Jane 0) won't be affected
Next row (Jane -8) can not be used, because all previous bilances are negative
etc.
You can sum all the values per name using a single SQL statement:
select
name,
sum(bilance) as bilance_sum
from
my_table
group by
name
order by
name
On the face of it, it sounds like this should do what you want:
select Name, sum(bilance)
from table
group by Name
order by Name
If not, you might need to elaborate on how the Names are sorted and what you mean by "summarize".
I'm not sure what you mean by this line... "The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs".
But, it may be possible to use a group by query to get a lot of what you need.
select name, case when bilance < 0 then 'negative' when bilance >= 0 then 'positive', count(*)
from table
group by name, bilance
That might not be perfect syntax for the case statement, but it should get you really close.