Get the SUM of a value partitioned by a column as a measure

Get the SUM of a value partitioned by a column as a measure - sum

I have data like this:
date
serial
val
2021-08-17
A
1
2021-08-17
B
0
2021-08-18
A
0
2021-08-18
B
1
2021-08-19
A
1
2021-08-19
B
1
what I want is:
date
serial
val
sum
2021-08-17
A
1
1
2021-08-17
B
0
0
2021-08-18
A
0
1
2021-08-18
B
1
1
2021-08-19
A
1
2
2021-08-19
B
1
2
I found some examples to do it with a calculated column
https://community.powerbi.com/t5/Desktop/Sum-over-partition-by-order-by-in-DAX/td-p/215856
https://community.powerbi.com/t5/Desktop/Sum-over-partition-by/m-p/442516
, however as I know a calculated column does not work with slicers, so I need to code it with a measure.
As an example if I move the time slicer to 2021-08-18 the outcome should be:
date
serial
val
sum
2021-08-18
A
0
0
2021-08-18
B
1
1
2021-08-19
A
1
1
2021-08-19
B
1
2
Is this somehow possible with a dynamic measure?

you can achieve this with the following measure
Dynamic Running Total =
CALCULATE(
SUM('Table 1'[val]),
FILTER(
ALLSELECTED('Table 1'[date]),
ISONORAFTER('Table 1'[date], MAX('Table 1'[date]), DESC)
)
)

Related

Query to get First Value and Second value with Filter

I have the following need but I am not able to get an effective query:
ID
DATE
PARCEL
STATUS
TYPE
DT_PAY
DT
1
2021-10-15
28
3
R
2021-10-15
2021-10-15
2
2021-11-15
29
0
R
1900-01-01
2021-11-15
3
2021-12-15
30
3
R
2021-12-15
2021-12-15
4
2022-01-15
31
3
R
2022-01-15
2022-01-15
5
2022-02-15
32
3
R
2022-02-15
2022-02-15
6
2022-03-15
33
0
R
1900-01-01
2022-03-15
7
2022-04-15
34
0
R
1900-01-01
2022-04-15
8
2022-05-15
35
0
R
1900-01-01
2022-05-15
9
2022-06-15
36
0
R
1900-01-01
2022-06-15
10
2022-07-15
37
3
R
2022-07-15
2022-07-15
With the data in the table above you would need the following result:
ID
DATE
PARCEL
STATUS
TYPE
DT_PAY
DT
6
2022-03-15
33
0
R
1900-01-01
2022-03-15
2
2021-11-15
29
0
R
1900-01-01
2021-11-15
It is necessary to list the first occurrence of a line where STATUS = 0 appears after a line with STATUS = 3 appears, and the second time this occurs after another line appears with STATUS = 3 as well, but being from the most current to the oldest date, in this case the date 2022-03-15 is more current and the date 2021-11-15 is more old one that meets the STATUS = 0 filter appears after a line with STATUS = 3 appears
My query only works to find STATUS=3, but needed it to be the same for STATUS=0
with TopDates as
(select row_number() over (order by DT desc) as Row, *
from DBO.TABLE
WHERE DT < GETDATE ()
AND DT_PAY <> '1900-01-01'
AND STATUS = '3'
)
select
TB.ID
,TB.DATE
,TB.PARCEL
,TB.STATUS
,TB.DT_PAY
,TB.DT
from TopDates TB
where Row<=2

Just add an OR clause in there?
Or am I not understanding you correctly?
`with TopDates as
(select row_number() over (order by DT desc) as Row, *
from DBO.TABLE
WHERE DT < GETDATE ()
AND DT_PAY <> '1900-01-01'
AND STATUS = '3'
OR STATUS = '0'
)
select
TB.ID
,TB.DATE
,TB.PARCEL
,TB.STATUS
,TB.DT_PAY
,TB.DT
from TopDates TB
where Row<=2

Period and Quarter Sequence

I'm trying to find a way to do a sequence for date periods and quarters(not sure if this is the correct term).
Basically this will help people to navigate dates based on weeks, periods, and quarters once I join this to our sales data. For example, if I just want to know the sales from last week, I could just use WHERE WeekSequence = -1... Another example is, a manager wants to get the sales data for the past quarter, I could just use WHERE QuarterSequence = -1... something like that.
My current table:
WeekStartDate WeekEndDate CurrentWeek Period Quarter WeekSequence
----------------------------------------------------------------------
2020-08-03 2020-08-09 0 2 1 -5
2020-08-10 2020-08-16 0 2 1 -4
2020-08-17 2020-08-23 0 2 1 -3
2020-08-24 2020-08-30 0 2 1 -2
2020-08-31 2020-09-06 0 2 1 -1
2020-09-07 2020-09-13 1 3 1 0
2020-09-14 2020-09-20 0 3 1 1
2020-09-21 2020-09-27 0 3 1 2
2020-09-28 2020-10-04 0 3 1 3
2020-10-05 2020-10-11 0 4 2 4
2020-10-12 2020-10-18 0 4 2 5
What I want it to look like(highlighted):

If I understand correctly, just use window functions:
select t.*,
(period -
max(case when currentweek = 1 then period end) over ()
) as periodsequence,
(quarter -
max(case when currentweek = 1 then quarter end) over ()
) as quartersequence
from t;
You can include this in a view rather than putting it in a table.

Calculate day's difference between successive pandas dataframe rows with condition

I have a dataframe as following:
Company Date relTweet GaplastRel
XYZ 3/2/2020 1
XYZ 3/3/2020 1
XYZ 3/4/2020 1
XYZ 3/5/2020 1
XYZ 3/5/2020 0
XYZ 3/6/2020 1
XYZ 3/8/2020 1
ABC 3/9/2020 0
ABC 3/10/2020 1
ABC 3/11/2020 0
ABC 3/12/2020 1
The relTweet displays whether the tweet is relevant (1) or not (0).
\nI need to find the days difference (GaplastRel) between each successive rows for each company, with a condition that the previous day's tweet should be relevant tweet (i.e. relTweet =1 ). e.g. For the first record relTweet should be 0. For the 2nd record, relTweet should be 1 as the last relevant tweet was made one day ago.
Below is the example of needed output:
Company Date relTweet GaplastRel
XYZ 3/2/2020 1 0
XYZ 3/3/2020 1 1
XYZ 3/4/2020 1 1
XYZ 3/5/2020 1 1
XYZ 3/5/2020 0 1
XYZ 3/6/2020 1 1
XYZ 3/8/2020 1 2
ABC 3/9/2020 0 0
ABC 3/10/2020 1 0
ABC 3/11/2020 0 1
ABC 3/12/2020 1 2
Following is my code:
dataDf['Date'] = pd.to_datetime(dataDf['Date'], format='%m/%d/%Y')
dataDf['relTweet'] = (dataDf.groupby('Company', group_keys=False)
.apply(lambda g: g['Date'].diff().replace(0, np.nan).ffill()))
This code gives the days difference between successive rows for each company without conisidering the relTweet =1 condition. I am not sure how to apply the condition.
Following is the output of the above code:
Company Date relTweet GaplastRel
XYZ 3/2/2020 1 NaT
XYZ 3/3/2020 1 1 days
XYZ 3/4/2020 1 1 days
XYZ 3/5/2020 1 1 days
XYZ 3/5/2020 0 0 days
XYZ 3/6/2020 1 1 days
XYZ 3/8/2020 1 2 days
ABC 3/9/2020 0 NaT
ABC 3/10/2020 1 1 days
ABC 3/11/2020 0 1 days
ABC 3/12/2020 1 1 days

Change your mind sometime we need merge_asof rather than groupby
df1=df.loc[df['relTweet']==1,['Company','Date']]
df=pd.merge_asof(df,df1.assign(Date1=df1.Date),by='Company',on='Date', allow_exact_matches=False)
df['GaplastRel']=(df.Date-df.Date1).dt.days.fillna(0)
df
Out[31]:
Company Date relTweet Date1 GaplastRel
0 XYZ 2020-03-02 1 NaT 0.0
1 XYZ 2020-03-03 1 2020-03-02 1.0
2 XYZ 2020-03-04 1 2020-03-03 1.0
3 XYZ 2020-03-05 1 2020-03-04 1.0
4 XYZ 2020-03-05 0 2020-03-04 1.0
5 XYZ 2020-03-06 1 2020-03-05 1.0
6 XYZ 2020-03-08 1 2020-03-06 2.0
7 ABC 2020-03-09 0 NaT 0.0
8 ABC 2020-03-10 1 NaT 0.0
9 ABC 2020-03-11 0 2020-03-10 1.0
10 ABC 2020-03-12 1 2020-03-10 2.0

Transposing SQLite rows and columns with average per hour

I have a table in SQLite called param_vals_breaches that looks like the following:
id param queue date_time param_val breach_count
1 c a 2013-01-01 00:00:00 188 7
2 c b 2013-01-01 00:00:00 156 8
3 c c 2013-01-01 00:00:00 100 2
4 d a 2013-01-01 00:00:00 657 0
5 d b 2013-01-01 00:00:00 23 6
6 d c 2013-01-01 00:00:00 230 12
7 c a 2013-01-01 01:00:00 100 0
8 c b 2013-01-01 01:00:00 143 9
9 c c 2013-01-01 01:00:00 12 2
10 d a 2013-01-01 01:00:00 0 1
11 d b 2013-01-01 01:00:00 29 5
12 d c 2013-01-01 01:00:00 22 14
13 c a 2013-01-01 02:00:00 188 7
14 c b 2013-01-01 02:00:00 156 8
15 c c 2013-01-01 02:00:00 100 2
16 d a 2013-01-01 02:00:00 657 0
17 d b 2013-01-01 02:00:00 23 6
18 d c 2013-01-01 02:00:00 230 12
I want to write a query that will show me a particular queue (e.g. "a") with the average param_val and breach_count for each param on an hour by hour basis. So transposing the data to get something that looks like this:
Results for Queue A
Hour 0 Hour 0 Hour 1 Hour 1 Hour 2 Hour 2
param avg_param_val avg_breach_count avg_param_val avg_breach_count avg_param_val avg_breach_count
c xxx xxx xxx xxx xxx xxx
d xxx xxx xxx xxx xxx xxx
is this possible? I'm not sure how to go about it. Thanks!

SQLite does not have a PIVOT function but you can use an aggregate function with a CASE expression to turn the rows into columns:
select param,
avg(case when time = '00' then param_val end) AvgHour0Val,
avg(case when time = '00' then breach_count end) AvgHour0Count,
avg(case when time = '01' then param_val end) AvgHour1Val,
avg(case when time = '01' then breach_count end) AvgHour1Count,
avg(case when time = '02' then param_val end) AvgHour2Val,
avg(case when time = '02' then breach_count end) AvgHour2Count
from
(
select param,
strftime('%H', date_time) time,
param_val,
breach_count
from param_vals_breaches
where queue = 'a'
) src
group by param;
See SQL Fiddle with Demo

Update a Field/Column based on Current and Previous Record Value

I need assistance with updating a field/column "IsLatest" based on the comparison between the current and previous record. I'm using CTE's syntax and I'm able to get the current and previous record but I'm unable updated field/column "IsLatest" which I need based on the field/column "Value" of the current and previous record.
Example
Current Output
Dates Customer Value IsLatest
2010-01-01 00:00:00.000 1 12 1
Dates Customer Value IsLatest
2010-01-01 00:00:00.000 1 12 0
2010-01-02 00:00:00.000 1 30 1
Dates Customer Value IsLatest
2010-01-01 00:00:00.000 1 12 0
2010-01-02 00:00:00.000 1 30 0
2010-01-03 00:00:00.000 1 13 1
Expected Final Output
Dates Customer Value ValueSetId IsLatest
2010-01-01 00:00:00.000 1 12 12 0
2010-01-01 00:00:00.000 1 12 13 0
2010-01-01 00:00:00.000 1 12 14 0
2010-01-02 00:00:00.000 1 30 12 0
2010-01-02 00:00:00.000 1 30 13 0
2010-01-02 00:00:00.000 1 30 14 0
2010-01-03 00:00:00.000 1 13 12 0
2010-01-03 00:00:00.000 1 13 13 0
2010-01-03 00:00:00.000 1 13 14 0
2010-01-04 00:00:00.000 1 14 12 0
2010-01-04 00:00:00.000 1 14 13 0
2010-01-04 00:00:00.000 1 14 14 1

;WITH a AS
(
SELECT
Dates Customer Value,
row_number() over (partition by customer order by Dates desc, ValueSetId desc) rn
FROM #Customers)
SELECT Dates, Customer, Value, case when RN = 1 then 1 else 0 end IsLatest
FROM a

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get the SUM of a value partitioned by a column as a measure - sum

you can achieve this with the following measure Dynamic Running Total = CALCULATE( SUM('Table 1'[val]), FILTER( ALLSELECTED('Table 1'[date]), ISONORAFTER('Table 1'[date], MAX('Table 1'[date]), DESC) ) )

Related

Query to get First Value and Second value with Filter

Period and Quarter Sequence

Calculate day's difference between successive pandas dataframe rows with condition

Transposing SQLite rows and columns with average per hour

Update a Field/Column based on Current and Previous Record Value

Categories

Resources