So I have data like this:
Date EMPLOYEE_ID HEADCOUNT TERMINATIONS
1/31/2011 1 1 0
2/28/2011 1 1 0
3/31/2011 1 1 0
4/30/2011 1 1 0
...
1/31/2012 1 1 0
2/28/2012 1 1 0
3/31/2012 1 1 0
1/31/2012 2 1 0
2/28/2011 2 1 0
3/31/2011 2 1 0
4/30/2011 2 0 1
1/31/2012 3 1 0
2/28/2011 3 1 0
3/31/2011 3 1 0
4/30/2011 3 1 0
...
1/31/2012 3 1 0
2/28/2012 3 1 0
3/31/2012 3 1 0
And I want to sum up the headcount, but I need to remove the duplicate entries from the sum by the employee_id. From the data you can see employee_id 1 occurs many times in the table, but I only want to add its headcount column once. For example if I rolled up on year I might get a report using this query:
with member [Measures].[Distinct HeadCount] as
??? how do I define this???
select { [Date].[YEAR].children } on ROWS,
{ [Measures].[Distinct HeadCount] } on COLUMNS
from [someCube]
It would product this output:
YEAR Distinct HeadCount
2011 3
2012 2
Any ideas how to do this with MDX? Is there a way to control which row is used in the sum for each employee?
You can use an expression like this:
WITH MEMBER [Measures].[Distinct HeadCount] AS
Sum(NonEmpty('the set of the employee ids', 'all the dates of the current year (ie [Date].[YEAR].CurrentMember)'), [Measures].[HeadCount])
If you want a more generic expression you can use this:
WITH MEMBER [Measures].[Distinct HeadCount] AS
Sum(NonEmpty('the set of the employee ids',
Descendants(Axis(0).Item(0).Item(0).Hierarchy.CurrentMember, Axis(0).Item(0).Item(0).Hierarchy.CurrentMember.Level, LEAVES)),
IIf(IsLeaf(Axis(0).Item(0).Item(0).Hierarchy.CurrentMember),
[Measures].[HeadCount],
NULL))
Related
Seeking some help after spending alot of time on searching but to no avail and decided to post this here as I'm rather new to SQL, so any help is greatly appreciated. I've tried a few functions but can't seem to get it right. e.g. GROUP BY, BETWEEN etc
On the PrestoSQL server, I have a table as shown below starting with columns Date, ID and COVID. Using GROUP BY ID, I would like to create a column EverCOVIDBefore which looks back at all past dates of the COVID column to see if there was ever COVID = 1 or not, as well as another column called COVID_last_2_mth which checks if there was ever COVID = 1 within the past 2 months
(Highlighted columns are my expected outcomes)
Link to dataset: https://drive.google.com/file/d/1Sc5Olrx9g2A36WnLcCFMU0YTQ3-qWROU/view?usp=sharing
You can do:
select *,
max(covid) over(partition by id order by date) as ever_covid_before,
max(covid) over(partition by id order by date
range between interval '2 month' preceding and current row)
as covid_last_two_months
from t
Result:
date id covid ever_covid_before covid_last_two_months
----------- --- ------ ------------------ ---------------------
2020-01-15 1 0 0 0
2020-02-15 1 0 0 0
2020-03-15 1 1 1 1
2020-04-15 1 0 1 1
2020-05-15 1 0 1 1
2020-06-15 1 0 1 0
2020-01-15 2 0 0 0
2020-02-15 2 1 1 1
2020-03-15 2 0 1 1
2020-04-15 2 0 1 1
2020-05-15 2 0 1 0
2020-06-15 2 1 1 1
See running example at db<>fiddle.
I have table with data needs to unpivot and get aggregated counts.
Source table:
primary_id sys_1 sys_2 sys3_ sy5 sys100
newa889 0 1 0 1 0
den7899 1 1 1 1 0
geo8988 1 1 1 1 0
atla8766 0 1 0 1 1
chic7898 0 1 0 0 1
Desired output:
sys_name count(primary_key) flag_0_or_1
sys_1 129999 0
sys_1 544545 1
sys_2 23333 0
sys2 23322323 1
sys3_ 332233 0
sys3_ 323232 1
sy5 32332 0
sy5 32323 1
Looking to get the data transpose get 0's and 1's counts from each sys_ column.
I have a dataframe with ID and date ( and calculated day difference between the rows for the same ID)
ID date day_difference
1 27/06/2019 0
1 28/06/2019 1
1 29/06/2019 1
1 01/07/2019 2
1 02/07/2019 1
1 03/07/2019 1
1 05/07/2019 2
2 27/06/2019 0
2 28/06/2019 1
2 29/06/2019 1
2 01/08/2019 33
2 02/08/2019 1
2 03/08/2019 1
2 04/08/2019 1
which i would like to group by ID and calculate total duration with a condition if day difference is bigger than 30 days re-use that ID again and create a new group starting counting duration from that day after a 30day gap.
Desired result
ID Duration
1 8
2 3
2 4
Thanks.
You can do:
(df.groupby(['ID', df.day_difference.gt(30).cumsum()])
.agg(ID=('ID','first'), Duration=('ID','count'))
.reset_index(drop=True)
)
Output:
ID Duration
0 1 7
1 2 3
2 2 4
I have a view with Columns:
WeekNo, MerchantId, Transactions
With a Select Query let's say that we have the following results:
TrnWeek AgencyId WeeklyTrn
1 110008 1
2 110008 2
3 110008 2
1 110045 4
3 110065 4
3 110124 1
1 110153 1
1 110155 3
2 110163 1
2 110165 1
making a pivot (stored procedure which creates dynamically the columns) i get the TrnWeek as Columns and as a result the following:
[1] [2] [3]
1 1 1
1 0 0
1 0 0
1 0 0
0 1 1
0 1 0
0 0 1
what I want to get is a "matrix" as follows:
TrnWeek 1 2 3
1 4 1 1
2 0 2 1
3 0 0 1
ih which i calculate how many merchants performed a transaction in the first week (position: 1,1), how many of them performed a transaction in the second one (position: 1,2), how many performed their first transaction in 2nd week (position: 2,2) etc.
I'd like to update approximately the first X number of rows in a table but I want to always update all rows with a matching column at the same time. So if my table has:
MyID Transaction Amount Date Status
1 1 2 02/08/2016 0
1 1 4 02/08/2016 0
2 4 1 02/08/2016 0
2 3 2 02/08/2016 0
3 10 1 02/08/2016 0
3 6 4 02/08/2016 0
I want to update Status to 1 on approximately the first 5 rows, but I don't want to split up matching MyID values, how can I do that? I could update the first 4 or 6 in this example.
Here is one method:
update t
set status = 1
where myId in (select top 5 MyId from t order by MyId);