Max value in pandas based on descending windows - pandas

i have some experience with pandas - but cannot figure out the following:
i have several weeks of timestamped data with multiple records within one day,
i want to add a column in which, for each day, the maximum value of the remaining records of that day is displayed.
so if 5 records remain in a particular day, i need the max the next 5 records, after that, the max of next 4 records etc etc.
I have tried to use Group By but this does not seem to do the trick,
can somebody help me out?
exampledata

This is not the fastest, but you can try this -
dt['mvalue'] = dt.sort('datetime', ascending=False).groupby('date').value.cummax()
It simply does rolling max on a reverse sorted series

Related

Postgres SQL count events with minimum and maximum time gap

I have a Postgres database table of 'events' (i.e. rows with a timestamp column). I want to count the number of events that are separated by more than a specified minimum time gap and by less than or equal to a specified maximum time gap.
For example, if there is an event on 6 consecutive days but I specify a minimum time gap of 2 days, I only want to register a count of 1 for those 6 events.
At the same time, if I specify a maximum time gap of 30 days, if two events are 30 days apart I want to register a count of 2 for the pair, but if they are 31 days apart I want to register a count of 0.
The accepted answer to the following post gives a method for counting events and satisfying the 'maximum gap' requirement using the Postgres generate_series function:
Best way to count rows by arbitrary time intervals
Maybe it's possible to modify the suggested solution to also satisfy the 'minimum gap' requirement. Can anyone advise on how I can accomplish this? Thanks.

SQL LAG function

I tried using the LAG function to calculate the value of previous weeks, but there are gaps in the data due to the fact that certain weeks are missing.
This is the table:
The problem is that the LAG functions takes the previous found week in the table. But I would like it to be zero if the previous week is not consecutive previous week.
This is what I would like it to be:
I'm open to any solutions.
Thank you in advance
Your example data is baffling. You have multiple rows per time frame. The first column looks like a string, which doesn't really make sense for the comparison.
So, let me answer based on a simpler data mode. The answer is to use range. If you had an integer column that specified the time frame:
ordering sales
1 10
2 20
3 30
5 50
Then you would phrase this as:
select max(sales) over (order by ordering range between 1 preceding and 1 preceding)
This would return the value from the "previous" row as defined by the first column. The value would be in a separate column, not a separate row.

Power pivot ytd calculation

Ok, I have watched many videos and read all sorts and I think I am nearly there, but must be missing something. In the data model I am trying to add the ytd calc to my product_table. I don't have unique dates in the product_table in column a and also they are weekly dates. I have all data for 2018 for each week of this year in set rows of 20, incrementing by one week every 20 rows. E.g. rows 1-20 are 01/01/2018, rows 21-40 are 07/01/2018, and so on.
Whilst I say they are in set rows of 20, this is an example. Some weeks there are more or less than 20 so I can't use the row count function-
Between columns c and h I have a bunch of other categories such as customer age, country etc. so there isn't a unique identifier. Do I need one for this to work? Column i is the sales column with the numbers. What I would like is a new column which gives me a ytd number for each row of data which all has unique criteria between a and h. Week 1 ytd is not going to be any different. For the next 20 rows I want it to add week1 sales to week2 sales, effectively giving me the ytd.
I could sumproduct this easily in the data set but I don't want do that. I want to use dax to save space etc..
I have a date_table which does have unique dates in the main_date column. All my date columns are formatted as date in the data model.
I have tried:
=calculate(products[sales],datesytd(date_table[main_date]))
This simply replicates the numbers in the sales column, not giving me an ytd as required. I also tried
=calculate(sum(products[sales]) ,datesytd(date_table[main_date]))
I don't know if what I am trying to do is possible. All the youtube clips don't seem to have the same issues I am having but I think they have unique dates in their data sets.
Id love to upload the data but its work stuff on a work computer so cant really. Hope I've painted the picture quite clearly.
Resolved, after googling sumif dax, mike honey had a response that i have adapted to get what i need. I needed to add the filter and earlier functions to my equarion and it ended up like this
Calculate (sum(products[sales]),
filter (sales, sales[we_date] <=earlier(sales[we_date]),
filter (sales, sales[year] =earlier(sales[year]),
filter (sales, sales[customer] =earlier(sales[customer]))
There are three other filter sections i had to add, but this now gives me the ytd i needed.
Hope this helps anyone else

Tableau combining rows with the same info

I have a dashboard in Tableau which shows different payments received - the amount, the date the payment was received, and a calculated field which shows the number days since the payment was received.
However, a lot of payments are the same, with the same amount, and received on the same day; so Tableau collapses these together, and adds the total days since the payments were received together in the final column, i.e. five lots of £5.50, each received on 1st January shows as below (as of 01/02/2018)
Column 1 Column 2 Column 3
£5.50 01/01/2018 155
But I need separate rows for each. Does anyone know how to stop tableau doing this, or of a workaround?
Many thanks.
You could try using RANK_UNIQUE function.
First of all, in the Analysis Menu, uncheck Aggregate Measures.
Then, starting from this data:
You can get this result:
Additionally, you may want to hide Rank from rows just not-showing header.
Is this something close to what you're looking for?
EDIT/UPDATE
In order to get all values and not just for the top rows, just move the Rank at the very beginning of the shelf:

SQL GROUPING SETS averages with multiple many-to-many dimensions

I have a table of data with the following:
User,Platform,Dt,Activity_Flag,Total_Purchases
1,iOS,05/05/2016,1,1
1,Android,05/05/2016,1,2
2,iOS,05/05/2016,1,0
2,Android,05/05/2016,1,2
3,iOS,05/05/2016,1,1
3,Android,06/05/2016,1,3
1,iOS,06/05/2016,1,2
4,Android,06/05/2016,1,2
1,Android,06/05/2016,1,0
3,iOS,07/05/2016,1,2
2,iOS,08/05/2016,1,0
I want to do a GROUPING SETS (Platform,Dt,(Platform,Dt),()) aggregation to be able to find for each combination of Platform and Dt the following:
Total Purchases
Total Unique Users
Average Purchases per User per Day
The first two are simple as these can be achieved via a sum(Total_Purchases) and count(distinct user) respectively.
The problem I have is with the last metric. The result set should look like this but I don't know how to get the last column to be calculated correctly:
Platform,Dt,Total_Purchases,Total_Unique_Users,Average_Purchases_Per_User_Per_Day
Android,05/05/2016,4,2,2.0
iOS,05/05/2016,2,3,0.7
Android,06/05/2016,5,3,1.7
iOS,06/05/2016,2,1,2.0
iOS,07/05/2016,2,1,2.0
iOS,08/05/2016,0,1,0.0
,05/05/2016,6,3,2.0
,06/05/2016,7,3,2.3
,07/05/2016,1,1,1.0
,08/05/2016,1,1,1.0
Android,,9,4,1.8
iOS,,6,3,1.2
,,15,4,1.6
For the first ten rows we see that getting the Average purchase per user per day is a simple division of the first two columns as the dimension in these rows represent a single date only. But when we look at the final 3 rows we see that the division is not the way to achieve the desired result. This is because it needs to take an average for each day in turn to get the overall per day amount.
If this isn't clear please let me know and I'll be happy to explain better. This is my first post on this site!