I have data in three col: account_id, date paid and amount. What I need is to decrement values in the col C by one per account. If another payment was made, add the new payment plus the last decrement value. After trying with analytical queries and simple calculator, I feel that there is no other way but to do it with js. Have you had a similar use case?
Input:
Output:
Updated:
Related
I have a table that has each transaction along with a field that shows how many units were cancelled in the order. If I filter the table on cancelled_units > 0 i can pull all transactions that are cancelled. There is also detailed date information for each transaction but I think I only need date. I need to create a rate calculation of total cancelled orders / total orders to get cancellation rate and then spread that out across every week for the past 12 months. I was thinking maybe using a CASE statement with some sort of counter in place? Also, I am using Databricks so maybe there is some built in function or operators that would make this easier. Appreciate you taking a look at my question.
From the context provided, you have a data frame with the list of transactions. It is also clear that there is a transaction column, the timestamp indicating when the order was placed, and the number of units cancelled in each transaction. So, when you filter your data frame on condition cancelled_units>0, and count the number of fields, you get number of cancelled_orders.
Using spark in Databricks:
Now, to find the rate of cancellation (cancelled_orders/total_orders) for every week in the past 12 months. I was able to find a way that allows you to calculate the rate of cancellation in a PARTICULAR YEAR (not past 12 months). So, gather all the records in a certain year first.
Since the timestamp indicating when the order was placed is already available in the data frame, we can use this timestamp to find out which week of the year this transaction was made. You can use the following way to achieve this (similar sytax for both pyspark and spark with scala).
df.withColumn("order_placed_week",date_format(col("transaction_date"), "w")).show()
Here transaction_date is a timestamp. If it is a date, then use the following method.
df.withColumn("order_placed_week", date_format(to_date("transaction_date", "dd/mm/yyyy"), "w")).show()
to_date() function helps you to specify the format of the transaction_date in your data frame.
The libraries that are required to be imported are:
For pyspark:
from pyspark.sql.functions import to_date, date_format, col
Reference: https://www.datasciencemadesimple.com/get-month-year-and-quarter-from-date-in-pyspark/
For spark:
import org.apache.spark.sql.functions._
Reference: https://sparkbyexamples.com/spark/spark-how-to-get-a-day-and-week-of-year/
After completing this process, you can use the resulting data frame with order_placed_week column to get cancellation rate.
Get the count of orders for each week number, and then the count of orders with cancelled units using groupBy and filter. Dividing the count_of_cancelled/total_count for each week will give your desired result.
I have a dashboard in Tableau which shows different payments received - the amount, the date the payment was received, and a calculated field which shows the number days since the payment was received.
However, a lot of payments are the same, with the same amount, and received on the same day; so Tableau collapses these together, and adds the total days since the payments were received together in the final column, i.e. five lots of £5.50, each received on 1st January shows as below (as of 01/02/2018)
Column 1 Column 2 Column 3
£5.50 01/01/2018 155
But I need separate rows for each. Does anyone know how to stop tableau doing this, or of a workaround?
Many thanks.
You could try using RANK_UNIQUE function.
First of all, in the Analysis Menu, uncheck Aggregate Measures.
Then, starting from this data:
You can get this result:
Additionally, you may want to hide Rank from rows just not-showing header.
Is this something close to what you're looking for?
EDIT/UPDATE
In order to get all values and not just for the top rows, just move the Rank at the very beginning of the shelf:
We are trying to calculate average stock from a movements table in a single sql sentence.
As far as we are, no problem with what we thought was a standard approach, instead of adding up the daily stock and divide by the number of days, as we don’t have daily stock, we simply add (movements*remaining days) :
select sum(quantity*(END_DATE-move_date))/(END_DATE-START_DATE)
from move_table
where move_date<=END_DATE
This is a simplified example, in real life we already take care of the initial stock at the starting date. Let’s say there are no movements prior to start_date.
Quantity sign depends on move type (sale, purchase, inventory, etc).
Of course this is done grouping by product, warehouse, ... but you get the idea.
It works as expected and the calculus is fine.
But (there is always a “but”), our customer doesn’t like accounting days when there is no stock (all stock sold out). So, he doesnt like
Sum of (daily_stock) / number_of_days (which is what we calculate using a diferent math)
Instead, he would like
Sum of (daily stock) / number_of_days_in_which_stock_is_not_zero
For sure we can do this in any programming language without much effort, but I was wondering how to do it using plain sql ... and wasn’t able to come up with a solution.
Any suggestion?
Consider creating a new table called something like Stock_EndOfDay_History that has the following columns.
stock#
date
stock_count_eod
This table would get a new row for each stock item at the start of a new day for the prior day. Rows could then be purged from this table once the applicable date value went outside the date window of interest.
To get the "number_of_days_in_which_stock_is_not_zero", use this.
SELECT COUNT(*) AS 'Not_Zero_Stock_Days' FROM Stock_EndOfDay_History
WHERE stock# = <stock#_value>
AND <date_window_clause>
Other approaches might attempt to just add a new column to the existing stock table to maintain a cumulative sum of the " number_of_days_in_which_stock_is_not_zero". But inevitably, questions will be asked as to how did the non-zero stock days count get calculated? Using this new table approach will address those questions better than the new column approach.
I have a table of data with the following:
User,Platform,Dt,Activity_Flag,Total_Purchases
1,iOS,05/05/2016,1,1
1,Android,05/05/2016,1,2
2,iOS,05/05/2016,1,0
2,Android,05/05/2016,1,2
3,iOS,05/05/2016,1,1
3,Android,06/05/2016,1,3
1,iOS,06/05/2016,1,2
4,Android,06/05/2016,1,2
1,Android,06/05/2016,1,0
3,iOS,07/05/2016,1,2
2,iOS,08/05/2016,1,0
I want to do a GROUPING SETS (Platform,Dt,(Platform,Dt),()) aggregation to be able to find for each combination of Platform and Dt the following:
Total Purchases
Total Unique Users
Average Purchases per User per Day
The first two are simple as these can be achieved via a sum(Total_Purchases) and count(distinct user) respectively.
The problem I have is with the last metric. The result set should look like this but I don't know how to get the last column to be calculated correctly:
Platform,Dt,Total_Purchases,Total_Unique_Users,Average_Purchases_Per_User_Per_Day
Android,05/05/2016,4,2,2.0
iOS,05/05/2016,2,3,0.7
Android,06/05/2016,5,3,1.7
iOS,06/05/2016,2,1,2.0
iOS,07/05/2016,2,1,2.0
iOS,08/05/2016,0,1,0.0
,05/05/2016,6,3,2.0
,06/05/2016,7,3,2.3
,07/05/2016,1,1,1.0
,08/05/2016,1,1,1.0
Android,,9,4,1.8
iOS,,6,3,1.2
,,15,4,1.6
For the first ten rows we see that getting the Average purchase per user per day is a simple division of the first two columns as the dimension in these rows represent a single date only. But when we look at the final 3 rows we see that the division is not the way to achieve the desired result. This is because it needs to take an average for each day in turn to get the overall per day amount.
If this isn't clear please let me know and I'll be happy to explain better. This is my first post on this site!
By default, the SUM - sums up all the column values. But in my case, i am having a report which is grouped by Name. A name can have single offer with multiple start date's. So, a report has to display each entry for all different start date i.e Same name, offer, players only difference is the date. So for ex, when you sum up the players, only one entry per name needs to taken into account. Because, even though it has multiple start date, other entries are same and duplicated.
The expected result should be like,
The offer cost $10 refers to same $10, so it should be added only once. Similarly for players, etc., But i need the display as shown above, each entries should be shown.
How to solve this?
If all you want to do is avoid aggregating the value in the group total row, as in your example, just remove the aggregation from the expression, i.e. change:
=Sum(Fields!Players.Value)
to:
=Fields!Players.Value
This just returns the first Players value in the Scope - since it's the same value for every row this should be fine.
If you need to further aggregate this value to something like a grand total row, you have a couple of options.
For 2008R2 and above, you can use nested aggregates as an expression in the report - something like:
=Sum(Max(Fields!Players.Value,"MyGroup"))
For 2008 and below, you will need to add the aggregate value to each row in the Dataset and use this without aggregation in the report as required.
I haven’t worked with SSRS much but if this was a regular SQL query you would have to group by date range.
Try adding start date column and check if you can add another group by on top of what you already have.
It would be useful if you can provide more details here like table schema you use for retrieving the data.