I have a table that has customers and I want to find what month the customer met or exceeded a certain number of requests.
The table has customer_id a timestamp of each request.
What I am looking for is the month (or day) that the customer met or exceeded 10000 requests. I've tried to get a running total in place but this just isn't working for me. I've left it in the code in case someone knows how I can do this.
What I have is the following:
SELECT
customer_id
, DATE_TRUNC(CAST(TIMESTAMP_MILLIS(created_timestamp) AS DATE), MONTH) as cMonth
, COUNT(created_timestamp) as searchCount
-- , SUM(COUNT (DISTINCT(created_timestamp))) OVER (ROWS UNBOUNDED PRECEDING) as RunningTotal2
FROM customer_requests.history.all
GROUP BY distributor_id, cMonth
ORDER BY 2 ASC, 1 DESC;
The representation I am after is something like this.
customer requests cMonth totalRequests
cust1 6000 2017-10-01 6000
cust1 4001 2017-11-01 10001
cust2 4000 2017-10-01 4000
cust2 4000 2017-11-01 8000
cust2 4000 2017-12-01 12000
cust2 3000 2017-12-01 3000
cust2 3000 2017-12-01 6000
cust2 3000 2017-12-01 9000
cust2 3000 2017-12-01 12000
Assuming SQL Server, try this (adjusting the cutoff at the top to get the number of transactions you need; right now it looks for the thousandth transaction per customer).
Note that this will not return customers who have not exceeded your cutoff, and assumes that each transaction has a unique date (or is issued a sequential ID number to break ties if there can be ties on date).
DECLARE #cutoff INT = 1000;
WITH CTE
AS (SELECT customer_id,
transaction_ID,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date, transaction_ID) AS RN,
COUNT(transaction_ID) OVER (PARTITION BY customer_id) AS TotalTransactions
FROM #test)
SELECT DISTINCT
customer_id,
transaction_date as CutoffTransactionDate,
TotalTransactions
FROM CTE
WHERE RN = #cutoff;
How it works:
row_number assigns a unique sequential identifier to each of a customer's transactions, in the order in which they were made. count tells you the total number of transactions a person made (assuming again one record per transaction - otherwise you would need to calculate this separately, since distinct won't work with the partition).
Then the second select returns the 1,000th (or however many you specify) row for each customer and its date, along with the total for that customer.
this is my solution.
SELECT
customerid
,SUM(requests) sumDay
,created_timestamp
FROM yourTable
GROUP BY
customerid,
created_timestamp
HAVING SUM(requests) >= 10000;
Its pretty simple. You just group according to your needs, sum up the requests and select the rows that meet your HAVING clause.
You can try the query here.
If you want a cumulative sum, you can use window functions. In Standard SQL, this looks like:
SELECT customer_id,
DATE_TRUNC(CAST(TIMESTAMP_MILLIS(created_timestamp) AS DATE), MONTH) as cMonth
COUNT(*) as searchCount,
SUM(COUNT(*)) OVER (ORDER BY MIN(created_timestamp) as runningtotal
FROM customer_requests.history.all
GROUP BY distributor_id, cMonth
ORDER BY 2 ASC, 1 DESC;
Related
So, the query is simple but i am facing issues in implementing the Sql logic. Heres the query suppose i have records like
Phoneno Company Date Amount
83838 xyz 20210901 100
87337 abc 20210902 500
47473 cde 20210903 600
Output expected is past 7 days progress as running avg of amount for each date (current date n 6 days before)
Date amount avg
20210901 100 100
20210902 500 300
20210903 600 400
I tried
Select date, amount, select
avg(lg) from (
Select case when lag(amount)
Over (order by NULL) IS NULL
THEN AMOUNT
ELSE
lag(amount)
Over (order by NULL) END AS LG)
From table
WHERE DATE>=t.date-7) as avg
From table t;
But i am getting wrong avg values. Could anyone please help?
Note: Ive tried without lag too it results the wrong avgs too
You could use a self join to group the dates
select distinct
a.dt,
b.dt as preceding_dt, --just for QA purpose
a.amt,
b.amt as preceding_amt,--just for QA purpose
avg(b.amt) over (partition by a.dt) as avg_amt
from t a
join t b on a.dt-b.dt between 0 and 6
group by a.dt, b.dt, a.amt, b.amt; --to dedupe the data after the join
If you want to make your correlated subquery approach work, you don't really need the lag.
select dt,
amt,
(select avg(b.amt) from t b where a.dt-b.dt between 0 and 6) as avg_lg
from t a;
If you don't have multiple rows per date, this gets even simpler
select dt,
amt,
avg(amt) over (order by dt rows between 6 preceding and current row) as avg_lg
from t;
Also the condition DATE>=t.date-7 you used is left open on one side meaning it will qualify a lot of dates that shouldn't have been qualified.
DEMO
You can use analytical function with the windowing clause to get your results:
SELECT DISTINCT BillingDate,
AVG(amount) OVER (ORDER BY BillingDate
RANGE BETWEEN TO_DSINTERVAL('7 00:00:00') PRECEDING
AND TO_DSINTERVAL('0 00:00:00') FOLLOWING) AS RUNNING_AVG
FROM accounts
ORDER BY BillingDate;
Here is a DBFiddle showing the query in action (LINK)
Display pymtmode, and total number of payments for those payments which were paid before the year 2015 and total number of payments should be more than 1 from the data given:
ORDERID QUOTATIONID QTYORDERED ORDERDATE STATUS PYMTDATE DELIVEREDDATE AMOUNTPAID PYMTMODE
O1001 Q1002 100 30-OCT-14 Delivered 05-NOV-14 05-NOV-14 140000 Cash
O1003 Q1003 50 15-DEC-14 Delivered 18-DEC-14 20-DEC-14 310000 Cash
O1004 Q1006 100 15-DEC-14 Delivered 25-DEC-14 30-DEC-14 80000 Cheque
O1005 Q1002 50 30-JAN-15 Delivered 01-FEB-15 03-FEB-15 70000 Cheque
O1006 Q1008 75 20-FEB-15 Delivered 22-FEB-15 23-FEB-15 161250 Cash
I've tried the below code for fetching Year and select only values before the year 2015 and grouping by year.
SELECT pymtmode, COUNT(*) as pymtcount
FROM orders
GROUP BY to_char(pymtdate, 'Year')
HAVING to_char(pymtdate,'Year')<2015 AND count(*)>1
I've learnt that group by columns/functions should be mentioned in SELECT statement as well. But this question and it's expected result doesn't relate with it. Clarity with basic explanations would help
Expected Result
PYMTMODE PYMTCOUNT
Cash 2
Thanks!
Your expected result needs pymtmode to be selected, so you must GROUP BY pymtmode and not GROUP BY to_char(pymtdate, 'Year'), because you don't need to get results for each year, right?.
Also the condition to_char(pymtdate,'Year')<2015 might as well be put in a WHERE clause, so to restrict the rows before aggregation:
SELECT pymtmode, COUNT(*) as pymtcount
FROM orders
WHERE EXTRACT(YEAR FROM pymtmode) < 2015
GROUP BY pymtmode
HAVING count(*) > 1
I strongly recommend using direct date comparisons for this purpose:
SELECT o.pymtmode, COUNT(*) as pymtcount
FROM orders o
WHERE o.pymtdate < DATE '2015-01-01'
GROUP BY o.pymtmode
HAVING COUNT(*) > 1;
Notes:
You want to do the filtering before the aggregation, not afterwards. I think that is the source of your confusion.
A direct comparison to dates makes it easier for the optimizer to generate the best execution plan.
Learning how to use table aliases (the o) is a good habit.
select pymtmode, count(pymtmode) pymtcount from orders where extract(year from to_date(pymtdate))<2015 group by pymtmode having count(pymtmode)>1;
Transaction Table
No Date Amount
1 06-07-2017 1000
2 06-07-2017 1500
3 08-07-2017 2000
4 09-07-2017 2000
5 09-07-2017 2000
6 09-07-2017 2000
Is it possible to achieve this result with single query ( no query loop)
No Date Total Amount
1 06-07-2017 2500
2 08-07-2017 2000
3 09-07-2017 6000
Are you looking for Group By?
select Trunc("Date"), -- we have to put ".." since Date is a Keyword in Oracle
sum(Amount) as "Total Amount"
from MyTable
group by Trunc("Date")
order by Trunc("Date")
Edit: it seems that Date field contains date and time, while time part should be truncated - Trunc - when aggregating (see the comments)
For the exact results:
select row_number() over (order by date) as No,
date, sum(amount) as "Total Amount"
from t
group by date
order by date;
Note: In Oracle, the date data type can contain a time component -- and this might not be visible in the output. If so, the aggregation doesn't do what you expect. If this is the case, then:
select row_number() over (order by trunc(date)) as No,
trunc(date) as date, sum(amount) as "Total Amount"
from t
group by trunc(date)
order by trunc(date);
I am trying to calculate a running total of "cab fare earned by a driver on a particular day". Originally tested on Netezza and now trying to code on spark-sql.
However if for two rows with structure as ((driver,day) --> fare) if 'fare' value is identical then running_total column always showing the final sum ! In case all the fares are distinct , it is being calculated perfectly. Is there any way to achieve this ( in ANSI SQL or Spark dataframe) without using rowsBetween(start,end) ?
Sample Data :
driver_id<<<<>>>>date_id <<<<>>>>fare
10001 2017-07-27 500
10001 2017-07-27 500
10001 2017-07-30 500
10001 2017-07-30 1500
SQL Query I fired to calculate running total
select driver_id, date_id, fare ,
sum(fare)
over(partition by date_id,driver_id
order by date_id,fare )
as run_tot_fare
from trip_info
order by 2
Result :
driver_id <<<<>>>> date_id <<<<>>>> fare <<<<>>>> run_tot_fare
10001 2017-07-27 500 1000 --**Showing Final Total expecting 500**
10001 2017-07-27 500 1000
10001 2017-07-30 500 500 --**No problem here**
10001 2017-07-30 1500 2000
If anybody can kindly let me know,what I am doing wrong and if it is achievable without using Rows Unbounded Precedings/rowsBetween(b,e), then I highly appreciate that. Thanks in advance.
The traditional solution in SQL is to use range instead of rows:
select driver_id, date_id, fare ,
sum(fare) over (partition by date_id, driver_id
order by date_id, fare
range between unbounded preceding and current rows
) as run_tot_fare
from trip_info
order by 2;
Absent that, two levels of window functions or an aggregation and join:
select driver_id, date_id, fare,
max(run_tot_fare_temp) over (partition by date_id, driver_id ) as run_tot_fare
from (select driver_id, date_id, fare ,
sum(fare) over (partition by date_id, driver_id
order by date_id, fare
) as run_tot_fare_temp
from trip_info ti
) ti
order by 2;
(The max() assumes the fares are never negative.)
I've been mulling on this problem for a couple of hours now with no luck, so I though people on SO might be able to help :)
I have a table with data regarding processing volumes at stores. The first three columns shown below can be queried from that table. What I'm trying to do is to add a 4th column that's basically a flag regarding if a store has processed >=$150, and if so, will display the corresponding date. The way this works is the first instance where the store has surpassed $150 is the date that gets displayed. Subsequent processing volumes don't count after the the first instance the activated date is hit. For example, for store 4, there's just one instance of the activated date.
store_id sales_volume date activated_date
----------------------------------------------------
2 5 03/14/2012
2 125 05/21/2012
2 30 11/01/2012 11/01/2012
3 100 02/06/2012
3 140 12/22/2012 12/22/2012
4 300 10/15/2012 10/15/2012
4 450 11/25/2012
5 100 12/03/2012
Any insights as to how to build out this fourth column? Thanks in advance!
The solution start by calculating the cumulative sales. Then, you want the activation date only when the cumulative sales first pass through the $150 level. This happens when adding the current sales amount pushes the cumulative amount over the threshold. The following case expression handles this.
select t.store_id, t.sales_volume, t.date,
(case when 150 > cumesales - t.sales_volume and 150 <= cumesales
then date
end) as ActivationDate
from (select t.*,
sum(sales_volume) over (partition by store_id order by date) as cumesales
from t
) t
If you have an older version of Postgres that does not support cumulative sum, you can get the cumulative sales with a subquery like:
(select sum(sales_volume) from t t2 where t2.store_id = t.store_id and t2.date <= t.date) as cumesales
Variant 1
You can LEFT JOIN to a table that calculates the first date surpassing the 150 $ limit per store:
SELECT t.*, b.activated_date
FROM tbl t
LEFT JOIN (
SELECT store_id, min(thedate) AS activated_date
FROM (
SELECT store_id, thedate
,sum(sales_volume) OVER (PARTITION BY store_id
ORDER BY thedate) AS running_sum
FROM tbl
) a
WHERE running_sum >= 150
GROUP BY 1
) b ON t.store_id = b.store_id AND t.thedate = b.activated_date
ORDER BY t.store_id, t.thedate;
The calculation of the the first day has to be done in two steps, since the window function accumulating the running sum has to be applied in a separate SELECT.
Variant 2
Another window function instead of the LEFT JOIN. May of may not be faster. Test with EXPLAIN ANALYZE.
SELECT *
,CASE WHEN running_sum >= 150 AND thedate = first_value(thedate)
OVER (PARTITION BY store_id, running_sum >= 150 ORDER BY thedate)
THEN thedate END AS activated_date
FROM (
SELECT *
,sum(sales_volume)
OVER (PARTITION BY store_id ORDER BY thedate) AS running_sum
FROM tbl
) b
ORDER BY store_id, thedate;
->sqlfiddle demonstrating both.