How can I generate the latest for an aggregate?

How can I generate the latest for an aggregate? - sql

Hey stackoverflow community,
I have a table of Sales, hypothetical shown below.
Customer Revenue State Date
David $100 NY 2016-01-01
David $500 NJ 2016-01-03
Fred $200 CA 2016-01-01
Fred $200 CA 2016-01-02
I'm writing a simple query of revenue generated by customer. The output returns as such:
David $600
Fred $400
What I want to do now is add the row for the latest purchase date.
Desired result:
David $600 2016-01-03
Fred $400 2016-01-02
I would like to keep the SQL code as clean as possible. I also want to avoid doing a JOIN to a new query as this query can start to get complex. Any ideas as to how to do so?

You should sum revenues in your group and get the maximum of dates.
Something like this:
SELECT
Customer, SUM(Revenue) as RevenueSum, MAX([Date]) as [Date]
FROM Sales
GROUP BY Customer

I think it's what you need
select Customer,sum(Revenue), max(Date) from Sales group by Customer

One way to get the SUM of Revenue and also get the information from the record with the MAX Date is to use the ROW_NUMBER() and SUM() windowed functions.
The SUM() OVER() will apply the sum for the Customer to each row and the ROW_NUMBER() OVER() will give each row an order number by Customer and Date DESC.
Put this in a subquery and select only the records with Row_Number of 1 (max date)
SELECT [Customer],
[Revenue],
[State],
[Date]
FROM (SELECT [Customer],
SUM([Revenue]) OVER (PARTITION BY [Customer]) [Revenue],
[State],
[Date],
ROW_NUMBER() OVER (PARTITION BY [Customer] ORDER BY [Date] DESC) Rn
FROM Sales
) t
WHERE t.Rn = 1

Related

how to avoid sum(sum()) when writing this postgres query with window functions?

Runnable query example at https://www.db-fiddle.com/f/ssrpQyyajYdZkkkAJBaYUp/0
I have a postgres table of sales; each row has a sale_id, product_id, salesperson, and price.
I want to write a query that returns, for each (salesperson, product_id) tuple with at least one sale:
The total of price for all of the sales made by that salesperson for that product (call this product_sales).
The total of price over all of that salesperson's sales (call this total_sales).
My current query is as follows, but I feel silly writing sum(sum(price)). Is there a more standard/idiomatic approach?
select
salesperson,
product_id,
sum(price) as product_sales,
sum(sum(price)) over (partition by salesperson) as total_sales
from sales
group by 1, 2
order by 1, 2
Writing sum(price) instead of sum(sum(price)) yields the following error:
column "sales.price" must appear in the GROUP BY clause or be used in an aggregate function
UPDATES
See this response for a nice approach using a WITH clause. I feel like I ought to be able to do this without a subquery or WITH.
Just stumbled on this response to a different question which proposes both sum(sum(...)) and a subquery approach. Perhaps these are the best options?

You can use a Common Table Expression to simplify the query and do it in two steps.
For example:
with
s as (
select
salesperson,
product_id,
sum(price) as product_sales
from sales
group by salesperson, product_id
)
select
salesperson,
product_id,
product_sales,
sum(product_sales) over (partition by salesperson) as total_sales
from s
order by salesperson, product_id
Result:
salesperson product_id product_sales total_sales
------------ ----------- -------------- -----------
Alice 1 2000 5400
Alice 2 2200 5400
Alice 3 1200 5400
Bobby 1 2000 4300
Bobby 2 1100 4300
Bobby 3 1200 4300
Chuck 1 2000 4300
Chuck 2 1100 4300
Chuck 3 1200 4300
See running example at DB Fiddle.

You can try the below -
select * from
(
select
salesperson,
product_id,
sum(price) over(partition by salesperson,product_id) as product_sales,
sum(price) over(partition by salesperson) as total_sales,
row_number() over(partition by salesperson,product_id order by sale_id) as rn
from sales s
)A where rn=1

Counting unique combinations up until a date - per month

I am looking into a table with transaction data of a two-sided platform, where you have buyers and sellers. I want to know the total amount of unique combinations of buyers and sellers. Let's say, Abe buys from Brandon in January, that's 1 combination. If Abe buys with Cece in February, that makes 2, but if Abe then buys from Brandon again, it's still 2.
My solution was to use the DENSE_RANK() function:
WITH
combos AS (
SELECT
t.buyerid, t.sellerid,
DENSE_RANK() OVER (ORDER BY t.buyerid, t.sellerid) AS combinations
FROM transactions t
WHERE t.transaction_date < '2018-05-01'
)
SELECT
MAX(combinations) AS total_combinations
FROM combos
This works fine. Each new combo gets a higher rank, and if you select the MAX of that result, you know the amount of unique combos.
However, I want to know this total amount of unique combos on a per month basis. The problem here is that if I group per transaction month, it only counts the unique combos in that month. In the example of Abe, it would be a unique combo in January, and then another combo in the next month, because that's how grouping works in SQL.
Example:
transaction_date buyerid sellerid
2018-01-03 3828 219
2018-01-08 2831 123
2018-02-10 3828 219
The output of DENSE_RANK() named combinations over all these rows is:
transaction_date buyerid sellerid combinations
2018-01-03 3828 219 1
2018-01-08 2831 123 2
2018-02-10 3828 219 2
And therefore, when selecting the MAX combinations you know the amount of unique buyer/seller combos, which is here.
However, I would like to see a running total of unique combos up until each start of the month, for all months until now. But, when we group on month, it would go like this:
transaction_date buyerid sellerid month combinations
2018-01-03 3828 219 jan 1
2018-01-08 2831 123 jan 2
2018-02-10 3828 219 feb 1
While I actually would want an output like:
month total_combinations_at_month_start
jan 0
feb 2
mar 2
How should I solve this? I've tried to find help on all kinds of window functions, but no luck until now. Thanks!

Here is one method:
WITH combos AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY sellerid, buyerid ORDER BY t.transaction_date) as combo_seqnum,
ROW_NUMBER() OVER (PARTITION BY sellerid, buyerid, date_trunc('month', t.transaction_date) ORDER BY t.transaction_date) as combo_month_seqnum
FROM transactions t
WHERE t.transaction_date < '2018-05-01'
)
SELECT 'Overall' as which, COUNT(*)
FROM combos
WHERE combo_seqnum = 1
UNION ALL
SELECT to_char(transaction_date, 'YYYY-MM'), COUNT(*)
FROM combos
WHERE combo_month_seqnum = 1
GROUP BY to_char(transaction_date, 'YYYY-MM');
This puts the results in separate rows. If you want a cumulative number and number per month:
SELECT to_char(transaction_date, 'YYYY-MM'),
SUM( (combo_month_seqnum = 1)::int ) as uniques_in_month,
SUM(SUM( (combo_seqnum = 1)::int )) OVER (ORDER BY to_char(transaction_date, 'YYYY-MM')) as uniques_through_month
FROM combos
GROUP BY to_char(transaction_date, 'YYYY-MM')
Here is a rextester illustrating the solution.

SQL COUNT the number purchase between his first purchase and the follow 10 months

every customer has different first-time purchase date, I want to COUNT the number of purchases they have between the following 10 months after the first purchase?
sample table
TransactionID Client_name PurchaseDate Revenue
11 John Lee 10/13/2014 327
12 John Lee 9/15/2015 873
13 John Lee 11/29/2015 1,938
14 Rebort Jo 8/18/2013 722
15 Rebort Jo 5/21/2014 525
16 Rebort Jo 2/4/2015 455
17 Rebort Jo 3/20/2016 599
18 Tina Pe 10/8/2014 213
19 Tina Pe 6/10/2016 3,494
20 Tina Pe 8/9/2016 411
my code below just use ROW_NUM function to identify the first purchase, but I don't know how to do the calculations or there's a better way to do it?
SELECT client_name,
purchasedate,
Dateadd(month, 10, purchasedate) TenMonth,
Row_number()
OVER (
partition BY client_name
ORDER BY client_name) RM
FROM mytable

You might try something like this - I assume you're using SQL Server from the presence of DATEADD() and the fact that you're using a window function (ROW_NUMBER()):
WITH myCTE AS (
SELECT TransactionID, Client_name, PurchaseDate, Revenue
, MIN(PurchaseDate) OVER ( PARTITION BY Client_name ) AS min_PurchaseDate
FROM myTable
)
SELECT Client_name, COUNT(*)
FROM myCTE
WHERE PurchaseDate <= DATEADD(month, 10, min_PurchaseDate)
GROUP BY Client_name
Here I'm creating a common table expression (CTE) with all the data, including the date of first purchase, then I grab a count of all the purchases within a 10-month timeframe.
Hope this helps.

Give this a whirl ... Subquery to get the min purchase date, then LEFT JOIN to the main table to have a WHERE clause for the ten month date range, then count.
SELECT Client_name, COUNT(mt.PurchaseDate) as PurchaseCountFirstTenMonths
FROM myTable mt
LEFT JOIN (
SELECT Client_name, MIN(PurchaseDate) as MinPurchaseDate GROUP BY Client_name) mtmin
ON mt.Client_name = mtmin.Client_name AND mt.PurchaseDate = mtmin.MinPurchaseDate
WHERE mt.PurchaseDate >= mtmin.MinPurchaseDate AND mt.PurchaseDate <= DATEADD(month, 10, mtmin.MinPurchaseDate)
GROUP BY Client_name
ORDER BY Client_name
btw I'm guessing there's some kind of ClientID involved, as nine character full name runs the risk of duplicates.

Multiple Top 1 based on count

I have a table with multiple customers and multiple transaction dates.
Cust_ID Trans_Date
------- ----------
C01 2012-02-18
C01 2012-02-27
C01 2012-03-09
C02 2012-02-15
C02 2012-03-09
C03 2012-03-30
C01 2013-01-14
C02 2013-02-21
C03 2013-01-15
C03 2013-03-07
I want to find customers with most transaction in each year and the transactions for that customer.
Below is the result I am expecting.
Year Cust_ID nTrans
---- ------- ------
2012 C01 3
2013 C03 2
Can anybody help with the script? SQL Svr version 2012.
Thanking you in advance,
Thomas

This is the "greatest N per group" problem. It's usually solved with row_number().
;WITH CTE AS (
SELECT YEAR(Trans_Date) Year,
Cust_ID,
COUNT(*) as nTrans,
ROW_NUMBER() OVER (PARTITION BY YEAR(Trans_Date) ORDER BY COUNT(*) DESC) rn
FROM Table
GROUP BY YEAR(Trans_Date),
Cust_ID
)
SELECT Year,
Cust_ID,
nTrans
FROM CTE
WHERE rn = 1
ORDER BY Trans_Year
Strictly speaking, the ROW_NUMBER() here isn't ordered in a deterministic way. As written, if there's a tie in the count, the query just returns one Cust_ID, but there's no guarantee which ID will be returned. It should either be ORDER BY COUNT(*) DESC, Cust_ID to make the results consistent, or you should use RANK() or DENSE_RANK() to allow for ties.

I haven't had the chance to test it but your solution should look something like this:
SELECT YEAR(Trans_Date) AS Year, Cust_ID, COUNT(*) AS nTrans
FROM Transactions
GROUP BY Year, Cust_ID
HAVING MAX(nTrans);
Have a look at Group by functions in SQL.

You can use the max() function on the column to find the largest value in the column. In this case you can apply max(nTrans)
eg:
SELECT MAX(column_name) FROM table_name;

Querying table with group by and sum

I have the following table called Orders
Order | Date | Total
------------------------------------
34564 | 03/05/2015| 15.00
77456 | 01/01/2001| 3.00
25252 | 02/02/2008| 4.00
34564 | 03/04/2015| 7.00
I am trying to select the distinct order sum the total and group by order #, the problem is that it shows two records for 34564 because they are different dates.. How can I sum if they are repeated orders and pick only the max(date) - But sill sum the total of the two instances?
I.E result
Order | Date | Total
------------------------------------
34564 | 03/05/2015| 22.00
77456 | 01/01/2001| 3.00
25252 | 02/02/2008| 4.00
Tried:
SELECT DISTINCT Order, Date, SUM(Total)
FROM Orders
GROUP BY Order, Date
Of couse the above won't work as you can see but i am not sure how to achieve what i intend.

SELECT [order], MAX(date) AS date, SUM(total) AS total
FROM Orders o
GROUP BY [order]

You can use the MAX aggregate function to choose the latest Date to appear from each Order group:
SELECT Order, MAX(Date) AS Date, SUM(Total) AS Total
FROM Orders
GROUP BY Order

Simplest query should be:
SELECT MAX(Order), MAX(Date), SUM(Total)
FROM Orders

You can use SUM and MAX together:
SELECT
[Order],
[Date] = MAX([Date]),
Total = SUM(Total)
FROM tbl
GROUP BY [Order]
A word of advice, please refrain from using reserved words like Order and Date for your columns and table names.

Just add MAX(Date) to your SELECT clause.
Try this :
SELECT DISTINCT Order, MAX(Date), SUM(Total)
FROM Orders
GROUP BY Order, Date

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas