Finding a date with the largest sum - sql

I have a database of transactions, accounts, profit/loss, and date. I need to find the dates which the largest profit occurs by account. I have already found a way to find these actually max/min values but I can't seem to be able to pull the actual date from it. My code so far is like this:
Select accountnum, min(ammount)
from table
where date > '02-Jan-13'
group by accountnum
order by accountnum
Ideally I would like to see account num, the min or max, and then the date which this occurred on.

Try something like this to get the min and max amount for each customer and the date it happened.
WITH max_amount as (
SELECT accountnum, max(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
),
min_amount as (
SELECT accountnum, min(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
)
SELECT t.accountnum, ma.amount, ma.date, mi.amount, ma.date
FROM table t
JOIN max_amount ma
ON ma.accountnum = t.accountnum
JOIN min_amount mi
ON mi.accountnum = t.accountnum
If you want the data for just this year you could add a where clause to the end of the statement
WHERE t.date > '02-Jan-13'

The easiest way to do this is using window/analytic functions. These are ANSI standard and most databases support them (MySQL and Access being two notable exceptions).
Here is one way:
select t.accountnum, min_amount, max_amount,
min(case when amount = min_amount then date end) as min_amount_date,
min(case when amount = min_amount then date end) as max_amount_date,
from (Select t.*,
min(amount) over (partition by accountnum) as min_amount,
max(amount) over (partition by accountnum) as max_amount
from table t
where date > '02-Jan-13'
) t
group by accountnum, min_amount, max_amount;
order by accountnum
The subquery calculates the minimum and maximum amount for each account, using min() as a window function. The outer query selects these values. It then uses conditional aggregation to get the first date when each of those values occurred.

;with cte as
(
select accountnum, ammount, date,
row_number() over (partition by accountnum order by ammount desc) rn,
max(ammount) over (partition by accountnum) maxamount,
min(ammount) over (partition by accountnum) minamount
from table
where date > '20130102'
)
select accountnum,
ammount as amount,
date as date_of_max_amount,
minamount,
maxamount
from cte where rn = 1

Related

First users by categories in BigQuery

How can I count the new and existing users by categories and years?
For instance, during 2015-2020 if someone bought a product in category_A in 2016 first, it will be counted as a new uesr in 2016 in category_A although this user bought a product in category_B in 2015.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
One approach uses two levels of aggregation:
select extract(year from mindate) yr, category, count(*) num_new
from (
select user_id, category, min(date) mindate
from table_1
group by user_id, category
) t
group by extract(year from mindate)
The subquery retrieves the first purchase date of each user by category. Then, the outer query aggregates by the year of that date.
If you want the count of current users as well, then it is a bit different. You can use a window function in the subquery rather than aggregation, then count distinct values in the outer query:
select extract(year from mindate) yr, category,
countdistinctif(user_id, date = mindate) num_new,
countdistinct(user_id) num_total
from (
select date, user_id, category, min(date) over(partition by user_id, category) mindate
from table_1
) t
group by extract(year from mindate)
Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT *,
0 = COUNT(1) OVER(
PARTITION BY user_id, category
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) new_user
FROM `project.dataset.table_1`
ORDER BY date, user_id
)
SELECT EXTRACT(YEAR FROM date) AS year,
category,
COUNT(DISTINCT IF(new_user, user_id, NULL)) AS num_new,
COUNT(DISTINCT IF(new_user, NULL, user_id)) AS num_existing
FROM temp
GROUP BY year, category

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

SQL: Dividing daily data by a monthly index

I have daily transaction data that is a product of this query:
SELECT transaction_date ,
Merchant,
Amount
into transaction.table
FROM source.table
WHERE (DESCRIPTION iLIKE '%Criteria%')
The field transaction_date is in the format of DATE (yyyy-MM-dd).
What I would like to do is take each row/transaction in transaction.table and divide Amount by a value tied to its RESPECTIVE month (this is key) contained in a separate table called Calendar.
The separate table called Calendar is queried from the same source.table as below:
select month,count(*) as distinct_month
into source.Calendar
from
(
select Population, to_char(optimized_transaction_date, 'YYYY-MM') as month
FROM source.table
group by Population, to_char(optimized_transaction_date, 'YYYY-MM')
)
group by month
My goal is to get a value for each day: Amount / distinct_month.
The key part is matching the daily data (transaction_date) in the first query with the monthly data in the second query (month).
Note that month from second query is a varchar whereas transact_date in first query is DATE.
I think you want something like this:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT transaction_date, Merchant, Amount, Description,
(Amount / count(distinct population) over (partition by to_char(transaction_date, 'YYYY-MM')
) as newval
FROM source.table
) t
WHERE DESCRIPTION iLIKE '%Criteria%';
You only need the subquery because the total is calculated over all the data, without the filter condition.
EDIT:
Oops, I forgot that Postgres doesn't support COUNT(DISTINCT) as a window function. So do:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT t.*,
(Amount / SUM( (seqnum = 1)::int) OVER (partition by to_char(transaction_date, 'YYYY-MM') )
) as newval
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY partition by to_char(transaction_date, 'YYYY-MM'), population ORDER BY population) as seqnum
FROM source.table t
) t
) t
WHERE DESCRIPTION iLIKE '%Criteria%';

SQL: aggregation (group by like) in a column

I have a select that group by customers spending of the past two months by customer id and date. What I need to do is to associate for each row the total amount spent by that customer in the whole first week of the two month time period (of course it would be a repetition for each row of one customer, but for some reason that's ok ). do you know how to do that without using a sub query as a column?
I was thinking using some combination of OVER PARTITION, but could not figure out how...
Thanks a lot in advance.
Raffaele
Query:
select customer_id, date, sum(sales)
from transaction_table
group by customer_id, date
If it's a specific first week (e.g. you always want the first week of the year, and your data set normally includes January and February spending), you could use sum(case...):
select distinct customer_id, date, sum(sales) over (partition by customer_ID, date)
, sum(case when date between '1/1/15' and '1/7/15' then Sales end)
over (partition by customer_id) as FirstWeekSales
from transaction_table
In response to the comments below; I'm not sure if this is what you're looking for, since it involves a subquery, but here's my best shot:
select distinct a.customer_id, date
, sum(sales) over (partition by a.customer_ID, date)
, sum(case when date between mindate and dateadd(DD, 7, mindate)
then Sales end)
over (partition by a.customer_id) as FirstWeekSales
from transaction_table a
left join
(select customer_ID, min(date) as mindate
from transaction_table group by customer_ID) b
on a.customer_ID = b.customer_ID

Get Last Record From Each Month

Unfortunately SQL doesn't come to me very easily. I have two tables, a Loan table and a LoanPayments table.
LoanPayments Table:
ID (Primary Key), LoanID (matches an ID on loan table), PaymentDate, Amount, etc.
I need a sql statement that can give me the last payment entered on each month (if there is one). My current statement isn't giving me the results. There is also the problem that sometimes there will be a tie for the greatest date in that month, so I need to be able to deal with that too (my idea was to select the largest ID in the case of a tie).
This is what I have so far (I know it's wrong but I don't know why.):
SELECT lp.ID, lp.LoanID, lp.PaymentDate
FROM LoanPayments lp
WHERE lp.PaymentDate in (
SELECT DISTINCT MAX(PaymentDate) as PaymentDate
FROM LoanPayments
WHERE IsDeleted = 0
AND ReturnDate is null
GROUP BY YEAR(PaymentDate), Month(PaymentDate)
)
AND CAST(PaymentDate as date) >= CAST(DATEADD(mm, -24, GETDATE()) as date)
The last part is just filtering it so I only get the last 24 months of payments. Thanks for your help and for taking the time to help me with this issue.
You could use the ROW_NUMBER() function here:
SELECT *
FROM (SELECT lp.ID, lp.LoanID, lp.PaymentDate
, ROW_NUMBER() OVER (PARTITION BY YEAR(PaymentDate), Month(PaymentDate) ORDER BY PaymentDate DESC) 'RowRank'
FROM LoanPayments lp
)sub
WHERE RowRank = 1
That's just the most recent PaymentDate for each month, if you wanted it by LoanID you'd add LoanID to the PARTITION BY list. If you were interested in preserving ties you could use RANK() instead of ROW_NUMBER()
STEP 1: Use a windowing function to add a column that holds that max PaymentDate by month
SELECT
ID,
LoanID,
PaymentDate,
MAX(PaymentDate) OVER(PARTITION BY YEAR(PaymentDate), MONTH(PaymentDate)) AS MaxPaymentDate,
ROW_NUMBER() OVER(PARTITION BY PaymentDate ORDER BY ID) AS TieBreaker
FROM LoanPayments
WHERE IsDeleted = 0
AND ReturnDate is null
STEP 2: Filter those results to just the rows you want
SELECT ID,LoanID,PaymentDate
FROM (
SELECT
ID,
LoanID,
PaymentDate,
MAX(PaymentDate) OVER(PARTITION BY YEAR(PaymentDate), MONTH(PaymentDate)) AS MaxPaymentDate,
ROW_NUMBER() OVER(PARTITION BY PaymentDate ORDER BY ID) AS TieBreaker
FROM LoanPayments
WHERE IsDeleted = 0
AND ReturnDate is null
) t1
WHERE PaymentDate = MaxPaymentDate AND TieBreaker = 1
This method is more efficient than doing a self-join.