SQL get the first date the condition exists - sql

I have a collections table:
subscription_id, transaction_date_est, start_date_est, end_date_est, transaction_id, invoice_number, user_id, transaction_amount, plan_code, transaction_type
I have users who purchased a subscription, for each subscription I generate a subscription ID. Each subscription has between 1 to many invoices.
For each subscription I try to find when the user passed the accumulated amount of $100 with using a window function.
My Expected results: subscription_id, transaction_date
I tried:
SELECT subscription_id, MAX(date) transaction_date
FROM (SELECT subscription_id,
SUM(usd_price) LAG (partition by subscription_id
ORDER BY first_transaction_date ASC) AS total
GROUP BY(current_period_started_at)
HAVING BY total >= 100
ORDER BY usd_price)
but I don't succeed to extract the first date the user passed 100$.

I think you want:
select c.*
from (
select c.*, sum(usd_price) over((partition by subscription_id order by transaction_date) sum_usd_price
from collections c
) c
where sum_usd_price >= 100 and sum_usd_price - usd_price < 100

Related

PostgreSQL Query To Obtain Value that Occurs more than once in 12 months

I have the following query to return the number of users that booked a flight at least twice, but I need to identify those which have booked a flight more than once in the range of 12 months
SELECT COUNT(*)
FROM sales
WHERE customer in
(
SELECT customer
FROM sales
GROUP BY customer
HAVING COUNT(*) > 1
)
You would use window functions. The simplest method is lag():
select count(distinct customer)
from (select s.*,
lag(date) over (partition by customer order by date) as prev_date
from sales s
) s
where prev_date > s.date - interval '12 month';
At the cost of a self-join, #AdrianKlaver's answer can adapt to any 12-month period.
SELECT COUNT(DISTINCT customer) FROM
(SELECT customer
FROM sales s1
JOIN sales s2
ON s1.customer = s2.customer
AND s1.ticket_id <> s2.ticket_id
AND s2.date_field BETWEEN s1.date_field AND (s1.date_field + interval'1 year')
GROUP BY customer
HAVING COUNT(*) > 1) AS subquery;
A stab at it with a made up date field:
SELECT COUNT(*)
FROM sales
WHERE customer in
(
SELECT customer
FROM sales
WHERE date_field BETWEEN '01/01/2019' AND '12/31/2019'
GROUP BY customer
HAVING COUNT(*) > 1
)

Add columns to SQL query and filter by min(date) and sum(price)

I am trying to generate a list of users who's first purchase was in December 2018 and have spent over 100 dollars since then in SQL. I'm able to generate the list of users, but I'm unable to determine what their first purchase was or other variables and it appears to be an issue since the columns I'm trying to include are neither grouped nor aggregated so I'm hoping someone can point me in the right direction as I'm new to SQL.
Here's my code to generate the list I want to add more columns to:
select billing_address.name, contact_email, min(processed_at) as First_Purchase_Date, sum(total_price) as Total_Revenue
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
FROM `table.orders`
) orders -- identify duplicate rows
WHERE instance = 1
group by contact_email, billing_address.name
having min(processed_at) between '2019-01-01 00:00:00 UTC' and '2019-02-01 00:00:00 UTC' and sum(total_price) > 100
order by sum(total_price) desc
Is there some way I can modify this to pull each user's purchase from this list into a separate row and include more columns? So I'd pull in each user (and ALL of their purchases) who has a min(processed_at) in December 2018 AND their sum(total_price) > 100? something like this:
SELECT contact_email, billing_address, line_items, min(processed_at), sum(total_price) OVER (PARTITION BY contact_email)
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
FROM `table.orders`
) orders -- identify duplicate rows
WHERE instance = 1
However, the sum(total_price) doesn't work in this case and I can't filter by min(processed_at). Can someone guide me in the right direction?
I think that should use window functions instead of aggregation. You can compute the date of the first purchase and the total amount spent on the fly in a subquery, without aggregating (your original group by columns become the partition columns of the window functions). Then you can use these information to filter in the outer query.
This should get you close to what you want:
select o.*
from (
select
o.*,
min(processed_at) over(partition by contact_email, billing_address) min_processed_at,
sum(total_price) over(partition by contact_email, billing_address) sum_total_price
from (
select
o.*,
row_number() over(partition by id) instance
from orders o
) o
where instance = 1
) o
where
processed_at between '2019-01-01 00:00:00 UTC' and '2019-02-01 00:00:00 UTC'
and sum_total_price > 100
Your question was a bit unclear as you did not provide much detail about your input tables or your expected output, so this is a guess.
The following query gets all transactions from users who meet the criteria:
-- BigQuery StandardSQL
with ordered_orders as (
--rank each ID by processed_at date first to last
select *, row_number() over(partition by id order by processed_at asc) as rn
from `table.orders`
),
first_criteria as (
-- select IDs where first processed_at date is in 2018-12
select id, processed_at as first_order_date
from ordered_orders
where rn = 1
and extract(year from processed_at) = 2018
and extract(month from processed_at) = 12
),
second_criteria as (
-- further select IDs who meet first criteria and have a total of > 100
select id, sum(total_prices) as total_revenue
from ordered_orders
inner join first_criteria using(id)
group by id
having total_revenue > 100
),
orders_with_criteria as (
-- get all orders for users who meet both criteria
select ordered_orders.* except(rn), first_order_date, total_revenue
from ordered_orders
inner join first_criteria using(id)
inner join second_criteria using(id)
),
-- select any fields you want
select * from orders_with_criteria
I prefer liberal use of CTEs in cases like this to keep the logic clear.
I also wouldn't be surprised if this query doesn't work as you intend. I think it is highly doubtful that the ID column in your orders table refers to the customer id, which is what you/we are partitioning on. Depending on who set up your tables, id probably refers to the order id. If you have a customer_id (or account #, etc), then I would use that instead of id in the query.
No need to use row_number() in BigQuery for this:
SELECT billing_address.name, contact_email,
MIN(processed_at) as First_Purchase_Date,
SUM(total_price) as Total_Revenue,
ARRAY_AGG(o ORDER BY processed_at LIMIT 1) as first_order
FROM `table.orders` o
WHERE instance = 1
GROUP BY contact_email, billing_address.name
HAVING MIN(processed_at) >= '2019-01-01 00:00:00 UTC' AND
MIN(processed_at) < '2019-02-01 00:00:00 UTC' AND
SUM(total_price) > 100
ORDER BY SUM(total_price) desc;
This returns the entire first order as a struct. You can select specific columns, if you prefer.

Tag first observation for a user over products

I have a data table with the columns user_id, purchase_date and a date column in standard yyyy-mm-dd form.
Users in this table purchase multiple items of the same product (and different products) in the same month, so I needed to be able to capture the first time that they bought a particular product and then count each product by month.
I did it with the following:
SELECT yr, mo, COUNT(DISTINCT(CASE WHEN Product = 'product_a'
THEN user_id)) AS product_a
FROM
(
SELECT YEAR(min(purchase_date)) AS yr, MONTH(min(pruchase_date)) AS mo,
DAY(min(purchase_date)) AS dy, user_id, Product
FROM daily_purchases
GROUP BY user_id, Product
) b
GROUP BY yr, mo
ORDER BY yr, mo
This seems to work fine and capture what I am looking for. Does anyone have any suggestions - or is this the most appropriate way to go about it? Thanks!
I don't have any source data to test ... but here is how I would approach it ... this may need some tweaking as I have not tested it with source data:
Select [yr], [mo], [user_id], [firstBuy], [cntBuys] From
(
SELECT [yr], [mo], [user_id],
ROW_NUMBER() over (partition by user_id order by purchase_date) as 'firstBuy'
,COUNT(*) over (partition by user_id, Product) as 'cntBuys'
FROM daily_purchases
) a
Where a.firstBuy = 1
Group by a.yr, a.mo

Days Since Last Help Ticket was Filed

I am trying to create a report to show me the last date a customer filed a ticket.
Customers can file dozens of tickets. I want to know when the last ticket was filed and show how many days it's been since they have done so.
The fields I have are:
Customer,
Ticket_id,
Date_Closed
All from the Same table "Tickets"
I'm thinking I want to do a ranking of tickets by min date? I tried this query to grab something but it's giving me all the tickets from the customer. (I'm using SQL in a product called Domo)
select * from (select *, rank() over (partition by "Ticket_id"
order by "Date_Closed" desc) as date_order
from tickets ) zd
where date_order = 1
This should be simple enough,
SELECT customer,
MAX (date_closed) last_date,
ROUND((SYSDATE - MAX (date_closed)),0) days_since_last_ticket_logged
FROM emp
GROUP BY customer
select Customer, datediff(day, date_closed, current_date) as days_since_last_tkt
from
(select *, rank() over (partition by Customer order by "Date_Closed" desc) as date_order
from tickets) zd
join tickets t on zd.date_closed = t.date_closed
where zd.date_order = 1
Or you can simply do
select customer, datediff(day, max(Date_closed), current_date) as days_since_last_tkt
from tickets
group by customer
To select other fields
select t.*
from tickets t
join (select customer, max(Date_closed) as mxdate,
datediff(day, max(Date_closed), current_date) as days_since_last_tkt
from tickets
group by customer) tt
on t.customer = tt.customer and tt.mxdate = t.date_closed
I would do this with a simple sub-query to select the last closed date for the customer. Then compare this to today with datediff() to get the number of days since last closed.
Select
LastTicket.Customer,
LastTicket.LastClosedDate,
DateDiff(day,LastTicket.LastClosedDate,getdate()) as DaysSinceLastClosed
From
(select
tickets.customer
max(tickets.dateClosed) as LastClosedDate
from tickets
Group By tickets.Customer) as LastTicket
Based on the responses this is what I did:
select "Customer",
Max("date_closed") "last_date,
round(datediff(DAY, CURRENT_DATE, max("date_closed")), 0) as "Closed_date"
from tickets
group by "Customer"
ORDER BY "Customer"

Finding a date with the largest sum

I have a database of transactions, accounts, profit/loss, and date. I need to find the dates which the largest profit occurs by account. I have already found a way to find these actually max/min values but I can't seem to be able to pull the actual date from it. My code so far is like this:
Select accountnum, min(ammount)
from table
where date > '02-Jan-13'
group by accountnum
order by accountnum
Ideally I would like to see account num, the min or max, and then the date which this occurred on.
Try something like this to get the min and max amount for each customer and the date it happened.
WITH max_amount as (
SELECT accountnum, max(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
),
min_amount as (
SELECT accountnum, min(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
)
SELECT t.accountnum, ma.amount, ma.date, mi.amount, ma.date
FROM table t
JOIN max_amount ma
ON ma.accountnum = t.accountnum
JOIN min_amount mi
ON mi.accountnum = t.accountnum
If you want the data for just this year you could add a where clause to the end of the statement
WHERE t.date > '02-Jan-13'
The easiest way to do this is using window/analytic functions. These are ANSI standard and most databases support them (MySQL and Access being two notable exceptions).
Here is one way:
select t.accountnum, min_amount, max_amount,
min(case when amount = min_amount then date end) as min_amount_date,
min(case when amount = min_amount then date end) as max_amount_date,
from (Select t.*,
min(amount) over (partition by accountnum) as min_amount,
max(amount) over (partition by accountnum) as max_amount
from table t
where date > '02-Jan-13'
) t
group by accountnum, min_amount, max_amount;
order by accountnum
The subquery calculates the minimum and maximum amount for each account, using min() as a window function. The outer query selects these values. It then uses conditional aggregation to get the first date when each of those values occurred.
;with cte as
(
select accountnum, ammount, date,
row_number() over (partition by accountnum order by ammount desc) rn,
max(ammount) over (partition by accountnum) maxamount,
min(ammount) over (partition by accountnum) minamount
from table
where date > '20130102'
)
select accountnum,
ammount as amount,
date as date_of_max_amount,
minamount,
maxamount
from cte where rn = 1