SQL: Dividing daily data by a monthly index - sql

I have daily transaction data that is a product of this query:
SELECT transaction_date ,
Merchant,
Amount
into transaction.table
FROM source.table
WHERE (DESCRIPTION iLIKE '%Criteria%')
The field transaction_date is in the format of DATE (yyyy-MM-dd).
What I would like to do is take each row/transaction in transaction.table and divide Amount by a value tied to its RESPECTIVE month (this is key) contained in a separate table called Calendar.
The separate table called Calendar is queried from the same source.table as below:
select month,count(*) as distinct_month
into source.Calendar
from
(
select Population, to_char(optimized_transaction_date, 'YYYY-MM') as month
FROM source.table
group by Population, to_char(optimized_transaction_date, 'YYYY-MM')
)
group by month
My goal is to get a value for each day: Amount / distinct_month.
The key part is matching the daily data (transaction_date) in the first query with the monthly data in the second query (month).
Note that month from second query is a varchar whereas transact_date in first query is DATE.

I think you want something like this:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT transaction_date, Merchant, Amount, Description,
(Amount / count(distinct population) over (partition by to_char(transaction_date, 'YYYY-MM')
) as newval
FROM source.table
) t
WHERE DESCRIPTION iLIKE '%Criteria%';
You only need the subquery because the total is calculated over all the data, without the filter condition.
EDIT:
Oops, I forgot that Postgres doesn't support COUNT(DISTINCT) as a window function. So do:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT t.*,
(Amount / SUM( (seqnum = 1)::int) OVER (partition by to_char(transaction_date, 'YYYY-MM') )
) as newval
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY partition by to_char(transaction_date, 'YYYY-MM'), population ORDER BY population) as seqnum
FROM source.table t
) t
) t
WHERE DESCRIPTION iLIKE '%Criteria%';

Related

First users by categories in BigQuery

How can I count the new and existing users by categories and years?
For instance, during 2015-2020 if someone bought a product in category_A in 2016 first, it will be counted as a new uesr in 2016 in category_A although this user bought a product in category_B in 2015.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
One approach uses two levels of aggregation:
select extract(year from mindate) yr, category, count(*) num_new
from (
select user_id, category, min(date) mindate
from table_1
group by user_id, category
) t
group by extract(year from mindate)
The subquery retrieves the first purchase date of each user by category. Then, the outer query aggregates by the year of that date.
If you want the count of current users as well, then it is a bit different. You can use a window function in the subquery rather than aggregation, then count distinct values in the outer query:
select extract(year from mindate) yr, category,
countdistinctif(user_id, date = mindate) num_new,
countdistinct(user_id) num_total
from (
select date, user_id, category, min(date) over(partition by user_id, category) mindate
from table_1
) t
group by extract(year from mindate)
Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT *,
0 = COUNT(1) OVER(
PARTITION BY user_id, category
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) new_user
FROM `project.dataset.table_1`
ORDER BY date, user_id
)
SELECT EXTRACT(YEAR FROM date) AS year,
category,
COUNT(DISTINCT IF(new_user, user_id, NULL)) AS num_new,
COUNT(DISTINCT IF(new_user, NULL, user_id)) AS num_existing
FROM temp
GROUP BY year, category

First user by category

How can I count the new users for each category who bought in the category for the first by year? For instance, 2015-2020 by year, if someone bought in 2015 for the first it will be counted as a new uesr in 2015 but not in 2016-2020.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
You’ll want to start with a sub query to get the first date each user purchased in the category. This is a pretty straightforward group by problem:
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category;
Next, you can use Postgres’s date_trunc function to group by year and category, using your first query as a sub query:
select
category,
date_trunc('year', first_category_purchase)
count(*)
from (
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category
) a
group by 1, 2;
In Postgres, one method is group by after a distinct on:
select date, count(*) as num_new_users
from (select distinct on (user_id, category) t.*
from t
order by user_id, category, date asc
) d
group by date
order by date;
If date is really a date and not a year, then you need something like to_char() or date_trunc() to convert it to a year.

BigQuery RATIO_TO_REPORT for all data no partition

I want calculate ratio of specify field, I know in legacy sql I can use RATIO_TO_REPORT function ex:
SELECT
month,
RATIO_TO_REPORT(totalPoint) over (partition by month)
FROM (
SELECT
format_datetime('%Y-%m', ts) AS month,
SUM(point) AS totalPoint
FROM
`userPurchase`
GROUP BY
month
ORDER BY
month )
but I want get ratio that calculate by all data without partition, ex:(this code not work)
SELECT
month,
RATIO_TO_REPORT(totalPoint) over (partition by "all"),
# RATIO_TO_REPORT(totalPoint) over (partition by null)
FROM (
SELECT
format_datetime('%Y-%m', ts) AS month,
SUM(point) AS totalPoint
FROM
`userPurchase`
GROUP BY
month
ORDER BY
month )
It doesn't work, How I can do for same thing? thanks!
assuming the rest of the code is correct - just omit partition by part
RATIO_TO_REPORT(totalPoint) OVER ()

Tag first observation for a user over products

I have a data table with the columns user_id, purchase_date and a date column in standard yyyy-mm-dd form.
Users in this table purchase multiple items of the same product (and different products) in the same month, so I needed to be able to capture the first time that they bought a particular product and then count each product by month.
I did it with the following:
SELECT yr, mo, COUNT(DISTINCT(CASE WHEN Product = 'product_a'
THEN user_id)) AS product_a
FROM
(
SELECT YEAR(min(purchase_date)) AS yr, MONTH(min(pruchase_date)) AS mo,
DAY(min(purchase_date)) AS dy, user_id, Product
FROM daily_purchases
GROUP BY user_id, Product
) b
GROUP BY yr, mo
ORDER BY yr, mo
This seems to work fine and capture what I am looking for. Does anyone have any suggestions - or is this the most appropriate way to go about it? Thanks!
I don't have any source data to test ... but here is how I would approach it ... this may need some tweaking as I have not tested it with source data:
Select [yr], [mo], [user_id], [firstBuy], [cntBuys] From
(
SELECT [yr], [mo], [user_id],
ROW_NUMBER() over (partition by user_id order by purchase_date) as 'firstBuy'
,COUNT(*) over (partition by user_id, Product) as 'cntBuys'
FROM daily_purchases
) a
Where a.firstBuy = 1
Group by a.yr, a.mo

Finding a date with the largest sum

I have a database of transactions, accounts, profit/loss, and date. I need to find the dates which the largest profit occurs by account. I have already found a way to find these actually max/min values but I can't seem to be able to pull the actual date from it. My code so far is like this:
Select accountnum, min(ammount)
from table
where date > '02-Jan-13'
group by accountnum
order by accountnum
Ideally I would like to see account num, the min or max, and then the date which this occurred on.
Try something like this to get the min and max amount for each customer and the date it happened.
WITH max_amount as (
SELECT accountnum, max(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
),
min_amount as (
SELECT accountnum, min(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
)
SELECT t.accountnum, ma.amount, ma.date, mi.amount, ma.date
FROM table t
JOIN max_amount ma
ON ma.accountnum = t.accountnum
JOIN min_amount mi
ON mi.accountnum = t.accountnum
If you want the data for just this year you could add a where clause to the end of the statement
WHERE t.date > '02-Jan-13'
The easiest way to do this is using window/analytic functions. These are ANSI standard and most databases support them (MySQL and Access being two notable exceptions).
Here is one way:
select t.accountnum, min_amount, max_amount,
min(case when amount = min_amount then date end) as min_amount_date,
min(case when amount = min_amount then date end) as max_amount_date,
from (Select t.*,
min(amount) over (partition by accountnum) as min_amount,
max(amount) over (partition by accountnum) as max_amount
from table t
where date > '02-Jan-13'
) t
group by accountnum, min_amount, max_amount;
order by accountnum
The subquery calculates the minimum and maximum amount for each account, using min() as a window function. The outer query selects these values. It then uses conditional aggregation to get the first date when each of those values occurred.
;with cte as
(
select accountnum, ammount, date,
row_number() over (partition by accountnum order by ammount desc) rn,
max(ammount) over (partition by accountnum) maxamount,
min(ammount) over (partition by accountnum) minamount
from table
where date > '20130102'
)
select accountnum,
ammount as amount,
date as date_of_max_amount,
minamount,
maxamount
from cte where rn = 1