First users by categories in BigQuery

First users by categories in BigQuery - sql

How can I count the new and existing users by categories and years?
For instance, during 2015-2020 if someone bought a product in category_A in 2016 first, it will be counted as a new uesr in 2016 in category_A although this user bought a product in category_B in 2015.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow

One approach uses two levels of aggregation:
select extract(year from mindate) yr, category, count(*) num_new
from (
select user_id, category, min(date) mindate
from table_1
group by user_id, category
) t
group by extract(year from mindate)
The subquery retrieves the first purchase date of each user by category. Then, the outer query aggregates by the year of that date.
If you want the count of current users as well, then it is a bit different. You can use a window function in the subquery rather than aggregation, then count distinct values in the outer query:
select extract(year from mindate) yr, category,
countdistinctif(user_id, date = mindate) num_new,
countdistinct(user_id) num_total
from (
select date, user_id, category, min(date) over(partition by user_id, category) mindate
from table_1
) t
group by extract(year from mindate)

Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT *,
0 = COUNT(1) OVER(
PARTITION BY user_id, category
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) new_user
FROM `project.dataset.table_1`
ORDER BY date, user_id
)
SELECT EXTRACT(YEAR FROM date) AS year,
category,
COUNT(DISTINCT IF(new_user, user_id, NULL)) AS num_new,
COUNT(DISTINCT IF(new_user, NULL, user_id)) AS num_existing
FROM temp
GROUP BY year, category

Related

Top 2 per month in SQL

I have this dataset, which has dates and products for cities:
CREATE TABLE my_table (
the_id varchar(5) NOT NULL,
the_date timestamp NOT NULL,
the_city varchar(5) NOT NULL,
the_product varchar(1) NOT NULL
);
INSERT INTO my_table
VALUES ('VIS01', '2019-05-02 09:00:00','LISBO','A'),
('VIS02', '2019-05-04 12:00:00','EVORA','A'),
('VIS03', '2019-05-05 18:00:00','LISBO','B'),
('VIS04', '2019-05-06 18:30:00','PORTO','B'),
('VIS05', '2019-05-15 12:05:00','PORTO','C'),
('VIS06', '2019-06-02 18:06:00','EVORA','C'),
('VIS07', '2019-06-02 18:07:00','PORTO','A'),
('VIS08', '2019-06-04 18:08:00','EVORA','B'),
('VIS09', '2019-06-07 18:09:00','LISBO','B'),
('VIS10', '2019-06-09 18:10:00','LISBO','D'),
('VIS11', '2019-06-12 18:11:00','EVORA','D'),
('VIS12', '2019-06-15 18:12:00','LISBO','E'),
('VIS13', '2019-06-15 18:13:00','EVORA','F'),
('VIS14', '2019-06-18 18:14:00','PORTO','G'),
('VIS15', '2019-06-23 18:15:00','LISBO','A'),
('VIS16', '2019-06-25 18:16:00','LISBO','A'),
('VIS17', '2019-06-27 18:17:00','LISBO','F'),
('VIS18', '2019-06-27 18:18:00','LISBO','A'),
('VIS19', '2019-06-28 18:19:00','LISBO','A'),
('VIS20', '2019-06-30 18:20:00','EVORA','D'),
('VIS21', '2019-07-01 18:21:00','EVORA','D'),
('VIS22', '2019-07-04 18:30:00','EVORA','D'),
('VIS23', '2019-07-04 18:31:00','EVORA','B'),
('VIS24', '2019-07-06 18:40:00','EVORA','K'),
('VIS25', '2019-07-12 18:50:00','EVORA','G'),
('VIS26', '2019-07-15 18:00:00','PORTO','C'),
('VIS27', '2019-07-18 18:00:00','PORTO','C'),
('VIS28', '2019-07-25 18:00:00','PORTO','B'),
('VIS29', '2019-07-30 18:00:00','PORTO','M');
And I want the top two per month. The expected result should be:
month product count
2019-05 A 2
2019-05 B 2
2019-06 A 5
2019-06 D 3
2019-07 C 2
2019-07 D 2
But I'm not quite sure how to group by month. Please, any help will be greatly appreciated.

First, you can use to_char(the_date,'YYYY-MM') to get the year and month in the right format.
Next, you can use count(*) to group by the month and product, and row_number() to give a sequence number to each row in the groups.
SELECT to_char(the_date,'YYYY-MM') as month,
the_product as product,
count(*) as p_count,
row_number() over (partition by to_char(the_date,'YYYY-MM') order by count(*) desc) as seq
FROM my_table
group by month, product
Last, you can wrap that in an outer query to select just the columns and rows that you want.
SELECT month, product, p_count as count
FROM (
SELECT to_char(the_date,'YYYY-MM') as month,
the_product as product,
count(*) as p_count,
row_number() over (partition by to_char(the_date,'YYYY-MM') order by count(*) desc) as seq
FROM my_table
group by month, product
) as foo
where foo.seq <= 2;

You can use aggregation and window functions:
select mp.*
from (select date_trunc('month', the_date) as yyyymm,
the_product, count(*) as cnt,
row_number() over (partition by date_trunc('month', the_date) order by count(*) desc) as seqnum
from my_table
group by yyyymm, the_product
) mp
where seqnum <= 2;

In postgresql, I believe you can extract every parts of the timestamp using the Extract function.
e.g.:
SELECT the_date, EXTRACT(MONTH from the_date) as MONTH
the_date
MONTH
'2019-08-05'
08
that said, you can then group by Product, then Month, and Select the TOP 2
SELECT EXTRACT(MONTH from the_date) as month, the_product, count (*) FROM my_table
GROUP BY EXTRACT(MONTH from the_date), the_product
ORDER BY count(*)
LIMIT 2
There might be some optimization to do since I don't have a Database to test the query, but it might give you a good start

First user by category

How can I count the new users for each category who bought in the category for the first by year? For instance, 2015-2020 by year, if someone bought in 2015 for the first it will be counted as a new uesr in 2015 but not in 2016-2020.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow

You’ll want to start with a sub query to get the first date each user purchased in the category. This is a pretty straightforward group by problem:
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category;
Next, you can use Postgres’s date_trunc function to group by year and category, using your first query as a sub query:
select
category,
date_trunc('year', first_category_purchase)
count(*)
from (
select
user_id,
category,
min(date) as first_category_purchase
from my_table
group by user_id, category
) a
group by 1, 2;

In Postgres, one method is group by after a distinct on:
select date, count(*) as num_new_users
from (select distinct on (user_id, category) t.*
from t
order by user_id, category, date asc
) d
group by date
order by date;
If date is really a date and not a year, then you need something like to_char() or date_trunc() to convert it to a year.

SQL: Dividing daily data by a monthly index

I have daily transaction data that is a product of this query:
SELECT transaction_date ,
Merchant,
Amount
into transaction.table
FROM source.table
WHERE (DESCRIPTION iLIKE '%Criteria%')
The field transaction_date is in the format of DATE (yyyy-MM-dd).
What I would like to do is take each row/transaction in transaction.table and divide Amount by a value tied to its RESPECTIVE month (this is key) contained in a separate table called Calendar.
The separate table called Calendar is queried from the same source.table as below:
select month,count(*) as distinct_month
into source.Calendar
from
(
select Population, to_char(optimized_transaction_date, 'YYYY-MM') as month
FROM source.table
group by Population, to_char(optimized_transaction_date, 'YYYY-MM')
)
group by month
My goal is to get a value for each day: Amount / distinct_month.
The key part is matching the daily data (transaction_date) in the first query with the monthly data in the second query (month).
Note that month from second query is a varchar whereas transact_date in first query is DATE.

I think you want something like this:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT transaction_date, Merchant, Amount, Description,
(Amount / count(distinct population) over (partition by to_char(transaction_date, 'YYYY-MM')
) as newval
FROM source.table
) t
WHERE DESCRIPTION iLIKE '%Criteria%';
You only need the subquery because the total is calculated over all the data, without the filter condition.
EDIT:
Oops, I forgot that Postgres doesn't support COUNT(DISTINCT) as a window function. So do:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT t.*,
(Amount / SUM( (seqnum = 1)::int) OVER (partition by to_char(transaction_date, 'YYYY-MM') )
) as newval
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY partition by to_char(transaction_date, 'YYYY-MM'), population ORDER BY population) as seqnum
FROM source.table t
) t
) t
WHERE DESCRIPTION iLIKE '%Criteria%';

Tag first observation for a user over products

I have a data table with the columns user_id, purchase_date and a date column in standard yyyy-mm-dd form.
Users in this table purchase multiple items of the same product (and different products) in the same month, so I needed to be able to capture the first time that they bought a particular product and then count each product by month.
I did it with the following:
SELECT yr, mo, COUNT(DISTINCT(CASE WHEN Product = 'product_a'
THEN user_id)) AS product_a
FROM
(
SELECT YEAR(min(purchase_date)) AS yr, MONTH(min(pruchase_date)) AS mo,
DAY(min(purchase_date)) AS dy, user_id, Product
FROM daily_purchases
GROUP BY user_id, Product
) b
GROUP BY yr, mo
ORDER BY yr, mo
This seems to work fine and capture what I am looking for. Does anyone have any suggestions - or is this the most appropriate way to go about it? Thanks!

I don't have any source data to test ... but here is how I would approach it ... this may need some tweaking as I have not tested it with source data:
Select [yr], [mo], [user_id], [firstBuy], [cntBuys] From
(
SELECT [yr], [mo], [user_id],
ROW_NUMBER() over (partition by user_id order by purchase_date) as 'firstBuy'
,COUNT(*) over (partition by user_id, Product) as 'cntBuys'
FROM daily_purchases
) a
Where a.firstBuy = 1
Group by a.yr, a.mo

Finding a date with the largest sum

I have a database of transactions, accounts, profit/loss, and date. I need to find the dates which the largest profit occurs by account. I have already found a way to find these actually max/min values but I can't seem to be able to pull the actual date from it. My code so far is like this:
Select accountnum, min(ammount)
from table
where date > '02-Jan-13'
group by accountnum
order by accountnum
Ideally I would like to see account num, the min or max, and then the date which this occurred on.

Try something like this to get the min and max amount for each customer and the date it happened.
WITH max_amount as (
SELECT accountnum, max(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
),
min_amount as (
SELECT accountnum, min(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
)
SELECT t.accountnum, ma.amount, ma.date, mi.amount, ma.date
FROM table t
JOIN max_amount ma
ON ma.accountnum = t.accountnum
JOIN min_amount mi
ON mi.accountnum = t.accountnum
If you want the data for just this year you could add a where clause to the end of the statement
WHERE t.date > '02-Jan-13'

The easiest way to do this is using window/analytic functions. These are ANSI standard and most databases support them (MySQL and Access being two notable exceptions).
Here is one way:
select t.accountnum, min_amount, max_amount,
min(case when amount = min_amount then date end) as min_amount_date,
min(case when amount = min_amount then date end) as max_amount_date,
from (Select t.*,
min(amount) over (partition by accountnum) as min_amount,
max(amount) over (partition by accountnum) as max_amount
from table t
where date > '02-Jan-13'
) t
group by accountnum, min_amount, max_amount;
order by accountnum
The subquery calculates the minimum and maximum amount for each account, using min() as a window function. The outer query selects these values. It then uses conditional aggregation to get the first date when each of those values occurred.

;with cte as
(
select accountnum, ammount, date,
row_number() over (partition by accountnum order by ammount desc) rn,
max(ammount) over (partition by accountnum) maxamount,
min(ammount) over (partition by accountnum) minamount
from table
where date > '20130102'
)
select accountnum,
ammount as amount,
date as date_of_max_amount,
minamount,
maxamount
from cte where rn = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

First users by categories in BigQuery - sql

Related

Top 2 per month in SQL

First user by category

SQL: Dividing daily data by a monthly index

Tag first observation for a user over products

Finding a date with the largest sum

Categories

Resources