SQL Group By for quarterly dates - sql

transaction_date is in a date format.
What I'm actually trying to output is the COUNT DISTINCT of Unique_ID by quarter (i.e., how many times did a Unique_Id appear in a given quarter).
SELECT transaction_date ,
UNIQUE_ID,
FROM panel
WHERE (some criteria = 'x')
GROUP BY UNIQUE_ID

try this :
SELECT datepart(quarter,transaction_date),
count(distinct UNIQUE_ID) as cnt
FROM panel
WHERE (some criteria = 'x')
GROUP BY datepart(quarter,p.transaction_date)
but the count(distinct) will do a sort so it will take you a lot of time. so you can distinct it first in the table then do the count
SELECT datepart(quarter,p.transaction_date),
count(p.UNIQUE_ID) as cnt
FROM (select distinct transaction_date as transaction_date, UNIQUE_ID
from panel) as p
WHERE (some criteria = 'x')
GROUP BY datepart(quarter,p.transaction_date)

I'd use date_trunc:
select
date_trunc ('quarter', transaction_date), count (distinct unique_id)
from panel
where criteria = 'x'
group by 1
This presupposes that when you say "by quarter" that 1Q2015 is different than 1Q2014.

SELECT DATEPART(QUARTER, transaction_date) ,
COUNT(DISTINCT UNIQUE_ID),
FROM panel
GROUP BY transaction_date

Related

How to pull a list of all visitor_ids that generated more than $500 combined in their first two sessions in the month of January 2020?

Tables:
Sessions
session_ts
visitor_id
vertical
session_id
Transactions
session_ts
session_id
rev_bucket
revenue
Currently have the following query (using SQLite):
SELECT
s.visitor_id,
sub.session_id,
month,
year,
total_rev,
CASE
WHEN (row_num IN (1,2) >= total_rev >= 500) THEN 'Yes'
ELSE 'No' END AS High_Value_Transactions,
sub.row_num
FROM
sessions s
JOIN
(
SELECT
s.visitor_id,
t.session_id,
strftime('%m',t.session_ts) as month,
strftime('%Y',t.session_ts) as year,
SUM(t.revenue) as total_rev,
row_number() OVER(PARTITION BY s.visitor_id ORDER BY s.session_ts) as row_num
FROM
Transactions t
JOIN
sessions s
ON
s.session_id = t.session_id
WHERE strftime('%m',t.session_ts) = '01'
AND strftime('%Y',t.session_ts) = '2020'
GROUP BY 1,2
) sub
ON
s.session_id = sub.session_id
WHERE sub.row_num IN (1,2)
ORDER BY 1
I'm having trouble identifying the first two sessions that combine for $500.
Open to any feedback and simplifying of query. Thanks!
You can use window functions and aggregation:
select visitor_id, sum(t.revenue) total_revenue
from (
select
s.visitor_id,
t.revenue,
row_number() over(partition by s.visitor_id order by t.session_ts) rn
from transactions t
inner join sessions s on s.session_id = t.session_id
where t.session_ts >= '2020-01-01' and t.session_ts < '2020-02-01'
) t
where rn <= 2
group by visitor_id
having sum(t.revenue) >= 500
The subquery joins the two tables, filters on the target month (note that using half-open interval predicates is more efficient than applying date functions on the date column), and ranks each row within groups of visits of the same customer.
Then, the outer query filters on the first two visits per visitor, aggregates by visitor, computes the corresponding revenue, and filters it with a having clause.

SQL count new values only with partition by - running count with no duplicates

Based on table below in Presto I need a column for all new 'rid'. What I managed to do is the same what I can achieve with partition by but it's not exactly what I'm looking for (db<>fiddle demo).
Goal is to have many groupings counts but I think this should describe problem sufficiently.
I need data truncated by days and column for new users every day as shown at example below. In simple words - if value repeats don't count it. I've tried to find correlation between this and relational division problem but I just stuck.
You could use row_number() to rank the records of each rid by time; then you can aggregate and count in only the top record per group.
select
date_trunc(day, t.time) dy,
count(*) rid_count,
sum(case when t.rn = 1 then 1 else 0 end) new_rid_count
from (
select
t.*
row_number() over(partition by t.rid order by t.time) rn
from mytable t
) t
group by date_trunc(day, t.time)
I think of this as two levels of aggregation. The inner one to get the earliest date. The outer to aggregate:
select first_day, count(*)
from (select rid, date_trunc('day', min(time))::date as first_day
from orders o
group by rid
) r
group by 1

Find rows with similar date values

I want to find customers where for example, system by error registered duplicates of an order.
It's pretty easy, if reg_date is EXACTLY the same but I have no idea how to implement it in query to count as duplicate if for example there was up to 1 second difference between transactions.
select * from
(select customer_id, reg_date, count(*) as cnt
from orders
group by 1,2
) x where cnt > 1
Here is example dataset:
https://www.db-fiddle.com/f/m6PhgReSQbVWVZhqe8n4mi/0
CUrrently only customer's 104 orders are counted as duplicates because its reg_date is identical, I want to count also orders 1,2 and 4,5 as there's just 1 second difference
demo:db<>fiddle
SELECT
customer_id,
reg_date
FROM (
SELECT
*,
reg_date - lag(reg_date) OVER (PARTITION BY customer_id ORDER BY reg_date) <= interval '1 second' as is_duplicate
FROM
orders
) s
WHERE is_duplicate
Use the lag() window function. It allows to have a look hat the previous record. With this value you can do a diff and filter the records where the diff time is more than one second.
Try this following script. This will return you day/customer wise duplicates.
SELECT
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy') reg_date,
customer_id,
count(*) as cnt
FROM orders
GROUP BY
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy'),
customer_id
HAVING count(*) >1

refering to field out of subeselects scope

I'm working on a piece of SQL at the moment and i need to retrieve every row of a dataset with a median and an average aggregated in it.
Example
i have the following set
ID;month;value
and i would like to retrieve something like :
ID;month;value;average for this month;median for this month
without having to group by my result.
So it would be something like :
SELECT ID,month,value,
(SELECT AVG(value) FROM myTable) as "myAVG"
FROM myTable
but i would need that average to be the average for that month specifically. So, rows where the month="January" will have the average and median for "January" etc ...
Issue here is that i did not find a way to refer to the value of month in my subquery
(SELECT AVG(value) FROM myTable)
Does someone have a clue?
P.S: It's a redshift database i'm working on.
You would need to select all rows from the table, and do a left join with a select statement that does group by month. This way, you would get every row, and the group by results with them for that month.
Something like this:
SELECT * FROM myTable a
LEFT JOIN
(
SELECT Month, Sum(value being summed) as mySum
FROM myTable
GROUP BY Month
) b
ON a.Month = b.Month
Helpful?
with myavg as
(SELECT month, AVG(value) as avgval FROM myTable group by month)
, mymed as
(select month, median(value) as medval from myTable group by month)
select ID, month, value, ma.avgval, mm.medval
from mytable m left join myavg ma
on m.month = ma.month
left join mymed mm
on m.month = mm.month
You can use a cte to do this. However, you need a group by on month, as you are calculating an aggregate value.
In Redshift you can use Window Function.
select month,
avg(value) over
(PARTITION BY month rows unbounded preceding) as avg
from myTable
order by 1;

Finding a date with the largest sum

I have a database of transactions, accounts, profit/loss, and date. I need to find the dates which the largest profit occurs by account. I have already found a way to find these actually max/min values but I can't seem to be able to pull the actual date from it. My code so far is like this:
Select accountnum, min(ammount)
from table
where date > '02-Jan-13'
group by accountnum
order by accountnum
Ideally I would like to see account num, the min or max, and then the date which this occurred on.
Try something like this to get the min and max amount for each customer and the date it happened.
WITH max_amount as (
SELECT accountnum, max(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
),
min_amount as (
SELECT accountnum, min(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
)
SELECT t.accountnum, ma.amount, ma.date, mi.amount, ma.date
FROM table t
JOIN max_amount ma
ON ma.accountnum = t.accountnum
JOIN min_amount mi
ON mi.accountnum = t.accountnum
If you want the data for just this year you could add a where clause to the end of the statement
WHERE t.date > '02-Jan-13'
The easiest way to do this is using window/analytic functions. These are ANSI standard and most databases support them (MySQL and Access being two notable exceptions).
Here is one way:
select t.accountnum, min_amount, max_amount,
min(case when amount = min_amount then date end) as min_amount_date,
min(case when amount = min_amount then date end) as max_amount_date,
from (Select t.*,
min(amount) over (partition by accountnum) as min_amount,
max(amount) over (partition by accountnum) as max_amount
from table t
where date > '02-Jan-13'
) t
group by accountnum, min_amount, max_amount;
order by accountnum
The subquery calculates the minimum and maximum amount for each account, using min() as a window function. The outer query selects these values. It then uses conditional aggregation to get the first date when each of those values occurred.
;with cte as
(
select accountnum, ammount, date,
row_number() over (partition by accountnum order by ammount desc) rn,
max(ammount) over (partition by accountnum) maxamount,
min(ammount) over (partition by accountnum) minamount
from table
where date > '20130102'
)
select accountnum,
ammount as amount,
date as date_of_max_amount,
minamount,
maxamount
from cte where rn = 1