Oracle: Count of New Users from Transactions table - sql

I have below table
Table: Sales
I Need to write Oracle SQL query to identify new customers each day,
when a customer (Customer-X) do 1 new transaction on a specific date(Date-X), customer is considered as new customer only for this date(Date-X), if same customer (Customer-X) makes other transactions after that(Date-Y), then this customer is not considered new customer for any date bigger than Date-X
this customer is only considered new Customer on Date-X and counted 1 time only
I also need to generate the summary result
which is the count of new customers for each day

You can use window functions:
select s.*,
(case when row_number() over (partition by user_id order by s_date) = 1
then 'Yes' else 'No'
end) as new_customer
from sales s;
You can then aggregate for the summary:
select s_date, sum(case when new_customer = 'Yes' then 1 else 0 end)
from (select s.*,
(case when row_number() over (partition by user_id order by s_date) = 1
then 'Yes' else 'No'
end) as new_customer
from sales s
) s
group by s_date

Related

Find the count by conditioned group

I have a table on hand:
order_id, order_status (2=completed, 1=canceled), client_id, fee
I need to find how many client completed 1 order, and how many client completed more than 1 order.
I tried case as below but no clue:
select client_id, count(case order_status when '2' then 'completed' end) as cnt_completed, sum(fee)
from order
where cnt_completed = 1;
Are there any straightforward ways to get the numbers of clients who completed 1 order and numbers of clients who completed more than 1 order, along with the sum of fee for each group? Thanks.
Use two levels of aggregation to get this broken out by the number of orders:
select cnt, count(*), sum(sum_fee)
from (select client_id, count(*) as cnt, sum(fee) as sum_fee
from order o
where order_status = '2'
group by client_id
) o
group by cnt;
For just two rows, you can use:
select (case when cnt = 1 then 1 else 2 end), count(*), sum(sum_fee)
from (select client_id, count(*) as cnt, sum(fee) as sum_fee
from order o
where order_status = '2'
group by client_id
) o
group by (case when cnt = 1 then 1 else 2 end);

Advanced SQL with window function

I have Table a(Dimension table) and Table B(Fact table) stores transaction shopper history.
Table a : shopped id(surrogate key) created for unique combination(any of column 2,colum3,column4 repeated it will have same shopper id)
Table b is transaction data.
I am trying to identify New customers and repeated customers for each week, expected output is below.
I am thinking following SQL Statement
Select COUNT(*) OVER (PARTITION BY shopperid,weekdate) as total_new_shopperid for Repeated customer,
for Identifying new customer(ie unique) in same join condition, I am stuck on window function..
thanks,
Sam
You can use the DENSE_RANK analytical function along with aggregate function as follows:
SELECT WEEK_DATE,
COUNT(DISTINCT CASE WHEN DR = 1 THEN SHOPPER_ID END) AS TOTAL_NEW_CUSTOMER,
SUM(CASE WHEN DR = 1 THEN AMOUNT END) AS TOTAL_NEW_CUSTOMER_AMT,
COUNT(DISTINCT CASE WHEN DR > 1 THEN SHOPPER_ID END) AS TOTAL_REPEATED_CUSTOMER,
SUM(CASE WHEN DR > 1 THEN AMOUNT END) AS TOTAL_REPEATED_CUSTOMER_AMT
FROM
(
select T.*,
DENSE_RANK() OVER (PARTITION BY SHOPPER_ID ORDER BY WEEK_DATE) AS DR
FROM YOUR_TABLE T);
GROUP BY WEEK_DATE;
Cheers!!
Tejash's answer is fine (and I'm upvoting it).
However, Oracle is quite efficient with aggregation, so two levels of aggregation might have better performance (depending on the data):
select week_date,
sum(case when min_week_date = week_date then 1 else 0 end) as new_shoppers,
sum(case when min_week_date = week_date then amount else 0 end) as new_shopper_amount,
sum(case when min_week_date > week_date then 1 else 0 end) as returning_shoppers,
sum(case when min_week_date > week_date then amount else 0 end) as returning_amount
from (select shopper_id, week_date,
sum(amount) as amount,
min(week_date) over (partition by shopper_id) as min_week_date
from t
group by shopper_id, week_date
) sw
group by week_date
order by week_date;
Note: If this has better performance, it is probably due to the elimination of count(distinct).

SQL - Dividing aggregated fields, very new to SQL

I have list of line items from invoices with a field that indicates if a line was delivered or picked up. I need to find a percentage of delivered items from the total number of lines.
SALES_NBR | Total | Deliveryrate
1 = Delivered 0 = picked up from FULFILLMENT.
SELECT SALES_NBR,
COUNT (ITEMS) as Total,
SUM (case when FULFILLMENT = '1' then 1 else 0 end) as delivered,
(SELECT delivered/total) as Deliveryrate
FROM Invoice_table
WHERE STORE IN '0123'
And SALE_DATE >='2020-02-01'
And SALE_DATE <='2020-02-07'
Group By SALES_NBR, Deliveryrate;
My query executes but never finishes for some reason. Is there any easier way to do this? Fulfillment field does not contain any NULL values.
Any help would be appreciated.
I need to find a percentage of delivered items from the total number of lines.
The simplest method is to use avg():
select SALES_NBR,
avg(fulfillment) as delivered_ratio
from Invoice_table
where STORE = '0123' and
SALE_DATE >='2020-02-01' and
SALE_DATE <='2020-02-07'
group by SALES_NBR;
I'm not sure if the group by sales_nbr is needed.
If you want to get a "nice" query, you can use subqueries like this:
select
qry.*,
qry.delivered/qry.total as Deliveryrate
from (
select
SALES_NBR,
count(ITEMS) as Total,
sum(case when FULFILLMENT = '1' then 1 else 0 end) as delivered
from Invoice_table
where STORE IN '0123'
and SALE_DATE >='2020-02-01'
and SALE_DATE <='2020-02-07'
group by SALES_NBR
) qry;
But I think this one, even being ugglier, could perform faster:
select
SALES_NBR,
count(ITEMS) as Total,
sum(case when FULFILLMENT = '1' then 1 else 0 end) as delivered,
sum(case when FULFILLMENT = '1' then 1 else 0 end)/count(ITEMS) as Deliveryrate
from Invoice_table
where STORE IN '0123'
and SALE_DATE >='2020-02-01'
and SALE_DATE <='2020-02-07'
group by SALES_NBR

MSSQL Group by and Select rows from grouping

I'm trying to figure out if what I'm trying to do is possible. Instead of resorting to multiple queries on a table, I wanted to group the records by business date and id then group by the id and select one date for a field and another date for the other field.
SELECT
*
{AMOUNT FROM DATE}
{AMOUNT FROM OTHER DATE}
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
AS subquery
GROUP BY id
It seems that you're looking to do a pivot query. I usually use cross tabs for this. Based on the query you posted, it could look like:
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)AS subquery
GROUP BY id;
You could also use a CTE.
WITH CTE AS(
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
Or even be a rebel and do the operation directly.
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
However, some people have tested for performance and found that pre-aggregating can improve performance.
If I understand you correctly, then you're just trying to pivot, but only with two particular dates:
select id,
date1 = sum(iif(date = '2000-01-01', amount, null)),
date2 = sum(iif(date = '2000-01-02', amount, null))
from [table]
group by id

Hive rolling sum of data over date

I am working on Hive and am facing an issue with rolling counts. The sample data I am working on is as shown below:
and the output I am expecting is as shown below:
I tried using the following query but it is not returning the rolling count:
select event_dt,status, count(distinct account) from
(select *, row_number() over (partition by account order by event_dt
desc)
as rnum from table.A
where event_dt between '2018-05-02' and '2018-05-04') x where rnum =1
group by event_dt, status;
Please help me with this if some one has solved a similar issue.
You seem to just want conditional aggregation:
select event_dt,
sum(case when status = 'Registered' then 1 else 0 end) as registered,
sum(case when status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when status = 'suspended' then 1 else 0 end) as suspended,
sum(case when status = 'reactive' then 1 else 0 end) as reactive
from table.A
group by event_dt
order by event_dt;
EDIT:
This is a tricky problem. The solution I've come up with does a cross-product of dates and users and then calculates the most recent status as of each date.
So:
select a.event_dt,
sum(case when aa.status = 'Registered' then 1 else 0 end) as registered,
sum(case when aa.status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when aa.status = 'suspended' then 1 else 0 end) as suspended,
sum(case when aa.status = 'reactive' then 1 else 0 end) as reactive
from (select d.event_dt, ac.account, a.status,
max(case when a.status is not null then a.timestamp end) over (partition by ac.account order by d.event_dt) as last_status_timestamp
from (select distinct event_dt from table.A) d cross join
(select distinct account from table.A) ac left join
(select a.*,
row_number() over (partition by account, event_dt order by timestamp desc) as seqnum
from table.A a
) a
on a.event_dt = d.event_dt and
a.account = ac.account and
a.seqnum = 1 -- get the last one on the date
) a left join
table.A aa
on aa.timestamp = a.last_status_timestamp and
aa.account = a.account
group by d.event_dt
order by d.event_dt;
What this is doing is creating a derived table with rows for all accounts and dates. This has the status on certain days, but not all days.
The cumulative max for last_status_timestamp calculates the most recent timestamp that has a valid status. This is then joined back to the table to get the status on that date. Voila! This is the status used for the conditional aggregation.
The cumulative max and join is a work-around because Hive does not (yet?) support the ignore nulls option in lag().