how to add sum of all the values in sql as a separate column - sql

My table currently looks like this:
Partner Date Ad Unit Revenue
App 1/1/2020 x 10
App 1/1/202 y 3
I need the additional column with sum of all revenue for the day so it looks like the following
Partner Date Ad Unit Revenue Total Revenue
App 1/1/2020 x 10 13
App 1/1/2020 y 3 13
App 1/2/2020 x 2 6
App 1/2/20202 y 4 6
I have tried the following code, but in the output it no longer breaks the data by Ad Unit, which is what I want...
SELECT
`Date`,
`Ad Unit`,
`Partner`,
`Revenue`,
sum(`Revenue`) as `Total Revenue`
from
`master_table`
group by
`Date`
And the output now is
Partner Date Ad Unit Revenue Total Revenue
App 1/1/2020 x 10 13
How can I group the data so I have it broken down by Ad Unit and have a column for totals at the same time that is grouped by date?

You can use window functions:
SELECT mt.*, SUM(revenue) OVER (PARTITION BY date) as total_revenue
FROM `master_table` mt;
This might help you understand them.
You can also do this with a correlated subquery:
SELECT mt.*,
(SELECT SUM(mt2.revenue)
FROM master_table mt2
WHERE mt2.date = mt.date
) as total_revenue
FROM master_table mt;

Related

Get Decile of Values and # of records between deciles Presto SQL

I have a table that looks like this
User ID
Income
1
4.00
2
5.00
1
7.00
3
10.00
4
80.00
1
40.00
5
7.00
6
4.00
I need a Presto SQL query that breaks the range of "Income" {eg.4.00-80.00} up into deciles irrespective of frequency of that "Income" value. I also need the # of unique "User ID" that falls beneath that decile (eg. 10th percentile -> X users, 20th percentile Y users).
You can calculate the decile for each user-income, and then join the cte with itself (to account for the repeated users, since count distinct over() is not allowed in Presto).
WITH user_deciles_cte AS(
SELECT user_id,
NTILE(10) OVER (ORDER BY income) AS deciles
FROM table
),
join_users_and_deciles_cte AS(
SELECT DISTINCT dec.deciles,
users.user_id
FROM user_deciles_cte dec
LEFT JOIN user_deciles_cte users
ON users.deciles <= dec.deciles
)
SELECT deciles,
COUNT(DISTINCT user_id) AS users
FROM join_users_and_deciles_cte
GROUP BY 1
ORDER BY 1 ASC

Subquery in BigQuery (JOIN on same Table)

I have a BigQuery table with this data
client spent balance date
A 20 500 2022-01-01
A 10 490 2022-01-02
A 50 440 2022-01-03
B 200 1000 1995-07-09
B 300 700 1998-08-11
B 100 600 2002-04-17
C 2 100 2021-01-04
C 10 90 2021-06-06
C 70 20 2021-10-07
I need the latest balance of each client based on the date:
client spent balance date
A 50 440 2022-01-03
B 100 600 2002-04-17
C 70 20 2021-10-07
distinct does not work like in sql and group on client does also not work because I need count, sum, etc. with the other columns when I use group.
For just one client I use:
SELECT balance FROM `table` WHERE client = "A" ORDER BY date DESC LIMIT 1.
But how can I get this data for every client in just one statement.
I tried with subselect
SELECT client,
(SELECT balance FROM ` table ` WHERE client = tb. client ORDER by date DESC limit 1) AS bal
FROM `table` AS tb;
and got the error:
Correlated subqueries that reference other tables are not supported
unless they can be de-correlated, such as by transforming them into an
efficient JOIN.
I don’t know how to make a JOIN out of this subquery to make it work.
Hope you have an idea.
Use below
select * from your_table
qualify 1 = row_number() over(partition by client order by date desc)
if applied to sample data in your question - output is
have you tried using row_number window function?
select client, spent, balance, date
from (
select client, spent, balance, date
, ROW_NUMBER() OVER (PARTITION BY client ORDER BY date DESC) AS row_num -- adding row number, starting from latest date
from table
)
where row_num = 1 -- filter out only the latest date

Count distinct customers who bought in previous period and not in next period Bigquery

I have a dataset in bigquery which contains order_date: DATE and customer_id.
order_date | CustomerID
2019-01-01 | 111
2019-02-01 | 112
2020-01-01 | 111
2020-02-01 | 113
2021-01-01 | 115
2021-02-01 | 119
I try to count distinct customer_id between the months of the previous year and the same months of the current year. For example, from 2019-01-01 to 2020-01-01, then from 2019-02-01 to 2020-02-01, and then who not bought in the same period of next year 2020-01-01 to 2021-01-01, then 2020-02-01 to 2021-02-01.
The output I am expect
order_date| count distinct CustomerID|who not buy in the next period
2020-01-01| 5191 |250
2020-02-01| 4859 |500
2020-03-01| 3567 |349
..........| .... |......
and the next periods shouldn't include the previous.
I tried the code below but it works in another way
with customers as (
select distinct date_trunc(date(order_date),month) as dates,
CUSTOMER_WID
from t
where date(order_date) between '2018-01-01' and current_date()-1
)
select
dates,
customers_previous,
customers_next_period
from
(
select dates,
count(CUSTOMER_WID) as customers_previous,
count(case when customer_wid_next is null then 1 end) as customers_next_period,
from (
select prev.dates,
prev.CUSTOMER_WID,
next.dates as next_dates,
next.CUSTOMER_WID as customer_wid_next
from customers as prev
left join customers
as next on next.dates=date_add(prev.dates,interval 1 year)
and prev.CUSTOMER_WID=next.CUSTOMER_WID
) as t2
group by dates
)
order by 1,2
Thanks in advance.
If I understand correctly, you are trying to count values on a window of time, and for that I recommend using window functions - docs here and here a great article explaining how it works.
That said, my recommendation would be:
SELECT DISTINCT
periods,
COUNT(DISTINCT CustomerID) OVER 12mos AS count_customers_last_12_mos
FROM (
SELECT
order_date,
FORMAT_DATE('%Y%m', order_date) AS periods,
customer_id
FROM dataset
)
WINDOW 12mos AS ( # window of last 12 months without current month
PARTITION BY periods ORDER BY periods DESC
ROWS BETWEEN 12 PRECEEDING AND 1 PRECEEDING
)
I believe from this you can build some customizations to improve the aggregations you want.
You can generate the periods using unnest(generate_date_array()). Then use joins to bring in the customers from the previous 12 months and the next 12 months. Finally, aggregate and count the customers:
select period,
count(distinct c_prev.customer_wid),
count(distinct c_next.customer_wid)
from unnest(generate_date_array(date '2020-01-01', date '2021-01-01', interval '1 month')) period join
customers c_prev
on c_prev.order_date <= period and
c_prev.order_date > date_add(period, interval -12 month) left join
customers c_next
on c_next.customer_wid = c_prev.customer_wid and
c_next.order_date > period and
c_next.order_date <= date_add(period, interval 12 month)
group by period;

Firebird Query- Return first row each group

In a firebird database with a table "Sales", I need to select the first sale of all customers. See below a sample that show the table and desired result of query.
---------------------------------------
SALES
---------------------------------------
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
3 25 05/04/16 08:10
4 31 07/03/16 10:22
5 22 01/02/16 12:30
6 22 10/01/16 08:45
Result: only first sale, based on sale date.
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
4 31 07/03/16 10:22
6 22 10/01/16 08:45
I've already tested following code "Select first row in each GROUP BY group?", but it did not work.
In Firebird 2.5 you can do this with the following query; this is a minor modification of the second part of the accepted answer of the question you linked to tailored to your schema and requirements:
select x.id,
x.customerid,
x.dthrsale
from sales x
join (select customerid,
min(dthrsale) as first_sale
from sales
group by customerid) p on p.customerid = x.customerid
and p.first_sale = x.dthrsale
order by x.id
The order by is not necessary, I just added it to make it give the order as shown in your question.
With Firebird 3 you can use the window function ROW_NUMBER which is also described in the linked answer. The linked answer incorrectly said the first solution would work on Firebird 2.1 and higher. I have now edited it.
Search for the sales with no earlier sales:
SELECT S1.*
FROM SALES S1
LEFT JOIN SALES S2 ON S2.CUSTOMERID = S1.CUSTOMERID AND S2.DTHRSALE < S1.DTHRSALE
WHERE S2.ID IS NULL
Define an index over (customerid, dthrsale) to make it fast.
in Firebird 3 , get first row foreach customer by min sales_date :
SELECT id, customer_id, total, sales_date
FROM (
SELECT id, customer_id, total, sales_date
, row_number() OVER(PARTITION BY customer_id ORDER BY sales_date ASC ) AS rn
FROM SALES
) sub
WHERE rn = 1;
İf you want to get other related columns, This is where your self-answer fails.
select customer_id , min(sales_date)
, id, total --what about other colums
from SALES
group by customer_id
So simple as:
select CUSTOMERID min(DTHRSALE) from SALES group by CUSTOMERID

Mondrian MDX Last Element Aggregation

In TelCo industry is very important to know what was the customer status at some some point (end of week, month, etc).
So, I have SDC type II dimension with: customer_tk, customerID, status, date.
We use it custom reports to find what is state on some day (example):
Date = '2015-10-01'
Group Active Terminated Suspended Order
------------------------------------------------------
Group1 25 2 2 8
Group2 45 8 0 12
Group3 15 18 5 2
Group4 65 2 1 29
This is pivoted from query:
SELECT * FROM dim_customer
INNER JOIN (SELECT max(customer_tk) as maxId, customerId FROM dim_customer WHERE date<='2015-10-01' GROUP BY customerId) as maxCust
ON dim_customer.customer_tk = maxCust.maxId
And it works perfectly (date is parameter from report).
I want to put it in cube but how to create this type of join? I need cumulative count of customers
I tried with MDX Tail(filter(... )) expressions but didn't managed to get correct numbers.
So, basically, with no filters, it should return status = 8 for customer 29841 and status = 2 for customer 28425.
But if choose year = 2014, it should return status = 2 for both customers:
Thanks