Get Decile of Values and # of records between deciles Presto SQL - sql

I have a table that looks like this
User ID
Income
1
4.00
2
5.00
1
7.00
3
10.00
4
80.00
1
40.00
5
7.00
6
4.00
I need a Presto SQL query that breaks the range of "Income" {eg.4.00-80.00} up into deciles irrespective of frequency of that "Income" value. I also need the # of unique "User ID" that falls beneath that decile (eg. 10th percentile -> X users, 20th percentile Y users).

You can calculate the decile for each user-income, and then join the cte with itself (to account for the repeated users, since count distinct over() is not allowed in Presto).
WITH user_deciles_cte AS(
SELECT user_id,
NTILE(10) OVER (ORDER BY income) AS deciles
FROM table
),
join_users_and_deciles_cte AS(
SELECT DISTINCT dec.deciles,
users.user_id
FROM user_deciles_cte dec
LEFT JOIN user_deciles_cte users
ON users.deciles <= dec.deciles
)
SELECT deciles,
COUNT(DISTINCT user_id) AS users
FROM join_users_and_deciles_cte
GROUP BY 1
ORDER BY 1 ASC

Related

I have a table of calls data I want to figure out the count Unique accounts called everyday and take sum of unique accounts called by monthly basis

I have a table with 2 unique columns one has an account number and the other is the date. The sample data is given below.
Date account
9/8/2020 555
9/8/2020 666
9/8/2020 777
9/8/2020 888
9/9/2020 555
9/9/2020 999
9/10/2020 555
9/10/2020 222
9/10/2020 333
9/11/2020 666
9/11/2020 111
I would like to calculate the number of unique accounts called every day and sum it up for a month for example if account number 555 is called on 8sept, p sept and 20 Sept its is not adding up to the cumulative sum the result should look like this
date Cumulative Unique Accounts Called SO Far this month
9/8/2020 4
9/9/2020 5
9/10/2020 7
9/11/2020 8
Thank you in advance for your help.
You can do this with aggregation and window functions. First, get the first date for each account, then aggregate and accumulate:
select min_date,
count(*) as as_of_date,
sum(count(*)) over (partition by year(min_datedate), month(min_datedate)
order by min_date
) as cumulative_unique_count
from (select account, min(date) as min_date
from t
group by account, year(date), month(date)
) t
group by min_date;
You can try the below -
with cte as
(
select date,count(*) as total from
(
select date,count,row_number() over(partition by count order by date) as rn
from tablename
)A where rn=1 group by date
)
select date,sum(total) over(order by date) as cum_sum
from cte

how to add sum of all the values in sql as a separate column

My table currently looks like this:
Partner Date Ad Unit Revenue
App 1/1/2020 x 10
App 1/1/202 y 3
I need the additional column with sum of all revenue for the day so it looks like the following
Partner Date Ad Unit Revenue Total Revenue
App 1/1/2020 x 10 13
App 1/1/2020 y 3 13
App 1/2/2020 x 2 6
App 1/2/20202 y 4 6
I have tried the following code, but in the output it no longer breaks the data by Ad Unit, which is what I want...
SELECT
`Date`,
`Ad Unit`,
`Partner`,
`Revenue`,
sum(`Revenue`) as `Total Revenue`
from
`master_table`
group by
`Date`
And the output now is
Partner Date Ad Unit Revenue Total Revenue
App 1/1/2020 x 10 13
How can I group the data so I have it broken down by Ad Unit and have a column for totals at the same time that is grouped by date?
You can use window functions:
SELECT mt.*, SUM(revenue) OVER (PARTITION BY date) as total_revenue
FROM `master_table` mt;
This might help you understand them.
You can also do this with a correlated subquery:
SELECT mt.*,
(SELECT SUM(mt2.revenue)
FROM master_table mt2
WHERE mt2.date = mt.date
) as total_revenue
FROM master_table mt;

Deciling by partitions in Teradata SQL

I have a table in Teradata which contains Sales Information per store pertaining to each region.
StoreID RegionID Sales
1 A 200
2 A 150
3 A 210
4 B 400
5 B 420
How can I find out the stores in top 2 deciles by sales for each region?
There's the QUANTILE function, but this is old deprecated syntax. The top 2 decile are the top 20 percent and you can simply use PERCENT_RANK for this:
QUALIFY
PERCENT_RANK()
OVER (PARTITION BY RegionID
ORDER BY Sales DESC) <= 0.2

SQL Server : count types with totals by date change

I need to count a value (M_Id) at each change of a date (RS_Date) and create a column grouped by the RS_Date that has an active total from that date.
So the table is:
Ep_Id Oa_Id M_Id M_StartDate RS_Date
--------------------------------------------
1 2001 5 1/1/2014 1/1/2014
1 2001 9 1/1/2014 1/1/2014
1 2001 3 1/1/2014 1/1/2014
1 2001 11 1/1/2014 1/1/2014
1 2001 2 1/1/2014 1/1/2014
1 2067 7 1/1/2014 1/5/2014
1 2067 1 1/1/2014 1/5/2014
1 3099 12 1/1/2014 3/2/2014
1 3099 14 2/14/2014 3/2/2014
1 3099 4 2/14/2014 3/2/2014
So my goal is like
RS_Date Active
-----------------
1/1/2014 5
1/5/2014 7
3/2/2014 10
If the M_startDate = RS_Date I need to count the M_id and then for
each RS_Date that is not equal to the start date I need to count the M_Id and then add that to the M_StartDate count and then count the next RS_Date and add that to the last active count.
I can get the basic counts with something like
(Case when M_StartDate <= RS_Date
then [m_Id] end) as Test.
But I am stuck as how to get to the result I want.
Any help would be greatly appreciated.
Brian
-added in response to comments
I am using Server Ver 10
If using SQL SERVER 2012+ you can use ROWS with your the analytic/window functions:
;with cte AS (SELECT RS_Date
,COUNT(DISTINCT M_ID) AS CT
FROM Table1
GROUP BY RS_Date
)
SELECT *,SUM(CT) OVER(ORDER BY RS_Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Run_CT
FROM cte
Demo: SQL Fiddle
If stuck using something prior to 2012 you can use:
;with cte AS (SELECT RS_Date
,COUNT(DISTINCT M_ID) AS CT
FROM Table1
GROUP BY RS_Date
)
SELECT a.RS_Date
,SUM(b.CT)
FROM cte a
LEFT JOIN cte b
ON a.RS_DAte >= b.RS_Date
GROUP BY a.RS_Date
Demo: SQL Fiddle
You need a cumulative sum, easy in SQL Server 2012 using Windowed Aggregate Functions. Based on your description this will return the expected result
SELECT p_id, RS_Date,
SUM(COUNT(*))
OVER (PARTITION BY p_id
ORDER BY RS_Date
ROWS UNBOUNDED PRECEDING)
FROM tab
GROUP BY p_id, RS_Date
It looks like you want something like this:
SELECT
RS_Date,
SUM(c) OVER (PARTITION BY M_StartDate ORDER BY RS_Date ROWS UNBOUNDED PRECEEDING)
FROM
(
SELECT M_StartDate, RS_Date, COUNT(DISTINCT M_Id) AS c
FROM my_table
GROUP BY M_StartDate, RS_Date
) counts
The inline view computes the counts of distinct M_Id values within each (M_StartDate, RS_Date) group (distinctness enforced only within the group), and the outer query uses the analytic version of SUM() to add up the counts within each M_StartDate.
Note that this particular query will not exactly reproduce your example results. It will instead produce:
RS_Date Active
-----------------
1/1/2014 5
1/5/2014 7
3/2/2014 8
3/2/2014 2
This is on account of some rows in your example data with RS_Date 3/2/2014 having a later M_StartDate than others. If this is not what you want then you need to clarify the question, which currently seems a bit inconsistent.
Unfortunately, analytic functions are not available until SQL Server 2012. In SQL Server 2010, the job is messier. It could be done like this:
WITH gc AS (
SELECT M_StartDate, RS_Date, COUNT(DISTINCT M_Id) AS c
FROM my_table
GROUP BY M_StartDate, RS_Date
)
SELECT
RS_Date,
(
SELECT SUM(c)
FROM gc2
WHERE gc2.M_StartDate = gc.M_StartDate AND gc2.RS_Date <= gc.RS_Date
) AS Active
FROM gc
If you are using SQL 2012 or newer you can use LAG to produce a running total.
https://msdn.microsoft.com/en-us/library/hh231256(v=sql.110).aspx

SQL - select top xx% rows

I have a table, sales, which is ordered by descending TotalSales
user_id | TotalSales
----------------------
4 10
2 1.5
5 0.99
3 0.5
1 0.33
What I would like to do is find the percentage of the sum of all sales that the xx% most important sales represent.
For example if I wanted to do it for top 40% sales, here I would get (10+1.5)/(10+1.5+0.99+0.5+0.33)= 86%
But right now I haven't been able to select "top xx% rows".
Edit: DB management system can be MySQL or Vertica or Hive
select Sum(a) as s from sales where a in (Select TotalSales from sales where TotalSales>=x)
GROUP BY a
select Sum(TotalSales) as b from sales group by b
your result is s/b
and x= the percentage you set each time