Oracle SQL Create PDF from Data - sql

So I am trying to create a Probability Density Function from data in an Oracle SQL table through a SQL query. So consider the below table:
Name | Spend
--------------
Anne | 110
Phil | 40
Sue | 99
Jeff | 190
Stan | 80
Joe | 90
Ben | 100
Lee | 85
Now if I want to create a PDF from that data I need to count the number of times each customer spends with in a certain quanta (between 0 and 50 or between 50 and 100). An example graph would look something like this (forgive my poor ascii art):
5|
4| *
3| *
2| * *
1|* * * *
|_ _ _ _
5 1 1 2
0 0 5 0
0 0 0
So the axis are:
X-Axis: Is the buckets
Y-Axis: is the number of customers
I am currently using the Oracle SQL CASE function to determine whether the spend falls within the bucket and then summing the number of customers that do. However this is taking forever as it there are a couple of million records.
Any idea on how to do this effectively?
Thanks!

You can try using WIDTH_BUCKET function.
select bucket , count(name)
from (select name, spend,
WIDTH_BUCKET(spend, 0, 200, 4) bucket
from mytable
)
group by bucket
order by bucket;
Here I have divided the range 0 to 200 into 4 bucket. And the function assigns a bucket number to each value. You can group by this bucket and count how many reocrds fall in each bucket.
Demo here.
You can even display the actual bucket range.
select bucket,
cast(min_value + ((bucket-1) * (max_value-min_value)/buckets) as varchar2(10))
||'-'
||cast(min_value + ((bucket) * (max_value-min_value)/buckets) as varchar2(10)),
count(name) c
from (select name,
spend,
WIDTH_BUCKET(spend, min_value, max_value, buckets) bucket
from mytable)
group by bucket
order by bucket;
Sample here.

SELECT COUNT(*) y_axis,
X_AXIS
FROM
(SELECT COUNT(*)y_axis,
CASE
WHEN spend <= 50 THEN 50
WHEN spend < 100 AND spend > 50 THEN 100
WHEN spend < 150 AND spend >= 100 THEN 150
WHEN spend < 200 AND spend >= 150 THEN 200
END x_axis
FROM your_table
GROUP BY spend
)
GROUP BY X_AXIS;
y_axis x_axis
-----------------
4 100
1 50
1 200
2 150

Related

Vertica SQL for running count distinct and running conditional count

I'm trying to build a department level score table based on a deeper product url level score table.
Date is not consecutive
Not all urls got score updates at same day (independent to each other)
dist_url should be running count distinct (cumulative count distinct)
dist urls and urls score >=30 are both count distinct
What I have now is:
Date url Store Dept Page Score
10/1 a US A X 10
10/1 b US A X 30
10/1 c US A X 60
10/4 a US A X 20
10/4 d US A X 60
10/6 b US A X 22
10/9 a US A X 40
10/9 e US A X 10
Date Store Dept Page dist urls urls score >=30
10/1 US A X 3 2
10/4 US A X 4 3
10/6 US A X 4 2
10/9 US A X 5 2
I think the dist_url can be done by using window function, just not sure on query.
Current query is as below, but it's wrong since not cumulative count distinct:
SELECT
bm.AnalysisDate,
su.SoID AS Store,
su.DptCaID AS DTID,
su.PageTypeID AS PTID,
COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
FROM csn_seo.tblBotifyMetrics bm
INNER JOIN csn_seo.tblSEOURLs su
ON bm.SeoURLID = su.ID
WHERE su.DptCaID IS NOT NULL
AND su.DptCaID <> 0
AND su.PageTypeID IS NOT NULL
AND su.PageTypeID <> -1
AND bm.iscompliant = 1
GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;
Please let me know if anyone has any idea.
Based on your question, you seem to want two levels of logic:
select date, store, dept,
sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
from t
group by store, dept, page, url
) union all
(select store, dept, page, url, min(date) as date, 0, 1
from t
where score >= 30
group by store, dept, page, url
)
) t
group by date, store, dept, page;
I don't understand how your query is related to your question.
Try as I might, I don't get your output either:
But I think you can avoid UNION SELECTs - Does this do what you expect?
NULLS don't figure in COUNT DISTINCTs - and here you can combine an aggregate expression with an OLAP one ...
And Vertica has named windows to increase readability ....
WITH
input(Date,url,Store,Dept,Page,Score) AS (
SELECT DATE '2019-10-01','a','US','A','X',10
UNION ALL SELECT DATE '2019-10-01','b','US','A','X',30
UNION ALL SELECT DATE '2019-10-01','c','US','A','X',60
UNION ALL SELECT DATE '2019-10-04','a','US','A','X',20
UNION ALL SELECT DATE '2019-10-04','d','US','A','X',60
UNION ALL SELECT DATE '2019-10-06','b','US','A','X',22
UNION ALL SELECT DATE '2019-10-09','a','US','A','X',40
UNION ALL SELECT DATE '2019-10-09','e','US','A','X',10
)
SELECT
date
, store
, dept
, page
, SUM(COUNT(DISTINCT url) ) OVER(w) AS dist_urls
, SUM(COUNT(DISTINCT CASE WHEN score >=30 THEN url END)) OVER(w) AS dist_urls_gt_30
FROM input
GROUP BY
date
, store
, dept
, page
WINDOW w AS (PARTITION BY store,dept,page ORDER BY date)
;
-- out date | store | dept | page | dist_urls | dist_urls_gt_30
-- out ------------+-------+------+------+-----------+-----------------
-- out 2019-10-01 | US | A | X | 3 | 2
-- out 2019-10-04 | US | A | X | 5 | 3
-- out 2019-10-06 | US | A | X | 6 | 3
-- out 2019-10-09 | US | A | X | 8 | 4
-- out (4 rows)
-- out
-- out Time: First fetch (4 rows): 45.321 ms. All rows formatted: 45.364 ms

Those who listened to more than 10 mins each month in the last 6 months

I'm trying to figure out the count of users who listened to more than 10 mins each month in the last 6 months
We have this event: Song_stopped_listen and one attribute is session_progress_ms
Now I'm trying to see the monthly evolution of the count of this cohort over the last 6 months.
I'm using bigquery and this is the query I tried, but I feel that something is off semantically, but I couldn't put my finger on:
SELECT
CONCAT(CAST(EXTRACT(YEAR FROM DATE (timestamp)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (timestamp)) AS STRING)) AS date
,SUM(absl.session_progress_ms/(1000*60*10)) as total_10_ms, COUNT(DISTINCT u.id) as total_10_listeners
FROM ios.song_stopped_listen as absl
LEFT JOIN ios.users u on absl.user_id = u.id
WHERE absl.timestamp > '2018-05-01'
Group by 1
HAVING(total_10_ms > 1)
Please help figure out what I'm doing wrong here.
Thank you.
data Sample:
user_id | session_progress_ms | timestamp
1 | 10000 | 2017-10-10 14:34:25.656 UTC
What I want to have:
||Month-year | Count of users who listened to more than 10 mins
|2018-5 | 500
|2018-6 | 600
|2018-7 | 300
|2018-8 | 5100
|2018-9 | 4500
|2018-10 | 1500
|2018-11 | 1500
|2018-12 | 2500
Use multiple levels of aggregation:
select user_id
from (select ssl.user_id, timestamp_trunc(timestamp, month) as mon,
sum(ssl.session_progress_ms/(1000*60)) as total_minutes
from ios.song_stopped_listen as ssl
where date(ssl.timetamp) < date_trunc(current_date, month) and
date(ssl.timestamp) >= date_add(date_trunc(current_date, month) interval 6 month),
group by 1, 2
) u
where total_minutes >= 10
group by user_id
having count(*) = 6;
To get the count, just use this as a subquery with count(*).

SQL Query to Bucket Table Items

I am trying to bucket values within my table by the range they fall in, for example, if my table is the following:
course_name | current enrollment
course_1 | 10
course_2 | 200
course_3 | 500
I get the following result:
enrollment_range | courses
10 | 1
100 | 1
500 | 1
So far, I have the following:
SELECT
CASE
WHEN courses.current_enrollment >= 500 THEN 500
WHEN courses.current_enrollment >= 250 THEN 250
WHEN courses.current_enrollment >= 100 THEN 100
WHEN courses.current_enrollment >= 50 THEN 50
WHEN courses.current_enrollment >= 30 THEN 30
WHEN courses.current_enrollment >= 10 THEN 10
END enrollment_range, count() AS total
FROM courses
GROUP BY enrollment_range
ORDER BY enrollment_range ASC
but I end up with an extra result that is the total number of courses I have, so I get something like the following:
enrollment_range | courses
10 | 1
100 | 1
500 | 1
| 3
In you sql, you should use a group in the count. In my SQL server, I can produce the correct result using the following script :
SELECT
CASE
WHEN current_enrollment >= 500 THEN 500
WHEN current_enrollment >= 250 THEN 250
WHEN current_enrollment >= 100 THEN 100
WHEN current_enrollment >= 50 THEN 50
WHEN current_enrollment >= 30 THEN 30
WHEN current_enrollment >= 10 THEN 10
END as enrollment_range, t.course_name, t.count
FROM courses
join
( select Count(course_name) as count,course_name FROM courses group by course_name ) t
on courses.course_name = t.course_name
The extra result was the count of courses that did not fall within the specified brackets, in this case, courses with enrollment below 10.

SQL Summation with more than one Condition

I have a table like this
Link PeriodiD Debit Credit Project
1 49 - 200 1
1 49 200 - 2
1 49 100 0
1 50 50 - 1
2 49 - 600 0
I want a script to sum the debit and credit per link per period disregarding project.
so the answer should look like
Link PeriodiD TotalDebit TotalCredit
1 49 300 200
1 50 50 -
2 49 - 600
i have more than 60 periodID and more than 100 link.
Please assist to make such a script
Use a Group by with aggregate functions.
SELECT Link,
PeriodID,
SUM(TotalDebit) AS TotalDebit,
SUM(TotalCredit) AS TotalCredit
FROM tablename
GROUP BY Link, PeriodId;
This query might not always give the expected result if you can have NULL values, depending on the DBMS that you use. You can modify it like this to account for this situation:
SELECT Link,
PeriodID,
SUM(COALESCE(TotalDebit,0)) AS TotalDebit,
SUM(COALESCE(TotalCredit,0)) AS TotalCredit
FROM tablename
GROUP BY Link, PeriodId;

SQL Query to continuously bucket data

I have a table as follows:
Datetime | ID | Price | Quantity
2013-01-01 13:30:00 1 139 25
2013-01-01 13:30:15 2 140 25
2013-01-01 13:30:30 3 141 15
Supposing that I wish to end up with a table like this, which buckets the data into quantities of 50 as follows:
Bucket_ID | Max | Min | Avg |
1 140 139 139.5
2 141 141 141
Is there a simple query to do this? Data will constantly be added to the first table, it would be nice if it could somehow not recalculate the completed buckets of 50 and instead automatically start averaging the next incomplete bucket. Ideas appreciated! Thanks
You may try this solution. It should work even if "number" is bigger than 50 (but relying on fact that avg(number) < 50).
select
bucket_id,
max(price),
min(price),
avg(price)
from
(
select
price,
bucket_id,
(select sum(t2.number) from test t2 where t2.id <= t1.id ) as accumulated
from test t1
join
(select
rowid as bucket_id,
50 * rowid as bucket
from test) buckets on (buckets.bucket - 50) < accumulated
and buckets.bucket > (accumulated - number))
group by
bucket_id;
You can have a look at this fiddle http://sqlfiddle.com/#!7/4c63c/1 if it is what you want.