SQL Query to Bucket Table Items - sql

I am trying to bucket values within my table by the range they fall in, for example, if my table is the following:
course_name | current enrollment
course_1 | 10
course_2 | 200
course_3 | 500
I get the following result:
enrollment_range | courses
10 | 1
100 | 1
500 | 1
So far, I have the following:
SELECT
CASE
WHEN courses.current_enrollment >= 500 THEN 500
WHEN courses.current_enrollment >= 250 THEN 250
WHEN courses.current_enrollment >= 100 THEN 100
WHEN courses.current_enrollment >= 50 THEN 50
WHEN courses.current_enrollment >= 30 THEN 30
WHEN courses.current_enrollment >= 10 THEN 10
END enrollment_range, count() AS total
FROM courses
GROUP BY enrollment_range
ORDER BY enrollment_range ASC
but I end up with an extra result that is the total number of courses I have, so I get something like the following:
enrollment_range | courses
10 | 1
100 | 1
500 | 1
| 3

In you sql, you should use a group in the count. In my SQL server, I can produce the correct result using the following script :
SELECT
CASE
WHEN current_enrollment >= 500 THEN 500
WHEN current_enrollment >= 250 THEN 250
WHEN current_enrollment >= 100 THEN 100
WHEN current_enrollment >= 50 THEN 50
WHEN current_enrollment >= 30 THEN 30
WHEN current_enrollment >= 10 THEN 10
END as enrollment_range, t.course_name, t.count
FROM courses
join
( select Count(course_name) as count,course_name FROM courses group by course_name ) t
on courses.course_name = t.course_name

The extra result was the count of courses that did not fall within the specified brackets, in this case, courses with enrollment below 10.

Related

Oracle SQL: how to call created columns (alias) for pivot tables in a subquery

This is my first question in this community. It has helped me a lot before, so thank you all for being out there!
I have a problem with ORACLE PLSQL, I'm trying to create a pivot table that counts the number of people that are in a given salary range. I want cities as rows and salary_range as columns. My problem is when I select a column alias for the pivot table.
In table A I have the rows of all employees and their salaries, and in table B, I have their city. Both of them are linked by a key column named id_dpto. First, I join both tables selecting employee names, salaries, and cities. Second, I use CASE WHEN to create the range of salaries (less than 1000 and between 1000 and 2500 dollars) and give it the column alias SALARY_RANGE. Until here, everything is ok and the code runs perfectly.
My problem is on the third step. I use a subquery and PIVOT command to create the pivot to count by cities and salary_range, but when I use the select command in the alias it doesn't work, my error message is "'F'.'SALARY_RANGE' INVALID IDENTIFYER". Can you help me what is the proper way to select a created column (salary_range) in a pivot table? I've tried both, with the F after the from and without it.
Initial data base
| Name | salary | city |
| ---- | ------ | ------ |
|john | 999 | NY |
|adam | 500 | NY |
|linda | 1500 | NY |
|Matt | 2000 | London |
|Joel | 1500 | London |
Desired result:
city
salary less than 1000
salary between 1000 and 2500
NY
2
1
London
0
2
My code:
SELECT F.SALARY_RANGE, F.CITY
FROM (SELECT A.NAMES,
A.SALARY,
C.CITY,
CASE
WHEN SALARY < 1000 THEN 'LESS THAN 1000'
WHEN SALARY < 2500 THEN 'BETWEEN 1000 AND 2500'
END AS SALARY_RANGE FROM EMPLOYEES A
LEFT JOIN XXX B ON A.ID_DPTO = B.ID_DPTO) F
PIVOT
(COUNT(SALARY_RANGE)
FOR SALARY_RANGE IN ('LESS THAN 1000', 'BETWEEN 1000 AND 2500')
)
Thanks for helping me!
I think you should use * and exclude SALARY and NAMES from subquery:
SELECT *
FROM (SELECT B.CITY,
CASE
WHEN SALARY < 1000 THEN
'LESS THAN 1000'
WHEN SALARY < 2500 THEN
'BETWEEN 1000 AND 2500'
END AS SALARY_RANGE
FROM EMPLOYEES A
LEFT JOIN XXX B
ON A.ID_DPTO = B.ID_DPTO) F
PIVOT(COUNT(SALARY_RANGE)
FOR SALARY_RANGE IN('LESS THAN 1000', 'BETWEEN 1000 AND 2500'))

Need to pick up the SUM OF Tax Amount for the highest Sequence number Per Year, Per SSN, Per employer

Consider Employee table:
Employerid ssn year Seqnumber q1taxamt q2taxamt q3taxamt q4taxamt
1004 101 2013 1 2000 0 0 0
1004 101 2013 2 2000 100 0 0
1004 101 2013 3 2000 100 200 0
1004 101 2013 4 2000 100 200 300
1004 102 2013 1 3000 0 0 0
1004 102 2013 2 3000 200 0 0
1004 102 2013 3 3000 200 300 0
1004 102 2013 4 3000 200 300 400
1004 102 2013 5 3000 200 300 400
Here the transformation rule is we need to pick the highest Seqnumber with respect to each ssn per year per
Employerid and the amounts.
i.e for 10004 for sum(q1taxamt) is 2000 +3000 = 5000
The Logic is ssn 101 has highest seq number of 4 and ssn 102 has highest seq number of 5 so we need to pick those values wrt to employerid
Example:
Want to check for q1taxamt: 2000 +3000 = 5000
Want to check for q4taxamt: 300 +400 = 700
output must be:
Employerid YEAR q1taxamt q2taxamt q3taxamt q4taxamt
10004 2013 5000 300 500 700
The below query is generating wrong result:
Select
Sum(E1.q1taxamt) q1taxamt,
Sum(E1.q2taxamt) q2taxamt,
Sum(E1.q3taxamt) q3taxamt,
Sum(E1.q4taxamt) q4taxamt,
E1.Employerid,
E1.YEAR
from Employee E1
join
(
select
E.Employerid,
MAX(E.seqnumber) seqnumber,
E.YEAR
from Employee E
group by E.Employerid,E.SSn,E.year
)E2
on E1.Employerid=E2.Employerid
AND E1.YEAR=E2.YEAR
and E1.seqnumber=E2.Taxseqnumber
Just use row_number():
select e.*
from (select e.*,
row_number() over (partition by E.Employerid, E.SSn, E.year
order by e.seqnumber desc
) as seqnum
from Employee e
) e
where seqnum = 1;
For best performance, you want an index on Employee(EmployerId, SSN, seqnumber desc).
You missing SSN join predicate between E1 and E2 thats why you are getting wrong result. I think this might be faster than Row_Number method.
Select
Sum(E1.q1taxamt) q1taxamt,
Sum(E1.q2taxamt) q2taxamt,
Sum(E1.q3taxamt) q3taxamt,
Sum(E1.q4taxamt) q4taxamt,
E1.Employerid,
E1.YEAR
from Employee E1
join
(
select
E.Employerid,
E.SSn,
MAX(E.seqnumber) seqnumber,
E.YEAR
from Employee E
group by E.Employerid,E.SSn,E.year
)E2
on E1.Employerid=E2.Employerid
AND E1.YEAR=E2.YEAR
AND E1.SSN = E2.SSN --Here
and E1.seqnumber=E2.Taxseqnumber

Calculating the percentage of a GROUP_BY with a WHERE statement

Let's say I have a table with orders with revenue and status columns. I want to group the orders by revenue group (grouped by increments of 10) and get the percentage of which have their status column set to 1 in their respective revenue group. I thought a window function was the way to go, but the where statement restricts the columns so that I end up with only the columns where status == 1.
The end result would look something like: 10 | 76%, 20 | 50% etc.
SELECT CASE
WHEN revenue between 1 and 10 then 10
WHEN revenue between 10 and 20 then 20
WHEN revenue between 20 and 30 then 30
WHEN revenue between 30 and 40 then 40
WHEN revenue between 40 and 50 then 50
else 60
END as revgroup,
COUNT(*) / CAST(SUM(count(*)) over (partition by CASE
WHEN revenue between 1 and 10 then 10
WHEN revenue between 10 and 20 then 20
WHEN revenue between 20 and 30 then 30
WHEN revenue between 30 and 40 then 40
WHEN revenue between 40 and 50 then 50
else 60 END) as float) as percentage
from "order"
where "order".status = 1
group by revgroup
the clause PARTITION BY is excessive in your case, each partition was created using clause GROUP BY
SELECT CASE
WHEN revenue between 1 and 10 then 10
WHEN revenue between 10 and 20 then 20
WHEN revenue between 20 and 30 then 30
WHEN revenue between 30 and 40 then 40
WHEN revenue between 40 and 50 then 50
else 60
END as revgroup,
COUNT(*) * 1.0 / SUM(COUNT(*)) OVER () as percentage
from "order"
where "order".status = 1
group by revgroup

SQL Summation with more than one Condition

I have a table like this
Link PeriodiD Debit Credit Project
1 49 - 200 1
1 49 200 - 2
1 49 100 0
1 50 50 - 1
2 49 - 600 0
I want a script to sum the debit and credit per link per period disregarding project.
so the answer should look like
Link PeriodiD TotalDebit TotalCredit
1 49 300 200
1 50 50 -
2 49 - 600
i have more than 60 periodID and more than 100 link.
Please assist to make such a script
Use a Group by with aggregate functions.
SELECT Link,
PeriodID,
SUM(TotalDebit) AS TotalDebit,
SUM(TotalCredit) AS TotalCredit
FROM tablename
GROUP BY Link, PeriodId;
This query might not always give the expected result if you can have NULL values, depending on the DBMS that you use. You can modify it like this to account for this situation:
SELECT Link,
PeriodID,
SUM(COALESCE(TotalDebit,0)) AS TotalDebit,
SUM(COALESCE(TotalCredit,0)) AS TotalCredit
FROM tablename
GROUP BY Link, PeriodId;

Oracle SQL Create PDF from Data

So I am trying to create a Probability Density Function from data in an Oracle SQL table through a SQL query. So consider the below table:
Name | Spend
--------------
Anne | 110
Phil | 40
Sue | 99
Jeff | 190
Stan | 80
Joe | 90
Ben | 100
Lee | 85
Now if I want to create a PDF from that data I need to count the number of times each customer spends with in a certain quanta (between 0 and 50 or between 50 and 100). An example graph would look something like this (forgive my poor ascii art):
5|
4| *
3| *
2| * *
1|* * * *
|_ _ _ _
5 1 1 2
0 0 5 0
0 0 0
So the axis are:
X-Axis: Is the buckets
Y-Axis: is the number of customers
I am currently using the Oracle SQL CASE function to determine whether the spend falls within the bucket and then summing the number of customers that do. However this is taking forever as it there are a couple of million records.
Any idea on how to do this effectively?
Thanks!
You can try using WIDTH_BUCKET function.
select bucket , count(name)
from (select name, spend,
WIDTH_BUCKET(spend, 0, 200, 4) bucket
from mytable
)
group by bucket
order by bucket;
Here I have divided the range 0 to 200 into 4 bucket. And the function assigns a bucket number to each value. You can group by this bucket and count how many reocrds fall in each bucket.
Demo here.
You can even display the actual bucket range.
select bucket,
cast(min_value + ((bucket-1) * (max_value-min_value)/buckets) as varchar2(10))
||'-'
||cast(min_value + ((bucket) * (max_value-min_value)/buckets) as varchar2(10)),
count(name) c
from (select name,
spend,
WIDTH_BUCKET(spend, min_value, max_value, buckets) bucket
from mytable)
group by bucket
order by bucket;
Sample here.
SELECT COUNT(*) y_axis,
X_AXIS
FROM
(SELECT COUNT(*)y_axis,
CASE
WHEN spend <= 50 THEN 50
WHEN spend < 100 AND spend > 50 THEN 100
WHEN spend < 150 AND spend >= 100 THEN 150
WHEN spend < 200 AND spend >= 150 THEN 200
END x_axis
FROM your_table
GROUP BY spend
)
GROUP BY X_AXIS;
y_axis x_axis
-----------------
4 100
1 50
1 200
2 150