I'm analyzing some code that utilizes empty OVER clauses in the contest of Count().
Example:
SELECT
ROW_NUMBER() OVER (ORDER BY Priority DESC) AS RowID,
CAST((COUNT(*) OVER() / #pagesize) AS Int) AS TotalPages,
I'm trying to understand why the empty OVER clause is being used here.
There are other standard select elements below those two lines I listed above, and when I remove the empty OVER clause from the second the TotalPages line, I get errors like this:
Column 'TableA.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
As soon as I put the OVER() back, the error is gone.
My understanding of the OVER clause is very limited... I feel Like I understand what's going on in the RowID line... but the TotalPages line just baffles me.
OVER() is part of analytic function and define partitions in your recordset. OVER() is just one partition and applied to the whole dataset
i.e. COUNT(*) OVER() will return in each row how many records in your dataset.
look to this http://msdn.microsoft.com/en-us/library/ms189461.aspx
Say our table is employees:
+-----------+-------+---------+
| badge_num | name | surname |
+-----------+-------+---------+
| 1 | John | Smith |
| 2 | Mark | Pence |
| 3 | Steve | Smith |
| 4 | Bob | Smith |
+-----------+-------+---------+
Running
SELECT surname, COUNT(*)
FROM employees
GROUP BY surname;
we'll get:
+---------+----------+
| surname | COUNT(*) |
+---------+----------+
| Smith | 3 |
| Pence | 1 |
+---------+----------+
While running
SELECT surname, COUNT(*) OVER()
FROM employees
GROUP BY surname;
we'll get:
+---------+-----------------+
| surname | COUNT(*) OVER() |
+---------+-----------------+
| Smith | 2 |
| Pence | 2 |
+---------+-----------------+
In the second case, in each row we are just counting the number of rows of the whole select (not the single partition).
To summarize things, the OVER clause can be used with Ranking Functions(Rank, Row_Number, Dense_Rank..), Aggregate Functions like (AVG, Max, Min, SUM...etc) and Analytics Functions like (First_Value, Last_Value, and few others).
Let's See basic syntax of OVER clause
OVER (
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
)
PARTITION BY:
It is used to partition data and perform operations on groups with the same data.
ORDER BY:
It is used to define the logical order of data in Partitions. When we don't specify Partition, entire resultset is considered as a single partition
:
This can be used to specify what rows are supposed to be considered in a partition when performing the operation.
Let's take an example:
Here is my dataset:
Id Name Gender Salary
----------- -------------------------------------------------- ---------- -----------
1 Mark Male 5000
2 John Male 4500
3 Pavan Male 5000
4 Pam Female 5500
5 Sara Female 4000
6 Aradhya Female 3500
7 Tom Male 5500
8 Mary Female 5000
9 Ben Male 6500
10 Jodi Female 7000
11 Tom Male 5500
12 Ron Male 5000
So let me execute different scenarios and see how data is impacted and I'll come from difficult syntax to simple one
Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 32000
1 Mark Male 5000 32000
8 Mary Female 5000 32000
12 Ron Male 5000 32000
11 Tom Male 5500 48500
7 Tom Male 5500 48500
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
Just observe the sum_sal part. Here I am using order by Salary and using "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW".
In this case, we are not using partition so entire data will be treated as one partition and we are ordering on salary.
And the important thing here is UNBOUNDED PRECEDING AND CURRENT ROW. This means when we are calculating the sum, from starting row to the current row for each row.
But if we see rows with salary 5000 and name="Pavan", ideally it should be 17000 and for salary=5000 and name=Mark, it should be 22000. But as we are using RANGE and in this case, if it finds any similar elements then it considers them as the same logical group and performs an operation on them and assigns value to each item in that group. That is the reason why we have the same value for salary=5000. The engine went up to salary=5000 and Name=Ron and calculated sum and then assigned it to all salary=5000.
Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 17000
1 Mark Male 5000 22000
8 Mary Female 5000 27000
12 Ron Male 5000 32000
11 Tom Male 5500 37500
7 Tom Male 5500 43000
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
So with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW The difference is for same value items instead of grouping them together, It calculates SUM from starting row to current row and it doesn't treat items with same value differently like RANGE
Select *,SUM(salary) Over(order by salary) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 32000
1 Mark Male 5000 32000
8 Mary Female 5000 32000
12 Ron Male 5000 32000
11 Tom Male 5500 48500
7 Tom Male 5500 48500
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
These results are the same as
Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
That is because Over(order by salary) is just a short cut of Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
So wherever we simply specify Order by without ROWS or RANGE it is taking RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as default.
Note: This is applicable only to Functions that actually accept RANGE/ROW. For example, ROW_NUMBER and few others don't accept RANGE/ROW and in that case, this doesn't come into the picture.
Till now we saw that Over clause with an order by is taking Range/ROWS and syntax looks something like this RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
And it is actually calculating up to the current row from the first row. But what If it wants to calculate values for the entire partition of data and have it for each column (that is from 1st row to last row). Here is the query for that
Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1 Mark Male 5000 62000
2 John Male 4500 62000
3 Pavan Male 5000 62000
4 Pam Female 5500 62000
5 Sara Female 4000 62000
6 Aradhya Female 3500 62000
7 Tom Male 5500 62000
8 Mary Female 5000 62000
9 Ben Male 6500 62000
10 Jodi Female 7000 62000
11 Tom Male 5500 62000
12 Ron Male 5000 62000
Instead of CURRENT ROW, I am specifying UNBOUNDED FOLLOWING which instructs the engine to calculate till the last record of partition for each row.
Now coming to your point on what is OVER() with empty braces?
It is just a short cut for Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
Here we are indirectly specifying to treat all my resultset as a single partition and then perform calculations from the first record to the last record of each partition.
Select *,Sum(salary) Over() as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1 Mark Male 5000 62000
2 John Male 4500 62000
3 Pavan Male 5000 62000
4 Pam Female 5500 62000
5 Sara Female 4000 62000
6 Aradhya Female 3500 62000
7 Tom Male 5500 62000
8 Mary Female 5000 62000
9 Ben Male 6500 62000
10 Jodi Female 7000 62000
11 Tom Male 5500 62000
12 Ron Male 5000 62000
I did create a video on this and if you are interested you can visit it.
https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s
Thanks,
Pavan Kumar Aryasomayajulu
HTTP://xyzcoder.github.io
Related
I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?
You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here
Let's say I have three tables:
Employees:
PID NAME WAGE
---------- -------------------- ----------
10234 Able 8
11567 Baker 9
3289 George 10
88331 Alice 11
Employee_made:
PID SID QUANTITY HOURS
---------- ---------- ---------- ----------
10234 11 24 3
10234 12 6 1
10234 13 24 1
10234 21 6 1
10234 23 4 1
10234 31 48 6
11567 23 4 1
11567 31 1 1
88331 11 6 1
Sandwich:
SID PRICE NAME
---------- ---------- ------------------------------
12 2 hamburger on wheat
13 2 cheese burger
21 1.75 fish burger on rye
23 1.75 fish burger on wheat
31 3 veggie burger on wheat
11 2 hamburger on rye
I need to list all the employees who have made ALL the different sandwiches, and display their names and PID. What I've gotten so far is:
Select E.name, E.pid
From employees E, employee_made EM, sandwich S
Where E.pid = EM.pid
Which tells me the matching PIDs from the employees and employee_made table. Where I'm not sure to go is how to display the employees who have made ALL the sandwiches, so matching not any SID to the employee_made table, but ALL of them.
First, never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
You can approach this by counting the number of sandwiches mades by employees and then comparing to the total count of sandwiches:
select em.pid
from employee_made em
group by em.pid
having count(distinct em.sid) = (select count(*) from sandwich);
This gives the pid of the employee. I'll let you figure out how to bring in the employee name (hint: in, exists, and join could all be used).
The query I'm trying to answer is 'How many sales above or equal to 60 has each person made?'
My table (sales$):
SaleID name salevalue
1 Steve 100
2 John 50
3 Ellen 25
4 Steve 100
5 Mary 60
6 Mary 80
7 John 70
8 Mary 55
9 Steve 65
10 Ellen 120
11 Ellen 30
12 Ellen 40
13 John 40
14 Mary 60
15 Steve 50
My code is:
select name,
COUNT(*) as 'sales above 60'
from Sales$
group by salevalue, name
having salevalue >= 60;
Which gives:
Ellen 1
John 1
Mary 2
Mary 1
Steve 1
Steve 2
The information is correct in that Mary & Steve both have 3 sales, however I'm forced by the HAVING command to group them out.
Any ideas? I'm sure I've just taken a wrong turning.
You can use conditional aggregation for this:
select name,
COUNT(case when salevalue >= 60 then 1 end) as 'sales above 60'
from Sales$
group by name
This way COUNT will take into consideration only records having salevalue >= 60.
I've swapped the HAVING statement for a WHERE and achieved the desired result:
select name, count(*) 'sales above 50'
from sales$
where salevalue >=60
group by name
(Lightbulb moment after posting)
I am trying to create a view that records the selected attributes for all Computer Science majors.
This is my query to create a view:
DROP VIEW CS_grade_report;
CREATE VIEW CS_grade_report AS
SELECT Student.student_id AS "ID",
student_name AS "Name",
course_number AS "Course #",
credit AS "Credit",
grade AS Grade
FROM Student, Class, Enrolls
WHERE major = 'CSCI'
AND Student.student_id = Enrolls.student_id
AND Class.schedule_num = Enrolls.schedule_num;
SELECT *
FROM CS_grade_report;
And this is what is generated:
ID Name Course # Credit GR
------ ------------------------- -------- ---------- --
600000 John Smith CSCI3200 4 B+
600000 John Smith CSCI3700 3 C
600000 John Smith SPAN1004 3 A-
600000 John Smith CSCI4300 3 A+
600001 Andrew Tram MUSC2406 2 A+
600001 Andrew Tram SPAN1004 3 A
600001 Andrew Tram CSCI3700 3 B-
600002 Jane Doe CSCI4200 3 D+
600003 Michael Jordan CSCI4300 3 A+
600004 Tiger Woods MUSC1000 1 A
600007 Dominique Davis CSCI4300 3 F
ID Name Course # Credit GR
------ ------------------------- -------- ---------- --
600009 Will Smith CSCI3200 4 A
600010 Papa Johns CSCI3200 4 B
600011 John Doe CSCI3200 4 C
600012 Jackie Chan CSCI3200 4 D
600013 Some Guy CSCI3200 4 E
16 rows selected.
I am assuming this is output from sqlplus. There is a "pagesize" option to define when breaks are added. If you only want to see one heading, set the size to a large enough value prior to running your SELECT statement as such:
set pagesize 500
(or whatever size you want)
There are many command options for sqlplus. This link is a good cheat-sheet.
As defined here, the maximum number of compound select term is SQLITE_MAX_COMPOUND_SELECT
How can we get this value when we query a sqlite database using a sql command?
e.g.
select SQLITE_MAX_COMPOUND_SELECT from SomeTableSomeWhere
or
select someFunctionForThisValue()
well, I can not answer your directly to your question, but as we know SQLITE_MAX_COMPOUND_SELECT default defined 500. We can go around with this by using SELECT LIMIT, OFFSET statement to fetch all data in a large table (hope this is want you want)
for example:
ID NAME AGE ADDRESS SALARY
---------- ---------- ---------- ---------- ----------
1 Paul 32 California 20000.0
2 Allen 25 Texas 15000.0
3 Teddy 23 Norway 20000.0
4 Mark 25 Rich-Mond 65000.0
5 David 27 Texas 85000.0
6 Kim 22 South-Hall 45000.0
With help of count rows
sqlite> SELECT count(*) FROM COMPANY
We can select limit rows with offset to fetch all database table
sqlite> SELECT * FROM COMPANY LIMIT 3 OFFSET 2;
ID NAME AGE ADDRESS SALARY
---------- ---------- ---------- ---------- ----------
3 Teddy 23 Norway 20000.0
4 Mark 25 Rich-Mond 65000.0
5 David 27 Texas 85000.0
reference here: http://www.tutorialspoint.com/sqlite/sqlite_limit_clause.htm