SQL UNION query with order by giving syntax error on "(" - sql

I'm trying to select 2 oldest females and 2 oldest males using 1 query. The union keeps giving me a syntax error near "(". Both queries work independantly but after union I get error.
-- create a table
CREATE TABLE students (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
gender TEXT NOT NULL,
age INTEGER NOT NULL
);
-- insert some values
INSERT INTO students VALUES (1, 'Ryan', 'M', 23);
INSERT INTO students VALUES (2, 'Joanna', 'F', 22);
INSERT INTO students VALUES (3, 'Alex', 'F', 25);
INSERT INTO students VALUES (4, 'Ted', 'M', 21);
INSERT INTO students VALUES (5, 'June', 'F', 26);
INSERT INTO students VALUES (6, 'Rose', 'F', 24);
INSERT INTO students VALUES (7, 'Jack', 'M', 25);
-- select * from students;
SELECT * FROM
(SELECT name FROM students WHERE GENDER = 'F' ORDER BY age DESC LIMIT 2)
UNION
(SELECT name FROM students WHERE GENDER = 'M' ORDER BY age DESC LIMIT 2);

Your online compliler uses not MySQL but SQLite!
Execute select sqlite_version(); - the output is '3.31.1'.
Use this:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY gender ORDER BY age DESC) rn
FROM students
)
SELECT name
FROM cte
WHERE rn <= 2;
This code is correct for SQLite.
PS. Add ORDER BY if needed.

For SQLite, both unioned queries, since they contain an ORDER BY clause, must be used as subqueries with an external SELECT clause and you can use an ORDER BY clause at the end which will be applied to the resultset of the union and will put all Fs at the top because they are alphabetically less than the Ms:
SELECT * FROM (SELECT * FROM students WHERE gender = 'F' ORDER BY age DESC LIMIT 2)
UNION
SELECT * FROM (SELECT * FROM students WHERE gender = 'M' ORDER BY age DESC LIMIT 2)
ORDER BY gender, age;
See the demo.

Related

Union two queries ordered by newid

I have a table that stores employees (id, name, and gender). I need to randomly get two men and two women.
CREATE TABLE employees
(
id INT,
name VARCHAR (10),
gender VARCHAR (1),
);
INSERT INTO employees VALUES (1, 'Mary', 'F');
INSERT INTO employees VALUES (2, 'Jake', 'M');
INSERT INTO employees VALUES (3, 'Ryan', 'M');
INSERT INTO employees VALUES (4, 'Lola', 'F');
INSERT INTO employees VALUES (5, 'Dina', 'F');
INSERT INTO employees VALUES (6, 'Paul', 'M');
INSERT INTO employees VALUES (7, 'Tina', 'F');
INSERT INTO employees VALUES (8, 'John', 'M');
My attempt is the following:
SELECT TOP 2 *
FROM employees
WHERE gender = 'F'
ORDER BY NEWID()
UNION
SELECT TOP 2 *
FROM employees
WHERE gender = 'M'
ORDER BY NEWID()
But it doesn't work since I can't put two order by in the same query.
Why not just use row_number()? One method without a subquery is:
SELECT TOP (4) WITH TIES e.*
FROM employees
WHERE gender IN ('M', 'F')
ORDER BY ROW_NUMBER() OVER (PARTITION BY gender ORDER BY newid());
This is slightly less performant than using ROW_NUMBER() in a subquery.
Or, a fun method would use APPLY:
select e.*
from (values ('M'), ('F')) v(gender) cross apply
(select top (2) e.*
from employees e
where e.gender = v.gender
order by newid()
) e;
You cannot put an ORDER BY in the combinable query (the first one) of the UNION. However, you can use ORDER BY if you convert each one into a table expression.
For example:
select *
from (
SELECT TOP 2 *
FROM employees
WHERE gender = 'F'
ORDER BY newid()
) x
UNION ALL
select *
from (
SELECT TOP 2 *
FROM employees
WHERE gender = 'M'
ORDER BY newid()
) y
Result:
id name gender
--- ----- ------
5 Dina F
4 Lola F
2 Jake M
3 Ryan M
See running example at SQL Fiddle.

SQL : retrieving data by rank of variables

I have a dataset like this
student_id, course_id, grade
1 , 1, 2
1, 2, 5
1, 3 ,5
2, 3, 5
2, 1, 2
3, 1, 1
3, 2, 4
I created a schema for this on sqlfiddle.com like below:
CREATE TABLE enrollments(
STUDENT_ID INT NOT NULL,
COURSE_ID INT NOT NULL,
GRADE INT NOT NULL
);
INSERT INTO enrollments
(STUDENT_ID,COURSE_ID,GRADE) VALUES
(1, 1, 2);
INSERT INTO enrollments
(STUDENT_ID,COURSE_ID,GRADE) VALUES
(1, 2, 5);
INSERT INTO enrollments
(STUDENT_ID,COURSE_ID,GRADE) VALUES
(1, 3, 5);
INSERT INTO enrollments
(STUDENT_ID,COURSE_ID,GRADE) VALUES
(2, 3, 5);
INSERT INTO enrollments
(STUDENT_ID,COURSE_ID,GRADE) VALUES
(2, 1, 2);
INSERT INTO enrollments
(STUDENT_ID,COURSE_ID,GRADE) VALUES
(3, 1, 1);
INSERT INTO enrollments
(STUDENT_ID,COURSE_ID,GRADE) VALUES
(3, 2, 4);
INSERT INTO enrollments
(STUDENT_ID,COURSE_ID,GRADE) VALUES
(3, 3, 4);
Now here is what I want:
A query that returns the table with columns student_id, course_id, grade and which contains only the rows of the table corresponding to the highest grade each student was able to achieve across any of his/her courses.
If a student achieves the same highest grade in multiple courses, then only display the row corresponding to the course with the lowest course_id. Sort the output by student_id.
So I wrote the following query:
select STUDENT_ID, COURSE_ID, GRADE
from
(
select STUDENT_ID, rank() over(PARTITION BY STUDENT_ID ORDER BY GRADE Desc)
as grade_rank,
rank() over(PARTITION BY STUDENT_ID ORDER BY COURSE_ID asc) as course_rank
from enrollments
) as ss
where grade_rank=1 and course_rank=1
I want to test if this is the right logic on sqlfiddle but it throws an error for the query
ERROR: column "course_id" does not exist Position: 20
The schema has been successfully created there.
Is something wrong with this and how I can test if this is correct logic. If the logic is wrong, please highlight the error in code.
Thanks
You have to select the columns in the inner query too, if you want to select them in the outer query. Additionally have to use one RANK() with an ORDER BY regarding both columns.
SELECT STUDENT_ID,
COURSE_ID,
GRADE
FROM (SELECT STUDENT_ID,
COURSE_ID,
GRADE,
rank() OVER (PARTITION BY STUDENT_ID
ORDER BY GRADE DESC,
COURSE_ID ASC) R
FROM ENROLLMENTS) SS
WHERE R = 1;

Distinct with where condition

I have table as below:
I want to perform distinct on city but if city is duplicate then return row which having maximum ref_id. Result should contains all the columns.
Test data:
DECLARE #t_temp TABLE (ID smallint,
name varchar(10),
city varchar(10),
ref_id smallint);
INSERT INTO #t_temp
VALUES
(1, 'xyz', 'a', 101),
(2, 'pqr', 'a', 102),
(3, 'ijk', 'a', 103),
(4, 'abc', 'b', 104),
(5, 'ahg', 'c', 10);
Actual query:
SELECT ID
, name
, city
, ref_id
FROM (SELECT *
, ROW_NUMBER() OVER (PARTITION BY city ORDER BY ref_id DESC) Ranking
FROM #t_temp) base
WHERE Ranking = 1;
Result:
ID name city ref_id
------ ---------- ---------- ------
3 ijk a 103
4 abc b 104
5 ahg c 10
Basicly, what I'm doing is assigning a 'ranking' to all your records grouped by city and ordered by ref_id, and then retaining only the "number one" record. This is an alternative to what Rahul proposed, which is also a valid solution to your problem. The only difference between the two is that in Rahul's example he'll return multiple records if multiple exist with the same city and ref_id (considering it being the highest one), where the solution above will only return a single record. To reach the same behavior as Rahul, you can change the ROW_NUMBER() to RANK() or DENSE_RANK().
Try this:
Select tb1.* from Table1 as tb1
inner join (
Select city, Max(ref_id) as 'ref_id' from Table1 group by city
) as tb2
on tb1.city = tb2.city and tb1.ref_id = tb2.ref_id

How to get numeric range between row value SQL

I have a table which shows Grades and percentages.
Now I want to run query on table which fetch Grade between these percentages.
Example if a student get 72% I want to show the Grade as C.
How to get Grade from table?
Please refer this table picture:
Drop Table Grades
Drop Table Students
Create Table Students (Name Varchar(200), Percentage Numeric(5,2))
Insert Students Values ('John', 0.00)
Insert Students Values ('Jane', 38.00)
Insert Students Values ('Joe', 45.00)
Insert Students Values ('Greg', 50.00)
Insert Students Values ('Buck', 55.00)
Insert Students Values ('Harold', 60.00)
Insert Students Values ('Jack', 65.00)
Insert Students Values ('Bill', 68.00)
Insert Students Values ('Gerald', 75.00)
Insert Students Values ('Steve', 79.00)
Insert Students Values ('Walter', 85.00)
Insert Students Values ('Mike', 92.00)
Insert Students Values ('Mary', 100.00)
Insert Students Values ('Mark', 101.00)
Select * From Students
Create Table Grades (Grade Char(2), Percentage Numeric(5,2))
Go
Insert Grades Values ('A*', 101.00)
Insert Grades Values ('A', 85.00)
Insert Grades Values ('B', 75.00)
Insert Grades Values ('C', 65.00)
Insert Grades Values ('D', 55.00)
Insert Grades Values ('E', 45.00)
Insert Grades Values ('F', 0.00)
Select S.*, G.Grade
From
(
Select *, IsNull(Lead(Percentage) Over (Order By Percentage), (Select Max(Percentage)+.01 From Grades)) NextPercentage
From Grades ) G
Join Students S On S.Percentage >= G.Percentage And S.Percentage < G.NextPercentage
ORDER BY Percentage DESC with <= the percentage in WHERE and TOP 1 Grade will given the expected result
CREATE TABLE #GradeMaster (Grade VARCHAR(2), Percentage DECIMAL(5,2))
INSERT INTO #GradeMaster
SELECT 'A*', 101 UNION
SELECT 'A', 85 UNION
SELECT 'B', 75 UNION
SELECT 'C', 65 UNION
SELECT 'D', 55 UNION
SELECT 'E', 45 UNION
SELECT 'F', 0
SELECT TOP 1 Grade
FROM #GradeMaster
WHERE Percentage <= 72
ORDER BY Percentage DESC
DROP TABLE #GradeMaster
select grade from table1 where precentage in (
select max(percentage) from table1 where 72 > percentage);
You can substitute 72 for whatever score you like. There may be a way to do it without the 2 selects, but this should work.
You can use a order by limit 1
select grade from my_table
where percentage <= 72
order by percentage desc
limit 1;
Assuming there might also be a student table and assignment table ... I would think the lookup query would look something like this. The below will give you all students regardless of whether they have any graded assignments. Alternatively, you could join the student table directly if you have an overall grade already aggregated.
SELECT
S.*,
A.*,
G.grade
FROM
Student S
LEFT OUTER JOIN Assignment A ON S.Student_id = A.Student_id
LEFT OUTER JOIN Grade G ON A.Percentage >= G.Percentage AND A.Percentage < G.Percentage

How can I do a distinct sum?

I am trying to create a "score" statistic which is derived from the value of a certain column, calculated as the sum of a case expression. Unfortunately, the query structure needs to be a full outer join (this is simplified from the actual query, and the join structure survives from the original code), and thus the sum is incorrect, since each row may occur many times. I could group by the unique key; however, that breaks other aggregate functions that are in the same query.
What I really want to do is sum (case when ... distinct claim_id) which of course does not exist; is there an approach that will do what I need? Or does this have to be two queries?
This is on redshift, in case it matters.
create table t1 (id int, proc_date date, claim_id int, proc_code char(1));
create table t2 (id int, diag_date date, claim_id int);
insert into t1 (id, proc_date, claim_id, proc_code)
values (1, '2012-01-01', 0, 'a'),
(2, '2009-02-01', 1, 'b'),
(2, '2019-02-01', 2, 'c'),
(2, '2029-02-01', 3, 'd'),
(3, '2016-04-02', 4, 'e'),
(4, '2005-01-03', 5, 'f'),
(5, '2008-02-03', 6, 'g');
insert into t2 (id, diag_date, claim_id)
values (4, '2004-01-01', 20),
(5, '2010-02-01', 21),
(6, '2007-04-02', 22),
(5, '2011-02-01', 23),
(6, '2008-04-02', 24),
(5, '2012-02-01', 25),
(6, '2009-04-02', 26),
(7, '2002-01-03', 27),
(8, '2001-02-03', 28);
select id, sum(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end), count(distinct t1.claim_id) as proc_count, min(proc_date) as min_proc_date
from t1 full outer join t2 using (id) group by id order by id;
You can separate out your conditional aggregates into a cte or subquery and use OVER(PARTITION BY id) to get an id level aggregate without grouping, something like this:
with cte AS (SELECT *,sum(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end) OVER(PARTITION BY id) AS Some_Sum
, min(proc_date) OVER(PARTITION BY id) as min_proc_date
FROM t1
)
select id
, Some_Sum
, count(distinct cte.claim_id) as proc_count
, min_proc_date
from cte
full outer join t2 using (id)
group by id,Some_Sum,min_proc_Date
order by id;
Demo: SQL Fiddle
Note that you'll have to add these aggregates to the GROUP BY in the outer query, and the fields in your PARTITION BY should match the t1 fields you previously used in GROUP BY, in this case just id, but if your full query had other t1 fields in the GROUP BY be sure to add them to the PARTITION BY
You can use a subquery (by id and id_claim) and then regroup:
with base as (
select id, avg(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end) as value_proc,
t1.claim_id , min(proc_date) as min_proc_date
from t1 full outer join t2 using (id) group by id, t1.claim_id order by id, t1.claim_id)
select id, sum(value_proc), count(distinct claim_id) as proc_count, min(min_proc_date) as min_proc_date
from base
group by id
order by id;
See that I sugest avg for the internal subquery, but if you are sure that the same claim_id have the same letter you can use max or min and that was integer. If not is prefer this.