I have two tables: a candidate table with candidate ID as a main key and the second table is one of educations linking candidate ID with the school they went to.
I want to filter schools where there are 50 or more candidates from that school. I also want the candidate names too.
select candidates.first_name, candidates.last_name
from candidates
where candidates.id IN (select e.candidate_id, e.school_name, count(e.school_name)
from educations e
group by e.candidate_id, e.school_name
having count(e.school_name) >= 50)
I'm getting an error that says:
Subquery has too many columns
When you are using a subquery inside an IN condition, your subquery can only return a single column.
As Stu already said in the coment, a EXISTS would be faster than an IN clause
In ana IN your subselect only can return so many columns as a defined by the column name(s) before the IN
This example of a query is for MySQL, but it should work on any Databse system
and of course is simplified
CREATE tABLE candidates (id int, first_name varchar(10), last_name varchar(10))
INSERT INTO candidates VALUEs(1,'a','an'),(2,'b','bn')
Records: 2 Duplicates: 0 Warnings: 0
create TablE educations (id int, candidate_id int,school_name varchar(10))
INSERT INTO educations VALUES (1,1,'school A'),(2,1,'school B'),(3,1,'school C'),(4,1,'school D')
,(5,1,'school E'),(6,2,'school A'),(7,2,'school B'),(9,2,'school C')
Records: 8 Duplicates: 0 Warnings: 0
select candidates.first_name, candidates.last_name
from candidates
where EXISTS (select 1
from educations e
WHERE e.candidate_id = candidates.id
having count(e.school_name) >= 5)
first_name
last_name
a
an
fiddle
Related
I was trying to create alternate queries to get the required result just came up with the below code. The below code gives Robert as the result i want to understand how is the compiler evaluated the below code.
CREATE TABLE NAMES(Id integer PRIMARY KEY, Name text);
/* Create few records in this table */
INSERT INTO NAMES VALUES(1,'Tom');
INSERT INTO NAMES VALUES(2,'Lucy');
INSERT INTO NAMES VALUES(3,'Frank');
INSERT INTO NAMES VALUES(4,'Jane');
INSERT INTO NAMES VALUES(5,'Robert');
COMMIT;
/* Display all the records from the table */
SELECT * FROM NAMES a where 0 = (select count(Name) from NAMES b where b.Id > a.Id);
For each row in names it is checked whether the number of rows with a higher ID is 0. Only in that case is the row selected. This means only the row with the highest ID gets selected.
It is unnecessary to count, though. More typical would be a NOT EXISTS query:
SELECT *
FROM names
WHERE NOT EXISTS
(
SELECT NULL
FROM names name_with_higher_id
WHERE name_with_higher_id.id > names.id
);
Or a MAX query:
SELECT *
FROM names
WHERE id = (SELECT max(id) FROM names);
There are two tables:
Table education_data (list of countries with values by year per measured indicator).
create table education_data
(country_id int,
indicator_id int,
year date,
value float
);
Table indicators (list of all indicators):
create table indicators
(id int PRIMARY KEY,
name varchar(200),
code varchar(25)
);
I want to find the indicators for which the highest number of countries lack information entirely
i.e. max (count of missing indicators by country)
I have solved the problem in excel (by counting blanks in a pivot table by country)
pivot table with count for missing indicators by country
I haven't figured our yet the SQL query to return the same results.
I am able to return the number of missing indicators for a set country , read query below, but not for all countries.
SELECT COUNT(*)
FROM education_data AS edu
RIGHT JOIN indicators AS ind ON
edu.indicator_id = ind.id and country_id = 10
WHERE value IS NULL
GROUP BY country_id
I have tried with a cross join without success so far.
You will have to join on the contries as well, otherwise you can not tell if a contry has no entry in education_data at all:
create table countries(id serial primary key, name varchar);
create table indicators
(id int PRIMARY KEY,
name varchar(200),
code varchar(25)
);
create table education_data
(country_id int references countries,
indicator_id int references indicators,
year date,
value float
);
insert into countries values (1,'USA');
insert into countries values (2,'Norway');
insert into countries values (3,'France');
insert into indicators values (1,'foo','xxx');
insert into indicators values (2,'bar', 'yyy');
insert into education_data values(1,1,'01-01-2020',1.1);
SELECT count (c.id), i.id, i.name
FROM countries c JOIN indicators i ON (true) LEFT JOIN education_data e ON(c.id = e.country_id AND i.id = e.indicator_id)
WHERE indicator_id IS NULL
GROUP BY i.id;
count | id | name
-------+----+------
3 | 2 | bar
2 | 1 | foo
(2 rows)
I want to find the indicators for which the highest number of countries lack information entirely i.e. max (count of missing indicators by country)
That's a logical contradiction. The ...
count of missing indicators by country
.. cannot be pinned on any specific indicators, since those countries don't have an indicator.
The counts per country with "missing indicator" (i.e. indicator_id IS NULL):
SELECT country_id, count(*) AS ct_indicator_null
FROM education_data
WHERE indicator_id IS NULL
GROUP BY country_id
ORDER BY count(*) DESC;
Or, more generally, without valid indicator, which also includes rows where indicator_id has no match in table indicators:
SELECT country_id, count(*) AS ct_no_valid_indicator
FROM education_data e
WHERE NOT EXISTS (
SELECT FROM indicators i
WHERE i.id = e.indicator_id
)
GROUP BY country_id
ORDER BY count(*) DESC;
NOT EXISTS is one of four basic techniques that apply here (LEFT / RIGHT JOIN, like you tried being another one). See:
Select rows which are not present in other table
You mentioned a country table. Countries without any indicator entries in education_data are not included in the result above. To find those, too:
SELECT *
FROM country c
WHERE NOT EXISTS (
SELECT
FROM education_data e
JOIN indicators i ON i.id = e.indicator_id -- INNER JOIN this time!
WHERE e.country_id = c.id
);
Reports countries without valid indicator (none, or not valid).
If every country should have a valid indicator, after cleaning up existing data, consider:
1: adding an FOREIGN KEY constraint to disallow invalid entries in education_data.indicator_id.
2: setting education_data.indicator_id NOT NULL to also disallow NULL entries.
Or add a PRIMARY KEY on (country_id, indicator_id), which makes both columns NOT NULL automatically.
.. which brings you closer to a valid many-to-many implementation. See:
How to implement a many-to-many relationship in PostgreSQL?
I would like an efficient means of deriving groups of matching records across multiple fields. Let's say I have the following table:
CREATE TABLE cust
(
id INT NOT NULL,
class VARCHAR(1) NULL,
cust_type VARCHAR(1) NULL,
terms VARCHAR(1) NULL
);
INSERT INTO cust
VALUES
(1,'A',NULL,'C'),
(2,NULL,'B','C'),
(3,'A','B',NULL),
(4,NULL,NULL,'C'),
(5,'D','E',NULL),
(6,'D',NULL,NULL);
What I am looking to get is the set of IDs for which matching values unify a set of records over the three fields (class, cust_type and terms), so that I can apply a unique ID to the group.
In the example, records 1-4 constitute one match group over the three fields, while records 5-6 form a separate match.
The following does the job:
SELECT
DISTINCT
a.id,
DENSE_RANK() OVER (ORDER BY max(b.class),max(b.cust_type),max(b.terms)) AS match_group
FROM cust AS a
INNER JOIN
cust AS b
ON
a.class = b.class
OR a.cust_type = b.cust_type
OR a.terms = b.terms
GROUP BY a.id
ORDER BY a.id
id match_group
-- -----------
1 1
2 1
3 1
4 1
5 2
6 2
**But, is there a better way?** Running this query on a table of over a million rows is painful...
As Graham pointed out in the comments, the above query doesn't satisfy the requirements if another record is added that would group all the records together.
The following values should be grouped together in one group:
INSERT INTO cust
VALUES
(1,'A',NULL,'C'),
(2,NULL,'B','C'),
(3,'A','B',NULL),
(4,NULL,NULL,'C'),
(5,'D','E',NULL),
(6,'D',NULL,NULL),
(7,'D','B','C');
Would yield:
id match_group
-- -----------
1 1
2 1
3 1
4 1
5 1
6 1
...because the class value of D groups records 5, 6 and 7. The terms value of C matches records 1, 2 and 4 to that group, and cust_type value B ( or class value A) pulls in record 3.
Hopefully that all makes sense.
I don't think you can do this with a (recursive) Select.
I did something similar (trying to identify unique households) using a temporary table & repeated updates using following logic:
For each class|cust_type|terms get the minimum id and update that temp table:
update temp
from
(
SELECT
class, -- similar for cust_type & terms
min(id) as min_id
from temp
group by class
) x
set id = min_id
where temp.class = x.class
and temp.id <> x.min_id
;
Repeat all three updates until none of them updates a row.
For DB StudentInfo and Table Student as follows:
CREATE TABLE Student
(
ID INT PRIMARY KEY IDENTITY(1,1),
Name nvarchar(255)
)
and inserting values:
Insert Into Student Values ('Ashok')`
executing it 3 times, and
Insert Into Student Values ('Achyut')
executing it 2 times and total 5 rows of data are inserted into the table.
I want to display a result counting the result with the name having 'Ashok' & 'Achyut'.
Generally for single values count in a column I use:
SELECT Count(Name) AS NoOfStudentHavingNameAshok
FROM Student
WHERE Name = 'Ashok'
but how to display the NoOfStudentHavingNameAshok & NoOfStudentHavingNameAchyut what query should I run?
You should include name in the select and group by name.
SELECT name, Count(*)
From Student
group by name
You can put conditions inside your COUNT() function:
select count(case when Name = 'Ashok' then 'X' end) as NoOfStudentHavingNameAshok,
count(case when Name = 'Achyut' then 'X' end) as NoOfStudentHavingNameAchyut
from Student
I'm working with a Pro*C query, but this question should be general SQL. My research has been a dead end, but I think I'm missing something.
Suppose my server has an array of students' names, {"Alice","Charlie","Bob"}. I query the Student_Ids table for the students' ID numbers:
SELECT id_no FROM student_ids
WHERE student_name IN ('Alice','Charlie','Bob');
To simplify server-side processing, I want to sort the result set in the same order as the students' names. In other words, the result set would be {alice_id_no, charlie_id_no, bob_id_no} regardless of the actual ordering of the table or the behavior of any particular vendor's implementation.
The only solution I can think of is:
. . .
ORDER BY
CASE WHEN student_name='Alice' THEN 0
WHEN student_name='Charlie' THEN 1
WHEN student_name='Bob' THEN 2 END;
but that seems extremely messy and cumbersome when trying to dynamically generate/run this query.
Is there a better way?
UPDATE I gave a terrible example by pre-sorting the students' names. I changed the names to be deliberately unsorted. In other words, I want to sort the names in a non-ASC or DESC-friendly way.
UPDATE II Oracle, but for knowledge's sake, I am looking for more general solutions as well.
The ORDER BY expression you've given for your sample data is equivalent to ORDER BY student_name. Is that what you intended?
If you want a custom ordering that is not alphabetical, I think you might have meant something like this:
ORDER BY
CASE
WHEN student_name = 'Alice' THEN 0
WHEN student_name = 'Charlie' THEN 1
WHEN student_name = 'Bob' THEN 2
END;
You can use a derived table as well, that holds the names as well as the ordering you want. This way you only have to put the names in a single time:
SELECT S.id_no
FROM
student_ids AS S
INNER JOIN (
SELECT Name = 'Alice', Seq = 0 FROM DUAL
UNION ALL SELECT 'Bob', 2 FROM DUAL
UNION ALL SELECT 'Charlie', 1 FROM DUAL
) AS N
ON S.student_name = N.Name
ORDER BY
N.Seq;
You could also put them into a temp table, but in Oracle that could be somewhat of a pain.
Can you do this?
order by student_name
To do a custom sort, you only need one case:
ORDER BY (CASE WHEN student_name = 'Alice' THEN 1
WHEN student_name = 'Bob' THEN 2
WHEN student_name = 'Charlie' THEN 3
ELSE 4
END)
why not this :
SELECT id_no FROM student_ids
WHERE student_name IN ('Alice','Bob','Charlie')
ORDER BY student_name
You can ORDER BY any columns, not necessary those in SELECT list or WHERE clause
SELECT id_no
FROM student_ids
WHERE student_name IN ('Alice','Bob','Charlie)
ORDER BY id_no;
Add a table to hold the sort priorities then you can use the sort_priorities in whatever query you want (and easily update the priorities):
CREATE TABLE student_name_sort_priorities (
student_name VARCHAR2(30) CONSTRAINT student_name_sort_priority__pk PRIMARY KEY,
sort_priority NUMBER(10) CONSTRAINT student_name_sort_priority__nn NOT NULL
CONSTRAINT student_name_sort_priority__u UNIQUE
);
(If you want two values to be equivalently sorted then don't include the UNIQUE constraint.)
INSERT INTO student_name_sort_priorities VALUES ( 'Alice', 0 );
INSERT INTO student_name_sort_priorities VALUES ( 'Charlie', 2 );
INSERT INTO student_name_sort_priorities VALUES ( 'Bob', 1 );
Then you can join the sort priority table with the student_ids table and use the extra column to perform ordering:
SELECT id_no
FROM student_ids s
LEFT OUTER JOIN
student_name_sort_priorities p
ON (s.student_name = p.student_name)
ORDER BY
sort_priority NULLS LAST;
I've used a LEFT OUTER JOIN so that if a name is not contained on the student_name_sort_priorities table then it does not restrict the rows returned from the query; NULLS LAST is used in the ordering for a similar reason - any student names that aren't in the sort priorities table will return a NULL sort priority and be placed at the end of the ordering. If you don't want this then just use INNER JOIN and remove the NULLS LAST.
How about using a 'table of varchar' type like the build-in below:
TYPE dbms_debug_vc2coll is table of varchar2(1000);
test:
SQL> select customer_id, cust_first_name, cust_last_name from customers where cust_first_name in
2 (select column_value from table(sys.dbms_debug_vc2coll('Frederic','Markus','Dieter')));
CUSTOMER_ID CUST_FIRST_NAME CUST_LAST_NAME
----------- -------------------- --------------------
154 Frederic Grodin
149 Markus Rampling
152 Dieter Matthau
That seems to force the order, but that might just be bad luck. I'm not really a sql expert.
The execution plan for this uses 'collection iterator' instead of a big 'or' in the typical:
select customer_id, cust_first_name, cust_last_name from customers where cust_first_name in ('Frederic','Markus','Dieter');
hth, Hein.