SQL: difference between where in main body vs join clause - sql

I'm wondering why does the following queries give me a slightly different dataset:
SELECT t.name, COUNT(e.id)
FROM event_type t
LEFT JOIN event e ON t.id = e.type_id AND e.start BETWEEN ? AND ?
GROUP BY t.name;
SELECT t.name, COUNT(e.id)
FROM event_type t
LEFT JOIN event e ON t.id = e.type_id
WHERE e.start BETWEEN ? AND ?
GROUP BY t.name;
So I just moved BETWEEN clause to the main body, logically, it does not matter where to apply it, but the result says it matters. Any suggestions? Thanks!
UPD: tried on MySQL 5.6
create table event_type
(
id int auto_increment primary key,
name varchar(100) not null,
constraint UNIQ_93151B825E237E06 unique (name)
) collate = utf8_unicode_ci;
create table event
(
id int auto_increment primary key,
type_id int null,
start datetime not null,
...
constraint FK_3BAE0AA7C54C8C93
foreign key (type_id) references event_type (id)
) collate = utf8_unicode_ci;
create index IDX_3BAE0AA7C54C8C93
on event (type_id);

Maybe it's hard to answer this question without some images! but I try.
Let's assume this is the event_type table
Id
Name
1
First
2
Second
Events table:
Id
TypeId
Start
5
1
2022-10-01
6
1
2022-10-10
So for this query:
SELECT t.name, COUNT(e.id)
FROM event_type t
LEFT JOIN event e ON t.id = e.type_id AND e.start BETWEEN '2022-10-01' AND '2022-10-05'
GROUP BY t.name;
The result will be:
Name
Count(e.id)
First
1
Second
0
But why? becuase sql engine when try to get result on left join, it will check both of id and start, actually the result of prevois query is like this:
Id
Name
Id
TypeId
Start
1
First
5
1
2022-10-01
2
Second
null
null
null
That's it! When you try to use Between in where clause, in fact you are filtering the null values so sql would ingore them and the final result would be different.
I hope it's clear enough!

Related

Get user from table based on id

I have these Postgres tables:
create table deals_new
(
id bigserial primary key,
slip_id text,
deal_type integer,
timestamp timestamp,
employee_id bigint
constraint employee_id_fk
references common.employees
);
create table twap
(
id bigserial primary key,
deal_id varchar not null,
employee_id bigint
constraint fk_twap__employee_id
references common.employees,
status integer
);
create table employees
(
id bigint primary key,
account_id integer,
first_name varchar(150),
last_name varchar(150)
);
New table to query:
create table accounts
(
id bigint primary key,
account_name varchar(150) not null
);
I use this SQL query:
select d.*, t.id as twap_id
from common.deals_new d
left outer join common.twap t on
t.deal_id = d.slip_id and
d.timestamp between '11-11-2021' AND '11-11-2021' and
d.deal_type in (1, 2) and
d.quote_id is null
where d.employee_id is not null
order by d.timestamp desc, d.id
offset 10
limit 10;
How I can extend this SQL query to search also in table employees by account_id and map the result in table accounts by id? I would like to print also accounts. account_name based on employees .account_id.
You need two joins to to make this work for you. One join to get to the employee table, and one more join to get to the accounts table.
select d.*, t.id as twap_id, a.account_name
from common.deals_new d
left outer join common.twap t on
t.deal_id = d.slip_id and
d.timestamp between '11-11-2021' AND '11-11-2021' and
d.deal_type in (1, 2) and
d.quote_id is null
join employees as e on d.employee_id = e.id
join accounts as a on a.id = e.account_id
where d.employee_id is not null
order by d.timestamp desc, d.id
offset 10
limit 10;
Note: I did not fiddle this one, so could have a typo, but I think you get the idea here.

Optimise many-to-many join

I have three tables: groups and people and groups_people which forms a many-to-many relationship between groups and people.
Schema:
CREATE TABLE groups (
id SERIAL PRIMARY KEY,
name TEXT
);
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name TEXT,
join_date TIMESTAMP
);
CREATE TABLE groups_people (
group_id INT REFERENCES groups(id),
person_id INT REFERENCES people(id)
);
When I want to query for the latest 10 people who recenlty joined the group which has id = 1:
WITH person_ids AS (SELECT person_id FROM groups_people WHERE group_id = 1)
SELECT * FROM people WHERE id = ANY(SELECT person_id FROM person_ids)
ORDER BY join_date DESC LIMIT 10;
The query needs to scan all of the joined people then ordering them before selecting. That would be slow if the group containing too many people.
Is there anyway to work around it?
Schema (re-)design to allow same person joining multiple group
Since you mentioned that the relationship between groups and people
is many-to-many, I think you may want to move join_date to groups_people
(from people) because the same person can join different groups and each
such event has its own join_date
So I would change the schema to
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name TEXT --, -- change
-- join_date TIMESTAMP -- delete
);
CREATE TABLE groups_people (
group_id INT REFERENCES groups(id),
person_id INT REFERENCES people(id), -- change
join_date TIMESTAMP -- add
);
Query
select
p.id
, p.name
, gp.join_date
from
people as p
, groups_people as gp
where
p.id = gp.person_id
and gp.group_id=1
order by gp.join_date desc
limit 10
Disclaimer: The above query is in MySQL syntax (the question was originally tagged with MySQL)
This seems much easier to write as a simple join with order by and limit:
select p.*
from people p join
groups_people gp
on p.id = gp.person_id
where gp.group_id = 1
order by gp.join_date desc
limit 10; -- or fetch first 10 rows only
Try rewriting using EXISTS
SELECT *
FROM people p
WHERE EXISTS (SELECT 1
FROM groups_people ps
WHERE p.id = ps.person_id and group_id = 1)
ORDER BY join_date DESC
LIMIT 10;

SQL - Find duplicates with equivalencies

I'm having trouble wrapping my mind around developing this SQL query. Given the following two tables:
ACADEMIC_HISTORY ( STUDENT_ID, TERM, COURSE_ID, COURSE_GRADE )
COURSE_EQUIVALENCIES ( COURSE_ID, COURSE_ID_EQUIVALENT )
What would be the best way to detect if students have taken the same (or an equivalent) course in the past with a passing grade (C or better)?
Example
Student #1 took the course ABC001 and received a grade of C. Ten years later, the course was renamed ABC011 and the appropriate entry was made in COURSE_EQUIVALENCIES. The student retook the course under this new name and received a grade of B. How can I construct a SQL query that will detect the duplicate courses and only count the first passing grade?
(The actual case is significantly more complicated, but this should get me started.)
Thanks in advance.
EDIT:
It's not even necessary to keep or discard any information. A query that simply shows classes with duplicates will be sufficient.
you could use something like:
SELECT
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID
http://sqlfiddle.com/#!3/d608f/20
Sorry posted with a bug.. it preferred the score of the actual course requested over any equivalencies - fixed now
this only looks for one level of equivalencies.. but maybe you want to enforce that and have that part of the data entry process.. review all possible equivalencies and enter the valid ones
EDIT: for first pass of qualifying course (using numbered terms..)
SELECT TOP 1
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.TERM
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID, TERM
ORDER BY TERM ASC
http://sqlfiddle.com/#!3/fdded/6
(note TOP is a t-sql command for MySQL you need LIMIT)
The data (in LOWERCASE)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp;
SET search_path='tmp';
CREATE TABLE academic_history
( student_id INTEGER NOT NULL
, course_id CHAR(6)
, course_grade CHAR(1)
, PRIMARY KEY(student_id,course_id)
);
INSERT INTO academic_history ( student_id,course_id,course_grade) VALUES
(1, 'ABC001' , 'C' )
, (1, 'ABC011' , 'B' )
, (2, 'ABC011' , 'A' )
;
CREATE TABLE course_equivalencies
( course_id CHAR(6)
, course_id_equivalent CHAR(6)
);
INSERT INTO course_equivalencies(course_id,course_id_equivalent) VALUES
( 'ABC011' , 'ABC001' )
;
The query:
-- EXPLAIN ANALYZE
WITH canon AS (
SELECT ah.student_id AS student_id
, ah.course_id AS course_id
, COALESCE (eq.course_id_equivalent,ah.course_id) AS course_id_equivalent
FROM academic_history ah
LEFT JOIN course_equivalencies eq ON eq.course_id = ah.course_id
)
SELECT h.student_id
, c.course_id_equivalent
, MIN(h.course_grade) AS the_grade
FROM academic_history h
JOIN canon c ON c.student_id = h.student_id AND c.course_id = h.course_id
GROUP BY h.student_id, c.course_id_equivalent
ORDER BY h.student_id, c.course_id_equivalent
;
The output:
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to table tmp.academic_history
drop cascades to table tmp.course_equivalencies
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "academic_history_pkey" for table "academic_history"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 1
student_id | course_id_equivalent | the_grade
------------+----------------------+-----------
1 | ABC001 | B
2 | ABC001 | A
(2 rows)

Oracle sql query running for (almost) forever

An application of mine is trying to execute a count(*) query which returns after about 30 minutes. What's strange is that the query is very simple and the tables involved are large, but not gigantic (10,000 and 50,000 records).
The query which takes 30 minutes is:
select count(*)
from RECORD r inner join GROUP g
on g.GROUP_ID = r.GROUP_ID
where g.BATCH_ID = 1 and g.ENABLED = 'Y'
The database schema is essentially:
create table BATCH (
BATCH_ID int not null,
[other columns]...,
CONSTRAINT PK_BATCH PRIMARY KEY (BATCH_ID)
);
create table GROUP (
GROUP_ID int not null,
BATCH_ID int,
ENABLED char(1) not null,
[other columns]...,
CONSTRAINT PK_GROUP PRIMARY KEY (GROUP_ID),
CONSTRAINT FK_GROUP_BATCH_ID FOREIGN KEY (BATCH_ID)
REFERENCES BATCH (BATCH_ID),
CONSTRAINT CHK_GROUP_ENABLED CHECK(ENABLED in ('Y', 'N'))
);
create table RECORD (
GROUP_ID int not null,
RECORD_NUMBER int not null,
[other columns]...,
CONSTRAINT PK_RECORD PRIMARY KEY (GROUP_ID, RECORD_NUMBER),
CONSTRAINT FK_RECORD_GROUP_ID FOREIGN KEY (GROUP_ID)
REFERENCES GROUP (GROUP_ID)
);
create index IDX_GROUP_BATCH_ID on GROUP(BATCH_ID);
I checked whether there are any blocks in the database and there are none. I also ran the following pieces of the query and all except the last two returned instantly:
select count(*) from RECORD -- 55,501
select count(*) from GROUP -- 11,693
select count(*)
from RECORD r inner join GROUP g
on g.GROUP_ID = r.GROUP_ID
-- 55,501
select count(*)
from GROUP g
where g.BATCH_ID = 1 and g.ENABLED = 'Y'
-- 3,112
select count(*)
from RECORD r inner join GROUP g
on g.GROUP_ID = r.GROUP_ID
where g.BATCH_ID = 1
-- 27,742 - took around 5 minutes to run
select count(*)
from RECORD r inner join GROUP g
on g.GROUP_ID = r.GROUP_ID
where g.ENABLED = 'Y'
-- 51,749 - took around 5 minutes to run
Can someone explain what's going on? How can I improve the query's performance? Thanks.
A coworker figured out the issue. It's because the table statistics weren't being updated and the last time the table was analyzed was a couple of months ago (when the table was essentially empty). I ran analyze table RECORD compute statistics and now the query is returning in less than a second.
I'll have to talk to the DBA about why the table statistics weren't being updated.
SELECT COUNT(*)
FROM RECORD R
LEFT OUTER JOIN GROUP G ON G.GROUP_ID = R.GROUP_ID
AND G.BATCH_ID = 1
AND G.ENABLED = 'Y'
Try that and let me know how it turns out. Not saying this IS the answer, but since I don't have access to a DB right now, I can't test it. Hope it works for ya.
An explain plan would be a good place to start.
See here:
Strange speed changes with sql query
for how to use the explain plan syntax (and query to see the result.)
If that doesn't show anything suspicious, you'll probably want to look at a trace.

MySQL SQL Subquery?

Given the following schema / data / output how would I format a SQL query to give the resulting output?
CREATE TABLE report (
id BIGINT AUTO_INCREMENT,
name VARCHAR(255) NOT NULL UNIQUE,
source VARCHAR(255) NOT NULL UNIQUE,
PRIMARY KEY(id)
) ENGINE = INNODB;
CREATE TABLE field (
id BIGINT AUTO_INCREMENT,
name VARCHAR(255) NOT NULL UNIQUE,
report_id BIGINT,
PRIMARY KEY(id)
) ENGINE = INNODB;
ALTER TABLE filed ADD FOREIGN KEY (report_id) REFERENCES report(id) ON DELETE CASCADE;
reports:
id, name, source
1 report1 source1
2 report2 source2
3 report3 source3
4 report4 source4
field:
id, name, report_id
1 firstname 3
2 lastname 3
3 age 3
4 state 4
5 age 4
6 rank 4
Expected output for search term "age rank"
report_id, report_name, num_fields_matched
3 report3 1
4 report4 2
Thanks in advance!
This query will return all the reports with words you need.
SELECT *
FROM report r
INNER JOIN field f ON r.id = f.report_id
WHERE name IN ('age','rank')
You have to nest it. So the final query is:
SELECT a.id, a.name, COUNT(*)
FROM
(
SELECT r.id, r.name
FROM report r
INNER JOIN field f ON r.id = f.report_id
WHERE f.name
IN ('age', 'rank')
)a
GROUP BY a.id, a.name