SQL - Why does this evaluate to true? - sql

Consider this snippet of SQL:
CREATE TABLE product
(
id integer,
stock_quantity integer
);
INSERT INTO product (id, stock_quantity) VALUES (1, NULL);
INSERT INTO product (id, stock_quantity) VALUES (2, NULL);
SELECT *
FROM product
WHERE (id, stock_quantity) NOT IN ((1, 2), (2, 9));
I can't understand why it doesn't select anything. I'm using Postgres.
I would expect both rows to be returned, because I'd expect (1, NULL) and (2, NULL) to not be in ((1,2), (2, 9)).
If I replace NULL with 0, for example, it does return the two results.
Why was it designed to be this way? What am I missing?
Thanks!

Think of a null as missing data. For example, if we had a column "Date of Birth".
For example, consider a database that has three people born in 1975, 1990, and we didn't know the date of birth of the third one. We know that third person was born for sure, but we don't know it's birth date yet.
Now, what if a query searched for people "not born in 1990"? That would return the first person only.
The second person was born in 1990 so it clearly cannot be selected.
For the third person we don't know the date of birth so we cannot say anything about her, and the query doesn't select her either. Does it make sense?

Related

How to search an entry in a table and return the column name or index in PostgreSQL

I have a table representing a card deck with 4 cards that each have a unique ID. Now i want to look for a specific card id in the table and find out which card in the deck it is.
card1
card 2
card3
card4
cardID1
cardID2
cardID3
cardID4
if my table would like this for example I would like to do something like :
SELECT column_name WHERE cardID3 IN (card1, card2, card3, card4)
looking for an answer i found this: SQL Server : return column names based on a record's value
but this doesn't seem to work for PostgreSQl
SQL Server's cross apply is the SQL standard cross join lateral.
SELECT Cname
FROM decks
CROSS join lateral (VALUES('card1',card1),
('card2',card2),
('card3',card3),
('card4',card4)) ca (cname, data)
WHERE data = 3
Demonstration.
However, the real problem is the design of your table. In general, if you have col1, col2, col3... you should instead be using a join table.
create table cards (
id serial primary key,
value text
);
create table decks (
id serial primary key
);
create table deck_cards (
deck_id integer not null references decks,
card_id integer not null references cards,
position integer not null check(position > 0),
-- Can't have the same card in a deck twice.
unique(deck_id, card_id),
-- Can't have two cards in the same position twice.
unique(deck_id, position)
);
insert into cards(id, value) values (1, 'KH'), (2, 'AH'), (3, '9H'), (4, 'QH');
insert into decks values (1), (2);
insert into deck_cards(deck_id, card_id, position) values
(1, 1, 1), (1, 3, 2),
(2, 1, 1), (2, 4, 2), (2, 2, 3);
We've made sure a deck can't have the same card, nor two cards in the same position.
-- Can't insert the same card.
insert into deck_cards(deck_id, card_id, position) values (1, 1, 3);
-- Can't insert the same position
insert into deck_cards(deck_id, card_id, position) values (2, 3, 3);
You can query a card's position directly.
select deck_id, position from deck_cards where card_id = 3
And there is no arbitrary limit on the number of cards in a deck, you can apply one with a trigger.
Demonstration.
This is a rather bad idea. Column names belong to the database structure, not to the data. So you can select IDs and names stored as data, but you should not have to select column names. And actually a user using your app should not be interested in column names; they can be rather technical.
It would probably be a good idea you changed the data model and stored card names along with the IDs, but I don't know how exactly you want to work with your data of course.
Anyway, if you want to stick with your current database design, you can still select those names, by including them in your query:
select
case when card1 = 123 then 'card1'
when card2 = 123 then 'card2'
when card3 = 123 then 'card3'
when card4 = 123 then 'card4'
end as card_column
from cardtable
where 123 in (card1, card2, card3, card4);

Select rows so that two of the columns are separately unique

Table user_book describes every user's favorite books.
CREATE TABLE user_book (
user_id INT,
book_id INT,
FOREIGN KEY (user_id) REFERENCES user(id),
FOREIGN KEY (book_id) REFERENCES book(id)
);
insert into user_book (user_id, book_id) values
(1, 1),
(1, 2),
(1, 5),
(2, 2),
(2, 5),
(3, 2),
(3, 5);
I want to write a query (possibly a with clause that defines multiple statements ― but not a procedure) that would try to distribute ONE favorite book to every user who has one or more favorite books.
Any ideas how to do it?
More details:
The distribution plan may be naive. i.e. it may look as if you went user after user and each time randomly gave the user whatever favorite book was still available if there was any, without considering what would be left for the remaining users.
This means that sometimes some books may not be distributed, and/or sometimes some users may not get any book (example 2). This can happen when the numbers of books and users are not equal, and/or due to the specific distribution order that you have used.
A book cannot be distributed to two different users (example 3).
Examples:
1. A possible distribution:
(1, 1)
(2, 2)
(3, 5)
2. A possible distribution (here user 3 got nothing, and book 1 was not distributed. That's acceptable):
(1, 2)
(2, 5)
3. An impossible distribution (both users 1 and 2 got book 2, that's not allowed):
(1, 2)
(2, 2)
(3, 5)
Similar questions that are not exactly this one:
How to select records without duplicate on just one field in SQL?
SQL: How do I SELECT only the rows with a unique value on certain column?
How to select unique records by SQL
The user_book table should also have a UNIQUE(user_id, book_id) constraint.
A simple solution like this returns a list in which each user gets zero or one book and each book is given to zero or one user:
WITH list AS (SELECT user_id, MIN(book_id) AS fav_book FROM user_book GROUP BY user_id)
SELECT fav_book, MIN(user_id) FROM list GROUP BY fav_book

How do NOT EXISTS and correlated subqueries work internally

I would like to understand how NOT EXISTS works in a correlated subquery.
In this query, it's returned the patient that takes all the medications, but I don't understand why.
Could someone please explain what's happening in each step of execution of this query and which records are being considered and dismissed in each step.
create table medication
(
idmedic INT PRIMARY KEY,
name VARCHAR(20),
dosage NUMERIC(8,2)
);
create table patient
(
idpac INT PRIMARY KEY,
name VARCHAR(20)
);
create table prescription
(
idpac INT,
idmedic INT,
date DATE,
time TIME,
FOREIGN KEY (idpac) REFERENCES patient(idpac),
FOREIGN KEY (idmedic) REFERENCES medication(idmedic)
);
insert into patient (idpac, name)
values (1, 'joe'), (2, 'tod'), (3, 'ric');
insert into medication (idmedic, name, dosage)
values (1, 'tilenol', 0.01), (2, 'omega3', 0.02);
insert into prescription (idpac, idmedic, date, time)
values (1, 1, '2018-01-01', '20:00'), (1, 2, '2018-01-01', '20:00'),
(2, 2, '2018-01-01', '20:00');
select
pa.name
from
patient pa
where
not exists (select 1 from medication me
where not exists (select 1
from prescription pr
where pr.idpac = pa.idpac
and pr.idmedic = me.idmedic))
Your query is trying to find:
all the patients who TAKE ALL medications.
I have rewritten your script, to find
all the patients who have NOT TAKEN ANY medications.
-- This returns 1 Row, patient ric
-- all the patients who take all medications
select
pa.name
from
patient pa
where
not exists (select 1 from medication me
where /**** not ****/ exists (select 1
from prescription pr
where pr.idpac = pa.idpac
and pr.idmedic = me.idmedic))
DEMO:
Here is a SQL Fiddle for it.
I think that this query will clarify the usage of EXISTS operator to you.
If not, try to think of sub-queries as JOINs and EXISTS/NOT EXISTS as WHERE conditions.
EXISTS operator is explained as "Specifies a subquery to test for the existence of rows".
You could also check the examples on learn.microsoft.com Here.
If you see a doubly nested "not exists" in a query that's usually an indication that relational division is being performed (google that, you'll find plenty of stuff).
Translating the query clause by clause into informal language yields something like :
get patients
for which there does not exist
a medication
for which there does not exist
a prescription for that patient to take that medication
which translates to
patients for which there is no medication they don't take.
which translates to
patients that take all medications.
Relational division is the relational algebra operator that corresponds to universal quantification in predicate logic.

How can I intersect two ActiveRecord::Relations on an arbitrary column?

If I have a people table with the following structure and records:
drop table if exists people;
create table people (id int, name varchar(255));
insert into people values (1, "Amy");
insert into people values (2, "Bob");
insert into people values (3, "Chris");
insert into people values (4, "Amy");
insert into people values (5, "Bob");
insert into people values (6, "Chris");
I'd like to find the intersection of people with ids (1, 2, 3) and (4, 5, 6) based on the name column.
In SQL, I'd do something like this:
select
group_concat(id),
group_concat(name)
from people
group by name;
Which returns this result set:
id | name
----|----------
1,4 | Amy,Amy
2,5 | Bob,Bob
3,6 | Chris,Chris
In Rails, I'm not sure how to solve this.
My closest so far is:
a = Model.where(id: [1, 2, 3])
b = Model.where(id: [4, 5, 6])
a_results = a.where(name: b.pluck(:name)).order(:name)
b_results = b.where(name: a.pluck(:name)).order(:name)
a_results.zip(b_results)
This seems to work, but I have the following reservations:
Performance - is this going to perform well in the database?
Lazy enumeration - does calling #zip break lazy enumeration of records?
Duplicates - what will happen if either set contains more than one record for a given name? What will happen if a set contains more than one of the same id?
Any thoughts or suggestions?
Thanks
You can use your normal sql method to get this arbitrary column in ruby like so:
#people = People.select("group_concat(id) as somecolumn1, group_concat(name) as somecolumn2").group("group_concat(id), group_concat(name)")
For each record in #people you will now have somecolumn1/2 attributes.

A Simple Sql Select Query

I know I am sounding dumb but I really need help on this.
I have a Table (let's say Meeting) which Contains a column Participants.
The Participants dataType is varchar(Max) and it stores Participant's Ids in comma separated form like 1,2.
Now my problem is I am passing a parameter called #ParticipantsID in my Stored Procedure and want to do something like this:
Select Participants from Meeting where Participants in (#ParticipantsID)
Unfortunately I am missing something crucial here.
Can some one point that out?
I've been there before... I changed the DB design to have one record contain a single reference to the other table. If you can't change your DB structures and you have to live with this, I found this solution on CodeProject.
New Function
IF EXISTS(SELECT * FROM sysobjects WHERE ID = OBJECT_ID(’UF_CSVToTable’))
DROP FUNCTION UF_CSVToTable
GO
CREATE FUNCTION UF_CSVToTable
(
#psCSString VARCHAR(8000)
)
RETURNS #otTemp TABLE(sID VARCHAR(20))
AS
BEGIN
DECLARE #sTemp VARCHAR(10)
WHILE LEN(#psCSString) > 0
BEGIN
SET #sTemp = LEFT(#psCSString, ISNULL(NULLIF(CHARINDEX(',', #psCSString) - 1, -1),
LEN(#psCSString)))
SET #psCSString = SUBSTRING(#psCSString,ISNULL(NULLIF(CHARINDEX(',', #psCSString), 0),
LEN(#psCSString)) + 1, LEN(#psCSString))
INSERT INTO #otTemp VALUES (#sTemp)
END
RETURN
END
Go
New Sproc
SELECT *
FROM
TblJobs
WHERE
iCategoryID IN (SELECT * FROM UF_CSVToTable(#sCategoryID))
You would not typically organise your SQL database in quite this way. What you are describing are two entities (Meeting & Participant) that have a one-to-many relationship. i.e. a meeting can have zero or more participants. To model this in SQL you would use three tables: a meeting table, a participant table and a MeetingParticipant table. The MeetingParticipant table holds the links between meetings & participants. So, you might have something like this (excuse any sql syntax errors)
create table Meeting
(
MeetingID int,
Name varchar(50),
Location varchar(100)
)
create table Participant
(
ParticipantID int,
FirstName varchar(50),
LastName varchar(50)
)
create table MeetingParticipant
(
MeetingID int,
ParticipantID int
)
To populate these tables you would first create some Participants:
insert into Participant(ParticipantID, FirstName, LastName) values(1, 'Tom', 'Jones')
insert into Participant(ParticipantID, FirstName, LastName) values(2, 'Dick', 'Smith')
insert into Participant(ParticipantID, FirstName, LastName) values(3, 'Harry', 'Windsor')
and create a Meeting or two
insert into Meeting(MeetingID, Name, Location) values(10, 'SQL Training', 'Room 1')
insert into Meeting(MeetingID, Name, Location) values(11, 'SQL Training', 'Room 2')
and now add some participants to the meetings
insert into MeetingParticipant(MeetingID, ParticipantID) values(10, 1)
insert into MeetingParticipant(MeetingID, ParticipantID) values(10, 2)
insert into MeetingParticipant(MeetingID, ParticipantID) values(11, 2)
insert into MeetingParticipant(MeetingID, ParticipantID) values(11, 3)
Now you can select all the meetings and the participants for each meeting with
select m.MeetingID, p.ParticipantID, m.Location, p.FirstName, p.LastName
from Meeting m
join MeetingParticipant mp on m.MeetingID=mp.MeetingID
join Participant p on mp.ParticipantID=p.ParticipantID
the above should produce
MeetingID ParticipantID Location FirstName LastName
10 1 Room 1 Tom Jones
10 2 Room 1 Dick Smith
11 2 Room 2 Dick Smith
11 3 Room 2 Harry Windsor
If you want to find out all the meetings that "Dick Smith" is in you would write something like this
select m.MeetingID, m.Location
from Meeting m join MeetingParticipant mp on m.MeetingID=mp.ParticipantID
where
mp.ParticipantID=2
and get
MeetingID Location
10 Room 1
11 Room 2
I have omitted important things like indexes, primary keys and missing attributes such as meeting dates, but it is clearer without all the goo.
Your table is not normalized. If you want to query for individual participants, they should be split into their own table, along the lines of:
Meeting
MeetingId primary key
Other stuff
Persons
PersonId primary key
Other stuff
Participants
MeetingId foreign key Meeting(MeetingId)
PersonId foreign key Persons(PersonId)
primary key MeetingId,PersonId
Otherwise, you have to resort to all sorts of trickery (what I call SQL gymnastics) to find out what you want. That trickery never scales well - your queries become slow very quickly as the table grows.
With a properly normalized database, the queries can remain fast well into the multi-millions of records (I work with DB2/z where we are used to truly huge tables).
There are valid reasons for sometimes reverting to second normal form (or even first) for performance but that should be a very hard thought out decision (and based on actual performance data). All databases should initially start of in 3NF.
SELECT * FROM Meeting WHERE Participants LIKE '%,12,%' OR Participants LIKE '12,%' OR Participants LIKE '%,12'
where 12 is the ID you are looking for....
Ugly, what a nasty model.
If I understand your question correctly, you are trying to pass in a comma separated list of participant ids and see if it is in your list. This link lists several ways to do such a thing"
[http://vyaskn.tripod.com/passing_arrays_to_stored_procedures.htm][1]
codezy.blogspot.com
If you store the participant ids in a comma-separated list (as text) in the database, you cannot easily query it (as a list) using SQL. You would have to resort to string-operations.
You should consider changing your schema to use another table to map meetings to participants:
create table meeting_participants (
meeting_id integer not null , -- foreign key
participant_id integer not null
);
That table would have multiple rows per meeting (one for each participant).
You can then query that table for individual participants, or number of participants, and such.
If participants is a separate data type you should be storing it as a child table of your meeting table. e.g.
MEETING
PARTICIPANT 1
PARTICIPANT 2
PARTICIPANT 3
Each participant would hold the meeting ID so you can do a query
SELECT * FROM participants WHERE meeting_id = 1
However, if you must store a comma separated list (for some external reason) then you can do a string search to find the appropriate record. This would be a very inefficient way to do a query though.
That is not the best way to store the information you have.
If it is all you have got then you need to be doing a contains (not an IN). The best answer is to have another table that links Participants to Meetings.
Try SELECT Meeting, Participants FROM Meeting CONTAINS(Participants, #ParticipantId)