Prolog to SQL: Any way to improve SQL code for unit tests and fix an edge case elegantly? - sql

Inspired by this StackOverflow question:
Find mutual element in different facts in swi-prolog
We have the following
Problem statement
Given a database of "actors starring in movies"
(starsin is the relation linking actor "bob" to movie "a" for example)
starsin(a,bob).
starsin(c,bob).
starsin(a,maria).
starsin(b,maria).
starsin(c,maria).
starsin(a,george).
starsin(b,george).
starsin(c,george).
starsin(d,george).
And given set of movies M, find those actors that starred in all the movies of M.
The question was initially for Prolog.
Prolog solution
In Prolog, an elegant solution involves the predicate
setof/3,
which collects possible variable instantiations into a set (which is really list without
duplicate values):
actors_appearing_in_movies(MovIn,ActOut) :-
setof(
Ax,
MovAx^(setof(Mx,starsin(Mx,Ax),MovAx), subset(MovIn,MovAx)),
ActOut
).
I won't go into details about this, but let's look at the test code, which is of interest here.
Here are five test cases:
actors_appearing_in_movies([],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a,b],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c,d],ActOut),permutation([george],ActOut),!.
A test is a call to the predicate actors_appearing_in_movies/2, which is given
the input list of movies (e.g. [a,b]) and which captures the resulting list of
actors in ActOut.
Subsequently, we just need to test whether ActOut is a permutation of the expected
set of actors, hence for example:
permutation([george, maria],ActOut)`
"Is ActOut a list that is a permutation of the list [george,maria]?.
If that call succeeds (think, doesn't return with false), the test passes.
The terminal ! is the cut operator and is used to tell the Prolog engine to not
reattempt to find more solutions, because we are good at that point.
Note that for the empty set of movies, we get all the actors. This is arguably correct:
every actors stars in all the movies of the empty set (Vacuous Truth).
Now in SQL.
This problem is squarely in the domain of relational algebra, and there is SQL, so let's have
a go at this. Here, i'm using MySQL.
First, set up the facts.
DROP TABLE IF EXISTS starsin;
CREATE TABLE starsin (movie CHAR(20) NOT NULL, actor CHAR(20) NOT NULL);
INSERT INTO starsin VALUES
( "a" , "bob" ),
( "c" , "bob" ),
( "a" , "maria" ),
( "b" , "maria" ),
( "c" , "maria" ),
( "a" , "george" ),
( "b" , "george" ),
( "c" , "george" ),
( "d", "george" );
Regarding the set of movies given as input, giving them in the form of a
(temporary) table sounds natural. In MySQL, "temporary tables" are local to the session. Good.
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
Approach:
The results can now be obtained by getting, for each actor, the intersection of the set of
movies denoted by movies_in and the set of movies in which an actor ever appeared
(created for each actor via the inner join), then counting (for each actor) whether the
resulting set has at least as many entries as the set movies_in.
Wrap the query into a procedure for practical reasons.
A delimiter is useful here:
DELIMITER $$
DROP PROCEDURE IF EXISTS actors_appearing_in_movies;
CREATE PROCEDURE actors_appearing_in_movies()
BEGIN
SELECT
d.actor
FROM
starsin d, movies_in q
WHERE
d.movie = q.movie
GROUP BY
actor
HAVING
COUNT(*) >= (SELECT COUNT(*) FROM movies_in);
END$$
DELIMITER ;
Run it!
Problem A appears:
Is there a better way than edit + copy-paste table creation code,
issue a CALL and check the results "by hand"?
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
CALL actors_appearing_in_movies();
Empty set!
Problem B appears:
The above is not desired, I want "all actors", same as for the Prolog solution.
As I do not want to tack a weird edge case exception onto the code, my approach must
be wrong. Is there one which naturally covers this case but doesn't become too complex?
T-SQL and PostgreSQL one-liners are fine too!
The other test cases yield expected data:
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
| maria |
+--------+
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
| maria |
+--------+
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c"), ("d");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
+--------+

And given set of movies M, find those actors that starred in all the movies of M.
I would use:
select si.actor
from starsin si
where si.movie in (<M>)
group by si.actor
having count(*) = <n>;
If you have to deal with an empty set, then you need a left join:
select a.actor
from actors a left join
starsin si
on a.actor = si.actor and si.movie in (<M>)
group by a.actor
having count(si.movie) = <n>;
<n> here is the number of movies in <M>.
Update: The second approach in extended form
create or replace temporary table
actor (actor char(20) primary key)
as select distinct actor from starsin;
select
a.actor,
si.actor,si.movie -- left in for docu
from
actor a left join starsin si
on a.actor = si.actor
and si.movie in (select * from movies_in)
group
by a.actor
having
count(si.movie) = (select count(*) from movies_in);
Then for empty movies_in:
+--------+-------+-------+
| actor | actor | movie |
+--------+-------+-------+
| bob | NULL | NULL |
| george | NULL | NULL |
| maria | NULL | NULL |
+--------+-------+-------+
and for this movies_in for example:
+-------+
| movie |
+-------+
| a |
| b |
+-------+
movie here is the top of the group:
+--------+--------+-------+
| actor | actor | movie |
+--------+--------+-------+
| george | george | a |
| maria | maria | a |
+--------+--------+-------+

The following solution involves counting and an UPDATE
Writeup here: A Simple Relational Database Operation
We are using MariaDB/MySQL SQL.
T-SQL or PL/SQL are more complete.
Manual page for CREATE TABLE
Manual page for CREATE PROCEDURE
Manual page for data types in MariaDB
Note that SQL has no vector data types that can be passed to procedures. Gotta work without that.
Enter facts as table:
CREATE OR REPLACE TABLE starsin
(movie CHAR(20) NOT NULL, actor CHAR(20) NOT NULL,
PRIMARY KEY (movie, actor));
INSERT INTO starsin VALUES
( "a" , "bob" ),
( "c" , "bob" ),
( "a" , "maria" ),
( "b" , "maria" ),
( "c" , "maria" ),
( "a" , "george" ),
( "b" , "george" ),
( "c" , "george" ),
( "d", "george" );
Enter a procedure to compute solution and actually ... print it out.
DELIMITER $$
CREATE OR REPLACE PROCEDURE actors_appearing_in_movies()
BEGIN
-- collect all the actors
CREATE OR REPLACE TEMPORARY TABLE tmp_actor (actor CHAR(20) PRIMARY KEY)
AS SELECT DISTINCT actor from starsin;
-- table of "all actors x (input movies + '--' placeholder)"
-- (combinations that are needed for an actor to show up in the result)
-- and a flag indicating whether that combination shows up for real
CREATE OR REPLACE TEMPORARY TABLE tmp_needed
(actor CHAR(20),
movie CHAR(20),
actual TINYINT NOT NULL DEFAULT 0,
PRIMARY KEY (actor, movie))
AS
(SELECT ta.actor, mi.movie FROM tmp_actor ta, movies_in mi)
UNION
(SELECT ta.actor, "--" FROM tmp_actor ta);
-- SELECT * FROM tmp_needed;
-- Mark those (actor, movie) combinations which actually exist
-- with a numeric 1
UPDATE tmp_needed tn SET actual = 1 WHERE EXISTS
(SELECT * FROM starsin si WHERE
si.actor = tn.actor AND si.movie = tn.movie);
-- SELECT * FROM tmp_needed;
-- The result is the set of actors in "tmp_needed" which have as many
-- entries flagged "actual" as there are entries in "movies_in"
SELECT actor FROM tmp_needed GROUP BY actor
HAVING SUM(actual) = (SELECT COUNT(*) FROM movies_in);
END$$
DELIMITER ;
Testing
There is no ready-to-use unit testing framework for MariaDB, so we
"test by hand" and write a procedure, the out of which we check manually.
Variadic arguments don't exist, vector data types don't exist.
Let's accept up to 4 movies as input and check the result manually.
DELIMITER $$
CREATE OR REPLACE PROCEDURE
test_movies(IN m1 CHAR(20),IN m2 CHAR(20),IN m3 CHAR(20),IN m4 CHAR(20))
BEGIN
CREATE OR REPLACE TEMPORARY TABLE movies_in (movie CHAR(20) PRIMARY KEY);
CREATE OR REPLACE TEMPORARY TABLE args (movie CHAR(20));
INSERT INTO args VALUES (m1),(m2),(m3),(m4); -- contains duplicates and NULLs
INSERT INTO movies_in (SELECT DISTINCT movie FROM args WHERE movie IS NOT NULL); -- clean
DROP TABLE args;
CALL actors_appearing_in_movies();
END$$
DELIMITER ;
The above passes all the manual tests, in particular:
CALL test_movies(NULL,NULL,NULL,NULL);
+--------+
| actor |
+--------+
| bob |
| george |
| maria |
+--------+
3 rows in set (0.003 sec)
For example, for CALL test_movies("a","b",NULL,NULL);
First set up the table with all actors against in all the movies in the input set, including the
"doesn't exist" movie represented by a placeholder --.
+--------+--------+-------+
| actual | actor | movie |
+--------+--------+-------+
| 0 | bob | -- |
| 0 | bob | a |
| 0 | bob | b |
| 0 | george | -- |
| 0 | george | a |
| 0 | george | b |
| 0 | maria | -- |
| 0 | maria | a |
| 0 | maria | b |
+--------+--------+-------+
Then mark those rows with a 1 where the actor-movie combination actually exists in starsin.
+--------+--------+-------+
| actual | actor | movie |
+--------+--------+-------+
| 0 | bob | -- |
| 1 | bob | a |
| 0 | bob | b |
| 0 | george | -- |
| 1 | george | a |
| 1 | george | b |
| 0 | maria | -- |
| 1 | maria | a |
| 1 | maria | b |
+--------+--------+-------+
Finally select an actor for inclusion in the solution if the SUM(actual) is equal to the
number of entries in the input movies table (it cannot be larger), as that means that the
actor indeed appears in all movies of the input movies table. In the special case where that
table is empty, the actor-movie combination table will only contain
+--------+--------+-------+
| actual | actor | movie |
+--------+--------+-------+
| 0 | bob | -- |
| 0 | george | -- |
| 0 | maria | -- |
+--------+--------+-------+
and thus all actors will be selected, which is what we want.

Related

Making combinations of attributes unique in PostgreSQL

There is an option in postgresql where we can have a constraint such that we can have multiple attributes of a table together as unique
UNIQUE (A, B, C)
Is it possible to take attributes from multiple tables and make their entire combination as unique in some way
Edit:
Table 1: List of Book
Attributes: ID, Title, Year, Publisher
Table 2: List of Author
Attributes: Name, ID
Table 3: Written By: Relation between Book and Author
Attributes: Book_ID, Author_ID
Now I have situation where I don't want (Title, Year, Publisher, Authors) get repeated in my entire database
There are 3 solutions to this problem:
You add a column "authorID" to the table "book", as a foreign key. You can then add the UNIQUE constraint to the table "book".
We can have a foreign key on the 2 columns (bookID, author ID) which references the table bookAuthor.
You create a Trigger on insert on the table "book" which checks whether the combination exist and does not insert if it does exist. You will find a working example of this option below.
Whilst working on this option I realised that the JOIN to WrittenBy must be done on Title and not ID. Otherwise we can record the same book as many times as we like just by using a new ID. The problem with using the title is that the slightest change in spelling or punctuation means that it is treated as a new title.
In the example the 3rd insert has failed because it already exists. In the 4th have left 2 spaces in "Tom Sawyer" and it is accepted as a different title.
Also as we use a join to find out the author the real effect of our rule is exactly the same as if we had a UNIQUE constraint on the table books on columns Title, Year and Publisher. This means that all that I have coded is a waste of time.
We thus decide, after coding it, that this option is not effective.
We could create a fourth table with the 4 columns and a UNIQUE constraint on all 4. This seems a heavy solution compared to option 1.
CREATE TABLE Book (
ID int primary key,
Title varchar(25),
Year int,
Publisher varchar(10) );
CREATE TABLE Author (
ID int primary key,
Name varchar(10)
);
CREATE TABLE WrittenBy(
Book_ID int primary key,
Titlew varchar(25),
Author_ID int
);
CREATE FUNCTION book_insert_trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
DECLARE
authID INTEGER;
coun INTEGER;
BEGIN
IF pg_trigger_depth() <> 1 THEN
RETURN NEW;
END IF;
SELECT MAX(Author_ID) into authID
FROM WrittenBy w
WHERE w.Titlew = NEW.Title;
SELECT COUNT(*) INTO coun FROM
Book b LEFT JOIN WrittenBy w ON
b.Title = w.Titlew
WHERE NEW.year = b.year
AND NEW.title=b.title
AND NEW.publisher=b.publisher
AND authID = COALESCE(w.Author_ID,authID);
IF coun > 0 THEN
RETURN null; -- this means that we do not insert
ELSE
RETURN NEW;
END IF;
END;
$$
;
CREATE TRIGGER book_insert_trigger
BEFORE INSERT
ON Book
FOR EACH ROW
EXECUTE PROCEDURE book_insert_trigger_function();
INSERT INTO WrittenBy VALUES
(1,'Tom Sawyer',1),
(2,'Huckleberry Finn',1);
INSERT INTO Book VALUES (1,'Tom Sawyer',1950,'Classics');
INSERT INTO Book VALUES (2,'Huckleberry Finn',1950,'Classics');
INSERT INTO Book VALUES (3,'Tom Sawyer',1950,'Classics');
INSERT INTO Book VALUES (3,'Tom Sawyer',1950,'Classics');
SELECT *
FROM Book b
LEFT JOIN WrittenBy w on w.Titlew = b.Title
LEFT JOIN Author a on w.author_ID = a.ID;
>
> id | title | year | publisher | book_id | titlew | author_id | id | name
> -: | :--------------- | ---: | :-------- | ------: | :--------------- | --------: | ---: | :---
> 1 | Tom Sawyer | 1950 | Classics | 1 | Tom Sawyer | 1 | null | null
> 2 | Huckleberry Finn | 1950 | Classics | 2 | Huckleberry Finn | 1 | null | null
> 3 | Tom Sawyer | 1950 | Classics | null | null | null | null | null
>
db<>fiddle here

How to efficiently insert tree-like data structure into postgres

Essentially, I want to efficiently store a tree-like data structure in a table with Postgres. Each row has an ID (auto-generated upon insert), a parent ID (referencing another row in the same table, possibly null), and some additional metadata. All of that data comes in at once, so I'm trying to store it all at once as efficiently as possible.
My current thought is to group all the data by which level of the tree they're at, and batch insert one level at a time. That way I can set parent IDs using the IDs generated from the previous level's inserts. This way the amount of batches is correlated with the number of levels in the tree.
This is probably "good enough", but I'm wondering if there's a better way to do this kind of thing? It still seems like a lot of back and forth and unnecessary logic to me, when I have the whole tree of data already in memory and structured correctly.
Let me show how I would do it if I had some information on who is whose child record.
In my case, I use a staging table containing the info as it comes from the source. The records have a char based primary key id, and a self-referencing,nullable, foreign key boss_id .
Here goes:
-- the input table with "business identifiers".
DROP TABLE IF EXISTS rec_input;
CREATE TABLE rec_input (
id CHAR(4)
, first_name VARCHAR(32)
, last_name VARCHAR(32)
, boss_id CHAR(4)
)
;
-- some data for it ...
INSERT INTO rec_input(id,first_name,last_name,boss_id)
SELECT 'A01','Arthur','Dent' ,NULL
UNION ALL SELECT 'A02','Ford','Prefect' ,'A01'
UNION ALL SELECT 'A03','Zaphod','Beeblebrox' ,'A01'
UNION ALL SELECT 'A04','Tricia','McMillan' ,'A01'
UNION ALL SELECT 'A05','Gag','Halfrunt' ,'A02'
UNION ALL SELECT 'A06','Prostetnic Vogon','Jeltz','A02'
UNION ALL SELECT 'A07','Lionel','Prosser' ,'A04'
UNION ALL SELECT 'A08','Benji','Mouse' ,'A04'
UNION ALL SELECT 'A09','Frankie','Mouse' ,'A04'
UNION ALL SELECT 'A10','Svlad','Cjelli' ,'A03'
;
-- create a lookup table. The surrogate key is created here.
DROP TABLE IF EXISTS lookup_help;
CREATE TABLE lookup_help (
sk SERIAL NOT NULL -- < here is the surrogate auto increment key
, id CHAR(3)
);
-- fill the lookup table
INSERT INTO lookup_help(id)
SELECT id FROM rec_input;
-- test query
SELECT * FROM lookup_help;
-- this is the target table, with auto increment
-- and matching surrogate foreign key.
DROP TABLE IF EXISTS rec;
CREATE TABLE rec (
sk INTEGER NOT NULL -- surrogate key
, id CHAR(4). -- "business id"
, first_name VARCHAR(32)
, last_name VARCHAR(32)
, boss_id CHAR(4). -- "business foreign key", not needed really
, boss_sk INTEGER. -- internal foreign key
)
;
INSERT INTO rec
SELECT
l.sk -- from lookup table, inner joined
, i.id -- from input table
, i.first_name
, i.last_name
, i.boss_id
, b.sk -- from lookup table, left outer joined
FROM rec_input i
JOIN lookup_help l USING(id) -- for the main sk
LEFT JOIN lookup_help b ON i.boss_id=b.id -- for the foreign sk
;
-- test query
SELECT * FROM rec;
-- out sk | id | first_name | last_name | boss_id | boss_sk
-- out ----+------+------------------+------------+---------+---------
-- out 2 | A02 | Ford | Prefect | A01 | 1
-- out 3 | A03 | Zaphod | Beeblebrox | A01 | 1
-- out 4 | A04 | Tricia | McMillan | A01 | 1
-- out 6 | A06 | Prostetnic Vogon | Jeltz | A02 | 2
-- out 5 | A05 | Gag | Halfrunt | A02 | 2
-- out 10 | A10 | Svlad | Cjelli | A03 | 3
-- out 7 | A07 | Lionel | Prosser | A04 | 4
-- out 8 | A08 | Benji | Mouse | A04 | 4
-- out 9 | A09 | Frankie | Mouse | A04 | 4
-- out 1 | A01 | Arthur | Dent | |
-- out (10 rows)
Perhaps with your use case, you could try NoSql at the moment, querying such data would be far efficient and faster. Maybe give it a shot.
For development you've options like Apache CouchDB, redis, etc.

Result of query as column value

I've got three tables:
Lessons:
CREATE TABLE lessons (
id SERIAL PRIMARY KEY,
title text NOT NULL,
description text NOT NULL,
vocab_count integer NOT NULL
);
+----+------------+------------------+-------------+
| id | title | description | vocab_count |
+----+------------+------------------+-------------+
| 1 | lesson_one | this is a lesson | 3 |
| 2 | lesson_two | another lesson | 2 |
+----+------------+------------------+-------------+
Lesson_vocabulary:
CREATE TABLE lesson_vocabulary (
lesson_id integer REFERENCES lessons(id),
vocabulary_id integer REFERENCES vocabulary(id)
);
+-----------+---------------+
| lesson_id | vocabulary_id |
+-----------+---------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 4 |
+-----------+---------------+
Vocabulary:
CREATE TABLE vocabulary (
id integer PRIMARY KEY,
hiragana text NOT NULL,
reading text NOT NULL,
meaning text[] NOT NULL
);
Each lesson contains multiple vocabulary, and each vocabulary can be included in multiple lessons.
How can I get the vocab_count column of the lessons table to be calculated and updated whenevr I add more rows to the lesson_vocabulary table. Is this possible, and how would I go about doing this?
Thanks
You can use SQL triggers to serve your purpose. This would be similar to mysql after insert trigger which updates another table's column.
The trigger would look somewhat like this. I am using Oracle SQL, but there would just be minor tweaks for any other implementation.
CREATE TRIGGER vocab_trigger
AFTER INSERT ON lesson_vocabulary
FOR EACH ROW
begin
for lesson_cur in (select LESSON_ID, COUNT(VOCABULARY_ID) voc_cnt from LESSON_VOCABULARY group by LESSON_ID) LOOP
update LESSONS
set VOCAB_COUNT = LESSON_CUR.VOC_CNT
where id = LESSON_CUR.LESSON_ID;
end loop;
END;
It's better to create a view that calculates that (and get rid of the column in the lessons table):
select l.*, lv.vocab_count
from lessons l
left join (
select lesson_id, count(*)
from lesson_vocabulary
group by lesson_id
) as lv(lesson_id, vocab_count) on l.id = lv.lesson_id
If you really want to update the lessons table each time the lesson_vocabulary changes, you can run an UPDATE statement like this in a trigger:
update lessons l
set vocab_count = t.cnt
from (
select lesson_id, count(*) as cnt
from lesson_vocabulary
group by lesson_id
) t
where t.lesson_id = l.id;
I would recommend using a query for this information:
select l.*,
(select count(*)
from lesson_vocabulary lv
where lv.lesson_id = l.lesson_id
) as vocabulary_cnt
from lessons l;
With an index on lesson_vocabulary(lesson_id), this should be quite fast.
I recommend this over an update, because the data remains correct.
I recommend this over a trigger, because it is simpler.
I recommend this over a subquery with aggregation because it should be faster, particularly if you are filtering on the lessons table.

SQL Query: Search with list of tuples

I have a following table (simplified version) in SQLServer.
Table Events
-----------------------------------------------------------
| Room | User | Entered | Exited |
-----------------------------------------------------------
| A | Jim | 2014-10-10T09:00:00 | 2014-10-10T09:10:00 |
| B | Jim | 2014-10-10T09:11:00 | 2014-10-10T09:22:30 |
| A | Jill | 2014-10-10T09:00:00 | NULL |
| C | Jack | 2014-10-10T09:45:00 | 2014-10-10T10:00:00 |
| A | Jack | 2014-10-10T10:01:00 | NULL |
.
.
.
I need to create a query that returns person's whereabouts in given timestamps.
For an example: Where was (Jim at 2014-10-09T09:05:00), (Jim at 2014-10-10T09:01:00), (Jill at 2014-10-10T09:10:00), ...
The result set must contain the given User and Timestamp as well as the found room (if any).
------------------------------------------
| User | Timestamp | WasInRoom |
------------------------------------------
| Jim | 2014-10-09T09:05:00 | NULL |
| Jim | 2014-10-09T09:01:00 | A |
| Jim | 2014-10-10T09:10:00 | A |
The number of User-Timestamp tuples can be > 10 000.
The current implementation retrieves all records from Events table and does the search in Java code. I am hoping that I could push this logic to SQL. But how?
I am using MyBatis framework to create SQL queries so the tuples can be inlined to the query.
The basic query is:
select e.*
from events e
where e.user = 'Jim' and '2014-10-09T09:05:00' >= e.entered and ('2014-10-09T09:05:00' <= e.exited or e.exited is NULL) or
e.user = 'Jill' and '2014-10-10T09:10:00 >= e.entered and ('2014-10-10T09:10:00' <= e.exited or e.exited is NULL) or
. . .;
SQL Server can handle ridiculously large queries, so you can continue in this vein. However, if you have the name/time values in a table already (or it is the result of a query), then use a join:
select ut.*, t.*
from usertimes ut left join
events e
on e.user = ut.user and
ut.thetime >= et.entered and (ut.thetime <= exited or ut.exited is null);
Note the use of a left join here. It ensures that all the original rows are in the result set, even when there are no matches.
Answers from Jonas and Gordon got me on track, I think.
Here is query that seems to do the job:
CREATE TABLE #SEARCH_PARAMETERS(User VARCHAR(16), "Timestamp" DATETIME)
INSERT INTO #SEARCH_PARAMETERS(User, "Timestamp")
VALUES
('Jim', '2014-10-09T09:05:00'),
('Jim', '2014-10-10T09:01:00'),
('Jill', '2014-10-10T09:10:00')
SELECT #SEARCH_PARAMETERS.*, Events.Room FROM #SEARCH_PARAMETERS
LEFT JOIN Events
ON #SEARCH_PARAMETERS.User = Events.User AND
#SEARCH_PARAMETERS."Timestamp" > Events.Entered AND
(Events.Exited IS NULL OR Events.Exited > #SEARCH_PARAMETERS."Timestamp"
DROP TABLE #SEARCH_PARAMETERS
By declaring a table valued parameter type for the (user, timestamp) tuples, it should be simple to write a table valued user defined function which returns the desired result by joining the parameter table and the Events table. See http://msdn.microsoft.com/en-us/library/bb510489.aspx
Since you are using MyBatis it may be easier to just generate a table variable for the tuples inline in the query and join with that.

Creating a query to find matching objects in a "join" table

I am trying to find an efficient query to find all matching objects in a "join" table.
Given an object Adopter that has many Pets, and Pets that have many Adopters through a AdopterPets join table. How could I find all of the Adopters that have the same Pets?
The schema is fairly normalized and looks like this.
TABLE Adopter
INTEGER id
TABLE AdopterPets
INTEGER adopter_id
INTEGER pet_id
TABLE Pets
INTEGER id
Right now the solution I am using loops through all Adopters and asks for their pets anytime it we have a match store it away and can use it later, but I am sure there has to be a better way using SQL.
One SQL solution I looked at was GROUP BY but it did not seem to be the right trick for this problem.
EDIT
To explain a little more of what I am looking for I will try to give an example.
+---------+ +------------------+ +------+
| Adptors | | AdptorsPets | | Pets |
|---------| +----------+-------+ |------|
| 1 | |adptor_id | pet_id| | 1 |
| 2 | +------------------+ | 2 |
| 3 | |1 | 1 | | 3 |
+---------+ |2 | 1 | +------+
|1 | 2 |
|3 | 1 |
|3 | 2 |
|2 | 3 |
+------------------+
When you asked the Adopter with the id of 1 for any other Adopters that have the same Pets you would be retured id 3.
If you asked the same question for the Adopter with the id of 3 you would get id 1.
If you asked again the same question of the Adopter with id 2` you would be returned nothing.
I hope this helps clear things up!
Thank you all for the help, I used a combination of a few things:
SELECT adopter_id
FROM (
SELECT adopter_id, array_agg(pet_id ORDER BY pet_id)
AS pets
FROM adopters_pets
GROUP BY adopter_id
) AS grouped_pets
WHERE pets = array[1,2,3] #array must be ordered
AND adopter_id <> current_adopter_id;
In the subquery I get pet_ids grouped by their adopter. The ordering of the pet_ids is key so that the results in the main query will not be order dependent.
In the main query I compare the results of the subquery to the pet ids of the adopter I am looking to match. For the purpose of this answer the pet_ids of the particular adopter are represented by [1,2,3]. I then make sure that that the adopter I am comparing to is not included in the results.
Let me know if anyone sees any optimizations or if there is a way to compare arrays where order does not matter.
I'm not sure if this is exactly what you're looking for but this might give you some ideas.
First I created some sample data:
create table adopter (id serial not null primary key, name varchar );
insert into adopter (name) values ('Bob'), ('Sally'), ('John');
create table pets (id serial not null primary key, kind varchar);
insert into pets (kind) values ('Dog'), ('Cat'), ('Rabbit'), ('Snake');
create table adopterpets (adopter_id integer, pet_id integer);
insert into adopterpets values (1, 1), (1, 2), (2, 1), (2,3), (2,4), (3, 1), (3,3);
Next I ran this query:
SELECT p.kind, array_agg(a.name) AS adopters
FROM pets p
JOIN adopterpets ap ON ap.pet_id = p.id
JOIN adopter a ON a.id = ap.adopter_id
GROUP BY p.kind
HAVING count(*) > 1
ORDER BY kind;
kind | adopters
--------+------------------
Dog | {Bob,Sally,John}
Rabbit | {Sally,John}
(2 rows)
In this example, for each pet I'm creating an array of all owners. The HAVING count(*) > 1 clause ensures we only show pets with shared owners (more than 1). If we leave this out we'll include pets that don't share owners.
UPDATE
#scommette: Glad you've got it working! I've refactored your working example a little bit below to:
use #> operator. This checks if one array contains the other avoids need to explicitly set order
moved the grouped_pets subquery to a CTE. This isn't only solution but neatly allows you to both filter out the current_adopter_id and get the pets for that id
You might find it helpful to wrap this in a function.
WITH grouped_pets AS (
SELECT adopter_id, array_agg(pet_id ORDER BY pet_id) AS pets
FROM adopters_pets
GROUP BY adopter_id
)
SELECT * FROM grouped_pets
WHERE adopter_id <> 3
AND pets #> (
SELECT pets FROM grouped_pets WHERE adopter_id = 3
);
If you're using Oracle then wm_concat could be useful here
select pet_id, wm_concat(adopter_id) adopters
from AdopterPets
group by pet_id ;
--
-- Relational division 1.0
-- Show all people who own *exactly* the same (non-empty) set
-- of animals as I do.
--
-- Test data
CREATE TABLE adopter (id INTEGER NOT NULL primary key, fname varchar );
INSERT INTO adopter (id,fname) VALUES (1,'Bob'), (2,'Alice'), (3,'Chris');
CREATE TABLE pets (id INTEGER NOT NULL primary key, kind varchar);
INSERT INTO pets (id,kind) VALUES (1,'Dog'), (2,'Cat'), (3,'Pig');
CREATE TABLE adopterpets (adopter_id integer REFERENCES adopter(id)
, pet_id integer REFERENCES pets(id)
);
INSERT INTO adopterpets (adopter_id,pet_id) VALUES (1, 1), (1, 2), (2, 1), (2,3), (3,1), (3,2);
-- Show it to the world
SELECT ap.adopter_id, ap.pet_id
, a.fname, p.kind
FROM adopterpets ap
JOIN adopter a ON a.id = ap.adopter_id
JOIN pets p ON p.id = ap.pet_id
ORDER BY ap.adopter_id,ap.pet_id;
SELECT DISTINCT other.fname AS same_as_me
FROM adopter other
-- moi has *at least* one same kind of animal as toi
WHERE EXISTS (
SELECT * FROM adopterpets moi
JOIN adopterpets toi ON moi.pet_id = toi.pet_id
WHERE toi.adopter_id = other.id
AND moi.adopter_id <> toi.adopter_id
-- C'est moi!
AND moi.adopter_id = 1 -- 'Bob'
-- But moi should not own an animal that toi doesn't have
AND NOT EXISTS (
SELECT * FROM adopterpets lnx
WHERE lnx.adopter_id = moi.adopter_id
AND NOT EXISTS (
SELECT *
FROM adopterpets lnx2
WHERE lnx2.adopter_id = toi.adopter_id
AND lnx2.pet_id = lnx.pet_id
)
)
-- ... And toi should not own an animal that moi doesn't have
AND NOT EXISTS (
SELECT * FROM adopterpets rnx
WHERE rnx.adopter_id = toi.adopter_id
AND NOT EXISTS (
SELECT *
FROM adopterpets rnx2
WHERE rnx2.adopter_id = moi.adopter_id
AND rnx2.pet_id = rnx.pet_id
)
)
)
;
Result:
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "adopter_pkey" for table "adopter"
CREATE TABLE
INSERT 0 3
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "pets_pkey" for table "pets"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 6
adopter_id | pet_id | fname | kind
------------+--------+-------+------
1 | 1 | Bob | Dog
1 | 2 | Bob | Cat
2 | 1 | Alice | Dog
2 | 3 | Alice | Pig
3 | 1 | Chris | Dog
3 | 2 | Chris | Cat
(6 rows)
same_as_me
------------
Chris
(1 row)