Select rows so that two of the columns are separately unique - sql

Table user_book describes every user's favorite books.
CREATE TABLE user_book (
user_id INT,
book_id INT,
FOREIGN KEY (user_id) REFERENCES user(id),
FOREIGN KEY (book_id) REFERENCES book(id)
);
insert into user_book (user_id, book_id) values
(1, 1),
(1, 2),
(1, 5),
(2, 2),
(2, 5),
(3, 2),
(3, 5);
I want to write a query (possibly a with clause that defines multiple statements ― but not a procedure) that would try to distribute ONE favorite book to every user who has one or more favorite books.
Any ideas how to do it?
More details:
The distribution plan may be naive. i.e. it may look as if you went user after user and each time randomly gave the user whatever favorite book was still available if there was any, without considering what would be left for the remaining users.
This means that sometimes some books may not be distributed, and/or sometimes some users may not get any book (example 2). This can happen when the numbers of books and users are not equal, and/or due to the specific distribution order that you have used.
A book cannot be distributed to two different users (example 3).
Examples:
1. A possible distribution:
(1, 1)
(2, 2)
(3, 5)
2. A possible distribution (here user 3 got nothing, and book 1 was not distributed. That's acceptable):
(1, 2)
(2, 5)
3. An impossible distribution (both users 1 and 2 got book 2, that's not allowed):
(1, 2)
(2, 2)
(3, 5)
Similar questions that are not exactly this one:
How to select records without duplicate on just one field in SQL?
SQL: How do I SELECT only the rows with a unique value on certain column?
How to select unique records by SQL

The user_book table should also have a UNIQUE(user_id, book_id) constraint.
A simple solution like this returns a list in which each user gets zero or one book and each book is given to zero or one user:
WITH list AS (SELECT user_id, MIN(book_id) AS fav_book FROM user_book GROUP BY user_id)
SELECT fav_book, MIN(user_id) FROM list GROUP BY fav_book

Related

SQL - Why does this evaluate to true?

Consider this snippet of SQL:
CREATE TABLE product
(
id integer,
stock_quantity integer
);
INSERT INTO product (id, stock_quantity) VALUES (1, NULL);
INSERT INTO product (id, stock_quantity) VALUES (2, NULL);
SELECT *
FROM product
WHERE (id, stock_quantity) NOT IN ((1, 2), (2, 9));
I can't understand why it doesn't select anything. I'm using Postgres.
I would expect both rows to be returned, because I'd expect (1, NULL) and (2, NULL) to not be in ((1,2), (2, 9)).
If I replace NULL with 0, for example, it does return the two results.
Why was it designed to be this way? What am I missing?
Thanks!
Think of a null as missing data. For example, if we had a column "Date of Birth".
For example, consider a database that has three people born in 1975, 1990, and we didn't know the date of birth of the third one. We know that third person was born for sure, but we don't know it's birth date yet.
Now, what if a query searched for people "not born in 1990"? That would return the first person only.
The second person was born in 1990 so it clearly cannot be selected.
For the third person we don't know the date of birth so we cannot say anything about her, and the query doesn't select her either. Does it make sense?

Select parents with children in multiple places

I have two tables, boxes and things that partially model a warehouse.
A box may
contain a single thing
contain one or more boxes
be empty
There is only one level of nesting: a box may be a parent or a child, but not a grandparent.
I want to identify parent boxes that satisfy these criteria:
have children in more than one place
only child boxes associated with a quantity > 0 are to be considered
Using the example data the box with id 2 should be selected, because it has children with quantities in two places. Box 1 should be rejected because all it's children are in a single place and box 3 should be rejected because while it has children in two places, only one place has a positive quantity.
The query should work on all supported versions of Postgresql. Both tables contain around two million records.
Setup:
DROP TABLE IF EXISTS things;
DROP TABLE IF EXISTS boxes;
CREATE TABLE boxes (
id serial primary key,
box_id integer references boxes(id)
);
CREATE TABLE things (
id serial primary key,
box_id integer references boxes(id),
place_id integer,
quantity integer
);
INSERT INTO boxes (box_id)
VALUES (NULL), (NULL), (NULL), (1), (1), (2), (2), (3), (3);
INSERT INTO things (box_id, place_id, quantity)
VALUES (4, 1, 1), (5, 1, 1), (6, 2, 1), (7, 3, 1), (8, 4, 1), (9, 5, 0);
I have come up this solution
WITH parent_places AS (
SELECT DISTINCT ON (b.box_id, t.place_id) b.box_id, t.place_id
FROM boxes b
JOIN things t ON b.id = t.box_id
WHERE t.quantity > 0
)
SELECT box_id, COUNT(box_id)
FROM parent_places
GROUP BY box_id
HAVING COUNT(box_id) > 1;
but I'm wondering if I've missed a more obvious solution (or if my solution has any errors that I've overlooked).
DB Fiddle
The only way a box to have things with different location properties is only when the box has several boxes with things in them.
SELECT
b2.box_id, COUNT(DISTINCT place_id)
FROM
boxes b2
JOIN things t ON b2.id = t.box_id AND quantity > 0
WHERE
b2.box_id IS NOT NULL
GROUP BY
b2.box_id
HAVING
COUNT(DISTINCT place_id) > 1;
I see no reason for using CTE like in your example. I think you should use the simplest query that does the job.

Need help linking an ID column with another ID column for an insert

I'm generating a dataset from a stored procedure. There's an ID column from a different database that I need to link to my dataset. Going from one database to the other isn't the issue. The problem is I'm having trouble linking an ID column that's in one database and not the other.
There's a common ID between the two columns: CustomerEventID. I know that'll be my link, but here's the issue. The CustomerEventID is a unique ID for whenever a customer purchases something in a single encounter.
The column I need to pull in is more discrete. It creates a unique ID (CustomerPurchaseID) for each item purchased from the encounter. However, I only need to pull in the CustomerPurchaseID for the last item purchased in that encounter. Unfortunately there's no time stamp associated with the CustomerPurchaseID.
It's basically like the CustomerEventID is a unique ID for a customer's receipt, whereas the CustomerPurchaseID is a unique ID for each item purchased on that receipt.
What would be the best way to pull the last CustomerPurchaseID from the CustomerEventID (I only need the last item on the receipt in my dataset)? I'll be taking my stored procedure with the dataset (from database A), using an SSIS package to put the dataset into a table on database B, then inserting the CustomerPurchaseID into that table.
I'm not sure if it helps, but here's the query from the stored procedure that will be sent to the other database (the process will run every 2 weeks to send it to Database B):
SELECT
ce.CustomerEventID,
ce.CustomerName,
ce.CustomerPhone,
ce.CustomerEventDate
FROM
CustomerEvent ce
WHERE
DATEDIFF(d, ce.CustomerEventDate, GETDATE()) < 14
Thanks for taking the time to read this wall of text. :)
If CustomerPurchaseID field is increasing (as you mentioned) then you can do Order by Desc on that field while picking up the line. This can be done using a sub-query in parent query or by doing a Outer Apply or Cross Apply if you need all the fields from CustomerPurchase table. Check below example.
declare #customerEvent table(CustomerEventID int not null primary key identity
, EventDate datetime)
declare #customerPurchase table(CustomerPurchaseID int not null primary key identity
, CustomerEventID int, ItemID varchar(100))
insert into #customerEvent(EventDate)
values ('2018-01-01'), ('2018-01-02'), ('2018-01-03'), ('2018-01-04')
insert into #customerPurchase(CustomerEventID, ItemID)
values (1, 1), (1, 2), (1, 3)
, (2, 3), (2, 4), (2, 10)
, (3, 1), (3, 2)
, (4, 1)
-- if you want all the fields from CustomerPurchase Table
select e.CustomerEventID
, op.CustomerPurchaseID
from #customerEvent as e
outer apply (select top 1 p.* from #customerPurchase as p where p.CustomerEventID = e.CustomerEventID
order by CustomerPurchaseID desc) as op
-- if you want only the last CustomerPurchaseID from CustomerPurchase table
select e.CustomerEventID
, (select top 1 CustomerPurchaseID from #customerPurchase as p where p.CustomerEventID = e.CustomerEventID
order by CustomerPurchaseID desc)
as LastCustomerPurchaseID
from #customerEvent as e

SQL update variable number of rows

I am representing data from sports matches, where each match has any number of lineups (basically teams), and any number of players in each lineup.
I have the following tables/columns:
match (id, start_time)
match_lineup (id, match_id, score)
lineup_players (id, lineup_id, player_id) - where lineup_id is a foreign key on match_lineup.id
players (id, name)
My question is about updating the lineup_players table. The number of players associated with each lineup (lineup_id) is variable.
When the required UPDATE to lineup_players has the same number of rows (players) I can do the following:
WITH new_lineup (rn, new_player_id) AS (
VALUES
(1, 5), -- assuming their are 5 players on the old_lineup and 5 players on new_lineup
(2, 6),
(3, 4),
(4, 8),
(5, 7)),
old_lineup AS (
SELECT (id, lineup_id, player_id, row_number() OVER (ORDER BY id) rn)
FROM lineup_players
WHERE lineup_id = 10), -- Update lineup of lineup_id = 10
UPDATE lineup_players
SET player_id = new_player_id
FROM new_lineup
JOIN old_lineup USING (rn)
This won't work when the number of rows (players) associated with the lineup (lineup_id) is changing on the update.
If the number of rows (players) associated with the lineup is increasing (i.e. there is an extra player on the lineup) then I need to insert the extra rows (players).
If the number of rows (players) associated with the lineup is decreasing (i.e. a player is being removed from the lineup) then I need to delete any extra rows.
I could accomplish this by simply deleting all rows with a given lineup_id and then inserting the new players again, but that seems kinda dirty.
I'm sure this is a solved problem in SQL but I haven't been able to find a solution online.
I'm using postgres if that makes any difference (could this be super easy to solve with UPSERT in 9.5?)

How can I intersect two ActiveRecord::Relations on an arbitrary column?

If I have a people table with the following structure and records:
drop table if exists people;
create table people (id int, name varchar(255));
insert into people values (1, "Amy");
insert into people values (2, "Bob");
insert into people values (3, "Chris");
insert into people values (4, "Amy");
insert into people values (5, "Bob");
insert into people values (6, "Chris");
I'd like to find the intersection of people with ids (1, 2, 3) and (4, 5, 6) based on the name column.
In SQL, I'd do something like this:
select
group_concat(id),
group_concat(name)
from people
group by name;
Which returns this result set:
id | name
----|----------
1,4 | Amy,Amy
2,5 | Bob,Bob
3,6 | Chris,Chris
In Rails, I'm not sure how to solve this.
My closest so far is:
a = Model.where(id: [1, 2, 3])
b = Model.where(id: [4, 5, 6])
a_results = a.where(name: b.pluck(:name)).order(:name)
b_results = b.where(name: a.pluck(:name)).order(:name)
a_results.zip(b_results)
This seems to work, but I have the following reservations:
Performance - is this going to perform well in the database?
Lazy enumeration - does calling #zip break lazy enumeration of records?
Duplicates - what will happen if either set contains more than one record for a given name? What will happen if a set contains more than one of the same id?
Any thoughts or suggestions?
Thanks
You can use your normal sql method to get this arbitrary column in ruby like so:
#people = People.select("group_concat(id) as somecolumn1, group_concat(name) as somecolumn2").group("group_concat(id), group_concat(name)")
For each record in #people you will now have somecolumn1/2 attributes.