SQL update variable number of rows - sql

I am representing data from sports matches, where each match has any number of lineups (basically teams), and any number of players in each lineup.
I have the following tables/columns:
match (id, start_time)
match_lineup (id, match_id, score)
lineup_players (id, lineup_id, player_id) - where lineup_id is a foreign key on match_lineup.id
players (id, name)
My question is about updating the lineup_players table. The number of players associated with each lineup (lineup_id) is variable.
When the required UPDATE to lineup_players has the same number of rows (players) I can do the following:
WITH new_lineup (rn, new_player_id) AS (
VALUES
(1, 5), -- assuming their are 5 players on the old_lineup and 5 players on new_lineup
(2, 6),
(3, 4),
(4, 8),
(5, 7)),
old_lineup AS (
SELECT (id, lineup_id, player_id, row_number() OVER (ORDER BY id) rn)
FROM lineup_players
WHERE lineup_id = 10), -- Update lineup of lineup_id = 10
UPDATE lineup_players
SET player_id = new_player_id
FROM new_lineup
JOIN old_lineup USING (rn)
This won't work when the number of rows (players) associated with the lineup (lineup_id) is changing on the update.
If the number of rows (players) associated with the lineup is increasing (i.e. there is an extra player on the lineup) then I need to insert the extra rows (players).
If the number of rows (players) associated with the lineup is decreasing (i.e. a player is being removed from the lineup) then I need to delete any extra rows.
I could accomplish this by simply deleting all rows with a given lineup_id and then inserting the new players again, but that seems kinda dirty.
I'm sure this is a solved problem in SQL but I haven't been able to find a solution online.
I'm using postgres if that makes any difference (could this be super easy to solve with UPSERT in 9.5?)

Related

How to search an entry in a table and return the column name or index in PostgreSQL

I have a table representing a card deck with 4 cards that each have a unique ID. Now i want to look for a specific card id in the table and find out which card in the deck it is.
card1
card 2
card3
card4
cardID1
cardID2
cardID3
cardID4
if my table would like this for example I would like to do something like :
SELECT column_name WHERE cardID3 IN (card1, card2, card3, card4)
looking for an answer i found this: SQL Server : return column names based on a record's value
but this doesn't seem to work for PostgreSQl
SQL Server's cross apply is the SQL standard cross join lateral.
SELECT Cname
FROM decks
CROSS join lateral (VALUES('card1',card1),
('card2',card2),
('card3',card3),
('card4',card4)) ca (cname, data)
WHERE data = 3
Demonstration.
However, the real problem is the design of your table. In general, if you have col1, col2, col3... you should instead be using a join table.
create table cards (
id serial primary key,
value text
);
create table decks (
id serial primary key
);
create table deck_cards (
deck_id integer not null references decks,
card_id integer not null references cards,
position integer not null check(position > 0),
-- Can't have the same card in a deck twice.
unique(deck_id, card_id),
-- Can't have two cards in the same position twice.
unique(deck_id, position)
);
insert into cards(id, value) values (1, 'KH'), (2, 'AH'), (3, '9H'), (4, 'QH');
insert into decks values (1), (2);
insert into deck_cards(deck_id, card_id, position) values
(1, 1, 1), (1, 3, 2),
(2, 1, 1), (2, 4, 2), (2, 2, 3);
We've made sure a deck can't have the same card, nor two cards in the same position.
-- Can't insert the same card.
insert into deck_cards(deck_id, card_id, position) values (1, 1, 3);
-- Can't insert the same position
insert into deck_cards(deck_id, card_id, position) values (2, 3, 3);
You can query a card's position directly.
select deck_id, position from deck_cards where card_id = 3
And there is no arbitrary limit on the number of cards in a deck, you can apply one with a trigger.
Demonstration.
This is a rather bad idea. Column names belong to the database structure, not to the data. So you can select IDs and names stored as data, but you should not have to select column names. And actually a user using your app should not be interested in column names; they can be rather technical.
It would probably be a good idea you changed the data model and stored card names along with the IDs, but I don't know how exactly you want to work with your data of course.
Anyway, if you want to stick with your current database design, you can still select those names, by including them in your query:
select
case when card1 = 123 then 'card1'
when card2 = 123 then 'card2'
when card3 = 123 then 'card3'
when card4 = 123 then 'card4'
end as card_column
from cardtable
where 123 in (card1, card2, card3, card4);

Select rows so that two of the columns are separately unique

Table user_book describes every user's favorite books.
CREATE TABLE user_book (
user_id INT,
book_id INT,
FOREIGN KEY (user_id) REFERENCES user(id),
FOREIGN KEY (book_id) REFERENCES book(id)
);
insert into user_book (user_id, book_id) values
(1, 1),
(1, 2),
(1, 5),
(2, 2),
(2, 5),
(3, 2),
(3, 5);
I want to write a query (possibly a with clause that defines multiple statements ― but not a procedure) that would try to distribute ONE favorite book to every user who has one or more favorite books.
Any ideas how to do it?
More details:
The distribution plan may be naive. i.e. it may look as if you went user after user and each time randomly gave the user whatever favorite book was still available if there was any, without considering what would be left for the remaining users.
This means that sometimes some books may not be distributed, and/or sometimes some users may not get any book (example 2). This can happen when the numbers of books and users are not equal, and/or due to the specific distribution order that you have used.
A book cannot be distributed to two different users (example 3).
Examples:
1. A possible distribution:
(1, 1)
(2, 2)
(3, 5)
2. A possible distribution (here user 3 got nothing, and book 1 was not distributed. That's acceptable):
(1, 2)
(2, 5)
3. An impossible distribution (both users 1 and 2 got book 2, that's not allowed):
(1, 2)
(2, 2)
(3, 5)
Similar questions that are not exactly this one:
How to select records without duplicate on just one field in SQL?
SQL: How do I SELECT only the rows with a unique value on certain column?
How to select unique records by SQL
The user_book table should also have a UNIQUE(user_id, book_id) constraint.
A simple solution like this returns a list in which each user gets zero or one book and each book is given to zero or one user:
WITH list AS (SELECT user_id, MIN(book_id) AS fav_book FROM user_book GROUP BY user_id)
SELECT fav_book, MIN(user_id) FROM list GROUP BY fav_book

Select parents with children in multiple places

I have two tables, boxes and things that partially model a warehouse.
A box may
contain a single thing
contain one or more boxes
be empty
There is only one level of nesting: a box may be a parent or a child, but not a grandparent.
I want to identify parent boxes that satisfy these criteria:
have children in more than one place
only child boxes associated with a quantity > 0 are to be considered
Using the example data the box with id 2 should be selected, because it has children with quantities in two places. Box 1 should be rejected because all it's children are in a single place and box 3 should be rejected because while it has children in two places, only one place has a positive quantity.
The query should work on all supported versions of Postgresql. Both tables contain around two million records.
Setup:
DROP TABLE IF EXISTS things;
DROP TABLE IF EXISTS boxes;
CREATE TABLE boxes (
id serial primary key,
box_id integer references boxes(id)
);
CREATE TABLE things (
id serial primary key,
box_id integer references boxes(id),
place_id integer,
quantity integer
);
INSERT INTO boxes (box_id)
VALUES (NULL), (NULL), (NULL), (1), (1), (2), (2), (3), (3);
INSERT INTO things (box_id, place_id, quantity)
VALUES (4, 1, 1), (5, 1, 1), (6, 2, 1), (7, 3, 1), (8, 4, 1), (9, 5, 0);
I have come up this solution
WITH parent_places AS (
SELECT DISTINCT ON (b.box_id, t.place_id) b.box_id, t.place_id
FROM boxes b
JOIN things t ON b.id = t.box_id
WHERE t.quantity > 0
)
SELECT box_id, COUNT(box_id)
FROM parent_places
GROUP BY box_id
HAVING COUNT(box_id) > 1;
but I'm wondering if I've missed a more obvious solution (or if my solution has any errors that I've overlooked).
DB Fiddle
The only way a box to have things with different location properties is only when the box has several boxes with things in them.
SELECT
b2.box_id, COUNT(DISTINCT place_id)
FROM
boxes b2
JOIN things t ON b2.id = t.box_id AND quantity > 0
WHERE
b2.box_id IS NOT NULL
GROUP BY
b2.box_id
HAVING
COUNT(DISTINCT place_id) > 1;
I see no reason for using CTE like in your example. I think you should use the simplest query that does the job.

Is it ok to DELETE/INSERT instead of DIFF/UPDATE when doing entity mapping?

Let's say I have an event, and I want to have people attending it.
When I create the event, I would do...
INSERT INTO event (eventName) VALUES ('some event'); -- eventId = 1
INSERT INTO eventPeopleMapping (eventId, personId)
VALUES
(1, 1), -- Person 1
(1, 2), -- Person 2
(1, 3), -- Person 3
-- hundreds more...
;
Now, what If I want to remove Person 3 but add Person 7?
DELETE FROM eventPeopleMapping WHERE eventId = 1;
INSERT INTO eventPeopleMapping (eventId, personId)
VALUES
(1, 1), -- Person 1
(1, 2), -- Person 2
(1, 7), -- Person 7
-- hundreds more...
;
Is this a good way to do it?
NOTE: This is for hundreds of people, changing often.
Comparing arrays and objects in order to find differences, and then hunt for the values in the database is too cumbersome. This seems so simple, but I don't know if I am missing something.
The only drawback I see is A TON of mapping IDs, and them constantly changing.
Your method works, but it requires knowing all the people at the event. More commonly, you would delete only the row you want deleted and then insert only the new row:
DELETE FROM eventPeopleMapping
WHERE eventId = 1 AND personId = 3;
INSERT INTO eventPeopleMapping (eventId, personId)
VALUES (1, ;7) -- Person 7

Clustering/Similarity between text cells in an postgres aggregate

I've got a table that has a text column and some other identifying features. I want to be able to group by one of the features and find out whether the text in the groups are similar or not. I want to use this to determine if there are multiple groups in my data or a single group (with some possible bad spelling) so that I can provide a rough "confidence" value showing if the aggregate represents a single group or not.
CREATE TABLE data_test (
Id serial primary key,
Name VARCHAR(70) NOT NULL,
Job VARCHAR(100) NOT NULL);
INSERT INTO data_test
(Name, Job)
VALUES
('John', 'Astronaut'),
('John', 'Astronaut'),
('Ann', 'Sales'),
('Jon', 'Astronaut'),
('Jason', 'Sales'),
('Pranav', 'Sales'),
('Todd', 'Sales'),
('John', 'Astronaut');
I'd like to run a query that was something like:
select
Job,
count(Name),
Similarity_Agg(Name)
from data_test
group by Job;
and receive
Job count Similarity
Sales 4 0.1
Astronaut 4 0.9
Basically showing that Astronaut names are very similar (or, more likely in my data, all the rows are referring to a single astronaut) and the Sales names aren't (more people working in sales than in space). I see there is a Postgres Module that can handle comparing two strings but it doesn't seem to have any aggregate functions in it.
Any ideas?
One option is a self-join:
select
d.job,
count(distinct d.id) cnt,
avg(similarly(d.name, d1.name)) avg_similarity
from data_test d
inner join data_test d1 on d1.job = d.job
group by d.job