How can I intersect two ActiveRecord::Relations on an arbitrary column? - sql

If I have a people table with the following structure and records:
drop table if exists people;
create table people (id int, name varchar(255));
insert into people values (1, "Amy");
insert into people values (2, "Bob");
insert into people values (3, "Chris");
insert into people values (4, "Amy");
insert into people values (5, "Bob");
insert into people values (6, "Chris");
I'd like to find the intersection of people with ids (1, 2, 3) and (4, 5, 6) based on the name column.
In SQL, I'd do something like this:
select
group_concat(id),
group_concat(name)
from people
group by name;
Which returns this result set:
id | name
----|----------
1,4 | Amy,Amy
2,5 | Bob,Bob
3,6 | Chris,Chris
In Rails, I'm not sure how to solve this.
My closest so far is:
a = Model.where(id: [1, 2, 3])
b = Model.where(id: [4, 5, 6])
a_results = a.where(name: b.pluck(:name)).order(:name)
b_results = b.where(name: a.pluck(:name)).order(:name)
a_results.zip(b_results)
This seems to work, but I have the following reservations:
Performance - is this going to perform well in the database?
Lazy enumeration - does calling #zip break lazy enumeration of records?
Duplicates - what will happen if either set contains more than one record for a given name? What will happen if a set contains more than one of the same id?
Any thoughts or suggestions?
Thanks

You can use your normal sql method to get this arbitrary column in ruby like so:
#people = People.select("group_concat(id) as somecolumn1, group_concat(name) as somecolumn2").group("group_concat(id), group_concat(name)")
For each record in #people you will now have somecolumn1/2 attributes.

Related

SQL - Why does this evaluate to true?

Consider this snippet of SQL:
CREATE TABLE product
(
id integer,
stock_quantity integer
);
INSERT INTO product (id, stock_quantity) VALUES (1, NULL);
INSERT INTO product (id, stock_quantity) VALUES (2, NULL);
SELECT *
FROM product
WHERE (id, stock_quantity) NOT IN ((1, 2), (2, 9));
I can't understand why it doesn't select anything. I'm using Postgres.
I would expect both rows to be returned, because I'd expect (1, NULL) and (2, NULL) to not be in ((1,2), (2, 9)).
If I replace NULL with 0, for example, it does return the two results.
Why was it designed to be this way? What am I missing?
Thanks!
Think of a null as missing data. For example, if we had a column "Date of Birth".
For example, consider a database that has three people born in 1975, 1990, and we didn't know the date of birth of the third one. We know that third person was born for sure, but we don't know it's birth date yet.
Now, what if a query searched for people "not born in 1990"? That would return the first person only.
The second person was born in 1990 so it clearly cannot be selected.
For the third person we don't know the date of birth so we cannot say anything about her, and the query doesn't select her either. Does it make sense?

Select rows so that two of the columns are separately unique

Table user_book describes every user's favorite books.
CREATE TABLE user_book (
user_id INT,
book_id INT,
FOREIGN KEY (user_id) REFERENCES user(id),
FOREIGN KEY (book_id) REFERENCES book(id)
);
insert into user_book (user_id, book_id) values
(1, 1),
(1, 2),
(1, 5),
(2, 2),
(2, 5),
(3, 2),
(3, 5);
I want to write a query (possibly a with clause that defines multiple statements ― but not a procedure) that would try to distribute ONE favorite book to every user who has one or more favorite books.
Any ideas how to do it?
More details:
The distribution plan may be naive. i.e. it may look as if you went user after user and each time randomly gave the user whatever favorite book was still available if there was any, without considering what would be left for the remaining users.
This means that sometimes some books may not be distributed, and/or sometimes some users may not get any book (example 2). This can happen when the numbers of books and users are not equal, and/or due to the specific distribution order that you have used.
A book cannot be distributed to two different users (example 3).
Examples:
1. A possible distribution:
(1, 1)
(2, 2)
(3, 5)
2. A possible distribution (here user 3 got nothing, and book 1 was not distributed. That's acceptable):
(1, 2)
(2, 5)
3. An impossible distribution (both users 1 and 2 got book 2, that's not allowed):
(1, 2)
(2, 2)
(3, 5)
Similar questions that are not exactly this one:
How to select records without duplicate on just one field in SQL?
SQL: How do I SELECT only the rows with a unique value on certain column?
How to select unique records by SQL
The user_book table should also have a UNIQUE(user_id, book_id) constraint.
A simple solution like this returns a list in which each user gets zero or one book and each book is given to zero or one user:
WITH list AS (SELECT user_id, MIN(book_id) AS fav_book FROM user_book GROUP BY user_id)
SELECT fav_book, MIN(user_id) FROM list GROUP BY fav_book

How do I consolidate this table?

I have a problem that I will try to describe like this. I have a table in PostgreSQL like below
(here's what I have).
Now I'm wrapping my head around how to "merge" or "consolidate" this table to make it look like this one on -> Here's what I want to have.
Multiple rows are the result of having different ID or different value in any column after in general (but I don't need that information anymore, so I may get rid of it without any consequences).
Is there any function or any trick that might bring me desired result?
What I have tried:
select "name"
, "array_agg" [1][1] as math_grade
, "array_agg" [2][2] as history_grade
, "array_agg" [3][3] as geography_grade
from (select "name"
, array_agg(array[math_grade,history_grade,geography_grade])
from temp1234
group by "name") as abc
Here is a example table:
create table temp1234 (id int
, name varchar(50)
, math_grade int
, history_grade int
, geography_grade int)
And example data:
insert into temp1234 values (1, 'John Smith', 3, null, null)
insert into temp1234 values (2, 'John Smith', null, 4, null)
insert into temp1234 values (3, 'John Smith', null, null, 3)
Best Regards
This will give you what you want but I am sure that with more data you will find this query is not covering all you need ? Please do provide more data for more detailed help.
select min(id), name, max(math_grade), max(history_grade), max(geography_grade)
from temp1234
group by name
Here is a demo

H2 SQL database - INSERT if the record does not exist

I would like initialize a H2 database, but I am not sure if the records exist. If they exist I don't want to do anything, but if they don't exist I would like to write the default values.
Something like this:
IF 'number of rows in ACCESSLEVELS' = 0
INSERT INTO ACCESSLEVELS VALUES
(0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP')
;
MERGE INTO ACCESSLEVELS
KEY(ID)
VALUES (0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP');
Updates existing rows, and insert rows that don't exist. If no key column is specified, the primary key columns are used to find the row.
If you do not name the columns, their values must be provided as defined in the table. If you prefer to name the columns to be more independent from their order in the table definition, or to avoid having to provide values for all columns when that is not necessary or possible:
MERGE INTO ACCESSLEVELS
(ID, LEVELNAME)
KEY(ID)
VALUES (0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP');
Note that you must include the key column ("ID" in this example) in the column list as well as in the KEY clause.
The following works for MySQL, PostgreSQL, and the H2 database:
drop table ACCESSLEVELS;
create table ACCESSLEVELS(id int, name varchar(255));
insert into ACCESSLEVELS select * from (
select 0, 'admin' union
select 1, 'SEO' union
select 2, 'sales director' union
select 3, 'manager' union
select 4, 'REP'
) x where not exists(select * from ACCESSLEVELS);
To do this you can use MySQL Compatibility Mode in H2 database. Starting from 1.4.197 version it supports the following syntax:
INSERT IGNORE INTO table_name VALUES ...
From this pull request:
INSERT IGNORE is not supported in Regular mode, you have to enable MySQL compatibility mode explicitly by appending ;MODE=MySQL to your database URL or by executing SET MODE MySQL statement.
From official site:
INSERT IGNORE is partially supported and may be used to skip rows with duplicate keys if ON DUPLICATE KEY UPDATE is not specified.
Here is another way:
CREATE TABLE target (C1 VARCHAR(255), C2 VARCHAR(255));
MERGE INTO target AS T USING (SELECT 'foo' C1, 'bar') AS S ON T.C1=S.C1
WHEN NOT MATCHED THEN
INSERT VALUES('foo', 'bar')
When a row in S matches one or more rows in T, do nothing. But when a row in S is not matched, insert it. See "MERGE USING" for more details:
https://www.h2database.com/html/commands.html#merge_using

Iterating through a social graph in a SQL database

I store simple social-graph information like so:
People ( PersonId bigint, Name nvarchar )
Relationships ( From bigint, To bigint, Title nvarchar )
So the data looks something like this:
People
1, John Smith
2, Joan Smith
3, Jack Smith
Relationships
1, 2, Spouse
1, 3, Parent
2, 3, Parent
Note that the titles of relationships are normalized: so there is no "husband" and "wife", only "spouse", which also avoids needing to create two separate relationships that form the same link, the same applies with "Parent" instead of "Son" or "Daughter".
The question is how you can iterate through an entire connected-graph (i.e. only return a single family), and, for example, find siblings without needing to create an explicit Sibling relationship entry. The nodes don't necessarily need to be returned in any particular order. I might also want to only return nodes that are at most N degrees away from a given start node.
I know you can do recursive SQL SELECT statements with some new tricks in recent SQL language versions, but this isn't necessarily a recursive operation because these relationships can express a cyclic non-directional graph (think if "Friend" was added as a relationship). How would you do that in SQL?
Very cool problem. While it's a social network graph, it is still a hierarchical problem, even though the hierarchy can logistically turn into a web of interconnections. In MSSQL you still want to use a WITH clause to do a recursive query, the only difference is that due to the multiple interconnections you need to ensure unique results, either with DISTINCT or by using an IN clause in the WHERE condition.
This works:
DECLARE #PersonID bigint;
SET #PersonID = 1;
WITH RecurseRelations (PersonID, OriginalPersonID)
AS
(
SELECT PersonID, PersonId OriginalPersonID
FROM People
UNION ALL
SELECT ToPersonID, RR.OriginalPersonID
FROM Relationships R
INNER JOIN
RecurseRelations RR
ON
R.FromPersonID = RR.PersonID
)
SELECT PersonId, Name
FROM People
WHERE PersonId IN
(
SELECT PersonID
FROM RecurseRelations
WHERE OriginalPersonID = #PersonID
)
Here's some test data with more relations than you had originally and a whole other family to make sure it's not picking up more than intended.
create table People ( PersonId bigint, Name nvarchar(200) );
create table Relationships ( FromPersonID bigint, ToPersonID bigint, Title nvarchar(200) );
insert into People values (1, 'John Smith');
insert into People values (2, 'Joan Smith');
insert into People values (3, 'Jack Smith');
insert into People values (4, 'Joey Smith');
insert into People values (9, 'Jaime Smith');
insert into People values (5, 'Edward Jones');
insert into People values (6, 'Emma Jones');
insert into People values (7, 'Eva Jones');
insert into People values (8, 'Eve Jones');
insert into Relationships values (1, 2, 'Spouse');
insert into Relationships values (1, 3, 'Parent');
insert into Relationships values (2, 3, 'Parent');
insert into Relationships values (3, 4, 'Child');
insert into Relationships values (2, 4, 'Child');
insert into Relationships values (4, 9, 'Child');
insert into Relationships values (5, 6, 'Spouse');
insert into Relationships values (5, 7, 'Parent');
insert into Relationships values (6, 7, 'Parent');
insert into Relationships values (5, 8, 'Child');