Iterating through a social graph in a SQL database - sql

I store simple social-graph information like so:
People ( PersonId bigint, Name nvarchar )
Relationships ( From bigint, To bigint, Title nvarchar )
So the data looks something like this:
People
1, John Smith
2, Joan Smith
3, Jack Smith
Relationships
1, 2, Spouse
1, 3, Parent
2, 3, Parent
Note that the titles of relationships are normalized: so there is no "husband" and "wife", only "spouse", which also avoids needing to create two separate relationships that form the same link, the same applies with "Parent" instead of "Son" or "Daughter".
The question is how you can iterate through an entire connected-graph (i.e. only return a single family), and, for example, find siblings without needing to create an explicit Sibling relationship entry. The nodes don't necessarily need to be returned in any particular order. I might also want to only return nodes that are at most N degrees away from a given start node.
I know you can do recursive SQL SELECT statements with some new tricks in recent SQL language versions, but this isn't necessarily a recursive operation because these relationships can express a cyclic non-directional graph (think if "Friend" was added as a relationship). How would you do that in SQL?

Very cool problem. While it's a social network graph, it is still a hierarchical problem, even though the hierarchy can logistically turn into a web of interconnections. In MSSQL you still want to use a WITH clause to do a recursive query, the only difference is that due to the multiple interconnections you need to ensure unique results, either with DISTINCT or by using an IN clause in the WHERE condition.
This works:
DECLARE #PersonID bigint;
SET #PersonID = 1;
WITH RecurseRelations (PersonID, OriginalPersonID)
AS
(
SELECT PersonID, PersonId OriginalPersonID
FROM People
UNION ALL
SELECT ToPersonID, RR.OriginalPersonID
FROM Relationships R
INNER JOIN
RecurseRelations RR
ON
R.FromPersonID = RR.PersonID
)
SELECT PersonId, Name
FROM People
WHERE PersonId IN
(
SELECT PersonID
FROM RecurseRelations
WHERE OriginalPersonID = #PersonID
)
Here's some test data with more relations than you had originally and a whole other family to make sure it's not picking up more than intended.
create table People ( PersonId bigint, Name nvarchar(200) );
create table Relationships ( FromPersonID bigint, ToPersonID bigint, Title nvarchar(200) );
insert into People values (1, 'John Smith');
insert into People values (2, 'Joan Smith');
insert into People values (3, 'Jack Smith');
insert into People values (4, 'Joey Smith');
insert into People values (9, 'Jaime Smith');
insert into People values (5, 'Edward Jones');
insert into People values (6, 'Emma Jones');
insert into People values (7, 'Eva Jones');
insert into People values (8, 'Eve Jones');
insert into Relationships values (1, 2, 'Spouse');
insert into Relationships values (1, 3, 'Parent');
insert into Relationships values (2, 3, 'Parent');
insert into Relationships values (3, 4, 'Child');
insert into Relationships values (2, 4, 'Child');
insert into Relationships values (4, 9, 'Child');
insert into Relationships values (5, 6, 'Spouse');
insert into Relationships values (5, 7, 'Parent');
insert into Relationships values (6, 7, 'Parent');
insert into Relationships values (5, 8, 'Child');

Related

How do I consolidate this table?

I have a problem that I will try to describe like this. I have a table in PostgreSQL like below
(here's what I have).
Now I'm wrapping my head around how to "merge" or "consolidate" this table to make it look like this one on -> Here's what I want to have.
Multiple rows are the result of having different ID or different value in any column after in general (but I don't need that information anymore, so I may get rid of it without any consequences).
Is there any function or any trick that might bring me desired result?
What I have tried:
select "name"
, "array_agg" [1][1] as math_grade
, "array_agg" [2][2] as history_grade
, "array_agg" [3][3] as geography_grade
from (select "name"
, array_agg(array[math_grade,history_grade,geography_grade])
from temp1234
group by "name") as abc
Here is a example table:
create table temp1234 (id int
, name varchar(50)
, math_grade int
, history_grade int
, geography_grade int)
And example data:
insert into temp1234 values (1, 'John Smith', 3, null, null)
insert into temp1234 values (2, 'John Smith', null, 4, null)
insert into temp1234 values (3, 'John Smith', null, null, 3)
Best Regards
This will give you what you want but I am sure that with more data you will find this query is not covering all you need ? Please do provide more data for more detailed help.
select min(id), name, max(math_grade), max(history_grade), max(geography_grade)
from temp1234
group by name
Here is a demo

SAP HANA SQL Query to find all possible combinations between two columns

The target is to create all possible combinations of joining the two columns using SAP HANA SQL. every article of the first column ('100','101','102','103') must be in the combination result.
Sample Code
create table basis
(article Integer,
supplier VarChar(10) );
Insert into basis Values (100, 'A');
Insert into basis Values (101, 'A');
Insert into basis Values (101, 'B');
Insert into basis Values (101, 'C');
Insert into basis Values (102, 'D');
Insert into basis Values (103, 'B');
Result set
combination_nr;article;supplier
1;100;'A'
1;101;'A'
1;102;'D'
1;103;'B'
2;100;'A'
2;101;'B'
2;102;'D'
2;103;'B'
3;100;'A'
3;101;'C'
3;102;'D'
3;103;'B'
Let suppose if we add one more row against 102 as 'A' then our result set will be like this
Also according to the below-given calculations now we have 24 result sets
1;100;'A'
1;101;'A'
1;102;'A'
1;103;'B'
2;100;'A'
2;101;'A'
2;102;'D'
2;103;'B'
3;100;'A'
3;101;'B'
3;102;'A'
3;103;'B'
4;100;'A'
4;101;'B'
4;102;'D'
4;103;'B'
5;100;'A'
5;101;'C'
5;102;'A'
5;103;'B'
6;100;'A'
6;101;'C'
6;102;'D'
6;103;'B'
Calculations:
article 100: 1 supplier ('A')
article 101: 3 suppliers ('A','B','C')
article 102: 1 supplier ('D')
article 103: 1 supplier ('B')
unique articles: 4 (100,101,102,103)
1x3x1x1 x 4 = 12 (combination rows)
How about:
select article,
count(supplier) as nb_supplier,
STRING_AGG(supplier,',') as list_suppliers
from (select distinct article, supplier from basis)
group by article

Is it ok to DELETE/INSERT instead of DIFF/UPDATE when doing entity mapping?

Let's say I have an event, and I want to have people attending it.
When I create the event, I would do...
INSERT INTO event (eventName) VALUES ('some event'); -- eventId = 1
INSERT INTO eventPeopleMapping (eventId, personId)
VALUES
(1, 1), -- Person 1
(1, 2), -- Person 2
(1, 3), -- Person 3
-- hundreds more...
;
Now, what If I want to remove Person 3 but add Person 7?
DELETE FROM eventPeopleMapping WHERE eventId = 1;
INSERT INTO eventPeopleMapping (eventId, personId)
VALUES
(1, 1), -- Person 1
(1, 2), -- Person 2
(1, 7), -- Person 7
-- hundreds more...
;
Is this a good way to do it?
NOTE: This is for hundreds of people, changing often.
Comparing arrays and objects in order to find differences, and then hunt for the values in the database is too cumbersome. This seems so simple, but I don't know if I am missing something.
The only drawback I see is A TON of mapping IDs, and them constantly changing.
Your method works, but it requires knowing all the people at the event. More commonly, you would delete only the row you want deleted and then insert only the new row:
DELETE FROM eventPeopleMapping
WHERE eventId = 1 AND personId = 3;
INSERT INTO eventPeopleMapping (eventId, personId)
VALUES (1, ;7) -- Person 7

How can I intersect two ActiveRecord::Relations on an arbitrary column?

If I have a people table with the following structure and records:
drop table if exists people;
create table people (id int, name varchar(255));
insert into people values (1, "Amy");
insert into people values (2, "Bob");
insert into people values (3, "Chris");
insert into people values (4, "Amy");
insert into people values (5, "Bob");
insert into people values (6, "Chris");
I'd like to find the intersection of people with ids (1, 2, 3) and (4, 5, 6) based on the name column.
In SQL, I'd do something like this:
select
group_concat(id),
group_concat(name)
from people
group by name;
Which returns this result set:
id | name
----|----------
1,4 | Amy,Amy
2,5 | Bob,Bob
3,6 | Chris,Chris
In Rails, I'm not sure how to solve this.
My closest so far is:
a = Model.where(id: [1, 2, 3])
b = Model.where(id: [4, 5, 6])
a_results = a.where(name: b.pluck(:name)).order(:name)
b_results = b.where(name: a.pluck(:name)).order(:name)
a_results.zip(b_results)
This seems to work, but I have the following reservations:
Performance - is this going to perform well in the database?
Lazy enumeration - does calling #zip break lazy enumeration of records?
Duplicates - what will happen if either set contains more than one record for a given name? What will happen if a set contains more than one of the same id?
Any thoughts or suggestions?
Thanks
You can use your normal sql method to get this arbitrary column in ruby like so:
#people = People.select("group_concat(id) as somecolumn1, group_concat(name) as somecolumn2").group("group_concat(id), group_concat(name)")
For each record in #people you will now have somecolumn1/2 attributes.

H2 SQL database - INSERT if the record does not exist

I would like initialize a H2 database, but I am not sure if the records exist. If they exist I don't want to do anything, but if they don't exist I would like to write the default values.
Something like this:
IF 'number of rows in ACCESSLEVELS' = 0
INSERT INTO ACCESSLEVELS VALUES
(0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP')
;
MERGE INTO ACCESSLEVELS
KEY(ID)
VALUES (0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP');
Updates existing rows, and insert rows that don't exist. If no key column is specified, the primary key columns are used to find the row.
If you do not name the columns, their values must be provided as defined in the table. If you prefer to name the columns to be more independent from their order in the table definition, or to avoid having to provide values for all columns when that is not necessary or possible:
MERGE INTO ACCESSLEVELS
(ID, LEVELNAME)
KEY(ID)
VALUES (0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP');
Note that you must include the key column ("ID" in this example) in the column list as well as in the KEY clause.
The following works for MySQL, PostgreSQL, and the H2 database:
drop table ACCESSLEVELS;
create table ACCESSLEVELS(id int, name varchar(255));
insert into ACCESSLEVELS select * from (
select 0, 'admin' union
select 1, 'SEO' union
select 2, 'sales director' union
select 3, 'manager' union
select 4, 'REP'
) x where not exists(select * from ACCESSLEVELS);
To do this you can use MySQL Compatibility Mode in H2 database. Starting from 1.4.197 version it supports the following syntax:
INSERT IGNORE INTO table_name VALUES ...
From this pull request:
INSERT IGNORE is not supported in Regular mode, you have to enable MySQL compatibility mode explicitly by appending ;MODE=MySQL to your database URL or by executing SET MODE MySQL statement.
From official site:
INSERT IGNORE is partially supported and may be used to skip rows with duplicate keys if ON DUPLICATE KEY UPDATE is not specified.
Here is another way:
CREATE TABLE target (C1 VARCHAR(255), C2 VARCHAR(255));
MERGE INTO target AS T USING (SELECT 'foo' C1, 'bar') AS S ON T.C1=S.C1
WHEN NOT MATCHED THEN
INSERT VALUES('foo', 'bar')
When a row in S matches one or more rows in T, do nothing. But when a row in S is not matched, insert it. See "MERGE USING" for more details:
https://www.h2database.com/html/commands.html#merge_using