Find all connected groups of data (graphs) in a self joined table? - sql

Given a self joined many to many data set, with join table as below. How can I output the different graphs of connected data points with some kind of filter so that each cluster can be identified?
Data:
CREATE TABLE Partner (MyID VARCHAR(1), PartnerID VARCHAR(1))
INSERT INTO Partner VALUES ('A', 'B')
INSERT INTO Partner VALUES ('B', 'A')
INSERT INTO Partner VALUES ('A', 'C')
INSERT INTO Partner VALUES ('C', 'A')
INSERT INTO Partner VALUES ('C', 'D')
INSERT INTO Partner VALUES ('D', 'C')
INSERT INTO Partner VALUES ('A', 'D')
INSERT INTO Partner VALUES ('D', 'A')
INSERT INTO Partner VALUES ('X', 'Y')
INSERT INTO Partner VALUES ('Y', 'X')
INSERT INTO Partner VALUES ('Z', 'X')
INSERT INTO Partner VALUES ('X', 'Z')
Diagram of record connections
Such as the linked image shows, I desire to have A-B-C-D as one group with ID and X-Y-Z with another group ID. All so that I can ultimately use this query to create a view.
Ideal Result:
MyID PartnerID GroupID
A B 1
B A 1
A C 1
C A 1
C D 1
D C 1
A D 1
D A 1
X Y 2
Y X 2
Z X 2
X Z 2
I have done a little research and have found similar problems that use a CTE with recursion to returns groups of related nodes. However, nothing that I have found works with my particular data set. The issue seems to be with the circular nature of my data references causing either a max recursion limit error or being filtered out. A similar example that fails with my data is: https://www.sqlservercentral.com/Forums/FindPost520864.aspx
Can anyone point me in the right direction on this one? It seems like I could be missing something simple, but the idea of recursive CTEs is already a new and confusing concept to me.
Edit: Maybe simplfying the problem 2-4 nodes is the limit of nodes expected in this system. Anything more, is invalid and I aim to prevent those entries.

Related

Use extra columns in INSERT values list in ON CONFLICT UPDATE

I have an UPSERT query where I want to insert value y value in column b, but if it already exists I want to update it with value z.
INSERT INTO test (a,b)
select P.x,P.y
from (VALUES
('123', 4, 5),
('345', 2, 2)
) K(x,y,z)
ON CONFLICT (a) DO UPDATE
SET b = K.z;
How can I achieve this?
P.S: A simple value list (without select) did not work because we cannot have more columns in values list than we are inserting.
In the SET part, you can only reference columns of the target table and the corresponding "values" through the excluded record. Neither of them has a column named z
The only way I can think of, is to put the values into a CTE and access the column z through a sub-query:
with data (x,y,z) as (
VALUES
(123, 4, 6),
(345, 2, 3)
)
INSERT INTO test (a,b)
select d1.x, d1.y
from data d1
ON CONFLICT (a) DO UPDATE
SET b = (select d2.z from data d2 where d2.x = excluded.a);
The above assumes that a is the primary (or unique) key of the table.
Online example

SAP HANA SQL Query to find all possible combinations between two columns

The target is to create all possible combinations of joining the two columns using SAP HANA SQL. every article of the first column ('100','101','102','103') must be in the combination result.
Sample Code
create table basis
(article Integer,
supplier VarChar(10) );
Insert into basis Values (100, 'A');
Insert into basis Values (101, 'A');
Insert into basis Values (101, 'B');
Insert into basis Values (101, 'C');
Insert into basis Values (102, 'D');
Insert into basis Values (103, 'B');
Result set
combination_nr;article;supplier
1;100;'A'
1;101;'A'
1;102;'D'
1;103;'B'
2;100;'A'
2;101;'B'
2;102;'D'
2;103;'B'
3;100;'A'
3;101;'C'
3;102;'D'
3;103;'B'
Let suppose if we add one more row against 102 as 'A' then our result set will be like this
Also according to the below-given calculations now we have 24 result sets
1;100;'A'
1;101;'A'
1;102;'A'
1;103;'B'
2;100;'A'
2;101;'A'
2;102;'D'
2;103;'B'
3;100;'A'
3;101;'B'
3;102;'A'
3;103;'B'
4;100;'A'
4;101;'B'
4;102;'D'
4;103;'B'
5;100;'A'
5;101;'C'
5;102;'A'
5;103;'B'
6;100;'A'
6;101;'C'
6;102;'D'
6;103;'B'
Calculations:
article 100: 1 supplier ('A')
article 101: 3 suppliers ('A','B','C')
article 102: 1 supplier ('D')
article 103: 1 supplier ('B')
unique articles: 4 (100,101,102,103)
1x3x1x1 x 4 = 12 (combination rows)
How about:
select article,
count(supplier) as nb_supplier,
STRING_AGG(supplier,',') as list_suppliers
from (select distinct article, supplier from basis)
group by article

How to get end and start point of all lines of a hierarchical tree?

I have an SQLite DB with a bench of different hierarchical trees (similar to the image below), where I only want to get the start and the end points of each tree line.
In blue are the tree IDs (called ws_id), in green are the wanted start and end points and in red the unwanted objects between start and end points.
Here a data example with the same structure as the hierarchical tree above and a similar data structure to mine:
CREATE TABLE feat_link
(ws_id integer, source_node varchar(10), target_node varchar(10));
ALTER TABLE feat_link ADD PRIMARY KEY (ws_id);
INSERT INTO feat_link
VALUES ('b', '1', '36');
INSERT INTO feat_link
VALUES ('b', '1', '17');
INSERT INTO feat_link
VALUES ('b', '36', '21');
INSERT INTO feat_link
VALUES ('b', '2', '20');
INSERT INTO feat_link
VALUES ('b', '3', '37');
INSERT INTO feat_link
VALUES ('b', '37', '24');
As you can see the source_node value is only match with the next target_node value and not to the final node of a tree line. What I need is matching (I think recursive query), that first recognises which source_nodes are really the beginninig of a tree (attention, for example B is not expected) and which is the last point of that line. The further value columns are not relevant.
Here is my expected result:
What we tried so far were RECURSIVE queries. Here an example assuming that my data table above is called "feat_link":
WITH RECURSIVE target(x) AS (
SELECT (select 1 from feat_link)
UNION ALL
SELECT feat_link.target_node
FROM feat_link, target
WHERE feat_link.source_node=target.x
AND feat_link.source_node IS NOT NULL
and feat_link.ws_id = 'B'
)
select distinct x from target;
Do you have any ideas how to improve the code or even a better idea? We only get sometimes a return and the results seem not to be always true.
First, enumerate all possible lines (wanted and unwanted) in the standard way:
WITH RECURSIVE lines(ws_id, source_node, target_node) AS (
-- start with all nodes that have no link to their start
SELECT ws_id, source_node, target_node
FROM feat_link
WHERE (ws_id, source_node) NOT IN (SELECT ws_id, target_node
FROM feat_link)
UNION ALL
SELECT l.ws_id, l.source_node, f.target_node
FROM feat_link f
JOIN lines l ON (f.ws_id, f.source_node) = (l.ws_id, l.target_node)
)
...
Then filter out all lines that are part of a longer line, i.e., that have a link from their end:
...
SELECT *
FROM lines
WHERE (ws_id, target_node) NOT IN (SELECT ws_id, source_node
FROM feat_link);

Find all rows with the same exact relations as provided in another table

Given these tables:
Table: Test
Columns:
testID int PK
name nvarchar(128) UNIQUE NOT NULL
Table: [Test-Inputs]
Columns
inputsTableName nvarchar(128) UNIQUE PK
testID int PK FK
Temporary Table: ##TestSearchParams
Columns:
inputsTableName nvarchar(128) UNIQUE NOT NULL
I need to find Tests that have entries in Test-Inputs with inputsTableNames matching EXACTLY ALL of the entries in ##TestSearchParams; the resulting tests relationships must be exactly the ones listed in ##TestSearchParams.
Essentially I am finding tests with ONLY the given relationships, no more, no less. I am matching names with LIKE and wildcards, but that is a sidenote that I believe I can solve after the core logic is there for exact matching.
This is my current query:
Select *
From Tests As B
Where B.testID In (
Select ti
From (
Select (
Select Count(inputsTableName)
From [Test-Inputs]
Where [Test-Inputs].testID = B.testID
) - Count(Distinct i1) As delta,
ti
From (
Select [Test-Inputs].inputsTableName As i1,
[Test-Inputs].testID As ti
From ##TableSearchParams
Join [Test-Inputs]
On [Test-Inputs].inputsTableName Like ##TableSearchParams.inputsTableName
And B.testID = [Test-Inputs].testID
) As A
Group By ti
) As D
Where D.delta = 0
);
The current problem is that his seems to retrieve Tests with a match to ANY of the entries in ##TableSearchParams. I have tried several other queries before this, to varying levels of success. I have working queries for find tests that match any of the parameters, all of the paramters, and none of the parameters -- I just cant get this query working.
Here are some sample table values:
Tests
1, Test1
2, Test2
3, Test3
[Test-Inputs]
Table1, 1
Table2, 2
Table1, 3
Table2, 3
TestSearchParams
Table1
Table2
The given values should only return (3, Test3)
Here's a possible solution that works by getting the complete set of TestInputs for each record in Tests, left-joining to the set of search parameters, and then aggregating the results by test and making two observations:
First, if a record from Tests includes a TestInput that is not among the search parameters, then that record must be excluded from the result set. We can check this by seeing if there is any case in which the left-join described above did not produce a match in the search parameters table.
Second, if a record from Tests satisfies the first condition, then we know that it doesn't have any superfluous TestInput records, so the only problem it could have is if there exists a search parameter that is not among its TestInputs. If that is so, then the number of records we've aggregated for that Test will be less than the total number of search parameters.
I have made the assumption here that you don't have Tests records with duplicate TestInputs, and that you likewise don't use duplicate search parameters. If those assumptions are not valid then this becomes more complicated. But if they are, then this ought to work:
declare #Tests table (testID int, [name] nvarchar(128));
declare #TestInputs table (testID int, inputsTableName nvarchar(128));
declare #TestSearchParams table (inputsTableName nvarchar(128));
-- Sample data.
--
-- testID 1 has only a subset of the search parameters.
-- testID 2 matches the search parameters exactly.
-- testID 3 has a superset of the search parameters.
--
-- Therefore the result set should include testID 2 only.
insert #Tests values
(1, 'Table A'),
(2, 'Table B'),
(3, 'Table C');
insert #TestInputs values
(1, 'X'),
(2, 'X'),
(2, 'Y'),
(3, 'X'),
(3, 'Y'),
(3, 'Z');
insert #TestSearchParams values
('X'),
('Y');
declare #ParamCount int;
select #ParamCount = count(1) from #TestSearchParams;
select
Tests.testID,
Tests.[name]
from
#Tests Tests
inner join #TestInputs Inputs on Tests.testID = Inputs.testID
left join #TestSearchParams Search on Inputs.inputsTableName = Search.inputsTableName
group by
Tests.testID,
Tests.[name]
having
-- If a group includes any record where Search.inputsTableName is null, it means that
-- the record in Tests has a TestInput that is not among the search parameters.
sum(case when Search.inputsTableName is null then 1 else 0 end) = 0 and
-- If a group includes fewer records than there are search parameters, it means that
-- there exists some parameter that was not found among the Tests record's TestInputs.
count(1) = #ParamCount;

How can I intersect two ActiveRecord::Relations on an arbitrary column?

If I have a people table with the following structure and records:
drop table if exists people;
create table people (id int, name varchar(255));
insert into people values (1, "Amy");
insert into people values (2, "Bob");
insert into people values (3, "Chris");
insert into people values (4, "Amy");
insert into people values (5, "Bob");
insert into people values (6, "Chris");
I'd like to find the intersection of people with ids (1, 2, 3) and (4, 5, 6) based on the name column.
In SQL, I'd do something like this:
select
group_concat(id),
group_concat(name)
from people
group by name;
Which returns this result set:
id | name
----|----------
1,4 | Amy,Amy
2,5 | Bob,Bob
3,6 | Chris,Chris
In Rails, I'm not sure how to solve this.
My closest so far is:
a = Model.where(id: [1, 2, 3])
b = Model.where(id: [4, 5, 6])
a_results = a.where(name: b.pluck(:name)).order(:name)
b_results = b.where(name: a.pluck(:name)).order(:name)
a_results.zip(b_results)
This seems to work, but I have the following reservations:
Performance - is this going to perform well in the database?
Lazy enumeration - does calling #zip break lazy enumeration of records?
Duplicates - what will happen if either set contains more than one record for a given name? What will happen if a set contains more than one of the same id?
Any thoughts or suggestions?
Thanks
You can use your normal sql method to get this arbitrary column in ruby like so:
#people = People.select("group_concat(id) as somecolumn1, group_concat(name) as somecolumn2").group("group_concat(id), group_concat(name)")
For each record in #people you will now have somecolumn1/2 attributes.