SAP HANA SQL Query to find all possible combinations between two columns - sql

The target is to create all possible combinations of joining the two columns using SAP HANA SQL. every article of the first column ('100','101','102','103') must be in the combination result.
Sample Code
create table basis
(article Integer,
supplier VarChar(10) );
Insert into basis Values (100, 'A');
Insert into basis Values (101, 'A');
Insert into basis Values (101, 'B');
Insert into basis Values (101, 'C');
Insert into basis Values (102, 'D');
Insert into basis Values (103, 'B');
Result set
combination_nr;article;supplier
1;100;'A'
1;101;'A'
1;102;'D'
1;103;'B'
2;100;'A'
2;101;'B'
2;102;'D'
2;103;'B'
3;100;'A'
3;101;'C'
3;102;'D'
3;103;'B'
Let suppose if we add one more row against 102 as 'A' then our result set will be like this
Also according to the below-given calculations now we have 24 result sets
1;100;'A'
1;101;'A'
1;102;'A'
1;103;'B'
2;100;'A'
2;101;'A'
2;102;'D'
2;103;'B'
3;100;'A'
3;101;'B'
3;102;'A'
3;103;'B'
4;100;'A'
4;101;'B'
4;102;'D'
4;103;'B'
5;100;'A'
5;101;'C'
5;102;'A'
5;103;'B'
6;100;'A'
6;101;'C'
6;102;'D'
6;103;'B'
Calculations:
article 100: 1 supplier ('A')
article 101: 3 suppliers ('A','B','C')
article 102: 1 supplier ('D')
article 103: 1 supplier ('B')
unique articles: 4 (100,101,102,103)
1x3x1x1 x 4 = 12 (combination rows)

How about:
select article,
count(supplier) as nb_supplier,
STRING_AGG(supplier,',') as list_suppliers
from (select distinct article, supplier from basis)
group by article

Related

SQL query for key value table with 1:n relation

I have a table in which I want to store images. Each image has arbitrary properties that I want to store in a key-value table.
The table structure looks like this
id
fk_picture_id
key
value
1
1
camera
iphone
2
1
year
2001
3
1
country
Germany
4
2
camera
iphone
5
2
year
2020
6
2
country
United States
Now I want a query to find all pictures made by an iphone I could to something like this
select
fk_picture_id
from
my_table
where
key = 'camera'
and
value = 'iphone';
This works without any problems. But as soon as I want to add another key to my query I am get stucked. Lets say, I want all pictures made by an iPhone in the year 2020, I can not do something like
select
distinct(fk_picture_id)
from
my_table
where
(
key = 'camera'
and
value = 'iphone'
)
or
(
key = 'year'
and
value = '2020'
)
...because this selects the id 1, 4 and 5.
At the end I might have 20 - 30 different criteria to look for, so I don't think some sub-selects would work at the end.
I'm still in the design phase, which means I can still adjust the data model as well. But I can't think of any way to do this in a reasonable way - except to include the individual properties as columns in my main table.
A pattern you can consider here is to build a table of search parameters, then simply join this to your target table.
You would first create a temporary table with key and value columns then insert into it the search criteria values, any number of values you wish.
Using a CTE in place of a temporary table might look like:
with s as (
select 'camera' key, 'iphone' value
union all
select 'year', '2020'
)
select distinct t.fk_picture_id
from s
join t on t.key=s.key and t.value=s.value
The solution I found - thanks to this article
How to query data based on multiple 'tags' in SQL?
is that I made some changes to the database model
picture
id
name
1
Picture 1
2
Picture 2
And then I created a table for the tags
tag
id
tag
100
Germany
101
IPhone
102
United States
And the cross table
picture_tag
fk_picture_id
fk_tag_id
1
100
1
101
2
101
2
102
For a better understanding of the datasets
Picture
Tagname
Picture 1
Germany & Iphone
Picture 2
United States & IPhone
Now I can use the following statement
SELECT *
FROM picture
INNER JOIN (
SELECT fk_picture_id
FROM picture_tag
WHERE fk_tag_id IN (100, 101)
GROUP BY fk_picture_id
HAVING COUNT(fk_tag_id) = 2
) AS picture_tag
ON picture.id = picture_tag.fk_picture_id;
The only thing I need to do before the query is to collect the IDs of the tags I want to search for and put the number of tags in the having count statement.
If someone needs the example data, here are the sql statements for the tables and data
create table picture (
id integer,
name char(100)
);
create table tag (
id integer,
tag char(100)
);
create table picture_tag (
fk_picture_id integer,
fk_tag_id integer
);
insert into picture values (1, 'Picture 1');
insert into picture values (2, 'Picture 2');
insert into tag values (100, 'Germay');
insert into tag values (101, 'iphone');
insert into tag values (102, 'United States');
insert into picture_tag values (1, 100);
insert into picture_tag values (1, 101);
insert into picture_tag values (2, 101);
insert into picture_tag values (2, 102);

Find all connected groups of data (graphs) in a self joined table?

Given a self joined many to many data set, with join table as below. How can I output the different graphs of connected data points with some kind of filter so that each cluster can be identified?
Data:
CREATE TABLE Partner (MyID VARCHAR(1), PartnerID VARCHAR(1))
INSERT INTO Partner VALUES ('A', 'B')
INSERT INTO Partner VALUES ('B', 'A')
INSERT INTO Partner VALUES ('A', 'C')
INSERT INTO Partner VALUES ('C', 'A')
INSERT INTO Partner VALUES ('C', 'D')
INSERT INTO Partner VALUES ('D', 'C')
INSERT INTO Partner VALUES ('A', 'D')
INSERT INTO Partner VALUES ('D', 'A')
INSERT INTO Partner VALUES ('X', 'Y')
INSERT INTO Partner VALUES ('Y', 'X')
INSERT INTO Partner VALUES ('Z', 'X')
INSERT INTO Partner VALUES ('X', 'Z')
Diagram of record connections
Such as the linked image shows, I desire to have A-B-C-D as one group with ID and X-Y-Z with another group ID. All so that I can ultimately use this query to create a view.
Ideal Result:
MyID PartnerID GroupID
A B 1
B A 1
A C 1
C A 1
C D 1
D C 1
A D 1
D A 1
X Y 2
Y X 2
Z X 2
X Z 2
I have done a little research and have found similar problems that use a CTE with recursion to returns groups of related nodes. However, nothing that I have found works with my particular data set. The issue seems to be with the circular nature of my data references causing either a max recursion limit error or being filtered out. A similar example that fails with my data is: https://www.sqlservercentral.com/Forums/FindPost520864.aspx
Can anyone point me in the right direction on this one? It seems like I could be missing something simple, but the idea of recursive CTEs is already a new and confusing concept to me.
Edit: Maybe simplfying the problem 2-4 nodes is the limit of nodes expected in this system. Anything more, is invalid and I aim to prevent those entries.

Find all rows with the same exact relations as provided in another table

Given these tables:
Table: Test
Columns:
testID int PK
name nvarchar(128) UNIQUE NOT NULL
Table: [Test-Inputs]
Columns
inputsTableName nvarchar(128) UNIQUE PK
testID int PK FK
Temporary Table: ##TestSearchParams
Columns:
inputsTableName nvarchar(128) UNIQUE NOT NULL
I need to find Tests that have entries in Test-Inputs with inputsTableNames matching EXACTLY ALL of the entries in ##TestSearchParams; the resulting tests relationships must be exactly the ones listed in ##TestSearchParams.
Essentially I am finding tests with ONLY the given relationships, no more, no less. I am matching names with LIKE and wildcards, but that is a sidenote that I believe I can solve after the core logic is there for exact matching.
This is my current query:
Select *
From Tests As B
Where B.testID In (
Select ti
From (
Select (
Select Count(inputsTableName)
From [Test-Inputs]
Where [Test-Inputs].testID = B.testID
) - Count(Distinct i1) As delta,
ti
From (
Select [Test-Inputs].inputsTableName As i1,
[Test-Inputs].testID As ti
From ##TableSearchParams
Join [Test-Inputs]
On [Test-Inputs].inputsTableName Like ##TableSearchParams.inputsTableName
And B.testID = [Test-Inputs].testID
) As A
Group By ti
) As D
Where D.delta = 0
);
The current problem is that his seems to retrieve Tests with a match to ANY of the entries in ##TableSearchParams. I have tried several other queries before this, to varying levels of success. I have working queries for find tests that match any of the parameters, all of the paramters, and none of the parameters -- I just cant get this query working.
Here are some sample table values:
Tests
1, Test1
2, Test2
3, Test3
[Test-Inputs]
Table1, 1
Table2, 2
Table1, 3
Table2, 3
TestSearchParams
Table1
Table2
The given values should only return (3, Test3)
Here's a possible solution that works by getting the complete set of TestInputs for each record in Tests, left-joining to the set of search parameters, and then aggregating the results by test and making two observations:
First, if a record from Tests includes a TestInput that is not among the search parameters, then that record must be excluded from the result set. We can check this by seeing if there is any case in which the left-join described above did not produce a match in the search parameters table.
Second, if a record from Tests satisfies the first condition, then we know that it doesn't have any superfluous TestInput records, so the only problem it could have is if there exists a search parameter that is not among its TestInputs. If that is so, then the number of records we've aggregated for that Test will be less than the total number of search parameters.
I have made the assumption here that you don't have Tests records with duplicate TestInputs, and that you likewise don't use duplicate search parameters. If those assumptions are not valid then this becomes more complicated. But if they are, then this ought to work:
declare #Tests table (testID int, [name] nvarchar(128));
declare #TestInputs table (testID int, inputsTableName nvarchar(128));
declare #TestSearchParams table (inputsTableName nvarchar(128));
-- Sample data.
--
-- testID 1 has only a subset of the search parameters.
-- testID 2 matches the search parameters exactly.
-- testID 3 has a superset of the search parameters.
--
-- Therefore the result set should include testID 2 only.
insert #Tests values
(1, 'Table A'),
(2, 'Table B'),
(3, 'Table C');
insert #TestInputs values
(1, 'X'),
(2, 'X'),
(2, 'Y'),
(3, 'X'),
(3, 'Y'),
(3, 'Z');
insert #TestSearchParams values
('X'),
('Y');
declare #ParamCount int;
select #ParamCount = count(1) from #TestSearchParams;
select
Tests.testID,
Tests.[name]
from
#Tests Tests
inner join #TestInputs Inputs on Tests.testID = Inputs.testID
left join #TestSearchParams Search on Inputs.inputsTableName = Search.inputsTableName
group by
Tests.testID,
Tests.[name]
having
-- If a group includes any record where Search.inputsTableName is null, it means that
-- the record in Tests has a TestInput that is not among the search parameters.
sum(case when Search.inputsTableName is null then 1 else 0 end) = 0 and
-- If a group includes fewer records than there are search parameters, it means that
-- there exists some parameter that was not found among the Tests record's TestInputs.
count(1) = #ParamCount;

H2 SQL database - INSERT if the record does not exist

I would like initialize a H2 database, but I am not sure if the records exist. If they exist I don't want to do anything, but if they don't exist I would like to write the default values.
Something like this:
IF 'number of rows in ACCESSLEVELS' = 0
INSERT INTO ACCESSLEVELS VALUES
(0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP')
;
MERGE INTO ACCESSLEVELS
KEY(ID)
VALUES (0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP');
Updates existing rows, and insert rows that don't exist. If no key column is specified, the primary key columns are used to find the row.
If you do not name the columns, their values must be provided as defined in the table. If you prefer to name the columns to be more independent from their order in the table definition, or to avoid having to provide values for all columns when that is not necessary or possible:
MERGE INTO ACCESSLEVELS
(ID, LEVELNAME)
KEY(ID)
VALUES (0, 'admin'),
(1, 'SEO'),
(2, 'sales director'),
(3, 'manager'),
(4, 'REP');
Note that you must include the key column ("ID" in this example) in the column list as well as in the KEY clause.
The following works for MySQL, PostgreSQL, and the H2 database:
drop table ACCESSLEVELS;
create table ACCESSLEVELS(id int, name varchar(255));
insert into ACCESSLEVELS select * from (
select 0, 'admin' union
select 1, 'SEO' union
select 2, 'sales director' union
select 3, 'manager' union
select 4, 'REP'
) x where not exists(select * from ACCESSLEVELS);
To do this you can use MySQL Compatibility Mode in H2 database. Starting from 1.4.197 version it supports the following syntax:
INSERT IGNORE INTO table_name VALUES ...
From this pull request:
INSERT IGNORE is not supported in Regular mode, you have to enable MySQL compatibility mode explicitly by appending ;MODE=MySQL to your database URL or by executing SET MODE MySQL statement.
From official site:
INSERT IGNORE is partially supported and may be used to skip rows with duplicate keys if ON DUPLICATE KEY UPDATE is not specified.
Here is another way:
CREATE TABLE target (C1 VARCHAR(255), C2 VARCHAR(255));
MERGE INTO target AS T USING (SELECT 'foo' C1, 'bar') AS S ON T.C1=S.C1
WHEN NOT MATCHED THEN
INSERT VALUES('foo', 'bar')
When a row in S matches one or more rows in T, do nothing. But when a row in S is not matched, insert it. See "MERGE USING" for more details:
https://www.h2database.com/html/commands.html#merge_using

Iterating through a social graph in a SQL database

I store simple social-graph information like so:
People ( PersonId bigint, Name nvarchar )
Relationships ( From bigint, To bigint, Title nvarchar )
So the data looks something like this:
People
1, John Smith
2, Joan Smith
3, Jack Smith
Relationships
1, 2, Spouse
1, 3, Parent
2, 3, Parent
Note that the titles of relationships are normalized: so there is no "husband" and "wife", only "spouse", which also avoids needing to create two separate relationships that form the same link, the same applies with "Parent" instead of "Son" or "Daughter".
The question is how you can iterate through an entire connected-graph (i.e. only return a single family), and, for example, find siblings without needing to create an explicit Sibling relationship entry. The nodes don't necessarily need to be returned in any particular order. I might also want to only return nodes that are at most N degrees away from a given start node.
I know you can do recursive SQL SELECT statements with some new tricks in recent SQL language versions, but this isn't necessarily a recursive operation because these relationships can express a cyclic non-directional graph (think if "Friend" was added as a relationship). How would you do that in SQL?
Very cool problem. While it's a social network graph, it is still a hierarchical problem, even though the hierarchy can logistically turn into a web of interconnections. In MSSQL you still want to use a WITH clause to do a recursive query, the only difference is that due to the multiple interconnections you need to ensure unique results, either with DISTINCT or by using an IN clause in the WHERE condition.
This works:
DECLARE #PersonID bigint;
SET #PersonID = 1;
WITH RecurseRelations (PersonID, OriginalPersonID)
AS
(
SELECT PersonID, PersonId OriginalPersonID
FROM People
UNION ALL
SELECT ToPersonID, RR.OriginalPersonID
FROM Relationships R
INNER JOIN
RecurseRelations RR
ON
R.FromPersonID = RR.PersonID
)
SELECT PersonId, Name
FROM People
WHERE PersonId IN
(
SELECT PersonID
FROM RecurseRelations
WHERE OriginalPersonID = #PersonID
)
Here's some test data with more relations than you had originally and a whole other family to make sure it's not picking up more than intended.
create table People ( PersonId bigint, Name nvarchar(200) );
create table Relationships ( FromPersonID bigint, ToPersonID bigint, Title nvarchar(200) );
insert into People values (1, 'John Smith');
insert into People values (2, 'Joan Smith');
insert into People values (3, 'Jack Smith');
insert into People values (4, 'Joey Smith');
insert into People values (9, 'Jaime Smith');
insert into People values (5, 'Edward Jones');
insert into People values (6, 'Emma Jones');
insert into People values (7, 'Eva Jones');
insert into People values (8, 'Eve Jones');
insert into Relationships values (1, 2, 'Spouse');
insert into Relationships values (1, 3, 'Parent');
insert into Relationships values (2, 3, 'Parent');
insert into Relationships values (3, 4, 'Child');
insert into Relationships values (2, 4, 'Child');
insert into Relationships values (4, 9, 'Child');
insert into Relationships values (5, 6, 'Spouse');
insert into Relationships values (5, 7, 'Parent');
insert into Relationships values (6, 7, 'Parent');
insert into Relationships values (5, 8, 'Child');