Querying based on a set of Named Attributes/Values - sql

I am working with a set of what is essentially Attribute/Value pairs (there's actually quite a bit more to this, but I'm simplifying for the sake of this question). Effectively you can think of the tables as such:
Entities (EntityID,AttributeName,AttributeValue) PK=EntityID,AttributeName
Targets (TargetID,AttributeName,AttributeValue) PK=TargetID,AttributeName
How would you query with SQL the set of EntityID,TargetID for which an Entity has all the attributes for a target as well as the corresponding value?
EDIT (DDL as requested):
CREATE TABLE Entities(
EntityID INTEGER NOT NULL,
AttributeName CHAR(50) NOT NULL,
AttributeValue CHAR(50) NOT NULL,
CONSTRAINT EntitiesPK PRIMARY KEY (EntityID,AttributeName)
);
CREATE TABLE Targets(
TargetID INTEGER NOT NULL,
AttributeName CHAR(50) NOT NULL,
AttributeValue CHAR(50) NOT NULL,
CONSTRAINT TargetsPK PRIMARY KEY (TargetID,AttributeName)
);

Okay, I think after several tries and edits, this solution finally works:
SELECT e1.EntityID, t1.TargetID
FROM Entities e1
JOIN Entities e2 ON (e1.EntityID = e2.EntityID)
CROSS JOIN Targets t1
LEFT OUTER JOIN Targets t2 ON (t1.TargetID = t2.TargetID
AND e2.AttributeName = t2.AttributeName
AND e2.AttributeValue = t2.AttributeValue)
GROUP BY e1.EntityID, t1.TargetID
HAVING COUNT(e2.AttributeValue) = COUNT(t2.AttributeValue);
Test data:
INSERT INTO Entities VALUES
-- exact same attributes, should match
(1, 'Foo1', '123'),
(1, 'Bar1', '123'),
-- same attributes but different values, should not match
(2, 'Foo2', '456'),
(2, 'Bar2', '456'),
-- more columns in Entities, should not match
(3, 'Foo3', '789'),
(3, 'Bar3', '789'),
(3, 'Baz3', '789'),
-- fewer columns in Entities, should match
(4, 'Foo4', '012'),
(4, 'Bar4', '012'),
-- same as case 1, should match Target 1
(5, 'Foo1', '123'),
(5, 'Bar1', '123'),
-- one attribute with different value, should not match
(6, 'A', 'one'),
(6, 'B', 'two');
INSERT INTO Targets VALUES
(1, 'Foo1', '123'),
(1, 'Bar1', '123'),
(2, 'Foo2', 'abc'),
(2, 'Bar2', 'abc'),
(3, 'Foo3', '789'),
(3, 'Bar3', '789'),
(4, 'Foo4', '012'),
(4, 'Bar4', '012'),
(4, 'Baz4', '012'),
(6, 'A', 'one'),
(6, 'B', 'twox');
Test results:
+----------+----------+
| EntityID | TargetID |
+----------+----------+
| 1 | 1 |
| 4 | 4 |
| 5 | 1 |
+----------+----------+
To respond to your comment, here is a query with the tables reversed:
SELECT e1.EntityID, t1.TargetID
FROM Targets t1
JOIN Targets t2 ON (t1.TargetID = t2.TargetID)
CROSS JOIN Entities e1
LEFT OUTER JOIN Entities e2 ON (e1.EntityID = e2.EntityID
AND t2.AttributeName = e2.AttributeName
AND t2.AttributeValue = e2.AttributeValue)
GROUP BY e1.EntityID, t1.TargetID
HAVING COUNT(e2.AttributeValue) = COUNT(t2.AttributeValue);
And here's the output, given the same input data above.
+----------+----------+
| EntityID | TargetID |
+----------+----------+
| 1 | 1 |
| 3 | 3 |
| 5 | 1 |
+----------+----------+

I like these kind of questions but I think it is not unreasonable to hope that the OP provides at least create scripts for the table(s) and maybe even some sample data.
I like to hear who agrees and who disagrees.

SELECT *
FROM (
SELECT eo.total,
(
SELECT COUNT(*)
FROM Entities e, Targets t
WHERE e.EntityID = eo.EntityID
AND t.TargetID = e.EntityID
AND t.AttributeName = e.AttributeName
AND t.AttributeValue = e.AttributeValue
) AS equal
FROM (
SELECT e.EntityID, COUNT(*) as total
FROM Entities e
GROUP BY
e.EntityID
) eo
)
WHERE total = equal

select distinct entityid,targetid
from entities ent
, targets tar
where not exists
( select attributename, AttributeValue
from targets tar2
where tar.targetid = tar2.targetid
minus
select attributename, AttributeValue
from entities ent2
where ent2.entityid = ent.entityid)
and not exists
( select attributename, AttributeValue
from entities ent2
where ent2.entityid = ent.entityid
minus
select attributename, AttributeValue
from targets tar2
where tar.targetid = tar2.targetid)
order by entityid,targetid
/
edit1:
If it is OK to have rows in the target table that have no match in the entities table, the solution simplifies to:
select distinct entityid,targetid
from entities ent
, targets tar
where not exists
( select attributename, AttributeValue
from entities ent2
where ent2.entityid = ent.entityid
minus
select attributename, AttributeValue
from targets tar2
where tar.targetid = tar2.targetid)
order by entityid,targetid
/
edit 2:
It is not easy to understand the exact requirements of the OP.
Here is a new select statement. I hope he will test all my select statements to understand the differences. I hope he has good test cases and knows what he wants.
select distinct entityid,targetid
from entities ent
, targets tar
where not exists
( select attributename, AttributeValue
from targets tar2
where tar.targetid = tar2.targetid
minus
select attributename, AttributeValue
from entities ent2
where ent2.entityid = ent.entityid)
order by entityid,targetid
/

Related

Rule based select with source tracking

Here is an example of the sample dataset that I am having trouble writing an efficient SQL:
There is a target table T1 with 5 columns ID (primary key), NAME, CATEGORY, HEIGHT, LINEAGE
T1 gets data from 3 sources - source1, source2, source3
A map table defines the rule as to which column has to be picked in what order from which source
If a source has NULL value for a column, then check the next source to get the value - that's the rule
So the values for target table columns based on the rules are as below for ID = 1:
Name: A12, CATEGORY: T1, HEIGHT: 4, Lineage: S3-S1-S1
The values for target table columns based on the rules are as below for ID = 2:
NAME: B, CATEGORY: T22, HEIGHT: 5, Lineage: S3-S2-S1
The logic to merge into target should look like this:
Merge into Target T
using (select statement with rank based rules from 3 source tables) on
when matched then
when not matched then
Question: any suggestions on writing this Merge in an efficient way which also should update the Lineage in the merge?
First, the MAP table must have a column that will give priority to the mapping.
Then you should PIVOT this table.
The next step is to combine UNION ALL of all source tables.
And finally, we can join all and select our values with the FIRST_VALUE function.
Having such a result, you can substitute it in MERGE.
Structure and sample data for testing:
CREATE OR REPLACE TABLE SOURCE1 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE SOURCE2 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE SOURCE3 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE MAP (
PRIORITY int,
SOURCE_COLUMN string,
SOURCE_TABLE string);
INSERT INTO SOURCE1 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A', 'T1', 4),
(2, 'B', 'T2', 5),
(3, 'C', 'T3', 6);
INSERT INTO SOURCE2 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A1', 'T1', 4.4),
(2, 'B1', 'T22', 6),
(3, NULL, 'T3', 7.2);
INSERT INTO SOURCE3 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A12', 'T21', NULL),
(2, 'B', NULL, 6),
(3, 'C3', 'T3', NULL);
INSERT INTO MAP (PRIORITY, SOURCE_COLUMN, SOURCE_TABLE)
VALUES (1, 'NAME', 'SOURCE3'),
(2, 'NAME', 'SOURCE1'),
(3, 'NAME', 'SOURCE2'),
(1, 'CATEGORY', 'SOURCE2'),
(2, 'CATEGORY', 'SOURCE3'),
(3, 'CATEGORY', 'SOURCE1'),
(1, 'HEIGHT', 'SOURCE1'),
(2, 'HEIGHT', 'SOURCE2'),
(3, 'HEIGHT', 'SOURCE3');
And my suggestion for a solution:
WITH _MAP AS (
SELECT *
FROM MAP
PIVOT (MAX(SOURCE_TABLE) FOR SOURCE_COLUMN IN ('NAME', 'CATEGORY', 'HEIGHT')) AS p(PRIORITY, NAME, CATEGORY, HEIGHT)
), _SRC AS (
SELECT 'SOURCE1' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE1
UNION ALL
SELECT 'SOURCE2' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE2
UNION ALL
SELECT 'SOURCE3' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE3
)
SELECT DISTINCT _SRC.ID,
FIRST_VALUE(_SRC.NAME) OVER(PARTITION BY _SRC.ID ORDER BY MN.PRIORITY) AS NAME,
FIRST_VALUE(_SRC.CATEGORY) OVER(PARTITION BY _SRC.ID ORDER BY MC.PRIORITY) AS CATEGORY,
FIRST_VALUE(_SRC.HEIGHT) OVER(PARTITION BY _SRC.ID ORDER BY MH.PRIORITY) AS HEIGHT,
REPLACE(FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MN.PRIORITY) || '-' ||
FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MC.PRIORITY) || '-' ||
FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MH.PRIORITY), 'SOURCE', 'S') AS LINEAGE
FROM _SRC
LEFT JOIN _MAP AS MN ON _SRC.SOURCE_TABLE = MN.NAME AND _SRC.NAME IS NOT NULL
LEFT JOIN _MAP AS MC ON _SRC.SOURCE_TABLE = MC.CATEGORY AND _SRC.CATEGORY IS NOT NULL
LEFT JOIN _MAP AS MH ON _SRC.SOURCE_TABLE = MH.HEIGHT AND _SRC.HEIGHT IS NOT NULL;
Result:
+----+------+----------+--------+----------+
| ID | NAME | CATEGORY | HEIGHT | LINEAGE |
+----+------+----------+--------+----------+
| 1 | A12 | T1 | 4 | S3-S2-S1 |
| 2 | B | T22 | 5 | S3-S2-S1 |
| 3 | C3 | T3 | 6 | S3-S2-S1 |
+----+------+----------+--------+----------+

Select TOP columns from table1, join table2 with their names

I have a TABLE1 with these two columns, storing departure and arrival identifiers from flights:
dep_id arr_id
1 2
6 2
6 2
6 2
6 2
3 2
3 2
3 2
3 4
3 4
3 6
3 6
and a TABLE2 with the respective IDs containing their ICAO codes:
id icao
1 LPPT
2 LPFR
3 LPMA
4 LPPR
5 LLGB
6 LEPA
7 LEMD
How can i select the top count of TABLE1 (most used departure id and most used arrival id) and group it with the respective ICAO code from TABLE2, so i can get from the provided example data:
most_arrivals most_departures
LPFR LPMA
It's simple to get ONE of them, but mixing two or more columns doesn't seem to work for me no matter what i try.
You can do it like this.
Create and populate tables.
CREATE TABLE dbo.Icao
(
id int NOT NULL PRIMARY KEY,
icao nchar(4) NOT NULL
);
CREATE TABLE dbo.Flight
(
dep_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id),
arr_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id)
);
INSERT INTO dbo.Icao (id, icao)
VALUES
(1, N'LPPT'),
(2, N'LPFR'),
(3, N'LPMA'),
(4, N'LPPR'),
(5, N'LLGB'),
(6, N'LEPA'),
(7, N'LEMD');
INSERT INTO dbo.Flight (dep_id, arr_id)
VALUES
(1, 2),
(6, 2),
(6, 2),
(6, 2),
(6, 2),
(3, 2),
(3, 2),
(3, 2),
(3, 4),
(3, 4),
(3, 6),
(3, 6);
Then do a SELECT using two subqueries.
SELECT
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.arr_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_arrivals',
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.dep_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_departures';
Click this button on the toolbar to include the actual execution plan, when you execute the query.
And this is the graphical execution plan for the query. Each icon represents an operation that will be performed by the SQL Server engine. The arrows represent data flows. The direction of flow is from right to left, so the result is the leftmost icon.
try this one:
select
(select name
from table2 where id = (
select top 1 arr_id
from table1
group by arr_id
order by count(*) desc)
) as most_arrivals,
(select name
from table2 where id = (
select top 1 dep_id
from table1
group by dep_id
order by count(*) desc)
) as most_departures

query to count number of unique relations

I have 3 tables:
t_user (id, name)
t_user_deal (id, user_id, deal_id)
t_deal (id, title)
multiple user can be linked to the same deal. (I'm using oracle but it should be similar, I can adapt it)
How can I get all the users (name) with the number of unique user he made a deal with.
let's explain with some data:
t_user:
id, name
1, joe
2, mike
3, John
t_deal:
id, title
1, deal number 1
2, deal number 2
t_user_deal:
id, user_id, deal_id
1, 1, 1
2, 2, 1
3, 1, 2
4, 3, 2
the result I expect:
user_name, number of unique user he made a deal with
Joe, 2
Mike, 1
John, 1
I've try this but I didn't get the expected result:
SELECT tu.name,
count(tu.id) AS nbRelations
FROM t_user tu
INNER JOIN t_user_deal tud ON tu.id = tud.user_id
INNER JOIN t_deal td ON tud.deal_id = td.id
WHERE
(
td.id IN
(
SELECT DISTINCT td.id
FROM t_user_deal tud2
INNER JOIN t_deal td2 ON tud2.deal_id = td2.id
WHERE tud.id <> tud2.user_id
)
)
GROUP BY tu.id
ORDER BY nbRelations DESC
thanks for your help
This should get you the result
SELECT id1, count(id2),name
FROM (
SELECT distinct tud1.user_id id1 , tud2.user_id id2
FROM t_user_deal tud1, t_user_deal tud2
WHERE tud1.deal_id = tud2.deal_id
and tud1.user_id <> tud2.user_id) as tab, t_user tu
WHERE tu.id = id1
GROUP BY id1,name
Something like
select name, NVL (i.ud, 0) ud from t_user join (
SELECT user_id, count(*) ud from t_user_deal group by user_id) i on on t_user.id = i.user_id
where i.ud > 0
Unless I'm missing somethig here. It actually sounds like your question references having a second user in the t_user_deal table. The model you've described here doesn't include that.
PostgreSQL example:
create table t_user (id int, name varchar(255)) ;
create table t_deal (id int, title varchar(255)) ;
create table t_user_deal (id int, user_id int, deal_id int) ;
insert into t_user values (1, 'joe'), (2, 'mike'), (3, 'john') ;
insert into t_deal values (1, 'deal 1'), (2, 'deal 2') ;
insert into t_user_deal values (1, 1, 1), (2, 2, 1), (3, 1, 2), (4, 3, 2) ;
And the query.....
SELECT
name, COUNT(DISTINCT deal_id)
FROM
t_user INNER JOIN t_user_deal ON (t_user.id = t_user_deal.user_id)
GROUP BY
user_id, name ;
The DISTINCT might not be necessary (in the COUNT(), that is). Depends on how clean your data is (e.g., no duplicate rows!)
Here's the result in PostgreSQL:
name | count
------+-------
joe | 2
mike | 1
john | 1
(3 rows)

Find rows with same ID and have a particular set of names

EDIT:
I have a table with 3 rows like so.
ID NAME REV
1 A 0
1 B 0
1 C 0
2 A 1
2 B 0
2 C 0
3 A 1
3 B 1
I want to find the ID wich has a particular set of Names and the REV is same
example:
Edit2: GBN's solution would have worked perfectly, but since i do not have the access to create new tables. The added constraint is that no new tables can be created.
if input = A,B then output is 3
if input = A ,B,C then output is 1 and not 1,2 since the rev level differs in 2.
The simplest way is to compare a COUNT per ID with the number of elements in your list:
SELECT
ID
FROM
MyTable
WHERE
NAME IN ('A', 'B', 'C')
GROUP BY
ID
HAVING
COUNT(*) = 3;
Note: ORDER BY isn't needed and goes after the HAVING if needed
Edit, with question update. In MySQL, it's easier to use a separate table for search terms
DROP TABLE IF EXISTS gbn;
CREATE TABLE gbn (ID INT, `name` VARCHAR(100), REV INT);
INSERT gbn VALUES (1, 'A', 0);
INSERT gbn VALUES (1, 'B', 0);
INSERT gbn VALUES (1, 'C', 0);
INSERT gbn VALUES (2, 'A', 1);
INSERT gbn VALUES (2, 'B', 0);
INSERT gbn VALUES (2, 'C', 0);
INSERT gbn VALUES (3, 'A', 0);
INSERT gbn VALUES (3, 'B', 0);
DROP TABLE IF EXISTS gbn1;
CREATE TABLE gbn1 ( `name` VARCHAR(100));
INSERT gbn1 VALUES ('A');
INSERT gbn1 VALUES ('B');
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = (SELECT COUNT(*) FROM gbn1)
AND MIN(gbn.REV) = MAX(gbn.REV);
INSERT gbn1 VALUES ('C');
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = (SELECT COUNT(*) FROM gbn1)
AND MIN(gbn.REV) = MAX(gbn.REV);
Edit 2, without extra table, use a derived (inline) table:
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
(SELECT 'A' AS `name`
UNION ALL SELECT 'B'
UNION ALL SELECT 'C'
) gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = 3 -- matches number of elements in gbn1 derived table
AND MIN(gbn.REV) = MAX(gbn.REV);
Similar to gbn, but allowing for the possibility of duplicate ID/Name combinations:
SELECT ID
FROM MyTable
WHERE NAME IN ('A', 'B', 'C')
GROUP BY ID
HAVING COUNT(DISTINCT NAME) = 3;
OKAY!... I solved my problem ! I modified GBN's logic to do it without a search table using the IN clause
1 flaw with doing MAX(rev) = MIN(REV) is: if i have a data like so .
ID NAME REV
1 A 0
1 B 1
1 A 1
then when I use a query like
Select ID from TABLE
where NAME in {A,B}
groupby ID
having count(*) = 2
and MIN(REV) = MAX(REV)
it will not show me the ID 1 as the min and max are different and the count is 3.
So i simply add another column to the groupby
so the final query is
Select ID from TABLE
where NAME in {A,B}
groupby ID,REV
having count(*) = 2
and MIN(REV) = MAX(REV)
Thanks,to all that helped. !

Am I using GROUP_CONCAT properly?

I'm selecting properties and joining them to mapping tables where they get mapped to filters such as location, destination, and property type.
My goal is to grab all the properties and then LEFT JOIN them to the tables, and then basically get data that shows all the locations, destinations a property is attached to and the property type itself.
Here's my query:
SELECT p.slug AS property_slug,
p.name AS property_name,
p.founder AS founder,
IF (p.display_city != '', display_city, city) AS city,
d.name AS state,
type
GROUP_CONCAT( CONVERT(subcategories_id, CHAR(8)) ) AS foo,
GROUP_CONCAT( CONVERT(categories_id, CHAR(8)) ) AS bah
FROM properties AS p
LEFT JOIN destinations AS d ON d.id = p.state
LEFT JOIN regions AS r ON d.region_id = r.id
LEFT JOIN properties_subcategories AS sc ON p.id = sc.properties_id
LEFT JOIN categories_subcategories AS c ON c.subcategory_id = sc.subcategories_id
WHERE 1 = 1
AND p.is_active = 1
GROUP BY p.id
Before I do the GROUP BY and GROUP_CONCAT my data looks like this:
id name type category_id subcategory_id state
--------------------------------------------------------------------------
1 The Hilton Hotel 1 1 2 7
1 The Hilton Hotel 1 1 3 7
1 The BlaBla Resort 2 2 5 7
After the GROUP BY and GROUP_CONCAT it becomes...
id name type category_id subcategory_id state
--------------------------------------------------------------------------
1 The Hilton Hotel 1 1, 1 2, 3 7
1 The BlaBla Resort 2 1 3 7
Is this the preferred way of grabbing all the possible mappings for the property in one go, with GROUP_CONCAT into a CSV like this?
Using this data, I can render something like...
<div class="property" categories="1" subcategories="2,3">
<h2>{property_name}</h2>
<span>{property_location}</span>
</div>
Then use Javascript to show/hide based on if the user clicks on an anchor which has say, a subcategory="2" attribute it would hide each .property that doesn't have 2 inside of its subcategories attribute value.
I believe you want something like this:
CREATE TABLE property (id INT NOT NULL PRIMARY KEY, name TEXT);
INSERT
INTO property
VALUES
(1, 'Hilton'),
(2, 'Astoria');
CREATE TABLE category (id INT NOT NULL PRIMARY KEY, property INT NOT NULL);
INSERT
INTO category
VALUES
(1, 1),
(2, 1),
(3, 2);
CREATE TABLE subcategory (id INT NOT NULL PRIMARY KEY, category INT NOT NULL);
INSERT
INTO subcategory
VALUES
(1, 1),
(2, 1),
(3, 2),
(5, 3),
(6, 3),
(7, 3);
SELECT id, name,
CONCAT(
'{',
(
SELECT GROUP_CONCAT(
'"', c.id, '": '
'[',
(
SELECT GROUP_CONCAT(sc.id ORDER BY sc.id SEPARATOR ', ' )
FROM subcategory sc
WHERE sc.category = c.id
),
']' ORDER BY c.id SEPARATOR ', ')
FROM category c
WHERE c.property = p.id
), '}')
FROM property p;
which would output this:
1 Hilton {"1": [1, 2], "2": [3]}
2 Astoria {"3": [5, 6, 7]}
The last field is a properly formed JSON which maps category id's to the arrays of subcategory id's.
You should add DISTINCT, and possibly ORDER BY:
GROUP_CONCAT(DISTINCT CONVERT(subcategories_id, CHAR(8))
ORDER BY subcategories_id) AS foo,
GROUP_CONCAT(DISTINCT CONVERT(categories_id, CHAR(8))
ORDER BY categories_id) AS bah
It's "de-normalized" if you want to call it like this. If that's the best representation to be used for rendering is another question, I think it's fine. Some may say it's hack, but I guess it's not too bad.
By the way, a comma seems to be missing after the "type".