Selecting the two most common attribute pairings from a Entity-Attribute Table? - sql

I have a simple Entity-Attribute table in my database describing simply if an Entity has some Attribute by the existance of a row consisting of (Entity, Attribute).
I want to find out, of all the Entities with two and only two Attributes, what are the most common Attribute pairs
For example, if my table looked like:
+--------+-----------+
| Entity | Attribute |
+--------+-----------+
| Bob | A |
| Sally | B |
| Terry | C |
| Bob | B |
| Sally | A |
| Terry | D |
| Larry | C |
+--------+-----------+
I would want it to return
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| C | D | 1 |
+-------------+-------------+-------+
I currently have a short query that looks like:
WITH TwoAtts (
SELECT entity
FROM table
GROUP BY entity
HAVING COUNT(att) = 2
)
SELECT t1.att, t2.att, COUNT(entity)
FROM table t1
JOIN table t2
ON t1.entity = t2.entity
WHERE t1.entity IN (SELECT * FROM TwoAtts)
AND t1.att != t2.att
GROUP BY t1.att, t2.att
ORDER BY COUNT(entity) DESC
but is only capable of producing "duplicate" results like
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| B | A | 2 |
| D | C | 1 |
| C | D | 1 |
+-------------+-------------+-------+
In a sense I would like to be able to run a unordered DISTINCT / set operator over the two attribute columns, but I am not sure how to acheive this functionality in SQL?

Hmmm, I think you want two levels of aggregation, with some filtering:
select attribute_1, attribute_2, count(*)
from (select min(ea.attribute) as attribute_1, max(ea.attribute) as attribute_2
from entity_attribute ea
group by entity
having count(*) = 2
) aa
group by attribute_1, attribute_2;
Here is a db<>fiddle

Related

Joining table on two columns only joins it on a single

How do I correctly join a table on two columns. My issue is that the result is not correct as it only joins on a single column.
This question started of in this other question: SQL query returns product of results instead of sum . I am creating a new question as there is an other issue I am trying to solve.
I join a table of materials on a table which contains multiple supply and disposal movements. Each movement references a material id. I would like to join the material on each movement.
My query:
SELECT supply_material_refer, disposal_material_refer, material_id, material_name
FROM "construction_sites"
JOIN projects ON construction_sites.project_refer = projects.project_id
JOIN addresses ON construction_sites.address_refer = addresses.address_id
cross join lateral ( select *
from (select row_number() over () as rn, *
from supplies
where supplies.supply_project_refer = projects.project_id) as supplies
full join (select row_number() over () as rn, *
from disposals
where disposals.disposal_project_refer = projects.project_id
) as disposals
on (supplies.rn = disposals.rn)
) as combined
LEFT JOIN materials material ON combined.disposal_material_refer = material.material_id
OR combined.supply_material_refer = material.material_id
WHERE (projects.project_name = 'Project 15')
ORDER BY construction_site_id asc;
The result of the query:
+-----------------------+-------------------------+-------------+---------------+
| supply_material_refer | disposal_material_refer | material_id | material_name |
+-----------------------+-------------------------+-------------+---------------+
| 1 | 1 | 1 | Materialtest |
| 2 | 1 | 1 | Materialtest |
| 2 | 1 | 2 | Dirt |
| 1 | 1 | 1 | Materialtest |
| 2 | 1 | 1 | Materialtest |
| 2 | 1 | 2 | Dirt |
| 1 | (null) | 1 | Materialtest |
| 4 | (null) | 4 | Stones |
+-----------------------+-------------------------+-------------+---------------+
An example line I have issues with:
+------------------------+-------------------------+-------------+---------------+
| supply_material_refer | disposal_material_refer | material_id | material_name |
+------------------------+-------------------------+-------------+---------------+
| 2 | 1 | 1 | Materialtest |
+------------------------+-------------------------+-------------+---------------+
A prefered output would be like:
+------------------------+----------------------+-------------------------+------------------------+
| supply_material_refer | supply_material_name | disposal_material_refer | disposal_material_name |
+------------------------+----------------------+-------------------------+------------------------+
| 2 | Dirt | 1 | Materialtest |
+------------------------+----------------------+-------------------------+------------------------+
I have created a sqlfiddle with dummy data: http://www.sqlfiddle.com/#!17/863d78/2
To my understanding the solution would be to have a disposal_material column and and supply_material column for the material names. I do not know how I can achieve this goal though...
Thanks for any help!

Join Lookup from 1 table to multiple columns

How do I link 1 table with multiple columns in another table without using mutiple JOIN query?
Below is my scenario:
I have table User with ID and Name
User
+---------+------------+
| Id | Name |
+---------+------------+
| 1 | John |
| 2 | Mike |
| 3 | Charles |
+---------+------------+
And table Product with multiple columns, but just focus on 2 columns CreateBy And ModifiedBy
+------------+-----------+-------------+
| product_id | CreateBy | ModifiedBy |
+------------+-----------+-------------+
| 1 | 1 | 3 |
| 2 | 1 | 3 |
| 3 | 2 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 3 |
+------------+-----------+-------------+
With normal JOIN, i will need to do 2 JOIN:
SELECT p.Product_id,
u1.Name AS CreateByName,
u2.Name AS ModifiedByName
FROM Product p
JOIN USER user u1 ON p.CreateBy = u1.Id,
JOIN USER user u2 ON p.ModifiedBy = u2.Id
to come out result
+------------+---------------+-----------------+
| product_id | CreateByName | ModifiedByName |
+------------+---------------+-----------------+
| 1 | John | Charles |
| 2 | John | Charles |
| 3 | Mike | Charles |
| 4 | Mike | John |
| 5 | Mike | Charles |
+------------+---------------+-----------------+
How do i avoid that 2 times JOIN?
I'm using MS-SQL , but open to all SQL query for my own learning curious
Your current design/approach is acceptable, I think, and the need for two joins is a function of there being two user ID columns. Each of the two columns requires a separate join.
For fun, here is a table design which you may consider if you really want to have to perform only one join:
+------------+-----------+-------------+
| product_id | user_id | type |
+------------+-----------+-------------+
| 1 | 1 | created |
| 2 | 1 | created |
| 3 | 2 | created |
| 4 | 2 | created |
| 5 | 2 | created |
| 1 | 3 | modified |
| 2 | 3 | modified |
| 3 | 3 | modified |
| 4 | 1 | modified |
| 5 | 3 | modified |
+------------+-----------+-------------+
Now, you can get away with a just a single join followed by an aggregation:
SELECT
p.product_id,
MAX(CASE WHEN t.type = 'created' THEN u.Name END) AS CreateByName,
MAX(CASE WHEN t.type = 'modified' THEN u.Name END) AS ModifiedByName
FROM Product p
INNER JOIN user u
ON p.user_id = u.Id
GROUP BY
p.product_id;
Note that I don't recommend this approach at all. It is much cleaner to use your current approach and use two joins. Joins can fairly easily be optimized using one or more indices. The above aggregation approach would probably not perform as well as what you already have.
If you use natural keys instead of surrogates, you won't need to join at all.
I don't know how you tell your products apart in the real world, but for the example I will assume you have a UPC
CREATE TABLE User
(Name VARCHAR(20) PRIMARY KEY);
CREATE TABLE Product
(UPC CHAR(12) PRIMARY KEY,
CreatedBy VARCHAR(20) REFERENCES User(Name),
ModifiedBy VARCHAR(20) REFERENCES User(Name)
);
Now your query is a simple select, and you also enforce uniqueness of your user names as a bonus, and don't need additional indexes.
Try it...
HTH
Join is the best Approach, but if looking for alternate approach you can use Inline Query.
SELECT P.PRODUCT_ID,
(SELECT [NAME] FROM #USER WHERE ID = CREATED_BY) AS CREATED_BY,
(SELECT [NAME] FROM #USER WHERE ID = MODIFIED_BY) AS MODIFIED_BY
FROM #PRODUCT P
DEMO

SQL convert column headers to row values

I have a table that looks like this:
+--------+-----------+------------+-----------+
| Group# | Person A | Person B | Person C |
+--------+-----------+------------+-----------+
| 1 | yes | no | no |
| 2 | no | yes | yes |
| 3 | yes | yes | yes |
I want to use a SQL query on this data that will return the Group# in one column and the column header in the second column when the value = yes. The result I want would look like this for the above table:
+-----------+----------+
| Group# | Person |
+-----------+----------+
| 1 | Person A |
| 2 | Person B |
| 2 | Person C |
| 3 | Person A |
| 3 | Person B |
| 3 | Person C |
+-----------+----------+
*Note that in contrast to my example, my actual data has many more columns than rows.
Thank you.
In my opinion, the best approach is a lateral join. But the most general method is simply union all:
select group#, 'personA' as person
from t
where personA = 'yes'
union all
select group#, 'personB' as person
from t
where personB = 'yes'
union all
select group#, 'personC' as person
from t
where personC = 'yes';
In answer to your next question . . . yes, you have to explicitly list the columns. However, you can use a SQL query on the metadata tables to generate the query you really want. And then execute that query.

PostgreSQL select all from one table and join count from table relation

I have two tables, post_categories and posts. I'm trying to select * from post_categories;, but also return a temporary column with the count for each time a post category is used on a post.
Posts
| id | name | post_category_id |
| 1 | test | 1 |
| 2 | nest | 1 |
| 3 | vest | 2 |
| 4 | zest | 3 |
Post Categories
| id | name |
| 1 | cat_1 |
| 2 | cat_2 |
| 3 | cat_3 |
Basically, I'm trying to do this without subqueries and with joins instead. Something like this, but in real psql.
select * from post_categories some-type-of-join posts, count(*)
Resulting in this, ideally.
| id | name | count |
| 1 | cat_1 | 2 |
| 2 | cat_2 | 1 |
| 3 | cat_3 | 1 |
Your help is greatly appreciated :D
You can use a derived table that contains the counts per post_category_id and left join it to the post_categories table
select p.*, coalesce(t1.p_count,0)
from post_categories p
left join (
select post_category_id, count(*) p_count
from posts
group by post_category_id
) t1 on t1.post_category_id = p.id
select post_categories.id, post_categories.name , count(posts.id)
from post_categories
inner join posts
on post_category_id = post_categories.id
group by post_categories.id, post_categories.name

How to apply a SUM operation without grouping the results in SQL?

I have a table like this one:
+----+---------+----------+
| id | group | value |
+----+---------+----------+
| 1 | GROUP A | 0.641028 |
| 2 | GROUP B | 0.946927 |
| 3 | GROUP A | 0.811552 |
| 4 | GROUP C | 0.216978 |
| 5 | GROUP A | 0.650232 |
+----+---------+----------+
If I perform the following query:
SELECT `id`, SUM(`value`) AS `sum` FROM `test` GROUP BY `group`;
I, obviously, get:
+----+-------------------+
| id | sum |
+----+-------------------+
| 1 | 2.10281205177307 |
| 2 | 0.946927309036255 |
| 4 | 0.216977506875992 |
+----+-------------------+
But I need a table like this one:
+----+-------------------+
| id | sum |
+----+-------------------+
| 1 | 2.10281205177307 |
| 2 | 0.946927309036255 |
| 3 | 2.10281205177307 |
| 4 | 0.216977506875992 |
| 5 | 2.10281205177307 |
+----+-------------------+
Where summed rows are explicitly repeated.
Is there a way to obtain this result without using multiple (nested) queries?
IT would depend on your SQL server, in Postgres/Oracle I'd use Window Functions. In MySQL... not possible afaik.
Perhaps you can fake it like this:
SELECT a.id, SUM(b.value) AS `sum`
FROM test AS a
JOIN test AS b ON a.`group` = b.`group`
GROUP BY a.id, b.`group`;
No there isn't AFAIK. You will have to use a join like
SELECT t.`id`, tsum.sum AS `sum`
FROM `test` as t GROUP BY `group`
JOIN (SELECT `id`, SUM(`value`) AS `sum` FROM `test` GROUP BY `group`) AS tsum
ON tsum.id = t.id