Find duplicate combinations - sql

I need a query to find duplicate combinations in these tables:
AttributeValue:
id | name
------------------
1 | green
2 | blue
3 | red
4 | 100x200
5 | 150x200
Product:
id | name
----------------
1 | Produkt A
ProductAttribute:
id | id_product | price
--------------------------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 100
4 | 1 | 200
5 | 1 | 100
6 | 1 | 200
7 | 1 | 100 -- duplicate combination
8 | 1 | 100 -- duplicate combination
ProductAttributeCombinations:
id_product_attribute | id_attribute
-------------------------------------
1 | 1
1 | 4
2 | 1
2 | 5
3 | 2
3 | 4
4 | 2
4 | 5
5 | 3
5 | 4
6 | 3
6 | 5
7 | 1
7 | 4
8 | 1
8 | 5
I need SQL that creates result like:
id_product | duplicate_attributes
----------------------------------
1 | {7,8}

If I understand correct, 7 is a duplicate of 1 and 8 is a duplicate of 2. As phrased, your question is a bit confusing, because 7 and 8 are not related to each other and the only table of interest is ProductAttributeCombinations.
If this is the case, then one method is to use string aggregation
with combos as (
select id_product_attribute,
string_agg(id_attribute::text, ',' order by id_attribute) as combo
from ProductAttributeCombinations pac
group by id_product_attribute
)
select *
from combos c
where exists (select 1
from combos c2
where c2.id_product_attribute > c.id_product_attribute and
c2.combo = c.combo
);

Your question leaves some room for interpretation. Here is my educated guess:
For each product, return an array of all instances with the same set of attributes as any other instance of the same product with smaller ID.
WITH combo AS (
SELECT id_product, id, array_agg(id_attribute) AS attributes
FROM (
SELECT pa.id_product, pa.id, pac.id_attribute
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
ORDER BY pa.id_product, pa.id, pac.id_attribute
) sub
GROUP BY 1, 2
)
SELECT id_product, array_agg(id) AS duplicate_attributes
FROM combo c
WHERE EXISTS (
SELECT 1
FROM combo
WHERE id_product = c.id_product
AND attributes = c.attributes
AND id < c.id
)
GROUP BY 1;
Sorting can be inlined into the aggregate function so we don't need a subquery for the sort (like #Gordon already provided). This is shorter, but also typically slower:
WITH combo AS (
SELECT pa.id_product, pa.id
, array_agg(pac.id_attribute ORDER BY pac.id_attribute) AS attributes
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
GROUP BY 1, 2
)
SELECT ...
This only returns products with duplicate instances.
SQL Fiddle.
Your table names are rather misleading / contradict the rest of your question. Your sample data is not very clear either, only featuring a single product. I assume there are many in your table.
It's also unclear whether you are using double-quoted table names preserving CaMeL-case spelling. I assume: no.

Related

MS SQL Query to get all entries comming multiple times in table where some column value doesnt have entries

I just tried to formulate title as best as possible. So my case is as follow.
i have a table
venue_id | style_id | is_main
1 | 1 | 1
1 | 2 | 0
1 | 3 | 0
2 | 5 | 0
2 | 8 | 0
2 | 9 | 0
3 | 3 | 1
4 | 4 | 1
4 | 6 | 0
5 | 7 | 0
5 | 8 | 0
5 | 9 | 0
So i need to get only those venue ID, witch coming more then once and where is no is_main true entry.
So result should be contain venue_id's: 2 and 5
I would grateful for any suggestion how such query may looks like.
Thanks in Advance.
UPD: in my case with is_amin BIT value answer would be:
select venue_id
from table
group by venue_id
having cast(max(cast(is_main as INT)) AS BIT) = 0 and
count(*) >= 2;
You seem to want:
select venue_id
from t
group by venue_id
having max(is_main) = 0 and
count(*) >= 2;
You can use this one:
SELECT DISTINCT v.venue_id FROM venue v
LEFT OUTER JOIN (SELECT DISTINCT venue_id FROM venue WHERE is_main=1) m
ON v.venue_id = m.venue_id
WHERE m.venue_id IS NULL
If you have many thousands of rows, it would be better to create a secondary table or a materialized view to be used in place of the nested SELECT.

How to count records using an l-tree

I have two tables, tickets and categories. The categories table has 3 columns of interest: id, name and path. The data looks like this:
id | Name | Path
------------------
1 | ABC | 1
2 | DEF | 1.2
3 | GHI | 1.2.3
4 | JKL | 4
5 | MNO | 4.5
6 | PQR | 4.5.6
9 | STU | 4.5.9
Note that the path column is an l-tree. What this is meant to represent is that the category with id=2 is a subcategory of id=1 and that id=3 is a subcategory of id=2.
In my tickets table, there's a column called category_id which refers to the id column in my categories table. Each ticket can have up to one category assigned to it (category_id may be null).
I'm trying to count all the tickets for each category.
Suppose my tickets table looks like this:
ticket_id | ticket_title | category_id
1 | A | 1
2 | B | 2
3 | C | 3
4 | D | 5
5 | F | 5
6 | G | 6
7 | H | 9
I would like to output:
category_id | count
1 | 3
2 | 2
3 | 1
4 | 4
5 | 4
6 | 1
9 | 1
I've found that I can get all of the tickets which belong to a given category with the following query: select * from tickets where category_id in (select id from categories where path ~ '*.1.*'); (although now that I'm writing this question I'm not convinced this is correct).
I've also attempted to perform the ticket-count-by-category problem and I came up with:
SELECT
categories.id as cid,
COUNT(*) as tickets_count
FROM tickets
LEFT JOIN categories ON tickets.category_id = categories.id
GROUP BY cid;
which outputs the following:
c_id | count
1 | 1
2 | 1
3 | 1
5 | 2
6 | 1
9 | 1
I'm not very good at SQL. Is it possible to achieve what I want?
Try this:
WITH tickets_per_path AS (
SELECT
c.path AS path,
count(*) AS count
FROM tickets t INNER JOIN categories c ON (t.category_id = c.id)
GROUP BY c.path)
SELECT
c.id,
sum(tickets_per_path.count) AS count
FROM categories c LEFT JOIN tickets_per_path ON (c.path #> tickets_per_path.path)
GROUP BY c.id
ORDER BY c.id;
Which yields the following result:
id| count
1 | 3
2 | 2
3 | 1
4 | 4
5 | 4
6 | 1
9 | 1
It roughly works like this:
the WITH clause computes the number of tickets per path (without
including the count of tickets of descendent paths).
the second select clause joins the categories table with the precomputed tickets_per_path view, but instead of an equi-join on path, it
joins by testing whether a record in the left table (categories) is
an ancestor of the right side table (using #> operator). Then it
groups by category id and sums up the ticket counts by category
including the descendant counts.
You are close, but you need a more general join:
SELECT c.id as cid, COUNT(*) as tickets_count
FROM categories c LEFT JOIN
tickets t
ON t.category_id || '.' LIKE c.id || '.%'
GROUP BY c.id;
The '.' in the comparison is just so 1.100 doesn't match 1.1.

Why isn't this returning unique combinations of these attributes?

When using the following query:
with neededSkills(SkillCode) as (
select distinct SkillCode
from job natural join hasprofile natural join requires_skill
where job_code = '1'
minus
select skillcode
from person natural join hasskill
where id = '1'
)
select distinct
taughtin.c_code as c,
count(taughtin.skillcode) as s,
ti.c_code as cc,
count(ti.skillcode) as ss
from taughtin, taughtin ti
where taughtin.c_code <> ti.c_code
and taughtin.skillcode <> ti.skillcode
and taughtin.skillcode in (select skillcode from neededskills)
and ti.skillcode in (select skillcode from neededskills)
group by (taughtin.c_code, ti.c_code)
order by (taughtin.c_code);
It returns:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
2 | 1 | 1 | 1
3 | 1 | 1 | 1
5 | 1 | 1 | 1
I would expect it to return only lines where the combination of C and CC was not already used. Do I misunderstand how group by works? How would I achieve this result?
I am trying to have it return:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
I use Oracle SQLPlus.
You're grouping on the combination of taughtin.c_code and ti.c_code, which are seperate columns in the context of the query (even though they are the same column in the schema). A pair of 1, 2 is not the same as a pair of 2, 1; the values may be the same but the sources are not.
If you want to get the combinations one way but not the other then the simplest thing is to always make one value large than the other; instead of:
where taughtin.c_code <> ti.c_code
use:
where ti.c_code > taughtin.c_code
Though it would be better to use ANSI joins for the main query too, and I'm not a fan of natural joins. You also don't need either distinct; the first may eliminate duplicates but they don't logically matter if you're only using the temporary result set for in()

distinct rows with group by

I have one table:
id_object | version | document
------------------------------
1 | 1 | 1
1 | 2 | 2
2 | 1 | 3
2 | 2 | 1
2 | 3 | 2
1 | 1 | 3
I want to show only one row by object with the version (max) and the document. I have tried the following"
Select Distinct
id_object ,
Max(version),
document
From
prods
Group By
id_object, document
and I get this result
1 | 1 | 1
1 | 2 | 2
2 | 1 | 3
2 | 2 | 1
2 | 3 | 2
1 | 1 | 3
As you can see, I'm getting the entire table. My question is, why?
Since you group by id_object and document, you won't get your desired result. That is because document is different for each version.
select x.id_object,
x.maxversion as version,
p.document
from
(
Select id_object, Max(version) as maxversion
From prods
Group By id_object
) x
inner join prods p on p.id_object = x.id_object
and p.version = x.maxversion
You first have to select the id_object with the max(version). That can be joined with the actual data to get the correct document.
You have to do that because you can't select columns that are not in your group by clause, except you use a aggregate function on them (like max() for instance).
(MySQL can select non aggregated columns, but please avoid that since the outcome is not always clear or even predictable)
select prods.id_object, version, document
from prods inner join
(select id_object, max(version) as ver
from prods
group by id_object) tmp on prods.id_object = tmp.id_object and prods.version = tmp.ver
Query:
SQLFIDDLEExample
SELECT p.id_object,
p.version,
p.document
FROM prods p
WHERE p.version = (SELECT Max(version)
FROM prods
WHERE id_object = p.id_object)
Result:
| ID_OBJECT | VERSION | DOCUMENT |
----------------------------------
| 1 | 2 | 2 |
| 2 | 3 | 2 |

Select rows that end with the given parts in a join query?

I'm trying to create a complex SQL query (at least for me) but really don't know where to start.
Basically, I have a character object, and each character can be made of several parts. For example:
character table:
id | character
--------------
1 | 你
2 | 是
3  | 有
character_parts table:
id | character_id | part_id
---------------------------
1 | 1 | 4
2 | 1 | 9
3 | 1 | 5
4 | 2 | 2
5 | 2 | 34
6 | 2 | 43
7 | 3 | 21
8 | 3 | 16
9 | 3 | 41
10 | 3 | 43
So from that I know that:
Character 1 is made of parts 4, 9, 5
Character 2 is made of parts 2, 34, 43
Character 3 is made of parts 21, 16, 41, 43
Now what I would like to do is select all the characters that end by the specified parts.
For example, if I select all the characters that end by "16, 41, 43", I'll get Character 3. If I select all the characters that end by "43", I'll get Character 3 and 4.
I assume I need to build some query with subqueries, but not sure how to start, or if it can be done at all. So far I'm just selecting everything that include the required part IDs and then doing the comparison programmatically but this is too slow being I'm selecting way more than needed.
Any suggestion?
You could try group_concat function:
http://www.sqlite.org/lang_aggfunc.html
SELECT group_concat(part_id) FROM character_parts WHERE character_id=1
The query should return 4,9,5.
The problem is that the order used by group_concat is arbitrary:
Sqlite group_concat ordering
So, assuming you have a field position that defines the order of the parts, we can update the query like this:
SELECT group_concat(part_id) FROM (SELECT part_id FROM character_parts WHERE character_id=1 ORDER BY position ASC)
The query will now return the 4,9,5 parts exactly the defined order.
Now that we have this value, we can search though it like a regular string.
If we want to find all values ending with a certain string, we could use LIKE operator.
Finally the query would like like this:
SELECT character_id, parts_concat FROM (
SELECT character_id, group_concat(part_id) FROM (
SELECT character_id, part_id FROM character_parts WHERE ORDER BY position ASC
) GROUP BY character_id
) parts
WHERE parts_concat LIKE '%,9,5'
the query might be like this.
select * from character where id in (select character_id from character_parts where character_id = 'required no' AND character_id = 'required no' AND character_id = 'required no')
//required number is the part_id you want to specify.
Add another column, from_end that counts from the end, say:
from_end | id | character_id | part_id
--------------------------------------
2 | 1 | 1 | 4
1 | 2 | 1 | 9
0 | 3 | 1 | 5
2 | 4 | 2 | 2
1 | 5 | 2 | 34
0 | 6 | 2 | 43
3 | 7 | 3 | 21
2 | 8 | 3 | 16
1 | 9 | 3 | 41
0 | 10 | 3 | 43
Then you can do:
SELECT p0.character_id
FROM character_parts AS p0
JOIN character_parts AS p1 USING (character_id)
JOIN character_parts AS p2 USING (character_id)
WHERE p0.from_end = 0 AND p1.from_end = 1 AND p2.from_end = 2
AND p0.part_id = 43 AND p1.part_id = 41 AND p2.part_id = 16
SELECT c.character
FROM character_parts cp
JOIN character c
ON c.id = cp.character_id
WHERE cp.part_id = 43
AND cp.id =
(
SELECT id
FROM character_parts cpo
WHERE cpo.character_id = cp.character_id
ORDER BY
id DESC
)
Create the following indexes:
character_parts (part_id, character_id, id)
character_parts (character_id, id)
for this to work fast