Why isn't this returning unique combinations of these attributes? - sql

When using the following query:
with neededSkills(SkillCode) as (
select distinct SkillCode
from job natural join hasprofile natural join requires_skill
where job_code = '1'
minus
select skillcode
from person natural join hasskill
where id = '1'
)
select distinct
taughtin.c_code as c,
count(taughtin.skillcode) as s,
ti.c_code as cc,
count(ti.skillcode) as ss
from taughtin, taughtin ti
where taughtin.c_code <> ti.c_code
and taughtin.skillcode <> ti.skillcode
and taughtin.skillcode in (select skillcode from neededskills)
and ti.skillcode in (select skillcode from neededskills)
group by (taughtin.c_code, ti.c_code)
order by (taughtin.c_code);
It returns:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
2 | 1 | 1 | 1
3 | 1 | 1 | 1
5 | 1 | 1 | 1
I would expect it to return only lines where the combination of C and CC was not already used. Do I misunderstand how group by works? How would I achieve this result?
I am trying to have it return:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
I use Oracle SQLPlus.

You're grouping on the combination of taughtin.c_code and ti.c_code, which are seperate columns in the context of the query (even though they are the same column in the schema). A pair of 1, 2 is not the same as a pair of 2, 1; the values may be the same but the sources are not.
If you want to get the combinations one way but not the other then the simplest thing is to always make one value large than the other; instead of:
where taughtin.c_code <> ti.c_code
use:
where ti.c_code > taughtin.c_code
Though it would be better to use ANSI joins for the main query too, and I'm not a fan of natural joins. You also don't need either distinct; the first may eliminate duplicates but they don't logically matter if you're only using the temporary result set for in()

Related

Find number of rows identical one some, but different on another column

Say I have the following table:
CREATE TABLE data (
PROJECT_ID VARCHAR,
TASK_ID VARCHAR,
REF_ID VARCHAR,
REF_VALUE VARCHAR
);
I want to identify rows where
PROJECT_ID, REF_ID, REF_VALUE are the same
but TASK_ID are different.
The desired output is a list of TASK_ID_1, TASK_ID_2 and COUNT(*) of such conflicts. So, for example,
DATA
+------------+---------+--------+-----------+
| PROJECT_ID | TASK_ID | REF_ID | REF_VALUE |
+------------+---------+--------+-----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
+------------+---------+--------+-----------+
OUTPUT
+-----------+-----------+----------+
| TASK_ID_1 | TASK_ID_2 | COUNT(*) |
+-----------+-----------+----------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
+-----------+-----------+----------+
would mean that there are two entries with TASK_ID == 1 and two entries with TASK_ID == 2 that share the same values for the other three columns. The inherent symmetry in the output is fine.
How would I go about finding this information? I've tried joining the table onto itself and grouping, but this turned up more results for a single task than the table had rows altogether, so it's clearly wrong.
The database used is PostgreSQL, though a solution that applies to most common SQL systems would be preferable.
You want a self join and aggregation:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
on d1.project_id = d2.project_id and
d1.ref_id = d2.ref_id and
d1.ref_value = d2.ref_value and
d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
Notes:
Add the condition d1.task_id < d2.task_id if you want each pair to occur only once in the result set.
This does not handle NULL values, although that is easy enough to handle. Use is not distinct from instead of =.
You can also simplify this a bit with the using clause:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
using (project_id, ref_id, ref_value)
where d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
You can get an idea of how many rows might be returned by using:
select d.project_id, d.ref_id, d.ref_value, count(distinct d.task_id), count(*)
from data d
group by d.project_id, d.ref_id, d.ref_value;
This is how I understand your question. This assume there are only two task for the same combination.
SQL DEMO
SELECT "PROJECT_ID", "REF_ID", "REF_VALUE",
MIN("TASK_ID") as TASK_ID_1,
MAX("TASK_ID") as TASK_ID_2,
COUNT(*) as cnt
FROM Table1
GROUP BY "PROJECT_ID", "REF_ID", "REF_VALUE"
HAVING MIN("TASK_ID") != MAX("TASK_ID")
-- COUNT(*) > 1 also should work
OUTPUT
I add more column to make clear what are the same elements:
| PROJECT_ID | REF_ID | REF_VALUE | task_id_1 | task_id_2 | cnt |
|------------|--------|-----------|-----------|-----------|-----|
| 1 | 1 | 2 | 1 | 2 | 2 |
| 1 | 1 | 1 | 1 | 2 | 2 |

how to select unique records from a table based on a column which has distinct values in another column

I have below table SUBJ_SKILLS which has records like
TCHR_ID | LINE_NBR | SUBJ | SUBJ_TYPE
--------| ------- | ---------- | ----------
1 | 1 | Maths | R
1 | 2 | 101 | U
2 | 1 | BehaviourialTech | U
3 | 2 | Maths | R
4 | 1 | RegionalLANG | U
5 | 3 | ForeignLANG | U
5 | 4 | Maths | R
6 | 2 | Science | R
7 | 1 | 101 | U
7 | 3 | Physics | R
..
..
I am trying to retrieve records like below (i.e. single teacher who taught multiple different subjects)
TCHR_ID | LINE_NBR | SUBJ | SUBJ_TYPE
--------| ------- | ---------- | ----------
5 | 3 | ForeignLANG | U
5 | 4 | Maths | R
7 | 1 | 101 | U
7 | 3 | Physics | R
1 | 1 | Maths | R
1 | 2 | 101 | U
Here, the line numbers are unique, means that TCHR_ID:5 taught Physics (which was LINE_NBR=1, but was removed later). So, the LINE_NBR are not updated and stay as is.
i also have a look up table (SUBJ_LKUP) for subject and their categories/type like below ('R' for Regular subject and 'U' for Unique subject )
SUBJ | SUBJ_TYPE
----------------- | ------------
Maths | R
Physics | R
ForeignLANG | U
101 | U
Science | R
BehaviourialTech | U
RegionalLANG | U
My approach to resolve this was to create a table which have 2 records for Teacher and use another query on base table (SUBJ_SKILLS) and new table to filter out distinct records. I came up with below queries..
Query-1:
create table tchr_with_2_subj as select SS.TCHR_ID
from SUBJ_SKILLS SS, SUBJ_LKUP SL
where SS.SUBJ = SL.SUBJ
and SL.SUBJ_TYPE IN ('R', 'U') AND SS.TCHR_ID IN
(select SS.TCHR_ID from SUBJ_SKILLS SS)
GROUP BY SS.TCHR_ID HAVING COUNT(*) = 2)
Query-2:
select SS.TCHR_ID from SUBJ_SKILLS SS, tchr_with_2_subj tw2s
where SS.TCHR_ID = tw2s.TCHR_ID
GROUP BY SS.TCHR_ID,SS.SUBJ_TYPE HAVING COUNT(*) > 1)
Question:
1)'IN' condition in Query-1 is causing problems and pulling wrong records.
2) Is there a better way to write query to pull matching records using a single query (i.e. instead of creating a table)
Could someone help me on this pls.
For the answer to your original question, I would use window functions:
select ss.*
from (select ss.*,
min(subj) over (partition by tchr_id) as mins,
max(subj) over (partition by tchr_id) as maxs
from SUBJ_SKILLS ss
) ss
where mins <> maxs;
It is unclear how the subject type fits in, but if you need to include that, similar logic will work.
Your second table can be obtained from your first table with:
select ss.*
from
subj_skills as ss
inner join (
select tchr_id
from subj_skills
group by tchr_id
having count(*) > 1
) as mult on mult.tchr_id=ss.tchr_id;
I'd use analytic functions here, asomething like:
select tchr_id, line_nbr, subj, SUBJ_TYPE
from (select count(distinct subj) over (partition by tchr_id) as grp_cnt,
s.*
from subj_skills s)
where grp_cnt > 1
If you need to filter out invalid records, you can do it in the inner query. If a teacher cannot teach the same subject multiple times (the req 'multiple different subjects' can be translated to 'multiple subjects'), then I'd rather use count(*) instead of count(distinct subj).

Find duplicate combinations

I need a query to find duplicate combinations in these tables:
AttributeValue:
id | name
------------------
1 | green
2 | blue
3 | red
4 | 100x200
5 | 150x200
Product:
id | name
----------------
1 | Produkt A
ProductAttribute:
id | id_product | price
--------------------------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 100
4 | 1 | 200
5 | 1 | 100
6 | 1 | 200
7 | 1 | 100 -- duplicate combination
8 | 1 | 100 -- duplicate combination
ProductAttributeCombinations:
id_product_attribute | id_attribute
-------------------------------------
1 | 1
1 | 4
2 | 1
2 | 5
3 | 2
3 | 4
4 | 2
4 | 5
5 | 3
5 | 4
6 | 3
6 | 5
7 | 1
7 | 4
8 | 1
8 | 5
I need SQL that creates result like:
id_product | duplicate_attributes
----------------------------------
1 | {7,8}
If I understand correct, 7 is a duplicate of 1 and 8 is a duplicate of 2. As phrased, your question is a bit confusing, because 7 and 8 are not related to each other and the only table of interest is ProductAttributeCombinations.
If this is the case, then one method is to use string aggregation
with combos as (
select id_product_attribute,
string_agg(id_attribute::text, ',' order by id_attribute) as combo
from ProductAttributeCombinations pac
group by id_product_attribute
)
select *
from combos c
where exists (select 1
from combos c2
where c2.id_product_attribute > c.id_product_attribute and
c2.combo = c.combo
);
Your question leaves some room for interpretation. Here is my educated guess:
For each product, return an array of all instances with the same set of attributes as any other instance of the same product with smaller ID.
WITH combo AS (
SELECT id_product, id, array_agg(id_attribute) AS attributes
FROM (
SELECT pa.id_product, pa.id, pac.id_attribute
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
ORDER BY pa.id_product, pa.id, pac.id_attribute
) sub
GROUP BY 1, 2
)
SELECT id_product, array_agg(id) AS duplicate_attributes
FROM combo c
WHERE EXISTS (
SELECT 1
FROM combo
WHERE id_product = c.id_product
AND attributes = c.attributes
AND id < c.id
)
GROUP BY 1;
Sorting can be inlined into the aggregate function so we don't need a subquery for the sort (like #Gordon already provided). This is shorter, but also typically slower:
WITH combo AS (
SELECT pa.id_product, pa.id
, array_agg(pac.id_attribute ORDER BY pac.id_attribute) AS attributes
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
GROUP BY 1, 2
)
SELECT ...
This only returns products with duplicate instances.
SQL Fiddle.
Your table names are rather misleading / contradict the rest of your question. Your sample data is not very clear either, only featuring a single product. I assume there are many in your table.
It's also unclear whether you are using double-quoted table names preserving CaMeL-case spelling. I assume: no.

Selecting column from one table and count from another

t1
id | name | include
-------------------
1 | foo | true
2 | bar | true
3 | bum | false
t2
id | some | table_1_id
-------------------------
1 | 42 | 1
2 | 43 | 1
3 | 42 | 2
4 | 44 | 1
5 | 44 | 3
Desired output:
name | count(some)
------------------
foo | 3
bar | 1
What I have currently from looking through other solutions here:
SELECT a.name,
COUNT(r.some)
FROM t1 a
JOIN t2 r on a.id=r.table_1_id
WHERE a.include = 'true'
GROUP BY a.id,
r.some;
but that seems to get me
name | count(r.some)
--------------------
foo | 1
foo | 1
bar | 1
foo | 1
I'm no sql expert (I can do simple queries) so I'm googling around as well but finding most of the solutions I find give me this result. I'm probably missing something really easy.
Just remove the second column from the group by clause
SELECT a.name,
COUNT(r.some)
FROM t1 a
JOIN t2 r on a.id=r.table_1_id
WHERE a.include = 'true'
GROUP BY a.name
Columns you want to use in an aggregate function like sum() or count() must be left out of the group by clause. Only put the columns in there you want to be unique outputted.
This is because multiple column group requires the all column values to be same.
See this link for more info., Using group by on multiple columns
Actually in you case., if some are equal, table_1_id is not equal (And Vice versa). so grouping cannot occur. So all are displayed individually.
If the entries are like,
id | some | table_1_id
-------------------------
1 | 42 | 1
2 | 43 | 1
3 | 42 | 2
4 | 42 | 1
Then the output would have been.,
name | count
------------------
foo | 2 (for 42)
foo | 1 (for 43)
bar | 1 (for 42)
Actually, if you want to group on 1 column as Juergen said, you could remove r.some; from groupby clause.

SQL query for counting combinations and including entries that don't exist

This is a simplified version of the table I am dealing with which is Orders
+-------------------+------------------+---------------+
| Order_Base_Number | Order_Lot_Number | Other Cols... |
+-------------------+------------------+---------------+
| 1 | 3 | |
| 1 | 3 | |
| 1 | 4 | |
| 1 | 4 | |
| 1 | 4 | |
| 1 | 5 | |
| 2 | 3 | |
| 2 | 5 | |
| 2 | 9 | |
| 2 | 10 | |
+-------------------+------------------+---------------+
What I want to do is to get a count for the unique entries base on Base and Lot numbers. I have two set of numbers, one is a set of Base numbers and the other is a set of Lot numbers.
for example, lets say the two sets are Base In (1,2,3) and Lot is in (3,4,20).
I am looking for an SQL query that can return all the possible combination of (Base,Lot) from the two sets with a count that shows how many times the combination was found in the table. My problem is that I want to include all the possible combinations and if a combination is not in the Orders table, I want the count to show zero. So, the output I am looking for is something like this.
+------+-----+-----------+
| Base | Lot | Frequency |
+------+-----+-----------+
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 20 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
| 2 | 20 | 0 |
| 3 | 3 | 0 |
| 3 | 4 | 0 |
| 3 | 20 | 0 |
+------+-----+-----------+
I tried a lot of queries but never got close to this and not even sure if it can be done. Right now I am figuring out the combinations on the client side and hence I am performing too many queries to get the frequencies.
Perhaps the clearest way is to start with the lists as CTEs:
with bases as (
select 1 as base from dual union all
select 2 as base from dual union all
select 3 as base from dual
),
lots as (
select 3 as lot from dual union all
select 4 as lot from dual union all
select 20 as lot from dual
)
select b.base, l.lot, count(Order_Base_Number) as Frequency
from bases b cross join lots l left outer join
Orders o
on o.base = b.base and o.lot = l.lot
group by b.base, l.lot
Note that this makes the cross join explicit, purposely not using the , for a Cartesian product.
The first part of this query could also be written as something like the following (assuming that each base and lot has at least one record in the table):
with bases as (
select distinct base
from Orders -- or some other table, perhaps Orders ?
where base in (1, 2, 3)
),
select distinct lot
from Orders -- or some other table, perhaps Lots ?
where lot in (3, 4, 20)
)
. . .
This is more succinct, but might result in a less efficient query.
What you need in the innermost subquery is called CROSS JOIN, which gets cartesian products (all possible combinations) of records. That's what you get when you have neither JOIN..ON condition nor WHERE:
SELECT Base.Id as baseid, Lot.Id as lotid FROM Bases, Lots
Now put it into subquery and LEFT JOIN to the rest of your stuff:
SELECT ... FROM
(SELECT Base.Id as baseid, Lot.Id as lotid
FROM Bases, Lots) baseslots
LEFT JOIN Orders ON Order_Base_Number = baseid,
Order_Lot_Number = lotid ....
With this LEFT JOIN, you'll get NULL for nonexistent combinations. Use COALESCE (or something like this) to turn them into 0.
I don't have Oracle to test it but this is what I would do:
CREATE TABLE pairs AS
(
SELECT DISTINCT Base.Order_Base_Number, Lot.Order_Lot_Number
FROM ORDERS Base
CROSS JOIN ORDERS Lot
);
CREATE TABLE counts AS
(
SELECT Order_Base_Number, Order_Lot_Number, Count(*) AS C
FROM ORDERS
GROUP BY Order_Base_Number, Order_Lot_Number
);
SELECT P.Order_Base_Number, P.Order_Lot_Number, COALESCE(C.C,0) AS [Count]
FROM Pairs P
LEFT JOIN counts C ON P.Order_Base_Number = C.Order_Base_Number
AND P.Order_Lot_Number = C.Order_Lot_Number