3 table join with a calculated % - sql

I have the following tables:
POLLS
id | name | colour
-------------------------
1 | first poll | orange
2 | secon poll | blue
3 | third poll | green
QUESTIONS
id | poll_id | what_to_ask
---------------------------
1 | 1 | How nice is stackoverflow?
2 | 1 | Why do you think that?
3 | 2 | What do you like to eat?
CHOICES
id | question_id | choice_text
------------------------------------
1 | 1 | Very nice
2 | 1 | Moderatley nice
3 | 1 | Evil
4 | 2 | Etc.
Answers
id | choice_id | poll_id | question_id
--------------------------------------
1 | 1 | 1 | 1
2 | 1 | 1 | 1
3 | 2 | 1 | 1
4 | 3 | 1 | 1
5 | 1 | 1 | 1
I'm trying to pull back the following:
A row for each choice
poll information appended to the row (this will duplicate, and it's ok)
question information appended to the row (this will duplicate, and it's ok)
% of each answer (from all answers for a particular poll)
My query (below) works well if there is an answer per choice, but if there is only one answer (and perhaps 3 choices), it'll only bring back one row.
How can i bring back each answer for a poll, and then the number of answers and % of answers on each choice.
SELECT Count(a.choice_id) AS answer_count,
q.what_to_ask,
c.id,
c.choice_text,
COALESCE (Count(a.choice_id) * 100.0 / NULLIF((SELECT Count(*)
FROM answers
WHERE poll_id = 2), 0), 0
) AS percentage
FROM choices c
RIGHT OUTER JOIN answers a
ON a.choice_id = c.id
INNER JOIN polls p ON p.id = a.poll_id
INNER JOIN questions q ON c.question_id = q.id
WHERE p.id = 2
GROUP BY c.id, q.what_to_ask, c.choice_text

I would change the join between choices and answers to a LEFT JOIN:
SELECT Count(a.choice_id) AS answer_count,
q.what_to_ask,
c.id,
c.choice_text,
COALESCE (Count(a.choice_id) * 100.0 / NULLIF((SELECT Count(*)
FROM answers
WHERE poll_id = 2), 0), 0
) AS percentage
FROM choices c
LEFT JOIN answers a
ON a.choice_id = c.id
INNER JOIN polls p ON p.id = a.poll_id
INNER JOIN questions q ON c.question_id = q.id
WHERE p.id = 2
GROUP BY c.id, q.what_to_ask, c.choice_text
In making it a RIGHT JOIN, you are telling the query to return rows only when there is a match in the answers table.

Related

How to count records using an l-tree

I have two tables, tickets and categories. The categories table has 3 columns of interest: id, name and path. The data looks like this:
id | Name | Path
------------------
1 | ABC | 1
2 | DEF | 1.2
3 | GHI | 1.2.3
4 | JKL | 4
5 | MNO | 4.5
6 | PQR | 4.5.6
9 | STU | 4.5.9
Note that the path column is an l-tree. What this is meant to represent is that the category with id=2 is a subcategory of id=1 and that id=3 is a subcategory of id=2.
In my tickets table, there's a column called category_id which refers to the id column in my categories table. Each ticket can have up to one category assigned to it (category_id may be null).
I'm trying to count all the tickets for each category.
Suppose my tickets table looks like this:
ticket_id | ticket_title | category_id
1 | A | 1
2 | B | 2
3 | C | 3
4 | D | 5
5 | F | 5
6 | G | 6
7 | H | 9
I would like to output:
category_id | count
1 | 3
2 | 2
3 | 1
4 | 4
5 | 4
6 | 1
9 | 1
I've found that I can get all of the tickets which belong to a given category with the following query: select * from tickets where category_id in (select id from categories where path ~ '*.1.*'); (although now that I'm writing this question I'm not convinced this is correct).
I've also attempted to perform the ticket-count-by-category problem and I came up with:
SELECT
categories.id as cid,
COUNT(*) as tickets_count
FROM tickets
LEFT JOIN categories ON tickets.category_id = categories.id
GROUP BY cid;
which outputs the following:
c_id | count
1 | 1
2 | 1
3 | 1
5 | 2
6 | 1
9 | 1
I'm not very good at SQL. Is it possible to achieve what I want?
Try this:
WITH tickets_per_path AS (
SELECT
c.path AS path,
count(*) AS count
FROM tickets t INNER JOIN categories c ON (t.category_id = c.id)
GROUP BY c.path)
SELECT
c.id,
sum(tickets_per_path.count) AS count
FROM categories c LEFT JOIN tickets_per_path ON (c.path #> tickets_per_path.path)
GROUP BY c.id
ORDER BY c.id;
Which yields the following result:
id| count
1 | 3
2 | 2
3 | 1
4 | 4
5 | 4
6 | 1
9 | 1
It roughly works like this:
the WITH clause computes the number of tickets per path (without
including the count of tickets of descendent paths).
the second select clause joins the categories table with the precomputed tickets_per_path view, but instead of an equi-join on path, it
joins by testing whether a record in the left table (categories) is
an ancestor of the right side table (using #> operator). Then it
groups by category id and sums up the ticket counts by category
including the descendant counts.
You are close, but you need a more general join:
SELECT c.id as cid, COUNT(*) as tickets_count
FROM categories c LEFT JOIN
tickets t
ON t.category_id || '.' LIKE c.id || '.%'
GROUP BY c.id;
The '.' in the comparison is just so 1.100 doesn't match 1.1.

how to select unique records from a table based on a column which has distinct values in another column

I have below table SUBJ_SKILLS which has records like
TCHR_ID | LINE_NBR | SUBJ | SUBJ_TYPE
--------| ------- | ---------- | ----------
1 | 1 | Maths | R
1 | 2 | 101 | U
2 | 1 | BehaviourialTech | U
3 | 2 | Maths | R
4 | 1 | RegionalLANG | U
5 | 3 | ForeignLANG | U
5 | 4 | Maths | R
6 | 2 | Science | R
7 | 1 | 101 | U
7 | 3 | Physics | R
..
..
I am trying to retrieve records like below (i.e. single teacher who taught multiple different subjects)
TCHR_ID | LINE_NBR | SUBJ | SUBJ_TYPE
--------| ------- | ---------- | ----------
5 | 3 | ForeignLANG | U
5 | 4 | Maths | R
7 | 1 | 101 | U
7 | 3 | Physics | R
1 | 1 | Maths | R
1 | 2 | 101 | U
Here, the line numbers are unique, means that TCHR_ID:5 taught Physics (which was LINE_NBR=1, but was removed later). So, the LINE_NBR are not updated and stay as is.
i also have a look up table (SUBJ_LKUP) for subject and their categories/type like below ('R' for Regular subject and 'U' for Unique subject )
SUBJ | SUBJ_TYPE
----------------- | ------------
Maths | R
Physics | R
ForeignLANG | U
101 | U
Science | R
BehaviourialTech | U
RegionalLANG | U
My approach to resolve this was to create a table which have 2 records for Teacher and use another query on base table (SUBJ_SKILLS) and new table to filter out distinct records. I came up with below queries..
Query-1:
create table tchr_with_2_subj as select SS.TCHR_ID
from SUBJ_SKILLS SS, SUBJ_LKUP SL
where SS.SUBJ = SL.SUBJ
and SL.SUBJ_TYPE IN ('R', 'U') AND SS.TCHR_ID IN
(select SS.TCHR_ID from SUBJ_SKILLS SS)
GROUP BY SS.TCHR_ID HAVING COUNT(*) = 2)
Query-2:
select SS.TCHR_ID from SUBJ_SKILLS SS, tchr_with_2_subj tw2s
where SS.TCHR_ID = tw2s.TCHR_ID
GROUP BY SS.TCHR_ID,SS.SUBJ_TYPE HAVING COUNT(*) > 1)
Question:
1)'IN' condition in Query-1 is causing problems and pulling wrong records.
2) Is there a better way to write query to pull matching records using a single query (i.e. instead of creating a table)
Could someone help me on this pls.
For the answer to your original question, I would use window functions:
select ss.*
from (select ss.*,
min(subj) over (partition by tchr_id) as mins,
max(subj) over (partition by tchr_id) as maxs
from SUBJ_SKILLS ss
) ss
where mins <> maxs;
It is unclear how the subject type fits in, but if you need to include that, similar logic will work.
Your second table can be obtained from your first table with:
select ss.*
from
subj_skills as ss
inner join (
select tchr_id
from subj_skills
group by tchr_id
having count(*) > 1
) as mult on mult.tchr_id=ss.tchr_id;
I'd use analytic functions here, asomething like:
select tchr_id, line_nbr, subj, SUBJ_TYPE
from (select count(distinct subj) over (partition by tchr_id) as grp_cnt,
s.*
from subj_skills s)
where grp_cnt > 1
If you need to filter out invalid records, you can do it in the inner query. If a teacher cannot teach the same subject multiple times (the req 'multiple different subjects' can be translated to 'multiple subjects'), then I'd rather use count(*) instead of count(distinct subj).

Find duplicate combinations

I need a query to find duplicate combinations in these tables:
AttributeValue:
id | name
------------------
1 | green
2 | blue
3 | red
4 | 100x200
5 | 150x200
Product:
id | name
----------------
1 | Produkt A
ProductAttribute:
id | id_product | price
--------------------------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 100
4 | 1 | 200
5 | 1 | 100
6 | 1 | 200
7 | 1 | 100 -- duplicate combination
8 | 1 | 100 -- duplicate combination
ProductAttributeCombinations:
id_product_attribute | id_attribute
-------------------------------------
1 | 1
1 | 4
2 | 1
2 | 5
3 | 2
3 | 4
4 | 2
4 | 5
5 | 3
5 | 4
6 | 3
6 | 5
7 | 1
7 | 4
8 | 1
8 | 5
I need SQL that creates result like:
id_product | duplicate_attributes
----------------------------------
1 | {7,8}
If I understand correct, 7 is a duplicate of 1 and 8 is a duplicate of 2. As phrased, your question is a bit confusing, because 7 and 8 are not related to each other and the only table of interest is ProductAttributeCombinations.
If this is the case, then one method is to use string aggregation
with combos as (
select id_product_attribute,
string_agg(id_attribute::text, ',' order by id_attribute) as combo
from ProductAttributeCombinations pac
group by id_product_attribute
)
select *
from combos c
where exists (select 1
from combos c2
where c2.id_product_attribute > c.id_product_attribute and
c2.combo = c.combo
);
Your question leaves some room for interpretation. Here is my educated guess:
For each product, return an array of all instances with the same set of attributes as any other instance of the same product with smaller ID.
WITH combo AS (
SELECT id_product, id, array_agg(id_attribute) AS attributes
FROM (
SELECT pa.id_product, pa.id, pac.id_attribute
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
ORDER BY pa.id_product, pa.id, pac.id_attribute
) sub
GROUP BY 1, 2
)
SELECT id_product, array_agg(id) AS duplicate_attributes
FROM combo c
WHERE EXISTS (
SELECT 1
FROM combo
WHERE id_product = c.id_product
AND attributes = c.attributes
AND id < c.id
)
GROUP BY 1;
Sorting can be inlined into the aggregate function so we don't need a subquery for the sort (like #Gordon already provided). This is shorter, but also typically slower:
WITH combo AS (
SELECT pa.id_product, pa.id
, array_agg(pac.id_attribute ORDER BY pac.id_attribute) AS attributes
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
GROUP BY 1, 2
)
SELECT ...
This only returns products with duplicate instances.
SQL Fiddle.
Your table names are rather misleading / contradict the rest of your question. Your sample data is not very clear either, only featuring a single product. I assume there are many in your table.
It's also unclear whether you are using double-quoted table names preserving CaMeL-case spelling. I assume: no.

Why isn't this returning unique combinations of these attributes?

When using the following query:
with neededSkills(SkillCode) as (
select distinct SkillCode
from job natural join hasprofile natural join requires_skill
where job_code = '1'
minus
select skillcode
from person natural join hasskill
where id = '1'
)
select distinct
taughtin.c_code as c,
count(taughtin.skillcode) as s,
ti.c_code as cc,
count(ti.skillcode) as ss
from taughtin, taughtin ti
where taughtin.c_code <> ti.c_code
and taughtin.skillcode <> ti.skillcode
and taughtin.skillcode in (select skillcode from neededskills)
and ti.skillcode in (select skillcode from neededskills)
group by (taughtin.c_code, ti.c_code)
order by (taughtin.c_code);
It returns:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
2 | 1 | 1 | 1
3 | 1 | 1 | 1
5 | 1 | 1 | 1
I would expect it to return only lines where the combination of C and CC was not already used. Do I misunderstand how group by works? How would I achieve this result?
I am trying to have it return:
C | S | CC | SS
----|----|----|----
1 | 1 | 2 | 1
1 | 1 | 3 | 1
1 | 1 | 5 | 1
I use Oracle SQLPlus.
You're grouping on the combination of taughtin.c_code and ti.c_code, which are seperate columns in the context of the query (even though they are the same column in the schema). A pair of 1, 2 is not the same as a pair of 2, 1; the values may be the same but the sources are not.
If you want to get the combinations one way but not the other then the simplest thing is to always make one value large than the other; instead of:
where taughtin.c_code <> ti.c_code
use:
where ti.c_code > taughtin.c_code
Though it would be better to use ANSI joins for the main query too, and I'm not a fan of natural joins. You also don't need either distinct; the first may eliminate duplicates but they don't logically matter if you're only using the temporary result set for in()

Left Join on Associative Table

I have three tables
Prospect -- holds prospect information
id
name
projectID
Sample data for Prospect
id | name | projectID
1 | p1 | 1
2 | p2 | 1
3 | p3 | 1
4 | p4 | 2
5 | p5 | 2
6 | p6 | 2
Conjoint -- holds conjoint information
id
title
projectID
Sample data
id | title | projectID
1 | color | 1
2 | size | 1
3 | qual | 1
4 | color | 2
5 | price | 2
6 | weight | 2
There is an associative table that holds the conjoint values for the prospects:
ConjointProspect
id
prospectID
conjointID
value
Sample Data
id | prospectID | conjointID | value
1 | 1 | 1 | 20
2 | 1 | 2 | 30
3 | 1 | 3 | 50
4 | 2 | 1 | 10
5 | 2 | 3 | 40
There are one or more prospects and one or more conjoints in their respective tables. A prospect may or may not have a value for each conjoint.
I'd like to have an SQL statement that will extract all conjoint values for each prospect of a given project, displaying NULL where there is no value for a value that is not present in the ConjointProspect table for a given conjoint and prospect.
Something along the lines of this for projectID = 1
prospectID | conjoint ID | value
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
2 | 1 | 10
2 | 2 | NULL
2 | 3 | 40
3 | 1 | NULL
3 | 2 | NULL
3 | 3 | NULL
I've tried using an inner join on the prospect and conjoint tables and then a left join on the ConjointProspect, but somewhere I'm getting a cartesian products for prospect/conjoint pairs that don't make any sense (to me)
SELECT p.id, p.name, c.id, c.title, cp.value
FROM prospect p
INNER JOIN conjoint c ON p.projectID = c.projectid
LEFT JOIN conjointProspect cp ON cp.prospectID = p.id
WHERE p.projectID = 2
ORDER BY p.id, c.id
prospectID | conjoint ID | value
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
2 | 1 | 10
2 | 2 | 40
2 | 1 | 10
2 | 2 | 40
2 | 1 | 10
2 | 2 | 40
3 | 1 | NULL
3 | 2 | NULL
3 | 3 | NULL
Guidance is very much appreciated!!
Then this will work for you... Prejoin a Cartesian against all prospects and elements within that project via a select as your first FROM table. Then, left join to the conjoinprospect. You can obviously change / eliminate certain columns from result, but at least all is there, in the join you want with exact results you are expecting...
SELECT
PJ.*,
CJP.Value
FROM
( SELECT
P.ID ProspectID,
P.Name,
P.ProjectID,
CJ.Title,
CJ.ID ConJointID
FROM
Prospect P,
ConJoint CJ
where
P.ProjectID = 1
AND P.ProjectID = CJ.ProjectID
ORDER BY
1, 4
) PJ
LEFT JOIN conjointProspect cjp
ON PJ.ProspectID = cjp.prospectID
AND PJ.ConjointID = cjp.conjointid
ORDER BY
PJ.ProspectID,
PJ.ConJointID
Your cartesian product is a result of joining by project Id - in your sample data there are 3 prospects with a project id of 1 and 3 conjoints with a project id of 1. Joining based on project id should then result in 9 rows of data, which is what you're getting. It looks like you really need to join via the conjointprospects table as that it what holds the mapping between prospects and conjoint.
What if you try something like:
SELECT p.id, p.name, c.id, c.title, cp.value
FROM prospect p
LEFT JOIN conjointProspect cp ON cp.prospectID = p.id
RIGHT JOIN conjoint c ON cp.conjointID = c.id
WHERE p.projectID = 2
ORDER BY p.id, c.id
Not sure if that will work, but it seems like conjointprospects needs to be at the center of your join in order to correctly map prospects to conjoints.