PostgreSQL searching the query result based on defined conditions - sql

I am a beginner and I am working on a quite big query which returns about 50k of rows.
I've spent a big number of hours trying to figure out the issue and it looks that there is a gap in my knowledge and I would be really greatful if you could help me.
In order to show you the main idea I decided to simplify and split the data. I am presenting the relevant tables here:
*company*
+----+----------+---------------+----------------+
| ID | name | classificaton | special_number |
+----+----------+---------------+----------------+
| 1 | companyX | 309 | 242 |
+----+----------+---------------+----------------+
*branch*
+----+---------------+-------+
| ID | name | color |
+----+---------------+-------+
| 1 | environmental | green |
| 2 | navy | blue |
+----+---------------+-------+
*company_branch*
+------------+-----------+
| ID_company | ID_branch |
+------------+-----------+
| 1 | 1 |
| 1 | 2 |
+------------+-----------+
Ok as we have all the needed data presented I need to create a query which will select all the companies along with the main color of the branches they are working in.
A companyX can belong to more than one branch but I need to show only the main branch which can be calculated based on the three conditions below:
*if classification = 309 and special_number is even then show the relevant color and go the next company (ignore the next conditions)
*if classification = 209 and special_number is even then show the relevant color and go the next company (ignore the next condition)
*else show as grey
I created a query like that: (I know that the case phrase is not correct but I am keeping it as it shows in a better way what I am trying to accomplish)
SELECT c.ID, c.name, b.color, c.classification, c.special_number,
CASE
WHEN c.classification = 309 AND c.special_number % 2 = 0 THEN b.color
WHEN c.classification = 209 AND c.special_number % 2 = 0 THEN b.color
ELSE 'grey'
END AS 'case'
FROM company c INNER JOIN company_branch cb ON c.ID = cb.ID_company
INNER JOIN branch b ON b.ID = cb.ID_branch
then I get the following result
*result*
+----+----------+-------+----------------+----------------+------+
| ID | name | color | classification | special_number | case |
+----+----------+-------+----------------+----------------+------+
| 1 | companyX | green | 309 | 242 | green|
| 1 | companyX | blue | 309 | 242 | blue |
+----+----------+-------+----------------+----------------+------+
The problem is that if any company belongs to more than one branch then I always get many colors... what I would like to get is a list of companies with only one color of the main branch they are working in.
can you guys help the newbie ?

I think you want something like this:
SELECT DISTINCT ON (c.id) c.ID, c.name,
COALESCE(b.color, 'grey') as color
c.classification, c.special_number,
FROM company c LEFT JOIN
company_branch cb
ON c.ID = cb.ID_company LEFT JOIN
branch b
ON b.ID = cb.ID_branch
ORDER BY c.Id,
(CASE WHEN c.classification = 309 AND c.special_number % 2 = 0 THEN 1
WHEN c.classification = 209 AND c.special_number % 2 = 0 THEN 2
ELSE 3
END);
DISTINCT ON is a (useful) Postgres extension that returns one row per item(s) in parentheses. The returned row is based on the GROUP BY. The first key(s) in the ORDER BY need to be the item(s). The following specifies what "first" means.
I switched the joins to being outer joins, so all companies are in the result set.

Related

Include zero counts when grouping by multiple columns and setting filters

I have a table (tbl) containing category (2 categories), impact (3 impacts), company name and date for example:
category | impact | company | date | number
---------+----------+---------+-----------|
Animal | Critical | A | 12/31/1999|1
Book | Critical | B | 12/31/2000|2
Animal | Minor | C | 12/31/2001|3
Book | Minor | D | 12/31/2002|4
Animal | Medium | E | 1/1/2003 |5
I want to get the count of records for each category and impact and be able to add rows with zero count and also be able to filter by company and date.
In the example result set below, the count result is 1 for category = Animal and company = A. The rest is 0 records and only the Critical and Medium impacts appear
category | impact | count
---------+----------+-------
Animal | Critical | 1
Animal | Medium | 0
I've looked at the responses to similar questions by using joins however, adding a WHERE clause doesn't include the zero records.
I also tried doing outer joins but it doesn't produce desired output. For example
select a.impact, b.category, ISNULL(count(b.impact), 0) from tbl a
left outer join tbl b
on b.number = a.number
and (a.category = 'Animal' and a.company in ('A'))
group by a.impact, b.category
produces
impact | category | count
---------+------------+--------
Medium | NULL | 0
Medium | Animal | 1
Critical | NULL | 0
Minor | NULL | 0
but the desired output should be
category | impact | count
---------+----------+-------
Animal | Critical | 1
Animal | Medium | 0
Animal | Minor | 0
Any help will be appreciated. Answers to associated questions don't have filtering so I will appreciate if someone can help with a query to produce desired output.
You need a master table with all the possible combinations of Categories and Impacts for this. Then Left join your table with the master and do the aggregation. Something like below
;WITH CAT
AS
(
SELECT
category
FROM Tbl
GROUP BY category
),
IMP
AS
(
SELECT
Impact
FROM Tbl
GROUP BY Impact
),MST
AS
(
SELECT
*
FROM CAT
CROSS JOIN IMP
)
SELECT
MST.category,
MST.Impact,
COUNT(T.Number)
FROM MST
LEFT JOIN Tbl T
ON MST.category = T.category
AND MST.Impact = T.Impact
AND T.Company = 'A'
WHERE MST.Category = 'Animal' GROUP BY MST.category,
MST.Impact

Can i merge the result of two queries?

I'm trying to retrieve some statistics from my database, to be concrete i'm look to show how many todo's is completed vs the total of a checklist.
The structure is as follows
A category has many Cards, has many checklists, has many assessments.
I can get the amount of assessments or completed assessments with the following query.
SELECT count(a.id) AS completed_count, a.checklist_id, ca.category_id
FROM assessments a
JOIN checklists ch ON ch.id = a.checklist_id
JOIN cards ca ON ca.id = ch.card_id
WHERE a.complete
GROUP BY a.checklist_id, ca.category_id;
This will give me something like this.
completed_count | checklist_id | category_id
-----------------+--------------+-------------
2 | 3 | 2
1 | 2 | 2
2 | 5 | 3
I could then do a query, to get the total amount, by removing the WHERE a.complete, and write some code that matches the two results.
But what i really want, is a result like this.
completed_amount | total_amount | checklist_id | category_id
------------------+--------------+--------------+-------------
2 | 2 | 3 | 2
1 | 1 | 2 | 2
2 | 2 | 5 | 3
I just can't wrap my head around, how i can achieve that.
I think conditional aggregation does what you want:
SELECT sum( (a.complete)::int ) AS completed_count,
count(*) as total_count
a.checklist_id, ca.category_id
FROM assessments a JOIN
checklists ch
ON ch.id = a.checklist_id JOIN
cards ca
ON ca.id = ch.card_id
GROUP BY a.checklist_id, ca.category_id;

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

Multiple select or distinct

I am new to sql so looking for a little help - got the first part down however I am having issues with the second part.
I got the three tables tied together. First I needed to tie tblPatient.ID = tblPatientVisit.PatientID together to eliminate dups which works
Now I need to take those results and eliminate dups in the MRN but my query is only returning one result which is WRONG - LOL
Query
select
tblPatient.id,
tblPatient.firstname,
tblPatient.lastname,
tblPatient.dob,
tblPatient.mrn,
tblPatientSmokingScreenOrder.SmokeStatus,
tblPatientVisit.VisitNo
from
tblPatient,
tblPatientSmokingScreenOrder,
tblPatientVisit
Where
tblPatient.ID = tblPatientVisit.PatientID
and tblPatientVisit.ID = tblPatientSmokingScreenOrder.VisitID
and tblPatient.ID in(
Select Distinct
tblPatient.mrn
From
tblPatient
where
isdate(DOB) = 1
and Convert(date,DOB) <'12/10/2000'
and tblPatientVisit.PatientType = 'I')
Actual Results:
ID | firstName | LastName | DOB | MRN | SmokeStatus | VisitNO
12 | Test Guy | Today | 12/12/1023 | 0015396 | Never Smoker | 0013957431
Desired Results:
90 | BOB | BUILDER | 02/24/1974 | 0015476 | Former Smoker | 0015476001
77 | DORA | EXPLORER | 06/04/1929 | 0015463 | Never Smoker | 0015463001
76 | MELODY | VALENTINE | 09/17/1954 | 0015461 | Current | 0015461001
32 | STRAWBERRY | SHORTCAKE | 07/06/1945 | 0015415 | Current | 0015415001
32 | STRAWBERRY | SHORTCAKE | 07/06/1945 | 0015415 | Never Smoker | 0015415001
32 | STRAWBERRY | SHORTCAKE | 07/06/1945 | 0015415 | Former Smoker | 0015415001
12 | Test Guy | Today | 12/12/1023 | 0015345 | Never Smoker | 0013957431
Anyone have any suggestions on how I go down to the next level and get all the rows with one unique MRN. From the data above I should have 5 in my list. Any help would be appreciated.
Thanks
If I had to guess -- The only thing that looks odd (but maybe it's OK) is that you're comparing patient.ID from your parent query to patient.mrn in the subquery.
Beyond that -- things to check:
(1)
Do you get all your patients with the inner query?
Select Distinct
tblPatient.mrn
From
tblPatient
where
isdate(DOB) = 1
and Convert(date,DOB) <'12/10/2000'
and tblPatientVisit.PatientType = 'I'
(2)
What is your patient type for the missing records? (Your filtering it to tblPatientVisit.PatientType = 'I' -- do the missing records have that patient type as well?)
Perhaps you need to invert the logic here. As in,
Select Distinct
patients.mrn1
From (select
tblPatient.id as id1,
tblPatient.firstname as firstname1,
tblPatient.lastname as lastname1,
tblPatient.dob as DOB1,
tblPatient.mrn as mrn1,
tblPatientSmokingScreenOrder.SmokeStatus as SmokeStatus1,
tblPatientVisit.VisitNo as VisitNo1,
tblPatientVisit.PatientType as PatientType1,
from
tblPatient,
tblPatientSmokingScreenOrder,
tblPatientVisit
Where
tblPatient.ID = tblPatientVisit.PatientID
and tblPatientVisit.ID = tblPatientSmokingScreenOrder.VisitID
) as patients
where
isdate(patients.DOB1) = 1
and Convert(date,patients.DOB1) <'12/10/2000'
and patients.PatientType1 = 'I');
Cleaned it up a bit and I think they were right. MRN wont match patient id, at least not from your example data. You should not need an inner query. This query should give you what you want.
SELECT DISTINCT
p.id,
p.firstname,
p.lastname,
p.dob,
p.mrn,
s.SmokeStatus,
v.VisitNo
FROM
tblPatient p
JOIN tblPatientVisit v ON p.id = v.patientId
JOIN tblPatientSmokingScreenOrder s ON v.id = s.visitId
WHERE
isdate(p.DOB) = 1
AND CONVERT(date,p.DOB) <'12/10/2000'
AND v.PatientType = 'I'

join on three tables? Error in phpMyAdmin

I'm trying to use a join on three tables query I found in another post (post #5 here). When I try to use this in the SQL tab of one of my tables in phpMyAdmin, it gives me an error:
#1066 - Not unique table/alias: 'm'
The exact query I'm trying to use is:
select r.*,m.SkuAbbr, v.VoucherNbr from arrc_RedeemActivity r, arrc_Merchant m, arrc_Voucher v
LEFT OUTER JOIN arrc_Merchant m ON (r.MerchantID = m.MerchantID)
LEFT OUTER JOIN arrc_Voucher v ON (r.VoucherID = v.VoucherID)
I'm not entirely certain it will do what I need it to do or that I'm using the right kind of join (my grasp of SQL is pretty limited at this point), but I was hoping to at least see what it produced.
(What I'm trying to do, if anyone cares to assist, is get all columns from arrc_RedeemActivity, plus SkuAbbr from arrc_Merchant where the merchant IDs match in those two tables, plus VoucherNbr from arrc_Voucher where VoucherIDs match in those two tables.)
Edited to add table samples
Table arrc_RedeemActivity
RedeemID | VoucherID | MerchantID | RedeemAmt
----------------------------------------------
1 | 2 | 3 | 25
2 | 6 | 5 | 50
Table arrc_Merchant
MerchantID | SkuAbbr
---------------------
3 | abc
5 | def
Table arrc_Voucher
VoucherID | VoucherNbr
-----------------------
2 | 12345
6 | 23456
So ideally, what I'd like to get back would be:
RedeemID | VoucherID | MerchantID | RedeemAmt | SkuAbbr | VoucherNbr
-----------------------------------------------------------------------
1 | 2 | 3 | 25 | abc | 12345
2 | 2 | 5 | 50 | def | 23456
The problem was you had duplicate table references - which would work, except for that this included table aliasing.
If you want to only see rows where there are supporting records in both tables, use:
SELECT r.*,
m.SkuAbbr,
v.VoucherNbr
FROM arrc_RedeemActivity r
JOIN arrc_Merchant m ON m.merchantid = r.merchantid
JOIN arrc_Voucher v ON v.voucherid = r.voucherid
This will show NULL for the m and v references that don't have a match based on the JOIN criteria:
SELECT r.*,
m.SkuAbbr,
v.VoucherNbr
FROM arrc_RedeemActivity r
LEFT JOIN arrc_Merchant m ON m.merchantid = r.merchantid
LEFT JOIN arrc_Voucher v ON v.voucherid = r.voucherid