PSQL not counting all entries - sql

I am writing a count query which counts attendances at events over a few years. However there is a column named status in which it has extra information about attendance. Apology is listed when the person did not attend and thus should be excluded. status has various entries including group's letters, attendance information, guest status, as well as NULL entries.
The problem is that not all of the attendances are counted in full. They are counted in full when the status NOT LIKE 'Apology' is removed.
How can I get it to count all of the entries? Rather than a limited selection of 364 out of 1583.
I am using psql v9.3.11 in pgAdmin III v1.18.1.
The following is the relevant code:
SELECT b.attendances,
COUNT(b.attendances)
FROM (
SELECT COUNT(a.event_id) as attendances
FROM (SELECT DISTINCT event_id AS event_id,
member_pers_id AS member_pers_id
FROM event_attendance
WHERE status NOT LIKE 'Apology') a
GROUP BY member_pers_id
ORDER BY attendances) b
GROUP BY attendances
ORDER BY attendances ASC

I suspect the NOT LIKE is filtering out additional values, such as NULL. If so, then test for this explicitly.
Also, it makes no sense to do so many aggregations. One is sufficient:
SELECT COUNT(a.event_id) as attendances
FROM (SELECT DISTINCT event_id, member_pers_id
FROM event_attendance
WHERE status NOT LIKE 'Apology' OR status IS NULL
) a
And I'm not sure if the SELECT DISTINCT is necessary

Related

Use Common Table Expression to get difference between two sets of data

I'm trying to use a common table expression to find the differences between two queries I wrote. The first query returns how many patients belong to each ROOMID(each ID represent a specific room).
Second query I have is how many patients that belong to each ROOMId have surgery operated on them. PatientID represent each patient.
select roomID, count(distinct patientID) as totalinsurgery
from data with (nolock)
where ptprocess = 'surgery'
group by clientid, batchid
Second query:
select CAroomid, sum(patientsinroom) as patientsinroom
from data
group by caroomid
So the idea behind is try to get the 'difference' in result of the two query. So how many patients in the room went to surgery. What is the best way to use common table expression to get the result?
So how many patients in the room went to surgery.
I suspect you just want conditional aggregation:
select roomId,
count(distinct case when ptprocess = 'surgery' then patientID end) as num_surgery
count(distinct patientID) as total
from data
group by roomId;
Note: I have no idea why you are using count(distinct). Can a patient really occur more than one time in a room?

Why do I get extra rows in LEFT JOIN when joining to an ID and TIMESTAMP column?

I have a table that contains multiple registration periods (date and time for the start of the registration, as well as date and time for when that instance of registration ends). For each row (registration period), there is a status column that contains the status at the end of the registration period. I was trying to get the status associated with the most recent end date of registration per a given ID. I've used a window function to get the most recent end date of interest per ID, and then I wanted to LEFT JOIN on ID and end date to get the status from the same table on which I used the window function. There should really just be one just one combination for a given end date and status per ID, but somehow I get more rows that what's in the left table.
Like I mentioned earlier, my approach was to use a window function to get MAX(end_date) per ID and some other column, let's call it enrollment_number. Then use LEFT JOIN on this table and its parent table to bring in status associated with that date only. Later, I'd like to use the result of this join to bring in the status associated with the end date into other tables where I need it.
WITH
my_first_test AS
(
SELECT my_id,
enrollment_number,
MAX(end_date_of_enrollment) OVER (partition by my_id, enrollment_number) AS end_date_enrolled
FROM enrollments
)
SELECT mft.my_id, mft.end_date_enrolled, e.status
FROM my_first_test AS mft
LEFT JOIN enrollments AS e
ON mft.my_id = e.my_id AND mft.end_date_enrolled = e.end_date_enrolled;
The CTE returns 42917 rows, same number of rows as in the enrollments table, which it should be if I understand it correctly.
Then, I LEFT JOIN enrollments, to bring in information from the status column also contained in the enrollments table. The LEFT JOIN is done on my_id and end_date_enrolled.
I expect 42917 rows in the resulting table, because my_id and end_date_enrolled together should be unique. However, I get slightly more rows in my final table - 44408. I was wondering if the StackOverflow community would be able to help me solve this mystery. I am using SQL in AWS Redshift.
You have duplicates in enrollments. You can find them with aggregation:
SELECT my_id, end_date_enrolled, COUNT(*)
FROM enrollments AS e
GROUP BY my_id, end_date_enrolled
HAVING COUNT(*) > 1;

Identifying data with different occurrences

How to find the animals with different test dates?
If you want to find animals which have 2 or more distinct dates then you can use COUNT along with DISTINCT:
SELECT animals_key, anim_ident
FROM jt3
GROUP BY animals_key, anim_ident
HAVING COUNT(DISTINCT test_date) > 1
If you want to also include test date information, then you can join the above query to the original table.

How does GROUP BY use COUNT(*)

I have this query which finds the number of properties handled by each staff member along with their branch number:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s, PropertyForRent p
WHERE s.staffNo=p.staffNo
GROUP BY s.branchNo, s.staffNo
The two relations are:
Staff{staffNo, fName, lName, position, sex, DOB, salary, branchNO}
PropertyToRent{propertyNo, street, city, postcode, type, rooms, rent, ownerNo, staffNo, branchNo}
How does SQL know what COUNT(*) is referring to? Why does it count the number of properties and not (say for example), the number of staff per branch?
This is a bit long for a comment.
COUNT(*) is counting the number of rows in each group. It is not specifically counting any particular column. Instead, what is happening is that the join is producing multiple properties, because the properties are what cause multiple rows for given values of s.branchNo and s.staffNo.
It gets even a little more "confusing" if you include a column name. The following would all typically return the same value:
COUNT(*)
COUNT(s.branchNo)
COUNT(s.staffNo)
COUNT(p.propertyNo)
With a column name, COUNT() determines the number of rows that do not have a NULL value in the column.
And finally, you should learn to use proper, explicit join syntax in your queries. Put join conditions in the on clause, not the where clause:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s JOIN
PropertyForRent p
ON s.staffNo = p.staffNO
GROUP BY s.branchNo, s.staffNo;
GROUP BY clauses partition your result set. These partitions are all the sql engine needs to know - it simply counts their sizes.
Try your query with only count(*) in the select part.
In particular, COUNT(*) does not produce the number of distinct rows/columns in your result set!
Some people might think that count(*) really count all the columns, however the sql optimizer is smarter than that.
COUNT(*) returns the number of rows in a specified table without getting rid of duplicates. Which mean that you can't use Distinct with count(*)
Count(*) will return the cardinality (elements in table) of the specified mapping.
What you have to remember is that when using count over a specific column, null won't be allowed while count(*) will allow null in the rows as it could be any field.
How does SQL know what COUNT(*) is referring to?
I'm pretty sure, however not 100% sure as I can't find in doc, that the sql optimizer simply do a count on the primary key (not null) instead of trying to handle null in rows.

Can peewee nest SELECT queries such that the outer query selects on an aggregate of the inner query?

I'm using peewee2.1 with python3.3 and an sqlite3.7 database.
I want to perform certain SELECT queries in which:
I first select some aggregate (count, sum), grouping by some id column; then
I then select from the results of (1), aggregating over its aggregate. Specifically, I want to count the number of rows in (1) that have each aggregated value.
My database has an 'Event' table with 1 record per event, and a 'Ticket' table with 1..N tickets per event. Each ticket record contains the event's id as a foreign key. Each ticket also contains a 'seats' column that specifies the number of seats purchased. (A "ticket" is really best thought of as a purchase transaction for 1 or more seats at the event.)
Below are two examples of working SQLite queries of this sort that give me the desired results:
SELECT ev_tix, count(1) AS ev_tix_n FROM
(SELECT count(1) AS ev_tix FROM ticket GROUP BY event_id)
GROUP BY ev_tix
SELECT seat_tot, count(1) AS seat_tot_n FROM
(SELECT sum(seats) AS seat_tot FROM ticket GROUP BY event_id)
GROUP BY seat_tot
But using Peewee, I don't know how to select on the inner query's aggregate (count or sum) when specifying the outer query. I can of course specify an alias for that aggregate, but it seems I can't use that alias in the outer query.
I know that Peewee has a mechanism for executing "raw" SQL queries, and I've used that workaround successfully. But I'd like to understand if / how these queries can be done using Peewee directly.
I posted the same question on the peewee-orm Google group. Charles Leifer responded promptly with both an answer and new commits to the peewee master. So although I'm answering my own question, obviously all credit goes to him.
You can see that thread here: https://groups.google.com/forum/#!topic/peewee-orm/FSHhd9lZvUE
But here's the essential part, which I've copied from Charles' response to my post:
I've added a couple commits to master which should make your queries
possible
(https://github.com/coleifer/peewee/commit/22ce07c43cbf3c7cf871326fc22177cc1e5f8345).
Here is the syntax,roughly, for your first example:
SELECT ev_tix, count(1) AS ev_tix_n FROM
(SELECT count(1) AS ev_tix FROM ticket GROUP BY event_id)
GROUP BY ev_tix
ev_tix = SQL('ev_tix') # the name of the alias.
(Ticket
.select(ev_tix, fn.count(ev_tix).alias('ev_tix_n'))
.from_(
Ticket.select(fn.count(Ticket.id).alias('ev_tix')).group_by(Ticket.event))
.group_by(ev_tix))
This yields the following SQL:
SELECT ev_tix, count(ev_tix) AS ev_tix_n FROM (SELECT Count(t2."id")
AS ev_tix FROM "ticket" AS t2 GROUP BY t2."event_id")
GROUP BY ev_tix