SELECT COUNT(*) - return 0 along with grouped fields if there are no matching rows - sql

I have the following query:
SELECT employee,department,count(*) AS sum FROM items
WHERE ((employee = 1 AND department = 2) OR
(employee = 3 AND department = 4) OR
(employee = 5 AND department = 6) OR
([more conditions with the same structure]))
AND available = true
GROUP BY employee, department;
If there are no items for a pair "employee-department", then the query returns nothing. I'd like it to return zero instead:
employee | department | sum
---------+------------+--------
1 | 2 | 0
3 | 4 | 12
5 | 6 | 1234
EDIT1
Looks like this is not possible, as Matthew PK explains in his answer to a similar question. I was mistakenly assuming Postgres could extract missing values from WHERE clause somehow.
EDIT2
It is possible with some skills. :) Thanks to Erwin Brandstetter!

Not possible? Challenge accepted. :)
WITH x(employee, department) AS (
VALUES
(1::int, 2::int)
,(3, 4)
,(5, 6)
-- ... more combinations
)
SELECT x.employee, x.department, count(i.employee) AS ct
FROM x
LEFT JOIN items i ON i.employee = x.employee
AND i.department = x.department
AND i.available
GROUP BY x.employee, x.department;
This will give you exactly what you are asking for. If employee and department aren't integer, cast to the matching type.
Per comment from #ypercube: count() needs to be on a non-null column of items, so we get 0 for non-existing critera, not 1.
Also, pull up additional criteria into the LEFT JOIN condition (i.available in this case), so you don't exclude non-existing criteria.
Performance
Addressing additional question in comment.
This should perform very well. With longer lists of criteria, (LEFT) JOIN is probably the fastest method.
If you need it as fast as possible, be sure to create a multicolumn index like:
CREATE INDEX items_some_name_idx ON items (employee, department);
If (employee, department) should be the PRIMARY KEY or you should have a UNIQUE constraint on the two columns, that would do the trick, too.

select employee, department,
count(
(employee = 1 and department = 2) or
(employee = 3 and department = 4) or
(employee = 5 and department = 6) or
null
) as sum
from items
where available = true
group by employee, department;

Building on Erwin's join suggestion, this one really works:
with x(employee, department) as (
values (1, 2)
)
select
coalesce(i.employee, x.employee) as employee,
coalesce(i.department, x.department) as department,
count(available or null) as ct
from
x
full join
items i on
i.employee = x.employee
and
i.department = x.department
group by 1, 2
order by employee, department

Related

SQL query (Postgres) how to answer that?

I have a table with company id's (non unique) and some attribute (let's call it status id), status can be between 1 to 18 (many to many the row id is what unique)
now I need to get results of companies who only have rows with 1 and 18, if they have any number as well (let's say 3) then this company should not be returned.
The data is stored as row id, some meta data, company id and one status id, the example below is AFTER I ran a group by query.
So as an example if I do group by and string agg, I am getting these values:
Company ID Status
1 1,9,12,18
2 12,13,18
3 1
4 8
5 18
So in this case I need to return only 3 and 5.
You should fix your data model. Here are some reasons:
Storing numbers in strings is BAD.
Storing multiple values in a string is BAD.
SQL has poor string processing capabilities.
Postgres offers many ways to store multiple values -- a junction table, arrays, and JSON come to mind.
For your particular problem, how about an explicit comparison?
where status in ('1', '18', '1,18', '18,1')
You can group by companyid and set 2 conditions in the having clause:
select companyid
from tablename
group by companyid
having
sum((status in (1, 18))::int) > 0
and
sum((status not in (1, 18))::int) = 0
Or with EXCEPT:
select companyid from tablename
except
select companyid from tablename
where status not in (1, 18)
See the demo.
Results:
> | companyid |
> | --------: |
> | 3 |
> | 5 |
You can utilize group by and having. ie:
select *
from myTable
where statusId in (1,18)
and companyId in (select companyId
from myTable
group by companyId
having count(distinct statusId) = 1);
EDIT: If you meant to include those who have 1,18 and 18,1 too, then you could use array_agg instead:
select *
from t t1
inner join
(select companyId, array_agg(statusId) as statuses
from t
group by companyId
) t2 on t1.companyid = t2.companyid
where array[1,18] #> t2.statuses;
EDIT: If you meant to get back only companyIds without the rest of columns and data:
select companyId
from t
group by companyId
having array[1,18] #> array_agg(statusId);
DbFiddle Demo

Join results of two queries in SQL and produce a result given some condition

I've never used SQL until now, so please bear with me. I have a table of departments:
I have written two queries as follows:
-- nbr of staff associated with each dept.
SELECT count(departmentId) as freq
FROM staff
GROUP BY departmentId
-- nbr of students associated with each dept.
SELECT count(departmentId) as freq
FROM StudentAssignment
GROUP BY departmentId
These produce the following two tables:
For each department id 1 to 5, I need to divide the studentFreq by the staffFreq and show the department id and description if the result is greater than 2.
If the staffFreq i.e. number of staff, for a department id is zero then I need to show that department id and description too.
So for example, in this case I want to produce a table with the department ids of 1, 4 and 5 and their corresponding descriptions: Computing, Classics and Mechanical Engineering.
Computing because 7 / 2 > 2. Classics and ME because 0 staff are assigned to those depts.
One method is a left join, starting with the departments table:
SELECT d.*,
s.freq as as num_staff, sa.freq as num_students,
sa.freq * 1.0 / s.freq as student_staff_ratio
FROM deptartments d LEFT JOIN
(SELECT departmentId, count(*) as freq
FROM staff
GROUP BY departmentId
) s
ON s.departmentId = d.department_id LEFT JOIN
(SELECT departmentId, count(*) as freq
FROM StudentAssignment
GROUP BY departmentId
) sa
ON sa.departmentId = d.departmentId;
Notes:
This should missing values as NULL rather than 0. You can assign 0 instead using COALESCE(): COALESCE(s.freq, 0) as num_staff.
SQL Server does integer division, so 7 / 2 = 3, not 3.5. I think you would typically want the fractional component.

Selecting a row based on column value

I have a select statement in SQL. The select statement is selecting a licenseNo and a LicenseID. Basically, I want it to return the LicenseNo depending on which LicenseTypeID it is.
For example, I want it to return the LicenseNo if the LicenseTypeID = 6 first, then if there is no ID that equals 6, return the LicenseNo where the LicenseTypeID = 5 and so on.
Right now, I have a join that is causing multiple LicenseNos to be returned because there are multiple LicenseTypeIDs. I only want it to return the LicenseNo and row in which the ID of 6 takes precedence, then 5, then 4 and so on. It looks something like this right now:
Select a.Name,
a.addressNo,
b.LicenseNo,
LicenseTypeID
from addressbook a
join licenses b
on a.addressNo = b.addressNo
Returns
111 CompanyA 1234 6
111 CompanyA 2222 4
So I only want it to return the first row, and if that ID doesnt exist (6) I want it to return the second row of 4.
You need a subselect to determine the maximum licence number for each address:
select
a.name,
a.addressno,
l.licenseno,
l.licensetypeid
from addressbook a
join licenses l on l.addressno = a.addressno
where l.licenseno =
(
select max(licenseno)
from licenses
where licenses.addressno = a.addressno
);
Try this.
SELECT * FROM
(SELECT ROW_NUMBER() OVER
(PARTITION BY l.licenseno ORDER BY l.licenseno DESC) NO,
a.Name,
a.addressNo,
b.LicenseNo,
LicenseTypeID
from addressbook a
join licenses b
on a.addressNo = b.addressNo) AS t WHERE no = 1

write a query to identify discrepancy

I have a table with Student ID's and Student Names. There has been issues with assigning unique Student Id's to students and Hence I want to find the duplicates
Here is the sample Table:
Student ID Student Name
1 Jack
1 John
1 Bill
2 Amanda
2 Molly
3 Ron
4 Matt
5 James
6 Kathy
6 Will
Here I want a third column "Duplicate_Count" to display count of duplicate records.
For e.g. "Duplicate_Count" would display "3" for Student ID = 1 and so on. How can I do this?
Thanks in advance
Select StudentId, Count(*) DupCount
From Table
Group By StudentId
Having Count(*) > 1
Order By Count(*) desc,
Select
aa.StudentId, aa.StudentName, bb.DupCount
from
Table as aa
join
(
Select StudentId, Count(*) as DupCount from Table group by StudentId
) as bb
on aa.StudentId = bb.StudentId
The virtual table gives the count for each StudentId, this is joined back to the original table to add the count to each student record.
If you want to add a column to the table to hold dupcount, this query can be used in an update statement to update that column in the table
This should work:
update mytable
set duplicate_count = (select count(*) from mytable t where t.id = mytable.id)
UPDATE:
As mentioned by #HansUp, adding a new column with the duplicate count probably doesn't make sense, but that really depends on what the OP originally thought of using it for. I'm leaving the answer in case it is of help for someone else.

SQL select value if no corresponding value exists in another table

I have a database which tries to acheive point-in-time information by having a master table and a history table which records when fields in the other table will/did change. e.g.
Table: Employee
Id | Name | Department
-----------------------------
0 | Alice | 1
1 | Bob | 1
Table: History
ChangeDate | Field | RowId | NewValue
---------------------------------------------
05/05/2009 | Department | 0 | 2
That records that employee 0 (Alice) will move to department 2 on 05/05/2009.
I want to write a query to determine the employee's department on a particular date. So it needs to:
Find the first history record for that field and employee before given date
If none exists then default to the value currently in the master employee table.
How can I do this? My intuition is to select the first row of a result set which has all suitable history records reverse ordered by date and with the value in the master table last (so it's only the first result if there are no suitable history records), but I don't have the required SQL-fu to achieve this.
Note: I am conscious that this may not be the best way to implement this system - I am not able to change this in the short term - though if you can suggest a better way to implement this I'd be glad to hear it.
SELECT COALESCE (
(
SELECT newValue
FROM history
WHERE field = 'Department'
AND rowID = ID
AND changeDate =
(
SELECT MAX(changedate)
FROM history
WHERE field = 'Department'
AND rowID = ID
AND changeDate <= '01/01/2009'
)
), department)
FROM employee
WHERE id = #id
In both Oracle and MS SQL, you can also use this:
SELECT COALESCE(newValue, department)
FROM (
SELECT e.*, h.*,
ROW_NUMBER() OVER (PARTITION BY e.id ORDER BY changeDate) AS rn
FROM employee e
LEFT OUTER JOIN
history h
ON field = 'Department'
AND rowID = ID
AND changeDate <= '01/01/2009'
WHERE e.id = #id
)
WHERE rn = 1
Note, though, that ROWID is reserved word in Oracle, so you'll need to rename this column when porting.
This should work:
select iif(history.newvalue is null, employee.department, history.newvalue)
as Department
from employee left outer join history on history.RowId = employee.Id
and history.changedate < '2008-05-20' // (i.e. given date)
and history.changedate = (select max(changedate) from history h1
where h1.RowId = history.RowId and h1.changedate <= history.changedate)