SQL for COUNT and GROUP BY - sql

This is my first experience posting a question in this site :) so, i´m in trouble. I have a database in access, that i connect with ODBC.
Database have 3 tables:
nombres : (-idnombre, -nombre)
rutas : (-idruta, ruta)
fechas : (-idfecha, idruta, idnombre, fecha)
Sample data :
nombres:(1,nombreA),(2,nombreB)
Rutas:(1,rutaA),(2,RutaB)
fechas:(1,1,1,28/06/2013), (2,2,1,28/06/2013), (3,2,2,28/06/2013),(4,2,2,28/06/2013)
so i need this output ( third field is a count ):
rutaA - nombreA - 1 time
rutaA - nombreB - 0 times
rutaB - nombreA - 1 time
rutaB - nombreB - 2 times
my sql is:
SELECT rutas.ruta, nombres.nombre, Count(fechas.idruta) AS CuentaDeidruta
FROM rutas INNER JOIN
(nombres INNER JOIN fechas ON nombres.idnombre = fechas.idnombre)
ON rutas.idruta = fechas.idruta
GROUP BY rutas.ruta, nombres.nombre;
its ok, but is not showing zero counts. so my output is:
rutaA - nombreA - 1 time
rutaB - nombreA - 1 time
rutaB - nombreB - 2 times
I tried with left join, so i get some errors.

I think you're looking for the Cartesian Product of Rutas and Nombres, thus giving you something like this as in your sample data, there is no correlation between rutaA and nombreB.
SELECT t.ruta, t.nombre, Count(fechas.idruta) AS CuentaDeidruta
FROM (
SELECT rutas.idruta, rutas.ruta, nombres.idnombre, nombres.nombre
FROM rutas, nombres
) t
LEFT JOIN fechas ON
t.idnombre = fechas.idnombre AND fechas.idruta = t.idruta
GROUP BY t.ruta, t.nombre;
SQL Fiddle Demo
This is for SQL Server, but should port over to MS Access pretty closely. Also please note the usage of a LEFT JOIN to ensure you get the 0s you were missing.

Related

SQL - Subquery/JOINS/Having - wrong record

I have a problem with one SQL exercise:
In which places, examiner with the surname 'Muryjas' examined more than 2 students?
We have 3 tables:
- Places (Id_place, Place_name)
- Exams (Id_examiner,Id_student, Id_place)
- Examiners (Id_examiner, Examiner_name)
SELECT p.Place_name
FROM Places p
INNER JOIN Egzams e ON e.Id_place = p.Id_place
INNER JOIN Egzaminers ee ON ee.Id_egzaminer = e.Id_egzaminer
WHERE ee.name = "Muryjas"
GROUP BY p.Nazwa_o
HAVING COUNT(e.Id_student) > 2
Expected result : should print 2 records
Actual result: I have 3 records, one is wrong.
Should I use subquery? But I don't have any idea how to implement this feature. Any suggestions?
HAVING COUNT(DISTINCT e.Id_student)

SQL Cross Join and IIF(ISNULL) - Not Working as Expected

I'm writing a query with a report file, an employee file and a report distribution file. I would like a list of employee names, each with every report name and a 0 if they don't get the report and a 1 if they do get the report.
select distinct ut.user_name
, ut.emailaddress
, r.name
, iif(ISNULL(rd.employeeid,0)=0, 0,1) AS currentreport
from US_usertable ut
cross join DL_reports r
left outer join DL_Reptdistrib rd
on ut.employeeID = rd.employeeid
For each user, I get a complete set of reports - so the cross join works - but either they get a 1 for all the reports or a 0 for all their reports. I don't understand why this query isn't working. Kindly help if you can. Thanks in advance!!!
I think you are missing a join condition on the report:
select ut.user_name, ut.emailaddress, r.name,
(case when rd.employeeid is null then 0 else 1 end) as currentreport
from US_usertable ut cross join
DL_reports r left outer join
DL_Reptdistrib rd
on ut.employeeID = rd.employeeid and r.?? = rd.??;
The ?? is for the field used to identify the report. I might guess that it is reportID.
Note: I switched the syntax to standard SQL. IIF() is in SQL Server for compatibility with MS Access (why Microsoft didn't put the ANSI standard case in MS Access is beyond me). I also replaced ISNULL() with the ANSI standard IS NULL.

Pentaho parametrized datasource

I am trying to build some parametrized data source (sql query over jndi).
query of my data source is:
SELECT ${param_interval}(dim_date.date), count(docs_fact.id) as docs_count
FROM rel_docs_dates
left join docs_fact on rel_docs_dates.doc_id = docs_fact.id
left join dim_date on rel_docs_dates.date_id = dim_date.id
Parametr ${param_interval} can get two values: MONTH and DAY, and as i checked it got the correct values.
But when i am trying to make preview of my dashboard i get warning "error processing component".
Notice that this query (see bellow) works ok.
SELECT MONTH(dim_date.date), count(docs_fact.id) as docs_count, ${param_interval} as tmp_fiel
FROM rel_docs_dates
left join docs_fact on rel_docs_dates.doc_id = docs_fact.id
left join dim_date on rel_docs_dates.date_id = dim_date.id
Can somebody tell me where is mistake? Or (may be) this way to use parameters in data source is not supported?
finaly i found decision. it isnt what i would like to have but it works and it the most important thing.
i rewrite my query with 'case' constraction and, it is important, i changed type of my parametr from string to numeric (string doesnt work :( ). now my query looks like this:
SELECT
case ${param_interval}
when 1 then MONTH(dim_date.date)
when 2 then DAY(dim_date.date)
end
,count(docs_fact.id) as fact_count
FROM rel_docs_dates
left join docs_fact on rel_docs_dates.doc_id = docs_fact.id
left join dim_date on rel_docs_dates.date_id = dim_date.id
where dim_date.date > LAST_DAY(DATE_SUB(CURDATE(), INTERVAL ${param_period} MONTH))
AND dim_date.date < LAST_DAY(DATE_SUB(CURDATE(), INTERVAL 0 MONTH))
group by
case ${param_interval}
when 1 then MONTH(dim_date.date)
when 2 then DAY(dim_date.date)
end
order by YEAR(dim_date.date), MONTH(dim_date.date)
may be it will help somebody else.

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

SQL subquery / select subset of data from separate table

I'm writing an SQL query to extract the printing usage for individual cartridges. I've got the main body of the query down as below but I'm having trouble selecting some specific data to do with the meter readings stored in a separate table.
The below query lists cartridges put into printers with the date they were activated and the date they were deactivated. I would then like to use a MeterReadings table to see what the usage was over that period using the ActivatedDate and DeactivatedDate based on the DeviceID. What I have so far is below
SELECT Devices.DeviceID,
Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID,
ConsumableVariants.Type,
ConsumableDescriptions.Description,
MAX(ConsumableReadings.ReadingDate) as DeactivatedDate,
MIN(ConsumableReadings.ReadingDate) AS ActivatedDate,
ConsumableReadings.ChangedDate,
CASE ConsumableVariants.ColourID
WHEN 1 THEN MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono)
ELSE MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour)
END AS PrintingDiff,
ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
FROM Devices
LEFT JOIN DeviceConsumables ON Devices.DeviceID = DeviceConsumables.DeviceID
LEFT JOIN ConsumableVariants ON DeviceConsumables.ConsumableVariantID = ConsumableVariants.ConsumableVariantID
LEFT JOIN ConsumableReadings ON DeviceConsumables.ConsumableID = ConsumableReadings.ConsumableID
LEFT JOIN ConsumableDescriptions ON ConsumableVariants.DescriptionID = ConsumableDescriptions.ConsumableDescriptionID
LEFT JOIN MeterReadings ON DeviceConsumables.DeviceID = MeterReadings.DeviceID
WHERE ConsumableVariants.Type = '3' -- To only get toner cartridges
AND Devices.DeviceID = '24'
AND MeterReadings.ScanDateTime > MIN(ConsumableReadings.ReadingDate)
AND MeterReadings.ScanDateTime < MAX(ConsumableReadings.ReadingDate)
GROUP BY devices.DeviceID, Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID, ConsumableVariants.Type, ConsumableDescriptions.Description,
ConsumableReadings.ChangedDate, ConsumableVariants.ColourID, ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
ORDER BY Devices.DeviceID
This is currently generating the error "An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference."
The calculated fields ActivatedDate and DeactivatedDate are the date ranges I will require. I want to use the case statement to select MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono) for black and white or MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour) for colour. This would effectively give me the usage as readings can only go upwards. This would hopefully give me the starting point usage with the MIN and ending point usage with the MAX for the specific DeviceID.
As shown above I'm joining on my MeterReadings table on DeviceID.
I can't figure out how to get the MeterReadings for device x between y and z (where x is DeviceID, y is ActivatedDate and z is DeactivatedDate) so I can then add a calculated column into the case statement. Any help appreciated.
-- Edit
For brevity I won't add all the schema in here but what should be enough.
Devices - list of all known devices
DeviceID
DeviceDescription
lots of extra fields that describe the device
DeviceConsumables - What devices use what consumables
ConsumableID
DeviceID - Forign key to device
ConsumableVariantID - Forign key to ConsumableVariant
ConsumableVariant - list of all the consumable variants there are
ConsumableVariantID
Type - 3 here indicates toner, what I'm interested in
ConsumableReadings
ReadingID - PK
ConsumableID - forign key to DeviceConsumables
ReadingDate - last time a reading was taken
ChangedDate - last time a new cartridge was inserted
MeterReadings
ReadingID - PK not to do with PK of consumablereadings
DeviceID
ScanDateTime - time usage scan was taken
TotalMono - total mono at scan time
TotalColour Total colour at scan time
Well you have to break you queries into nested queries ... Below query is not tested, so it may have some syntax problem, but it gives a way to find out what you are looking for ...
SELECT Devices.DeviceID,
Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID,
ConsumableVariants.Type,
ConsumableDescriptions.Description,
A.DeactivatedDate,
A.ActivatedDate,
A.ChangedDate,
CASE ConsumableVariants.ColourID
WHEN 1 THEN MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono)
ELSE MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour)
END AS PrintingDiff,
ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
FROM Devices
LEFT JOIN DeviceConsumables ON Devices.DeviceID = DeviceConsumables.DeviceID
LEFT JOIN ConsumableVariants ON DeviceConsumables.ConsumableVariantID = ConsumableVariants.ConsumableVariantID
LEFT JOIN ConsumableReadings ON DeviceConsumables.ConsumableID = ConsumableReadings.ConsumableID
LEFT JOIN ConsumableDescriptions ON ConsumableVariants.DescriptionID = ConsumableDescriptions.ConsumableDescriptionID
LEFT JOIN
(
SELECT D.DeviceID,
MAX(CR.ReadingDate) as DeactivatedDate,
MIN(CR.ReadingDate) AS ActivatedDate,
CR.ChangedDate
FROM Devices D
LEFT JOIN DeviceConsumables DC ON D.DeviceID = DC.DeviceID
LEFT JOIN ConsumableReadings CR ON DC.ConsumableID = CR.ConsumableID
WHERE D.DeviceID = '24'
GROUP BY D.DeviceID,
CR.ChangedDate
) AS A ON DeviceConsumables.DeviceID = A.DeviceID
LEFT JOIN MeterReadings ON A.DeviceID = MeterReadings.DeviceID
WHERE ConsumableVariants.Type = '3' -- To only get toner cartridges
AND Devices.DeviceID = '24'
AND MeterReadings.ScanDateTime > A.ActivatedDate
AND MeterReadings.ScanDateTime < A.DeactivatedDate
GROUP BY devices.DeviceID, Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID, ConsumableVariants.Type, ConsumableDescriptions.Description,
ConsumableReadings.ChangedDate, ConsumableVariants.ColourID, ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
ORDER BY Devices.DeviceID
First, I'd add to your output the ColourID so you know if you are reading the Mono or Colour values. Second, I believe if you remove the ConsumableID from your group by clause, it should work. ConsumableID rows have one date, and if you include that in your group by, you'll never be able to get a max and min, therefore the difference.
Your problem is in your join statement.
Change the following line:
LEFT JOIN ConsumableTypes ON ConsumableVariants.Type = ConsumableVariants.Type
To something like:
LEFT JOIN ConsumableTypes ON ConsumableVariants.Type = ConsumableTypes.Type
(or whatever table you are joining to).