Query Shows multiple listings of same info - sql

I'm using this query to find all the machines on the network (using dell kace) that have an expired warranty according to their service tag.
However, when I run the query, some of the machines are listed twice but should only be listed once.
Here is an example of the output where machine example3 is correctly listed but example1 is listed twice.
# Machine Name Service Tag
1 example1 abcd123
2 example1 abcd123
3 example3 abcd124
Code:
SELECT
M.NAME AS MACHINE_NAME, M.CS_MODEL AS MODEL, DA.SERVICE_TAG,
DA.SHIP_DATE,M.USER_LOGGED AS LAST_LOGGED_IN_USER, DW.SERVICE_LEVEL_CODE,
DW.SERVICE_LEVEL_DESCRIPTION, DW.END_DATE AS EXPIRATION_DATE
FROM
DELL_WARRANTY DW
JOIN
DELL_ASSET DA ON (DW.SERVICE_TAG = DA.SERVICE_TAG)
JOIN
MACHINE M
ON (M.BIOS_SERIAL_NUMBER = DA.PARENT_SERVICE_TAG OR M.BIOS_SERIAL_NUMBER = DA.SERVICE_TAG)
LEFT JOIN
DELL_WARRANTY DW2 ON DW2.SERVICE_TAG=DW.SERVICE_TAG and DW2.END_DATE > NOW()
WHERE
M.CS_MANUFACTURER LIKE '%dell%'
AND
M.BIOS_SERIAL_NUMBER!=''
AND
DA.DISABLED != 1
AND
DW.END_DATE < NOW()
AND
DW2.SERVICE_TAG IS NULL;
Any ideas on how to make computers with the same machine name and service tags only output once? Thanks.

Let me make the assumption that you have a reasonable data model and reasonably populated data. That means that the duplicates are not coming from inappropriate data stored in the database.
Your query (formatted so I can read it) is:
SELECT M.NAME AS MACHINE_NAME, M.CS_MODEL AS MODEL, DA.SERVICE_TAG,
DA.SHIP_DATE, M.USER_LOGGED AS LAST_LOGGED_IN_USER, DW.SERVICE_LEVEL_CODE,
DW.SERVICE_LEVEL_DESCRIPTION, DW.END_DATE AS EXPIRATION_DATE
FROM DELL_WARRANTY DW JOIN
DELL_ASSET DA
ON DW.SERVICE_TAG = DA.SERVICE_TAG JOIN
MACHINE M
ON M.BIOS_SERIAL_NUMBER = DA.PARENT_SERVICE_TAG OR
M.BIOS_SERIAL_NUMBER = DA.SERVICE_TAG LEFT JOIN
DELL_WARRANTY DW2
ON DW2.SERVICE_TAG = DW.SERVICE_TAG and DW2.END_DATE > NOW()
WHERE M.CS_MANUFACTURER LIKE '%dell%' AND
M.BIOS_SERIAL_NUMBER <> '' AND
DA.DISABLED <> 1;
The suspect join is the one on Machine because it has an or. So, two machines might match different parts of the service tag, resulting in multiple very similar rows.
If your concern is specifically about machine names and service tags (the two columns you highlighted in the question), then you can fix those duplicates by ending the query with:
group by M.NAME, DA.SERVICE_TAG
(This assumes that you are using MySQL -- based on the syntax of your query. Most other databases would require aggregation functions around the rest of the columns in select.)

Try putting a DISTINCT infront of M.Name
OR playing with the joins like
SELECT
M.NAME AS MACHINE_NAME, M.CS_MODEL AS MODEL, DA.SERVICE_TAG,
DA.SHIP_DATE,M.USER_LOGGED AS LAST_LOGGED_IN_USER, DW.SERVICE_LEVEL_CODE,
DW.SERVICE_LEVEL_DESCRIPTION, DW.END_DATE AS EXPIRATION_DATE
FROM
DELL_WARRANTY DW
INNER JOIN
DELL_ASSET DA ON (DW.SERVICE_TAG = DA.SERVICE_TAG)
INNER JOIN
MACHINE M
ON (M.BIOS_SERIAL_NUMBER = DA.PARENT_SERVICE_TAG OR M.BIOS_SERIAL_NUMBER = DA.SERVICE_TAG)
INNER JOIN
DELL_WARRANTY DW2 ON DW2.SERVICE_TAG=DW.SERVICE_TAG and DW2.END_DATE > NOW()
WHERE
M.CS_MANUFACTURER LIKE '%dell%'
AND
M.BIOS_SERIAL_NUMBER!=''
AND
DA.DISABLED != 1
AND
DW.END_DATE < NOW()
AND
DW2.SERVICE_TAG IS NULL;

Related

Join 3 tables using column common to all 3 tables

I am a total SQL novice, so please bear with me. I have three tables that are set up in the following fashion:
date|country|Test 1|Test 2|Test 3|etc.
The data in the date and country columns are identical across the three tables, and the differences are in the data in the Test columns. I'd like to use Join to query one date column and the three corresponding Test columns from the three tables.
I'm planning on just re-building the table so that the Test columns in the other tables are additional columns in the one table, but I'd still like to know how to use Join in this way. This is what I have at the moment, although it's throwing an error saying that there's an error in the syntax of the FROM clause. It's worth noting that I'm running this query in VBA using an Access DB.
SELECT r.CRDate, r.Test, p.Test, z.Test
FROM CountryRaw as r
INNER JOIN CountryPct as p ON p.CPctDate = r.CRDate
INNER JOIN CountryZ as z ON z.CZDate = p.CPctDate
WHERE r.Country = 'US' AND p.Country = 'US' AND z.Country = 'US'
I came across something using SELECT COALESCE(r.CRDate, p.CPctDate, z.CZDate) to start, but I didn't get anywhere with that.
MS Access requires extra parentheses. So try this:
SELECT r.CRDate, r.Test, p.Test, z.Test
FROM (CountryRaw as r INNER JOIN
CountryPct as p
ON p.CPctDate = r.CRDate
) INNER JOIN
CountryZ as z
ON z.CZDate = p.CPctDate
WHERE r.Country = 'US' AND p.Country = 'US' AND z.Country = 'US'

Sum up multiple values based off a single column

For context, I work in transportation. Also, I apologize for a poor title - I'm not exactly sure how to summarize my issue.
I am currently editing an existing report which returns a drivers ID, their name, when they were hired, and the total amount of miles they have driven since they have started at the company. It was brought to my attention that drivers who move within the company are assigned a different driverID, which is not counted towards their total miles driven. Using an example provided to me, I was indeed able to confirm this scenario, as indicated below:
DriverCode DriverName
----------- ----------------
WETDE Wethington,Dean
WETDEA Wethington,Dean
This is the query that gets the above (example driver is hardcoded at the moment):
select mpp.mpp_id as DriverCode,
mpp.mpp_lastfirst as DriverName
from manpowerprofile mpp
outer apply (select top 1 mpp_id
from manpowerprofile) as id
where mpp_firstname = 'Dean'
and mpp_lastname = 'Wethington'
This is the current query as it stands:
SELECT lh.lgh_driver1 as DriverCode
,m.mpp_lastfirst as DriverName
,m.mpp_hiredate as HireDate
,SUM(s.stp_lgh_mileage) as TotMiles
FROM stops s (nolock)
INNER JOIN legheader lh (nolock) on lh.lgh_number = s.lgh_number
INNER JOIN manpowerprofile m (nolock) on m.mpp_id = lh.lgh_driver1
/* OUTER APPLY ( SELECT top 1 mpp_id
FROM manpowerprofile) as id */
WHERE m.mpp_terminationdt > GETDATE()
AND m.mpp_id <> 'UNKNOWN'
AND lh.lgh_outstatus = 'CMP'
GROUP BY lh.lgh_driver1, m.mpp_lastfirst, m.mpp_hiredate
HAVING SUM(s.stp_lgh_mileage) > 850000
ORDER BY DriverCode DESC
What I'm looking to do is check to see if a name exists twice, and if it does, add both of those driver code's total miles together to return a single result for that individual driver. I'm a pretty novice SQL Developer still and have only now really started to delve into databases.
My current train of thought was to use an outer apply, but I'm sure there's a better way to do this.
As per your comment, leaving off the driver code and hire date...
(Because they could/would be different for the drivers being combined.)
SELECT
m.mpp_lastfirst as DriverName
,SUM(s.stp_lgh_mileage) as TotMiles
FROM
stops s (nolock)
INNER JOIN
legheader lh (nolock)
on lh.lgh_number = s.lgh_number
INNER JOIN
manpowerprofile m (nolock)
on m.mpp_id = lh.lgh_driver1
WHERE
m.mpp_terminationdt > GETDATE()
AND m.mpp_id <> 'UNKNOWN'
AND lh.lgh_outstatus = 'CMP'
GROUP BY
m.mpp_lastfirst
HAVING
SUM(s.stp_lgh_mileage) > 850000
ORDER BY
m.mpp_lastfirstDESC

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

SQL Need advice how to add timestamp to this query

I have this code:
select Users.phoneMac, Users.apMac, Locations.Lon, Locations.Lat
from Locations, Users
inner join (
select u.phoneMac, max(u.strenght) as most
from Users u, Locations l
where u.apMac = l.apMac
group by u.phoneMac
) as ij on ij.phoneMac=Users.phoneMac and Users.strenght = ij.most
where Locations.apMac = Users.apMac;
It worked for me fine but when I added more data to users table this query calculated results from all the data and I wanted to get results just from latest data. So I added timestamp to Users table.
So can you help me fix this code so it first take only data from latest timestamp for every user(users.phoneMac)(there can be more then 1 row of data for same phoneMac) and then do the rest of calculations.
You're already picking the max value of the "strenght" field and joining on that, so why not use the same approach again for your timestamp field? Something like:
SELECT Users.phoneMac, Users.apMac, Locations.Lon, Locations.Lat
FROM Locations
INNER JOIN Users
ON Users.apMac = Locations.apMac
INNER JOIN (
SELECT u.phoneMac, max(u.strenght) AS most
FROM Locations l
INNER JOIN Users u ON u.apMac = l.apMac
GROUP BY u.phoneMac) AS ij
ON ij.phoneMac = Users.phoneMac
AND Users.strenght = ij.most
INNER JOIN (
SELECT u2.phoneMac, max(u2.timestampfield) AS latest
FROM Locations l2
INNER JOIN Users u2 ON u2.apMac = l2.apMac
GROUP BY u2.phoneMac) AS ijk
ON ijk.phoneMac = Users.phoneMac
AND Users.timestampfield = ij.latest;
(By the way, using the old join syntax with comma and the WHERE clause makes it harder to understand the logic, and occasionally makes the logic wrong. The new join syntax with ON is really a lot better.)

SQL subquery / select subset of data from separate table

I'm writing an SQL query to extract the printing usage for individual cartridges. I've got the main body of the query down as below but I'm having trouble selecting some specific data to do with the meter readings stored in a separate table.
The below query lists cartridges put into printers with the date they were activated and the date they were deactivated. I would then like to use a MeterReadings table to see what the usage was over that period using the ActivatedDate and DeactivatedDate based on the DeviceID. What I have so far is below
SELECT Devices.DeviceID,
Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID,
ConsumableVariants.Type,
ConsumableDescriptions.Description,
MAX(ConsumableReadings.ReadingDate) as DeactivatedDate,
MIN(ConsumableReadings.ReadingDate) AS ActivatedDate,
ConsumableReadings.ChangedDate,
CASE ConsumableVariants.ColourID
WHEN 1 THEN MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono)
ELSE MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour)
END AS PrintingDiff,
ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
FROM Devices
LEFT JOIN DeviceConsumables ON Devices.DeviceID = DeviceConsumables.DeviceID
LEFT JOIN ConsumableVariants ON DeviceConsumables.ConsumableVariantID = ConsumableVariants.ConsumableVariantID
LEFT JOIN ConsumableReadings ON DeviceConsumables.ConsumableID = ConsumableReadings.ConsumableID
LEFT JOIN ConsumableDescriptions ON ConsumableVariants.DescriptionID = ConsumableDescriptions.ConsumableDescriptionID
LEFT JOIN MeterReadings ON DeviceConsumables.DeviceID = MeterReadings.DeviceID
WHERE ConsumableVariants.Type = '3' -- To only get toner cartridges
AND Devices.DeviceID = '24'
AND MeterReadings.ScanDateTime > MIN(ConsumableReadings.ReadingDate)
AND MeterReadings.ScanDateTime < MAX(ConsumableReadings.ReadingDate)
GROUP BY devices.DeviceID, Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID, ConsumableVariants.Type, ConsumableDescriptions.Description,
ConsumableReadings.ChangedDate, ConsumableVariants.ColourID, ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
ORDER BY Devices.DeviceID
This is currently generating the error "An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference."
The calculated fields ActivatedDate and DeactivatedDate are the date ranges I will require. I want to use the case statement to select MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono) for black and white or MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour) for colour. This would effectively give me the usage as readings can only go upwards. This would hopefully give me the starting point usage with the MIN and ending point usage with the MAX for the specific DeviceID.
As shown above I'm joining on my MeterReadings table on DeviceID.
I can't figure out how to get the MeterReadings for device x between y and z (where x is DeviceID, y is ActivatedDate and z is DeactivatedDate) so I can then add a calculated column into the case statement. Any help appreciated.
-- Edit
For brevity I won't add all the schema in here but what should be enough.
Devices - list of all known devices
DeviceID
DeviceDescription
lots of extra fields that describe the device
DeviceConsumables - What devices use what consumables
ConsumableID
DeviceID - Forign key to device
ConsumableVariantID - Forign key to ConsumableVariant
ConsumableVariant - list of all the consumable variants there are
ConsumableVariantID
Type - 3 here indicates toner, what I'm interested in
ConsumableReadings
ReadingID - PK
ConsumableID - forign key to DeviceConsumables
ReadingDate - last time a reading was taken
ChangedDate - last time a new cartridge was inserted
MeterReadings
ReadingID - PK not to do with PK of consumablereadings
DeviceID
ScanDateTime - time usage scan was taken
TotalMono - total mono at scan time
TotalColour Total colour at scan time
Well you have to break you queries into nested queries ... Below query is not tested, so it may have some syntax problem, but it gives a way to find out what you are looking for ...
SELECT Devices.DeviceID,
Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID,
ConsumableVariants.Type,
ConsumableDescriptions.Description,
A.DeactivatedDate,
A.ActivatedDate,
A.ChangedDate,
CASE ConsumableVariants.ColourID
WHEN 1 THEN MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono)
ELSE MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour)
END AS PrintingDiff,
ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
FROM Devices
LEFT JOIN DeviceConsumables ON Devices.DeviceID = DeviceConsumables.DeviceID
LEFT JOIN ConsumableVariants ON DeviceConsumables.ConsumableVariantID = ConsumableVariants.ConsumableVariantID
LEFT JOIN ConsumableReadings ON DeviceConsumables.ConsumableID = ConsumableReadings.ConsumableID
LEFT JOIN ConsumableDescriptions ON ConsumableVariants.DescriptionID = ConsumableDescriptions.ConsumableDescriptionID
LEFT JOIN
(
SELECT D.DeviceID,
MAX(CR.ReadingDate) as DeactivatedDate,
MIN(CR.ReadingDate) AS ActivatedDate,
CR.ChangedDate
FROM Devices D
LEFT JOIN DeviceConsumables DC ON D.DeviceID = DC.DeviceID
LEFT JOIN ConsumableReadings CR ON DC.ConsumableID = CR.ConsumableID
WHERE D.DeviceID = '24'
GROUP BY D.DeviceID,
CR.ChangedDate
) AS A ON DeviceConsumables.DeviceID = A.DeviceID
LEFT JOIN MeterReadings ON A.DeviceID = MeterReadings.DeviceID
WHERE ConsumableVariants.Type = '3' -- To only get toner cartridges
AND Devices.DeviceID = '24'
AND MeterReadings.ScanDateTime > A.ActivatedDate
AND MeterReadings.ScanDateTime < A.DeactivatedDate
GROUP BY devices.DeviceID, Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID, ConsumableVariants.Type, ConsumableDescriptions.Description,
ConsumableReadings.ChangedDate, ConsumableVariants.ColourID, ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
ORDER BY Devices.DeviceID
First, I'd add to your output the ColourID so you know if you are reading the Mono or Colour values. Second, I believe if you remove the ConsumableID from your group by clause, it should work. ConsumableID rows have one date, and if you include that in your group by, you'll never be able to get a max and min, therefore the difference.
Your problem is in your join statement.
Change the following line:
LEFT JOIN ConsumableTypes ON ConsumableVariants.Type = ConsumableVariants.Type
To something like:
LEFT JOIN ConsumableTypes ON ConsumableVariants.Type = ConsumableTypes.Type
(or whatever table you are joining to).