Access SQL query without duplicate results - sql

I made a query and wanted to not have any duplicates but i got some times 3 duplicates and when i used DISTINCT or DISTINCTROW i got only 2 duplicates.
SELECT f.flight_code,
f.status,
a.airport_name,
a1.airport_name,
f.departing_date+f.departing_time AS SupposedDepartingTime,
f.landing_date+f.landing_time AS SupposedLandingTime,
de.actual_takeoff_date+de.actual_takeoff_time AS ActualDepartingTime,
SupposedLandingTime+(ActualDepartingTime-SupposedDepartingTime) AS ActualLandingTime
FROM
(((Flights AS f
LEFT JOIN Aireports AS a
ON a.airport_code = f.depart_ap)
LEFT JOIN Aireports AS a1
ON f.target_ap = a1.airport_code)
LEFT JOIN Irregular_Events AS ie
ON f.flight_code = ie.flight_code)
LEFT JOIN Delay_Event AS de
ON ie.IE_code = de.delay_code;
had to use LEFT JOIN because when i used INNER JOIN i missed some of the things i wanted to show because i wanted to see all the flights and not only the flights that got delayed or canceled.
This is the results when i used INNER JOIN, you can see only the flights that have the status "ביטול" or "עיכוב" and that is not what i wanted.
[the results with LEFT JOIN][2]
[2]: https://i.stack.imgur.com/cgE2G.png
and when i used DISTINCT where you see the rows with the NUMBER 6 on the first column it appear only two times
IMPORTANT!
I just checked my query and all the tables i use there and i saw my problem but dont know how to fix it!
in the table Irregular_Events i have more the one event for flights 3,6 and 8 and that is why when i use LEFT JOIN i see more even thou i use distinct, please give me some help!

Not entirely sure without seeing the table structure, but this might work:
SELECT f.flight_code,
f.status,
a.airport_name,
a1.airport_name,
f.departing_date+f.departing_time AS SupposedDepartingTime,
f.landing_date+f.landing_time AS SupposedLandingTime,
de.actual_takeoff_date+de.actual_takeoff_time AS ActualDepartingTime,
SupposedLandingTime+(ActualDepartingTime-SupposedDepartingTime) AS ActualLandingTime
FROM
((Flights AS f
LEFT JOIN Aireports AS a
ON a.airport_code = f.depart_ap)
LEFT JOIN Aireports AS a1
ON f.target_ap = a1.airport_code)
LEFT JOIN
(
SELECT
ie.flight_code,
de1.actual_takeoff_date,
de1.actual_takeoff_time
FROM
Irregular_Events ie
INNER JOIN Event AS de1
ON ie.IE_code = de1.delay_code
) AS de
ON f.flight_code = de.flight_code

It is hard to tell what is the problem with your query without any sample of the output, and without any description of the structure of your tables.
But your problem is that your are querying from the flights table, which [I assume] can be linked to multiple irregular_events, which can possibly also be linked to multiple delay_event.
If you want to get only one row per flight, you need to make sure your joins return only one row too. Maybe you can do it by adding one more condition to the join, or by adding a condition in a sub-query.
EDIT
You could try to add a GROUP BY to the query:
GROUP BY
f.flight_code,
f.status,
a.airport_name,
a1.airport_name;

Related

Use group by with sum in query

These 3 tables that you see in the image are related
Course table and coaching table and sales table
I want to make a report from this table on how much each coach has sold by each course period.
The query I created is as follows, but unfortunately it has a problem and I do not know where the problem is.
Please help me fix the problem
Thank you
SELECT
dbo.tblCustomersOrders.id, dbo.tblCustomersOrders.pid, dbo.tblPost.postTitle,
dbo.tblArticleAuthor.authorName, SUM(dbo.tblCustomersOrders.prodPrice) AS TotalBuys
FROM
dbo.tblPost
INNER JOIN
dbo.tblArticleAuthor ON dbo.tblPost.id = dbo.tblArticleAuthor.articleID
INNER JOIN
dbo.tblCustomersOrders ON dbo.tblPost.id = dbo.tblCustomersOrders.pid
GROUP BY dbo.tblCustomersOrders.pid
For this use, SUM() is an Aggregate Function, so you need to refer all the
fields that you want to get in your result set.
Example:
SELECT
dbo.tblCustomersOrders.id, dbo.tblCustomersOrders.pid, dbo.tblPost.postTitle,
dbo.tblArticleAuthor.authorName, SUM(dbo.tblCustomersOrders.prodPrice) AS TotalBuys
FROM dbo.tblPost
INNER JOIN
dbo.tblArticleAuthor ON dbo.tblPost.id = dbo.tblArticleAuthor.articleID
INNER JOIN
dbo.tblCustomersOrders ON dbo.tblPost.id = dbo.tblCustomersOrders.pid
GROUP BY dbo.tblCustomersOrders.id, dbo.tblCustomersOrders.pid,
dbo.tblPost.postTitle, dbo.tblArticleAuthor.authorName
But this query does not solve the need for your report.
If you just need to get "how much each coach has sold by each course" , you can try the query bellow.
SELECT
dbo.tblArticleAuthor.authorName, dbo.tblPost.postTitle,
SUM(dbo.tblCustomersOrders.prodPrice) AS TotalBuys
FROM dbo.tblPost
INNER JOIN
dbo.tblArticleAuthor ON dbo.tblPost.id = dbo.tblArticleAuthor.articleID
INNER JOIN
dbo.tblCustomersOrders ON dbo.tblPost.id = dbo.tblCustomersOrders.pid
GROUP BY dbo.tblArticleAuthor.authorName, dbo.tblPost.postTitle
If you need, send more details regarding the desired result.
Here you can find more information about SQL SERVER Aggregate Functions:
https://learn.microsoft.com/en-us/sql/t-sql/functions/aggregate-functions-transact-sql?view=sql-server-ver15
And here a quick example regarding SQL Aliases to build queries with a simple
and effective way:
https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_alias_table
Per your description of the task, the problem is that you only GROUPed BY dbo.tblCustomersOrders.pid, which is the period's id I guess, but you also need to GROUP BY the coach, which is dbo.tblArticleAuthor.authorName, I guess again. Plus in the SELECT field list you can not use more columns only that are aggregated + GROUPed.

How to only select an attribute one time

This code shows me the results based on two tables
SELECT h.Destino, g.Fuente
FROM Hist_LDS h, gTronco g
WHERE h.Fecha='2020-10-28'
AND g.Tronco = h.Tronco
The Destinos are 19 and then it repeat, i need to only apeers one time, and count the time od appereance to. I need to create a new table when insted of the information in the variable Tronco I put the Destino and Fuente
What you are doing is a CROSS JOIN.
And what you probably want is a simple INNER JOIN. Not very clear from your question, also check LEFT/RIGHT JOIN.
SELECT h.Destino, g.Fuente
FROM Hist_LDS h
INNER JOIN gTronco g ON g.Tronco = h.Tronco
WHERE h.Fecha='2020-10-28'

COUNT is outputting more than one row

I am having a problem with my SQL query using the count function.
When I don't have an inner join, it counts 55 rows. When I add the inner join into my query, it adds a lot to it. It suddenly became 102 rows.
Here is my SQL Query:
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'
GROUP BY [fmsStage].[dbo].[File].[FILENUMBER]
Also, I have to do TOP 1 at the SELECT statement because it returns 51 rows with random numbers inside of them. (They are probably not random, but I can't figure out what they are.)
What do I have to do to make it just count the rows from [fmsStage].[dbo].[file].[FILENUMBER]?
First, your query would be much clearer like this:
SELECT COUNT(f.[FILENUMBER])
FROM [fmsStage].[dbo].[File] f INNER JOIN
[fmsStage].[dbo].[Container] c
ON v.[FILENUMBER] = c.[FILENUMBER]
WHERE f.[RELATIONCODE] = 'SHIP02' AND
c.DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08';
No GROUP BY is necessary. Otherwise you'll just one row per file number, which doesn't seem as useful as the overall count.
Note: You might want COUNT(DISTINCT f.[FILENUMBER]). Your question doesn't provide enough information to make a judgement.
Just remove GROUP BY Clause
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

Select DISTINCT values with 3 Inner joins

I have a few tables that I need to link together to get a specific value, here is my current query
SELECT DISTINCT bu.Email,iv.Code,ex.ExNumber
FROM Invoices AS iv
INNER JOIN Clients AS cs
ON iv.Code = cs.Code
INNER JOIN BusinessUser_ExNumbers AS ex
ON cs.ExNumber = ex.ExNumber
INNER JOIN BusinessUsers AS bu
ON ex.Userid = bu.Id
WHERE iv.BatchId = '74b43669-c80f-4b44-999c-a1dfe5695844' // Test value
Basically the links to my invoices are held in the invoice table, and what I need to do is send a link to the user who is specified as being the recipient of the invoice. The code works but it's not as 'distinct' as I would like. I dont want any rows with the same email and exnumber as any other row. Unfortunately if there is more than 1 iv.Code that ends up linking to a specific user I get multiple rows. I understand why this is but I can't think of a way to make that not happen.
Thanks!
EDIT: Sorry that was really stupid of me. I didn't need to select the iv.Code and that's what was causing the issue. That's probably why it didn't make much sense. Anyway since I posted the question I have tried Martin Smiths answer below and it works, so that is the answer to the question. If you don't need the iv.Code it can just be removed.
The below will return the MAX code in the event that there are multiple for a particular Email, Number combination.
SELECT bu.Email,
MAX(iv.Code) AS Code,
ex.ExNumber
FROM Invoices AS iv
INNER JOIN Clients AS cs
ON iv.Code = cs.Code
INNER JOIN BusinessUser_ExNumbers AS ex
ON cs.ExNumber = ex.ExNumber
INNER JOIN BusinessUsers AS bu
ON ex.Userid = bu.Id
WHERE iv.BatchId = '74b43669-c80f-4b44-999c-a1dfe5695844'
GROUP BY bu.Email,
ex.ExNumber