We started investigation on our database as it is the less scalable component in our infrastructure.
I checked the table pg_stat_statements of our Postgresql database with the following query:
SELECT userid, calls, total_time, rows, 100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent, query
FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
Everytime, the same query is first in the list:
16386 | 21564 | 4077324.749363 | 1423094 | 99.9960264252721535 |
SELECT DISTINCT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
"auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
"auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id")
LEFT OUTER JOIN "auth_group" ON ("auth_user_groups"."group_id" = "auth_group"."id")
LEFT OUTER JOIN "auth_group_permissions" ON ("auth_group"."id" = "auth_group_permissions"."group_id")
LEFT OUTER JOIN "auth_user_user_permissions" ON ("auth_user"."id" = "auth_user_user_permissions"."user_id")
WHERE ("auth_group_permissions"."permission_id" = $1 OR "auth_user_user_permissions"."permission_id" = $2)
This sounds like a permission check and as I understand, it is cached at request level.
I wonder if someone did a package to cache them into memcached for instance, or found a solution to reduce the amount of requests done to check those permissions?
I checked all indices and they seem correct. The request is a bit slow mostly because we have a lot of permissions but still, the amount of calls is crazy.
Related
one of the tab is broken in the Application (when hit that tab it keeps on rotating and calling the particular service and getting 404 error after sometime). we created indexes for few columns in Application DB to see if that improves performance. Creating indexes didn't make much difference.
so We planned to rewrite the sql query to improve the performance and fix it.
select distinct
editedUser.*
from
users editedUser,
relationship_ref editedUserRelationship,
users approvingUser,
user_role approvingUserRoles,
role_permission approvingUserRolePermissions,
account approvingUserAccount
where
approvingUser.user_id = 175263
and approvingUser.user_id = approvingUserRoles.user_id
and approvingUserRoles.role_id = approvingUserRolePermissions.role_id
and approvingUserRoles.user_role_status_id = 2
and editedUserRelationship.relationship_id =
editedUser.submitted_relationship_id
and (approvingUser.account_id = approvingUserAccount.account_id
or approvingUser.account_id is null)
and editedUser.review_status = 'R'
and approvingUserRolePermissions.permission_id =
editedUserRelationship.view_pending_permission_id;
It is taking nearly 6 mins.So can one please suggest how to use the proper joins in this query. It has 36 columns and 30,000 records.
Rearranged in a style that is easier to read (to me at least), I get this:
select distinct ed.*
from users ap
join user_role ro
on ro.user_id = ap.user_id
join account ac
on ac.account_id = ap.account_id or ap.account_id is null
join role_permission rp
on rp.role_id = ro.role_id
join relationship_ref re
on re.view_pending_permission_id = rp.permission_id
join users ed
on ed.submitted_relationship_id = re.relationship_id
where au.user_id = 175263
and ro.user_role_status_id = 2
and ed.review_status = 'R'
How many approving users have null account_id? For any of those, you retrieve all accounts in the system. Is that actually the business requirement? I'm not sure that it makes any sense. The query does not make any further use of that table, so perhaps you can remove the account join entirely.
I have some troubles with this query (view).
I'm using SQL Server 2012.
With few Record the query is fast but after that I add only 1000 records (Links) it becomes very slow (over 23 seconds)
I have to take a random Link for every host in database,so I used Row_Number and partition by.
Links table has 1000 records and host table has 10 record but the query is so slow
Any advice for increase performance?
UPDATE I need to get foreach Host a Random Link (could be 1 or 2 or 3 depends on Host.numLinksPerWork)
;WITH MyCte As
(
SELECT DISTINCT link.namUrl, host.uidHost, host.namHost AS Hostname,
[user].uidUser, usrProfile.UserName AS Username, host.numLinksPerWork,
referer.namUrl AS refererLink,host.Min, host.Max,ROW_NUMBER()
OVER (PARTITION BY host.numLinksPerWork, host.uidHost ORDER BY newid()) AS Number
FROM Links link JOIN
Users[user] ON [user].uidUser = link.codUser JOIN
Profile usrProfile ON usrProfile.UserId = [user].uidUser JOIN
Hosts host ON host.uidHost = link.codHost JOIN
Referers referer ON referer.codHost = host.uidHost JOIN
Referers referer2 ON referer.codUser = [user].uidUser
WHERE [user].flgBanned = 0
)
SELECT MyCte.uidHost, MyCte.uidUser, MyCte.namUrl, MyCte.refererLink,
MyCte.Hostname, MyCte.Username, MyCte.Min, MyCte.Max FROM MyCte
WHERE MyCte.Number <= MyCte.numLinksPerWork!
Similar Diagram
I Fixed the problem... The double join on the same table which is wrong OFC ,It slowed down the query!
Referers referer ON referer.codHost = host.uidHost JOIN
Referers referer2 ON referer.codUser = [user].uidUser
removing useless Join(s) it's fast!
I'm very new to SQL, and still learning. I'm using a reporting tool called Solarwinds Orion, and I'm honestly not sure how specific the query I have written is to the program, so if there's anything in the query that's confusing, let me know and I'll try to figure out if it's specific to the program or not.
The problem with the query I'm running is that it times out after a very long time (maybe an hour) of running. The database I'm using is huge. Unfortunately I don't really know how huge, but I've been told it's huge.
Is there anything I am doing wrong that would have a huge performance impact?
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM
((NetflowApplicationSummary
INNER JOIN Nodes ON (NetflowApplicationSummary.NodeID = Nodes.NodeID))
INNER JOIN InterfaceTraffic ON (Nodes.NodeID = InterfaceTraffic.InterfaceID))
INNER JOIN Interfaces ON (Nodes.NodeID = Interfaces.NodeID)
WHERE
( InterfaceTraffic.DateTime > (GetDate()-30) )
AND
(Nodes.WANCircuit = 1)
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
EDIT: I ran COUNT() on each of my tables with the below result.
SELECT COUNT(*) FROM NetflowApplicationSummary # 50671011
SELECT COUNT(*) FROM Nodes # 898
SELECT COUNT(*) FROM InterfaceTraffic # 18000166
SELECT COUNT(*) FROM Interfaces # 3938
# Total : 68,676,013
I really have no idea if 68 million items is a huge database to be honest.
A couple of notes:
The INNER JOIN operator is associative, so get rid of those parenthesis in the FROM clause and let the optimizer figure out the best join order.
You may have an implied cursor from the getdate() function being called for every row. Store the value in a local variable and compare to that.
The resulting SQL should look like this:
DECLARE #Date as datetime = getdate() - 30;
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM NetflowApplicationSummary
INNER JOIN Nodes ON NetflowApplicationSummary.NodeID = Nodes.NodeID
INNER JOIN InterfaceTraffic ON Nodes.NodeID = InterfaceTraffic.InterfaceID
INNER JOIN Interfaces ON Nodes.NodeID = Interfaces.NodeID
WHERE InterfaceTraffic.DateTime > #Date
AND Nodes.WANCircuit = 1
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
Also, make sure you have an index on table InterfaceTraffic with a leading field of DateTime. If this doesn't exist you may need to pay the penalty of a first time creation of it.
If this doesn't help, then you may need to post the execution plan where it can be inspected.
Out of interest, also perform a count() on all four tables and post that result, just so members here can make their own assessment of how big your database really is. It is amazing how many non-technical people still think a 1 or 10 GB database is huge, while I run that easily on my workstation!
I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)
The following SQL query returns no data for the LEFT JOIN in MS Access.
SELECT * FROM
(
SELECT Operation_Part.PPC,
Operation_Part.TargetOperationsPerHour as JPH,
Operation_Part.Misc1 as [JPh Alt 1],
STR(Operation_Part.SeqNr) as Sequence,
Operation_Part.idPart,
Operation_Part.idOperationPart,
Operation.OperationType as Operation,
tblOperationType.OperationType as [Operation Type]
FROM tblOperationType
RIGHT JOIN (Operation INNER JOIN Operation_Part ON Operation.idOperation = Operation_Part.idOperation)
ON tblOperationType.idOpType = Operation.OperationTID
WHERE Operation_Part.VsbLDq = 0
AND Operation_Part.idPart <> 0 AND Operation_Part.idPart = 1271)
AS [AA]
LEFT JOIN (SELECT Sum([Cptotal]) AS DownTime,
TransactionDetail.idPart,
STR(TransactionDetail.seq_number) as Sequence
FROM ([Transaction] INNER JOIN TransactionDetail ON [Transaction].idTransaction = TransactionDetail.idTransaction)
WHERE [Transaction].idTransactionType=29
AND TransactionDetail.WorkOrderNumber = 'PR23144'
GROUP BY TransactionDetail.idPart, STR(TransactionDetail.seq_number))
AS [EE]
ON AA.idPart = EE.idPart AND EE.Sequence=AA.Sequence
In SQL Server the query does return the downtime value of 1.08 as required (see pics below).
First select returns:
Second select returns:
MS Access result:
SQL server result:
How do I make it work in MS Access?
This is only a guess, but it may well have something to do with the nulls in the applicable columns of the rows you dont really want.
Suggest you change
SELECT Sum([Cptotal]) AS DownTime,
to
SELECT Sum(IIf(IsNull([CpTotal]), 0, [CpTotal])) AS DownTime
In Access I always use CStr(...) instead of Str(...)
Aside from this, painful though it may be, I'd suggest turning the left-joined component into a separate query, or if you dont use queries, building a temporary table with this data which is then left joined into the original query.