SQL query hangs on join - sql

I've written an SQL query that produces a report of some stats for each Year-Week-Mine-Product.
It works exactly as desired except for one thing - trn.wid-date isn't the correct date to be using.
I should be using td.datetime-act-comp-dump. When I replace trn.wid-date with td.datetime-act-comp-dump, it doesn't give me any errors but seems to just hang indefinitely. I let it go for a while yesterday and it came back with ORA-01652 unable to extend temp segment by 128 in tablespace TEMP, though I haven't seen that error since.
I don't understand what might be causing that considering that I'm able to successfully return MAX(td.datetime-act-comp-dump) in the query below
select to_char(trn.wid_date, 'IYYY') as dump_year,
to_char(trn.wid_date-7/24, 'IW') as dump_week,
SUBSTR(trn.train_control_id,1,2) as Mine,
vcon.product_type_code as Product,
COUNT(DISTINCT trn.train_control_id) as Trains,
COUNT(1) as Wagons,
MIN(trn.wid_date) as Min_WID_Hrs,
MAX(trn.wid_date) as Max_WID_Hrs,
MIN(td.datetime_act_comp_dump) as Min_Fin_Dump,
MAX(td.datetime_act_comp_dump) as Max_Fin_Dump,
ROUND(SUM(con.weight_total-con.empty_weight_total),0) as Tot_Tonnes,
ROUND(AVG(con.weight_total-con.empty_weight_total),2) as Avg_Tonnes,
ROUND(MIN(con.weight_total-con.empty_weight_total),2) as Minimum,
ROUND(PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY (con.weight_total-con.empty_weight_total) DESC),2) as "1st",
from widsys.consist con
INNER JOIN widsys.train trn
USING (train_record_id)
INNER JOIN tpps.train_details td
ON trn.train_tpps_id||trn.mine_code = td.train_id||td.mine_code
INNER JOIN widsys.v_consist_ore_detail vcon
USING (consist_id)
where trn.direction = 'N'
and to_char(trn.wid_date, 'IYYY') = 2009
and to_char(trn.wid_date-7/24, 'IW') = 25
group by to_char(trn.wid_date, 'IYYY'),
to_char(trn.wid_date-7/24, 'IW'),
SUBSTR(trn.train_control_id,1,2),
vcon.product_type_code
order by to_char(trn.wid_date-7/24, 'IW') DESC
Just in order to troubleshoot, from the query above, I've tried removing everything to do with vcon and replacing trn.wid_date with td.datetime-act-comp-dump. The effect is that it only reports on Year-Week-Mine rather than Year-Week-Mine-Product. (see query below)
This new query actually executes rather than just hanging, but returns a few odd results and doesn't isn't sufficient since it doesn't break things down on Product.
select to_char(td.datetime_act_comp_dump, 'IYYY') as dump_year,
to_char(td.datetime_act_comp_dump-7/24, 'IW') as dump_week,
SUBSTR(trn.train_control_id,1,2) as Mine,
--vcon.product_type_code as Product,
COUNT(DISTINCT trn.train_control_id) as Trains,
COUNT(1) as Wagons,
MIN(trn.wid_date) as Min_WID_Hrs,
MAX(trn.wid_date) as Max_WID_Hrs,
MIN(td.datetime_act_comp_dump) as Min_Fin_Dump,
MAX(td.datetime_act_comp_dump) as Max_Fin_Dump,
ROUND(SUM(con.weight_total-con.empty_weight_total),0) as Tot_Tonnes,
ROUND(AVG(con.weight_total-con.empty_weight_total),2) as Avg_Tonnes,
ROUND(MIN(con.weight_total-con.empty_weight_total),2) as Minimum,
ROUND(PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY (con.weight_total-con.empty_weight_total) DESC),2) as "1st"
from widsys.consist con
INNER JOIN widsys.train trn
USING (train_record_id)
INNER JOIN tpps.train_details td
ON trn.train_tpps_id||trn.mine_code = td.train_id||td.mine_code
--INNER JOIN widsys.v_consist_ore_detail vcon
--USING (consist_id)
where trn.direction = 'N'
and to_char(td.datetime_act_comp_dump, 'IYYY') = 2009
and to_char(td.datetime_act_comp_dump-7/24, 'IW') = 25
group by to_char(td.datetime_act_comp_dump, 'IYYY'),
to_char(td.datetime_act_comp_dump-7/24, 'IW'),
SUBSTR(trn.train_control_id,1,2)
--vcon.product_type_code
order by to_char(td.datetime_act_comp_dump-7/24, 'IW') DESC
Any advice on what might be going wrong?
Cheers,
Tommy

The only thing that I can think of without more information is that the datetime_act_comp_dump column of train_details isn't indexed and wid_date is. This sounds like a pretty normal performance issue where something is not indexed or the train and train_details tables are dramatically different sizes and your join is blowing up.
I'm not sure which DB you are using, but you might want to figure out how to run the query execution plan profiler and see what the difference between the two execution plans are. I suspect that the answer is going to be something structural or maybe that the concatenation in the join statement is causing some DB-specific problems.

I managed to get it to run muuuuuuuch faster by creating a subquery for widsys tables and one for tpps tables. Then doing an implicit inner join on two columns instead of concatenating.
SELECT blah FROM (widsys subquery) w, (tpps subquery) t WHERE w.mine_code = t.mine_code and w.train_id = t.train_tpps_id

Related

Order of Joins SQL Server

I ran across a curious issue earlier today while writing a multiple JOIN query in MS SQL Server and I am hoping somebody may be able to help me to understand better. Typically I like to join tables in the order I pull from them in the SELECT statement (just seems easier and cleaner to me). The below statement works only because it is not in order of the SELECT statement and I don't know why. If I re-wrote it to join VW_ORDER_HEADER --> VW_ORDER_ITEM --> VW_PO_ITEM in that order it returns no results. I only discovered this works after designing it in Access and looking at the resulting SQL. I am not new to SQL Server but at first and second glance I can't see why the order of the join would make a difference... Thanks in advance
SELECT O.SALES_ORDER_NUMBER,
O.BILL_TO,
O.SHIP_TO,
O.SOLD_TO,
O.ORDER_DATE,
O.REQUESTED_DELIVERY_DATE,
O.VALID_TO_DATE,
O.OPEN_QUANTITY,
OI.MATERIAL,
PI.PO_NUMBER
FROM VW_PO_ITEM PI
JOIN VW_ORDER_ITEM OI ON PI.MATERIAL = OI.MATERIAL
JOIN VW_ORDER_HEADER O ON OI.SALES_ORDER_NUMBER = O.SALES_ORDER_NUMBER
WHERE OI.MATERIAL = 'BA7948'
AND O.OPEN_QUANTITY > 0
AND O.VALID_TO_DATE >= GETDATE()
ORDER BY O.SALES_ORDER_NUMBER,
O.ORDER_DATE ASC;

How can I optimize this SQL query? (Solarwinds Orion)

I'm very new to SQL, and still learning. I'm using a reporting tool called Solarwinds Orion, and I'm honestly not sure how specific the query I have written is to the program, so if there's anything in the query that's confusing, let me know and I'll try to figure out if it's specific to the program or not.
The problem with the query I'm running is that it times out after a very long time (maybe an hour) of running. The database I'm using is huge. Unfortunately I don't really know how huge, but I've been told it's huge.
Is there anything I am doing wrong that would have a huge performance impact?
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM
((NetflowApplicationSummary
INNER JOIN Nodes ON (NetflowApplicationSummary.NodeID = Nodes.NodeID))
INNER JOIN InterfaceTraffic ON (Nodes.NodeID = InterfaceTraffic.InterfaceID))
INNER JOIN Interfaces ON (Nodes.NodeID = Interfaces.NodeID)
WHERE
( InterfaceTraffic.DateTime > (GetDate()-30) )
AND
(Nodes.WANCircuit = 1)
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
EDIT: I ran COUNT() on each of my tables with the below result.
SELECT COUNT(*) FROM NetflowApplicationSummary # 50671011
SELECT COUNT(*) FROM Nodes # 898
SELECT COUNT(*) FROM InterfaceTraffic # 18000166
SELECT COUNT(*) FROM Interfaces # 3938
# Total : 68,676,013
I really have no idea if 68 million items is a huge database to be honest.
A couple of notes:
The INNER JOIN operator is associative, so get rid of those parenthesis in the FROM clause and let the optimizer figure out the best join order.
You may have an implied cursor from the getdate() function being called for every row. Store the value in a local variable and compare to that.
The resulting SQL should look like this:
DECLARE #Date as datetime = getdate() - 30;
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM NetflowApplicationSummary
INNER JOIN Nodes ON NetflowApplicationSummary.NodeID = Nodes.NodeID
INNER JOIN InterfaceTraffic ON Nodes.NodeID = InterfaceTraffic.InterfaceID
INNER JOIN Interfaces ON Nodes.NodeID = Interfaces.NodeID
WHERE InterfaceTraffic.DateTime > #Date
AND Nodes.WANCircuit = 1
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
Also, make sure you have an index on table InterfaceTraffic with a leading field of DateTime. If this doesn't exist you may need to pay the penalty of a first time creation of it.
If this doesn't help, then you may need to post the execution plan where it can be inspected.
Out of interest, also perform a count() on all four tables and post that result, just so members here can make their own assessment of how big your database really is. It is amazing how many non-technical people still think a 1 or 10 GB database is huge, while I run that easily on my workstation!

SQl Query get data very slow from different tables

I am writing a sql query to get data from different tables but it is getting data from different tables very slowly.
Approximately above 2 minutes to complete.
What i am doing is here :
1. I am getting data differences and on behalf of date difference i am getting account numbers
2. I am comparing tables to get exact data i need.
here is my query
select T.accountno,
MAX(T.datetxn) as MxDt,
datediff(MM,MAX(T.datetxn), '2011-6-30') as Diffs,
max(P.Name) as POName
from Account_skd A,
AccountTxn_skd T,
POName P
where A.AccountNo = T.AccountNo and
GPOCode = A.OfficeCode and
Code = A.POCode and
A.servicecode = T.ServiceCode
group by T.AccountNo
order by len(T.AccountNo) DESC
please help that how i can use joins or any other way to get data within very less time say 5-10 seconds.
Since it appears you are getting EVERY ACCOUNT, and performance is slow, I would try by creating a prequery by just account, then do a single join to the other join tables something like..
select
T.Accountno,
T.MxDt,
datediff(MM, T.MxDt, '2011-6-30') as Diffs,
P.Name as POName
from
( select T1.AccountNo,
Max( T1.DateTxn ) MxDt
from AccontTxn_skd T1
group by T1.AccountNo ) T
JOIN Account_skd A
on T.AccountNo = A.AccountNo
JOIN POName P
on A.POCode = P.Code <-- GUESSING as you didn't qualify alias.field
AND A.OfficeCode = P.GPOCode <-- in your query for these two fields
order by
len(T.AccountNo) DESC
You had other elements based on the T.ServiceCode matching, but since you are only grouping on the account number anyhow, did it matter which service code was used? Otherwise, you would need to group by both the account AND service code (which I would have added the service code into the prequery and added as join condition to the account table too).

MySQL to PostgreSQL: GROUP BY issues

So I decided to try out PostgreSQL instead of MySQL but I am having some slight conversion problems. This was a query of mine that samples data from four tables and spit them out all in on result.
I am at a loss of how to convey this in PostgreSQL and specifically in Django but I am leaving that for another quesiton so bonus points if you can Django-fy it but no worries if you just pure SQL it.
SELECT links.id, links.created, links.url, links.title, user.username, category.title, SUM(votes.karma_delta) AS karma, SUM(IF(votes.user_id = 1, votes.karma_delta, 0)) AS user_vote
FROM links
LEFT OUTER JOIN `users` `user` ON (`links`.`user_id`=`user`.`id`)
LEFT OUTER JOIN `categories` `category` ON (`links`.`category_id`=`category`.`id`)
LEFT OUTER JOIN `votes` `votes` ON (`votes`.`link_id`=`links`.`id`)
WHERE (links.id = votes.link_id)
GROUP BY votes.link_id
ORDER BY (SUM(votes.karma_delta) - 1) / POW((TIMESTAMPDIFF(HOUR, links.created, NOW()) + 2), 1.5) DESC
LIMIT 20
The IF in the select was where my first troubles began. Seems it's an IF true/false THEN stuff ELSE other stuff END IF yet I can't get the syntax right. I tried to use Navicat's SQL builder but it constantly wanted me to place everything I had selected into the GROUP BY and that I think it all kinds of wrong.
What I am looking for in summary is to make this MySQL query work in PostreSQL. Thank you.
Current Progress
Just want to thank everybody for their help. This is what I have so far:
SELECT links_link.id, links_link.created, links_link.url, links_link.title, links_category.title, SUM(links_vote.karma_delta) AS karma, SUM(CASE WHEN links_vote.user_id = 1 THEN links_vote.karma_delta ELSE 0 END) AS user_vote
FROM links_link
LEFT OUTER JOIN auth_user ON (links_link.user_id = auth_user.id)
LEFT OUTER JOIN links_category ON (links_link.category_id = links_category.id)
LEFT OUTER JOIN links_vote ON (links_vote.link_id = links_link.id)
WHERE (links_link.id = links_vote.link_id)
GROUP BY links_link.id, links_link.created, links_link.url, links_link.title, links_category.title
ORDER BY links_link.created DESC
LIMIT 20
I had to make some table name changes and I am still working on my ORDER BY so till then we're just gonna cop out. Thanks again!
Have a look at this link GROUP BY
When GROUP BY is present, it is not
valid for the SELECT list expressions
to refer to ungrouped columns except
within aggregate functions, since
there would be more than one possible
value to return for an ungrouped
column.
You need to include all the select columns in the group by that are not part of the aggregate functions.
A few things:
Drop the backticks
Use a CASE statement instead of IF() CASE WHEN votes.use_id = 1 THEN votes.karma_delta ELSE 0 END
Change your timestampdiff to DATE_TRUNC('hour', now()) - DATE_TRUNC('hour', links.created) (you will need to then count the number of hours in the resulting interval. It would be much easier to compare timestamps)
Fix your GROUP BY and ORDER BY
Try to replace the IF with a case;
SUM(CASE WHEN votes.user_id = 1 THEN votes.karma_delta ELSE 0 END)
You also have to explicitly name every column or calculated column you use in the GROUP BY clause.

Having difficulty combining JET SQL queries

Warning: Here be beginner SQL! Be gentle...
I have two queries that independently give me what I want from the relevant tables in a reasonably timely fashion, but when I try to combine the two in a (fugly) union, things quickly fall to bits and the query either gives me duplicate records, takes an inordinately long time to run, or refuses to run at all quoting various syntax errors at me.
Note: I had to create a 'dummy' table (tblAllDates) with a single field containing dates from 1 Jan 2008 as I need the query to return a single record from each day, and there are days in both tables that have no data. This is the only way I could figure to do this, no doubt there is a smarter way...
Here are the queries:
SELECT tblAllDates.date, SUM(tblvolumedata.STT)
FROM tblvolumedata RIGHT JOIN tblAllDates ON tblvolumedata.date=tblAllDates.date
GROUP BY tblAllDates.date;
SELECT tblAllDates.date, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN tblAllDates ON tblTimesheetData.date=tblAllDates.date
GROUP BY tblAllDates.date;
The best result I have managed is the following:
SELECT tblAllDates.date, 0 AS STT, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN tblAllDates ON tblTimesheetData.date=tblAllDates.date
GROUP BY tblAllDates.date
UNION SELECT tblAllDates.date, SUM(tblvolumedata.STT) AS STT, 0 AS VA
FROM tblvolumedata RIGHT JOIN tblAllDates ON tblvolumedata.date=tblAllDates.date
GROUP BY tblAllDates.date;
This gives me the VA and STT data I want, but in two records where I have data from both in a single day, like this:
date STT VA
28/07/2008 0 54020
28/07/2008 33812 0
29/07/2008 0 53890
29/07/2008 33289 0
30/07/2008 0 51780
30/07/2008 30456 0
31/07/2008 0 52790
31/07/2008 31305 0
What I'm after is the STT and VA data in single row per day. How might this be achieved, and how far am I away from a query that could be considered optimal? (don't laugh, I only seek to learn!)
You could put all of that into one query like so
SELECT
dates.date,
SUM(volume.STT) AS STT,
SUM(NZ(timesheet.batching)+NZ(timesheet.categorisation)+NZ(timesheet.CDT)+NZ(timesheet.CSI)+NZ(timesheet.destruction)+NZ(timesheet.extraction)+NZ(timesheet.indexing)+NZ(timesheet.mail)+NZ(timesheet.newlodgement)+NZ(timesheet.recordedDeliveries)+NZ(timesheet.retrieval)+NZ(timesheet.scanning)) AS VA
FROM
tblAllDates dates
LEFT JOIN tblvolumedata volume
ON dates.date = volume.date
LEFT JOIN tblTimesheetData timesheet
ON
dates.date timesheet.date
GROUP BY dates.date;
I've put the dates table first in the FROM clause and then LEFT JOINed the two other tables.
The jet database can be funny with more than one join in a query, so you may need to wrap one of the joins in parentheses (I believe this is referred to as Bill's SQL!) - I would recommend LEFT JOINing the tables in the query builder and then taking the SQL code view and modifying that to add in the SUMs, GROUP BY, etc.
EDIT:
Ensure that the date field in each table is indexed as you're joining each table on this field.
EDIT 2:
How about this -
SELECT date,
Sum(STT),
Sum(VA)
FROM
(SELECT dates.date, 0 AS STT, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN dates ON tblTimesheetData.date=dates.date
GROUP BY dates.date
UNION SELECT dates.date, SUM(tblvolumedata.STT) AS STT, 0 AS VA
FROM tblvolumedata RIGHT JOIN dates ON tblvolumedata.date=dates.date
GROUP BY dates.date
)
GROUP BY date;
Interestingly, When I ran my first statement against some test data, the figures for STT and VA had all been multiplied by 4, compared to the second statement. Very strange behaviour and certainly not what I expected.
The table of dates is the best way.
Combine the joins in there FROM clause. Something like this....
SELECT d.date,
a.value,
b.value
FROM tableOfDates d
RIGHT JOIN firstTable a
ON d.date = a.date
RIGHT JOIN secondTable b
ON d.date = b.date
Turn the SQL into views and join them on the dates.