Query time for a specific entity is 10000 times higher [closed] - sql

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 9 months ago.
Improve this question
We run into a problem: select for a filter by a certain id takes a very long time. For all id about 5ms, for this - 10 seconds.
This is explain. Left - normal, right - long. This is absolutely the same sql query, where the difference is only in one digit 'where id = ...'
this
It is striking that a filter is used on the right, but for some reason it is not on the left, as well as some huge number of 'rows removed'. Such a number can only be obtained by multiplying the number of rows in the joined tables. Once again I repeat that the sql query is absolutely the same except for the entity id, the number of retrieved data for entities is comparable.
One of the tables also uses btre index. The only thing that this id has is special - it comes after the numbering break, 22,23,24,30 for example. But I was not able to reproduce the problem on this principle.
Unfortunately, I cannot show the code, but I hope that this information will be enough to advise something.
upd:
I found the reason. Postgres for some reason expects that one of the tables will return only 1 structure, when as a real return in 10k+ and therefore chooses the wrong algorithm. For other entity ids, it "thinks" correctly and chooses higher algorithms. Can you find how posgres counts plan lines? What could be the problem?

If I understand correctly, your problem is data histogram. We cannot support you because you cannot provide example code. Briefly, one of your table has a data whose id columns has heterogenous data in it. For example; your table has 1 billion records and in that table each id has 500 records. Yet, some of the id' s (virtually, let say) 20 or 200 millions records. So, if you search for these highly non-selective rows the database optimizer will not help you.
Check your data histogram!

Related

Improve performance of deducting values of same table in SQL

for a metering project I use a simple SQL table in the following format
ID
Timestamp: dat_Time
Metervalue: int_Counts
Meterpoint: fk_MetPoint
While this works nicely in general I have not found an efficient solution for one specific problem: There is one Meterpoint which is a submeter of another Meterpoint. I'd be interested in the Delta of those two Meterpoints to get the remaining consumption. As the registration of counts is done by one device I get datapoints for the various Meterpoints at the same Timestamp.
I think I found a solution applying a subquery which appears to be not very efficient.
SELECT
A.dat_Time,
(A.int_Counts- (SELECT B.int_Counts FROM tbl_Metering AS B WHERE B.fk_MetPoint=2 AND B.dat_Time=A.dat_Time)) AS Delta
FROM tbl_Metering AS A
WHERE fk_MetPoint=1
How could I improve this query?
Thanks in advance
You can try using a window function instead:
SELECT m.dat_Time,
(m.int_counts - m.int_counts_2) as delta
FROM (SELECT m.*,
MAX(CASE WHEN fk.MetPoint = 2 THEN int_counts END) OVER (PARTITION BY dat_time) as int_counts_2
FROM tbl_Metering m
) m
WHERE fk_MetPoint = 1
From a query point of view, you should as a minimum change to a set-based approach instead of an inline sub-query for each row, using a group by as a minimum but it is a good candidate for a windowing query, just as suggested by the "Great" Gordon Linoff
However if this is a metering project, then we are going to expect a high volume of records, if not now, certainly over time.
I would recommend you look into altering the input such that delta is stored as it's own first class column, this moves much of the performance hit to the write process which presumably will only ever occur once for each record, where as your select will be executed many times.
This can be performed using an INSTEAD OF trigger or you could write it into the business logic, in a recent IoT project we computed or stored these additional properties with each inserted reading to greatly simplify many types of aggregate and analysis queries:
Id of the Previous sequential reading
Timestamp of the Previous sequential reading
Value Delta
Time Delta
Number of readings between this and the previous reading
The last one sounds close to your scenario, we were deliberately batching multiple sequential readings into a single record.
You could also process the received data into a separate table that includes this level of aggregation information, so as not to pollute the raw feed and to allow you to re-process it on demand.
You could redirect your analysis queries to this second table, which is now effectively a data warehouse of sorts.

tableUnavailable dependent upon size of search

I'm experiencing something rather strange with some queries that I'm performing in BigQuery.
Firstly, I'm using an externally backed table (csv.gz) with about 35 columns. The total data in the location is around 5Gb, with an average file size of 350mb. The reason I'm doing this, is that I continually add data and remove to the table on a rolling basis - to give me a view of the last 7 days of our activity.
When querying, if perform something simple like:
select * from X limit 10
everything works fine. It continues to work fine if you increase the limit up to 1 million rows. As soon as you up the limit to ten million:
select * from X limit 10000000
I end up with a tableUnavailable error "Something went wrong with the table you queried. Contact the table owner for assistance. (error code: tableUnavailable)"
Now according to to any literature on this, this usually results from using some externally owned table (I'm not). I can't find any other enlightening information for this error code.
Basically, If I do anything slightly complex on the data, I get the same result. There's a column called event that has maybe a couple hundred of different values in the entire dataset. If I perform the following:
select eventType, count(1) from X group by eventType
I get the same error.
I'm getting the feeling that this might be related to limits on external tables? Can anybody clarify or shed any light on this?
Thanks in advance!
Doug

Limit result nested and get total number rows Laravel 4

i have some trouble limiting the result on a relation and get the original number of the rows. Let's go with my secenario:
I have posts, content of the post and comment. i want select all post and limit to 5 my comments, but i need to know how many comment have that post.
$post = Post::with(array('contentPost','commentPost' => function($query){
$query->take(5);
}))->where('wall_id','=',$team_info->id)->get();
with this relation i limited the comments to 5 it's right! but if i want count all the comments doing
$post->commentPost->count();
it show me just 5 comments because i limited it. How can i get the real number of comments even if i limited them?
In your $post->commentPost->count() call, you're asking for the count() of the results associated with $post. Naturally, this will always be the actual number of rows provided by your query parameters.
"Do it all in one place" is rarely the answer in development. Even if you find a way to make it happen, what happens in the future when you need to change this query? You end up with fragile code, prone to breakage.
Keep your relation as-is, with the 5-post limit. Run a separate query for the count.

Access & SQL Server: Number of uses since date aggregate problem - new reporting problem (solved aggregate issue)

BACKGROUND:
I've been trying to streamline the work involved in running a report in my program. Lately, I've had to supply a listing of job numbers an instrument has been used on with the listing of items for cost/benefit analysis. Mostly to see how often an instrument is used since it was last serviced/calibrated and the last time anyone did use it. I was looking to integrate this into the query that helps generate the report - but I keep hitting a brick wall of sorts with the number of uses - since I want that aggregate to be based on the date the instrument was last calibrated (a field based in the same query). I can get it to give me the number of uses in the system total - but it will not accept the limitation that I want it to be only counting the times used since the last time it was calibrated
PROBLEM:
Attempts to put an aggregate function in my report for the number of uses since the item's calibration are met either with undesired results, or the dreaded 'aggregate missing' error (don't remember the exact warning).
-- Edited to add 8/12/2011 # 16:09 --
An additional problem with the use of the Max aggregate has been found for instruments that have never been used being excluded by this query.
DETAILS:
Here is the query that does work so far:
SELECT
dbo_tblPOGaugeDetail.intGagePOID,
dbo_tblPOGaugeDetail.strGageDetailID,
dbo_Gage_Master.Description,
dbo_Gage_Master.Manufacturer,
dbo_Gage_Master.Model_No,
dbo_Gage_Master.Gage_SN,
dbo_Gage_Master.Unit_of_Meas,
dbo_Gage_Master.User_Defined,
dbo_Gage_Master.Calibration_Frequency,
dbo_Gage_Master.Calibration_Frequency_UOM,
dbo_tblPOGaugeDetail.bolGageLeavePriceBlank,
dbo_tblPOGaugeDetail.intGageCost,
dbo_Gage_Master.Last_Calibration_Date,
dbo_Gage_Master.Next_Due_Date,
dbo_tblPOGaugeDetail.bolGageEvaluate,
dbo_tblPOGaugeDetail.bolGageExpedite,
dbo_tblPOGaugeDetail.bolGageAccredited,
dbo_tblPOGaugeDetail.bolGageCalibrate,
dbo_tblPOGaugeDetail.bolGageRepair,
dbo_tblPOGaugeDetail.bolGageReturned,
dbo_tblPOGaugeDetail.bolGageBER,
dbo_tblPOGaugeDetail.intTurnaroundDaysOut,
qryRCEquipmentLastUse.MaxOfdatDateEntered
FROM (dbo_tblPOGaugeDetail
INNER JOIN dbo_Gage_Master ON dbo_tblPOGaugeDetail.strGageDetailID = dbo_Gage_Master.Gage_ID)
INNER JOIN qryRCEquipmentLastUse ON dbo_Gage_Master.Gage_ID = qryRCEquipmentLastUse.Gage_ID
ORDER BY dbo_tblPOGaugeDetail.strGageDetailID;
But I can't seem to aggregate a count of Uses (making a Count(strCustomerJobNum)) from the tblGageActivity with the following fields:
strGageID
strCustomerJobNum
datDateEntered
datTimeEntered
I tried to add a field to the formerly listed query to do a Count(strCustomerJobNum) where datDateEntered matched the Last_Calibration_Date from the calling query - but I got the 'missing aggregate' error. If I leave this condition out - it will run - but will list every instrument ever sent out only if it's had a usage count of at least one (not what I want at all, sadly).
I also want to make sure that if I should get a zero uses count - I will get a zero back instead of my expected records minus the null results.
I hope someone out there can tell me where I am going wrong with this - I want to save the time I am currently spending running an activity report in another program whenever I want to generate this report. Thanks in advance, and let me know if you need me to post more information.
-- Edited to add 08/15/2011 # 14:41 --
I managed to solve the Max() aggregate problem by creating a 'pure' first-step query to get a listing of all instrument with most modern date as qryRCEquipmentUsed.
qryRCEquipmentLastUse:
SELECT dbo.tblGageActivity.strGageID, Max(dbo.tblGageActivity.datDateEntered) AS datLastDateUsed
FROM dbo.tblGageActivity
GROUP BY dbo.tblGageActivity.strGageID;
Then I created a 'pure' listing of all instruments that have no usage at all as a query named qryRCEquipmentNeverUsed.
qryRCEquipmentNeverUsed:
SELECT dbo_Gage_Master.Gage_ID, NULL AS datLastDateUsed
FROM dbo_Gage_Master LEFT JOIN dbo_tblGageActivity ON dbo_Gage_Master.Gage_ID = dbo_tblGageActivity.strGageID
WHERE (((dbo_tblGageActivity.strGageID) Is Null));
NOTE: The NULL was inserted so that the third combining UNION query will not fail due to a mismatch in the number of fields being retrieved from the tables.
At last, I created a UNION query named qryCombinedUseEquipment to combine the two into a list:
qryCombinedUseEquipment:
SELECT *
FROM qryRCEquipmentLastUse
UNION SELECT *
FROM qryRCEquipmentNeverUsed;
Using this last union query to feed the Last Used date to the parent query works in datasheet view, but when the parent query is called in the report - I get a blank report; so a nudge in the right direction would still be wonderfully appreciated.
APPENDIX
Same script as above, but with shorter table aliases (in case someone finds that clearer):
SELECT
gd.intGagePOID,
gd.strGageDetailID,
gm.Description,
gm.Manufacturer,
gm.Model_No,
gm.Gage_SN,
gm.Unit_of_Meas,
gm.User_Defined,
gm.Calibration_Frequency,
gm.Calibration_Frequency_UOM,
gd.bolGageLeavePriceBlank,
gd.intGageCost,
gm.Last_Calibration_Date,
gm.Next_Due_Date,
gd.bolGageEvaluate,
gd.bolGageExpedite,
gd.bolGageAccredited,
gd.bolGageCalibrate,
gd.bolGageRepair,
gd.bolGageReturned,
gd.bolGageBER,
gd.intTurnaroundDaysOut,
lu.MaxOfdatDateEntered
FROM (dbo_tblPOGaugeDetail gd
INNER JOIN dbo_Gage_Master gm ON gd.strGageDetailID = gm.Gage_ID)
INNER JOIN qryRCEquipmentLastUse lu ON gm.Gage_ID = lu.Gage_ID
ORDER BY gd.strGageDetailID;
Piece by piece...
First -- I suspect you're trying to answer too many questions at once (as evidenced by 23 fields in your SELECT), which will make aggregation near-impossible. Start by narrowing down the scope of the query -- What question is this query attempting to answer? (You can always make more queries to answer other questions... :-)
1) How many uses since last calibration?
2) How many uses since last ...use? (not sure what you mean by that -- maybe last sign-out, or last rental, etc.?)
Tip -- learn to use table aliases. Large queries are difficult to read; worse because of repeated table names.
1) Ex.: dbo_tbl_POGaugeDetail.intGagePOID becomes d.intGagePOID
Here's a sample that might get you started:
SELECT
d.strCustomerJobNum,
Max(d.last_calibration_date) -- not sure what you named that field
Count(d.strCustomerJobNum)
FROM
dbo_tblPOGaugeDetail d
GROUP BY
d.strCustomerJobNum
Does this work:
SELECT dbo_tblPOGaugeDetail.intGagePOID, dbo_tblPOGaugeDetail.strGageDetailID,
OuterGageMaster.Description, OuterGageMaster.Manufacturer, OuterGageMaster.Model_No,
OuterGageMaster.Gage_SN, OuterGageMaster.Unit_of_Meas, OuterGageMaster.User_Defined,
OuterGageMaster.Calibration_Frequency, OuterGageMaster.Calibration_Frequency_UOM,
dbo_tblPOGaugeDetail.bolGageLeavePriceBlank, dbo_tblPOGaugeDetail.intGageCost,
OuterGageMaster.Last_Calibration_Date, OuterGageMasterNext_Due_Date,
dbo_tblPOGaugeDetail.bolGageEvaluate, dbo_tblPOGaugeDetail.bolGageExpedite,
dbo_tblPOGaugeDetail.bolGageAccredited, dbo_tblPOGaugeDetail.bolGageCalibrate,
dbo_tblPOGaugeDetail.bolGageRepair, dbo_tblPOGaugeDetail.bolGageReturned,
dbo_tblPOGaugeDetail.bolGageBER, dbo_tblPOGaugeDetail.intTurnaroundDaysOut,
qryRCEquipmentLastUse.MaxOfdatDateEntered,
(Select Count(strCustomerJobNum)
FROM tblGageActivity WHERE
OuterGageMaster.Last_Calibration_Date=tblGageActivity.datDateEntered) As JobCount
FROM
(dbo_tblPOGaugeDetail INNER JOIN dbo_Gage_Master OuterGageMaster ON
dbo_tblPOGaugeDetail.strGageDetailID = OuterGageMaster.Gage_ID) INNER JOIN
qryRCEquipmentLastUse ON OuterGageMaster.Gage_ID = qryRCEquipmentLastUse.Gage_ID
ORDER BY
dbo_tblPOGaugeDetail.strGageDetailID;
or is that what you tried?
Summary Problem:
Attempts to put an aggregate function in my report for the number of uses since the item's calibration are met either with undesired results, or the dreaded 'aggregate missing' error.
Solution:
I decided to leave the query driving the report alone - instead choosing to employ the use of DLookup and DCount as appropriate to retrieve the last used date from a query that provides the last used date of all the instruments, and the number of uses an instrument has had since it's last calibration, using the aforementioned domain aggregates respectively.
Using the query described in the problem description, I am able to retrieve the last used date for all instruments. I used a =DLookup statement as the source for a text box on the report's subreport dealing with various items as such:
=IIf((DLookUp("[qryRCCombinedUseEquipment]![datLastDateUsed]","[qryRCCombinedUseEquipment]","[qryRCCombinedUseEquipment]![strGageID]=[strGageDetailID]")) Is Null Or ([bolGageReturned]=True),"",DLookUp("[qryRCCombinedUseEquipment]![datLastDateUsed]","[qryRCCombinedUseEquipment]","[qryRCCombinedUseEquipment]![strGageID]=[strGageDetailID]"))
This allows items that have never been used to return a NULL result, which will display as a blank text box.
The number of uses, however, would not feed off a query using =DCount (I tried, it would take over ten minutes to retrieve results, if it ever did). However, using the underlying activity table, I used the following statement:
=IIf([bolGageReturned],"","Used " & DCount("[dbo_tblGageActivity]![strGageID]","[dbo_tblGageActivity]","[dbo_tblGageActivity]![strGageID] = [strGageDetailID] And [dbo_tblGageActivity]![datDateEntered] Between [txtLastCalibrationDate] And date()") & " times since last calibration")
It would retrieve a number of times used since the instrument was last calibrated, but no uses that are before that or after today (some jobs are post dated, strangely). Of course, this is SLOW (about thirty seconds for a large document with thirty or forty instruments).
Does anyone else have a better solution for this, or will I have to take the performance hit? If no one has any better ideas, I will accept this as the answer after five days (8/21/2011) .

long running queries: observing partial results?

As part of a data analysis project, I will be issuing some long running queries on a mysql database. My future course of action is contingent on the results I obtain along the way. It would be useful for me to be able to view partial results generated by a SELECT statement that is still running.
Is there a way to do this? Or am I stuck with waiting until the query completes to view results which were generated in the very first seconds it ran?
Thank you for any help : )
In general case the partial result cannot be produced. For example, if you have an aggregate function with GROUP BY clause, then all data should be analysed, before the 1st row is returned. LIMIT clause will not help you, because it is applied after the output is computed. Maybe you can give a concrete data and SQL query?
One thing you may consider is sampling your tables down. This is good practice in data analysis in general to get your iteration speed up when you're writing code.
For example, if you have table create privelages and you have some mega-huge table X with key unique_id and some data data_value
If unique_id is numeric, in nearly any database
create table sample_table as
select unique_id, data_value
from X
where mod(unique_id, <some_large_prime_number_like_1013>) = 1
will give you a random sample of data to work your queries out, and you can inner join your sample_table against the other tables to improve speed of testing / query results. Thanks to the sampling your query results should be roughly representative of what you will get. Note, the number you're modding with has to be prime otherwise it won't give a correct sample. The example above will shrink your table down to about 0.1% of the original size (.0987% to be exact).
Most databases also have better sampling and random number methods than just using mod. Check the documentaion to see what's available for your version.
Hope that helps,
McPeterson
It depends on what your query is doing. If it needs to have the whole result set before producing output - such as might happen for queries with group by or order by or having clauses, then there is nothing to be done.
If, however, the reason for the delay is client-side buffering (which is the default mode), then that can be adjusted using "mysql-use-result" as an attribute of the database handler rather than the default "mysql-store-result". This is true for the Perl and Java interfaces: I think in the C interface, you have to use an unbuffered version of the function that executes the query.