Query to get the latest treatment for each machine - sql

Let me just start by saying that I don't care on which type of SQL I get an answer on. In reality I am creating my question in Kusto but the Kusto thread in stackoverflow is dead most of the time. This is just to give me an idea of on which way I could do this so I can then translate it somehow into Kusto.
I have a database called "MachineData" that looks something like this (but with hundreds of thousands of records)
What I want to do is get for each Machine the latest treatment that the machine has done. In other words, I want for each machine to get the most recent StartTime.
I thought about doing something where I say "Order by SerialNumber, StartTime" but because there are hundreds of thousands of records then my system can't do that without crashing because of all the amount of data there is, and also this approach will still show me all records for each Machine and what I want to do is just get the latest StartTime.
The other thing I thought about doing is something like this,
MachineData
| top 1 by SerialNumber, StartTime
but the "top" command on Kusto only accepts one parameter to order by.

Probably you're looking for GROUP BY and max():
SELECT SerialNumber, max(StartTime) as MostRecentStartTime
FROM MachineData
GROUP BY SerialNumber;

Related

Improve performance of deducting values of same table in SQL

for a metering project I use a simple SQL table in the following format
ID
Timestamp: dat_Time
Metervalue: int_Counts
Meterpoint: fk_MetPoint
While this works nicely in general I have not found an efficient solution for one specific problem: There is one Meterpoint which is a submeter of another Meterpoint. I'd be interested in the Delta of those two Meterpoints to get the remaining consumption. As the registration of counts is done by one device I get datapoints for the various Meterpoints at the same Timestamp.
I think I found a solution applying a subquery which appears to be not very efficient.
SELECT
A.dat_Time,
(A.int_Counts- (SELECT B.int_Counts FROM tbl_Metering AS B WHERE B.fk_MetPoint=2 AND B.dat_Time=A.dat_Time)) AS Delta
FROM tbl_Metering AS A
WHERE fk_MetPoint=1
How could I improve this query?
Thanks in advance
You can try using a window function instead:
SELECT m.dat_Time,
(m.int_counts - m.int_counts_2) as delta
FROM (SELECT m.*,
MAX(CASE WHEN fk.MetPoint = 2 THEN int_counts END) OVER (PARTITION BY dat_time) as int_counts_2
FROM tbl_Metering m
) m
WHERE fk_MetPoint = 1
From a query point of view, you should as a minimum change to a set-based approach instead of an inline sub-query for each row, using a group by as a minimum but it is a good candidate for a windowing query, just as suggested by the "Great" Gordon Linoff
However if this is a metering project, then we are going to expect a high volume of records, if not now, certainly over time.
I would recommend you look into altering the input such that delta is stored as it's own first class column, this moves much of the performance hit to the write process which presumably will only ever occur once for each record, where as your select will be executed many times.
This can be performed using an INSTEAD OF trigger or you could write it into the business logic, in a recent IoT project we computed or stored these additional properties with each inserted reading to greatly simplify many types of aggregate and analysis queries:
Id of the Previous sequential reading
Timestamp of the Previous sequential reading
Value Delta
Time Delta
Number of readings between this and the previous reading
The last one sounds close to your scenario, we were deliberately batching multiple sequential readings into a single record.
You could also process the received data into a separate table that includes this level of aggregation information, so as not to pollute the raw feed and to allow you to re-process it on demand.
You could redirect your analysis queries to this second table, which is now effectively a data warehouse of sorts.

NuoDB query statistics

I'm currently doing a research using a TPC-H table and I'm trying to get the running time for 3 queries in NuoDB:
SELECT * FROM LINEITEM
SELECT * FROM LINEITEM WHERE L_PARTKEY BETWEEN 1 AND 80000
SELECT * FROM LINEITEM WHERE L_PARTKEY BETWEEN 1 AND 80000 OR L_PARTKEY
BETWEEN 100001 AND 200000
The thing is whenever I run a query, the results come but I guess the database does not store all of the data in memory, but rather keeps its connection open (so when I scroll down through the data results, it refreshes with more data).
Thus, I cannot retrieve the query running time. I tried setting the MIN_QUERY_TIME to 1 so it would be caught in SYSTEM.QUERYSTATS but it does not appear there (although I find the query as open in SYSTEM.CONNECTIONS, which is why I think the database keeps an open connection rather than storing in-memory).
Is there any kind of solution to fetch all data(as in a select all) and get the query running time?
Is there another way to get this that I'm missing? When I do
SELECT COUNT(*) FROM LINEITEM
it does work and the query goes to SYSTEM.QUERYSTATS. I think it is because the database has to go through all rows in order to count and therefore the query finishes, whereas when I do a
SELECT *
it waits until I "ask" for more data.
I have been trying for days and I could not get into a solution. I even tried different 3rd party tools like DBVisualizer and SQL Workbench but they do not seem to give me the expected results.
I would be really glad if you could give me a hand or at least forward this e-mail to someone that might lead me to a possible solution.
Many thanks.

Order results by weighted variables?

I have a listing of ~10,000 apps and I'd like to order them by certain columns, but I want to give certain columns more "weight" than others.
For instance, each app has overall_ratings and current_ratings. If the app has a lot of overall_ratings, that's worth 1.5, but the number of current_ratings would be worth, say 2, since the number of current_ratings shows the app is active and currently popular.
Right now there are probably 4-6 of these variables I want to take into account.
So, how can I pull that off? In the query itself? After the fact using just Ruby (remember, there are over 10,000 rows that would need to be processed here)? Something else?
This is a Rails 3.2 app.
Sorting 10000 objects in plain Ruby doesn't seem like a good idea, specially if you just want the first 10 or so.
You can try to put your math formula in the query (using the order method from Active Record).
However, my favourite approach would be to create a float attribute to store the score and update that value with a before_save method.
I would read about dirty attributes so you only perform this scoring when some of you're criteria is updated.
You may also create a rake task that re-scores your current objects.
This way you would keep the scoring functionality in Ruby (you could test it easily) and you could add an index to your float attribute so database queries have better performance.
One attempt would be to let the DB do this work for you with some query like: (can not really test it because of laking db schema):
ActiveRecord::Base.connection.execute("SELECT *,
(2*(SELECT COUNT(*) FROM overall_ratings
WHERE app_id = a.id) +
1.5*(SELECT COUNT(*) FROM current_ratings
WHERE app_id = a.id)
AS rating FROM apps a
WHERE true HAVING rating > 3 ORDER BY rating desc")
Idea is to sum the number of ratings found for each current and overall rating with the subqueries for an specific app id and weight them as desired.

Access & SQL Server: Number of uses since date aggregate problem - new reporting problem (solved aggregate issue)

BACKGROUND:
I've been trying to streamline the work involved in running a report in my program. Lately, I've had to supply a listing of job numbers an instrument has been used on with the listing of items for cost/benefit analysis. Mostly to see how often an instrument is used since it was last serviced/calibrated and the last time anyone did use it. I was looking to integrate this into the query that helps generate the report - but I keep hitting a brick wall of sorts with the number of uses - since I want that aggregate to be based on the date the instrument was last calibrated (a field based in the same query). I can get it to give me the number of uses in the system total - but it will not accept the limitation that I want it to be only counting the times used since the last time it was calibrated
PROBLEM:
Attempts to put an aggregate function in my report for the number of uses since the item's calibration are met either with undesired results, or the dreaded 'aggregate missing' error (don't remember the exact warning).
-- Edited to add 8/12/2011 # 16:09 --
An additional problem with the use of the Max aggregate has been found for instruments that have never been used being excluded by this query.
DETAILS:
Here is the query that does work so far:
SELECT
dbo_tblPOGaugeDetail.intGagePOID,
dbo_tblPOGaugeDetail.strGageDetailID,
dbo_Gage_Master.Description,
dbo_Gage_Master.Manufacturer,
dbo_Gage_Master.Model_No,
dbo_Gage_Master.Gage_SN,
dbo_Gage_Master.Unit_of_Meas,
dbo_Gage_Master.User_Defined,
dbo_Gage_Master.Calibration_Frequency,
dbo_Gage_Master.Calibration_Frequency_UOM,
dbo_tblPOGaugeDetail.bolGageLeavePriceBlank,
dbo_tblPOGaugeDetail.intGageCost,
dbo_Gage_Master.Last_Calibration_Date,
dbo_Gage_Master.Next_Due_Date,
dbo_tblPOGaugeDetail.bolGageEvaluate,
dbo_tblPOGaugeDetail.bolGageExpedite,
dbo_tblPOGaugeDetail.bolGageAccredited,
dbo_tblPOGaugeDetail.bolGageCalibrate,
dbo_tblPOGaugeDetail.bolGageRepair,
dbo_tblPOGaugeDetail.bolGageReturned,
dbo_tblPOGaugeDetail.bolGageBER,
dbo_tblPOGaugeDetail.intTurnaroundDaysOut,
qryRCEquipmentLastUse.MaxOfdatDateEntered
FROM (dbo_tblPOGaugeDetail
INNER JOIN dbo_Gage_Master ON dbo_tblPOGaugeDetail.strGageDetailID = dbo_Gage_Master.Gage_ID)
INNER JOIN qryRCEquipmentLastUse ON dbo_Gage_Master.Gage_ID = qryRCEquipmentLastUse.Gage_ID
ORDER BY dbo_tblPOGaugeDetail.strGageDetailID;
But I can't seem to aggregate a count of Uses (making a Count(strCustomerJobNum)) from the tblGageActivity with the following fields:
strGageID
strCustomerJobNum
datDateEntered
datTimeEntered
I tried to add a field to the formerly listed query to do a Count(strCustomerJobNum) where datDateEntered matched the Last_Calibration_Date from the calling query - but I got the 'missing aggregate' error. If I leave this condition out - it will run - but will list every instrument ever sent out only if it's had a usage count of at least one (not what I want at all, sadly).
I also want to make sure that if I should get a zero uses count - I will get a zero back instead of my expected records minus the null results.
I hope someone out there can tell me where I am going wrong with this - I want to save the time I am currently spending running an activity report in another program whenever I want to generate this report. Thanks in advance, and let me know if you need me to post more information.
-- Edited to add 08/15/2011 # 14:41 --
I managed to solve the Max() aggregate problem by creating a 'pure' first-step query to get a listing of all instrument with most modern date as qryRCEquipmentUsed.
qryRCEquipmentLastUse:
SELECT dbo.tblGageActivity.strGageID, Max(dbo.tblGageActivity.datDateEntered) AS datLastDateUsed
FROM dbo.tblGageActivity
GROUP BY dbo.tblGageActivity.strGageID;
Then I created a 'pure' listing of all instruments that have no usage at all as a query named qryRCEquipmentNeverUsed.
qryRCEquipmentNeverUsed:
SELECT dbo_Gage_Master.Gage_ID, NULL AS datLastDateUsed
FROM dbo_Gage_Master LEFT JOIN dbo_tblGageActivity ON dbo_Gage_Master.Gage_ID = dbo_tblGageActivity.strGageID
WHERE (((dbo_tblGageActivity.strGageID) Is Null));
NOTE: The NULL was inserted so that the third combining UNION query will not fail due to a mismatch in the number of fields being retrieved from the tables.
At last, I created a UNION query named qryCombinedUseEquipment to combine the two into a list:
qryCombinedUseEquipment:
SELECT *
FROM qryRCEquipmentLastUse
UNION SELECT *
FROM qryRCEquipmentNeverUsed;
Using this last union query to feed the Last Used date to the parent query works in datasheet view, but when the parent query is called in the report - I get a blank report; so a nudge in the right direction would still be wonderfully appreciated.
APPENDIX
Same script as above, but with shorter table aliases (in case someone finds that clearer):
SELECT
gd.intGagePOID,
gd.strGageDetailID,
gm.Description,
gm.Manufacturer,
gm.Model_No,
gm.Gage_SN,
gm.Unit_of_Meas,
gm.User_Defined,
gm.Calibration_Frequency,
gm.Calibration_Frequency_UOM,
gd.bolGageLeavePriceBlank,
gd.intGageCost,
gm.Last_Calibration_Date,
gm.Next_Due_Date,
gd.bolGageEvaluate,
gd.bolGageExpedite,
gd.bolGageAccredited,
gd.bolGageCalibrate,
gd.bolGageRepair,
gd.bolGageReturned,
gd.bolGageBER,
gd.intTurnaroundDaysOut,
lu.MaxOfdatDateEntered
FROM (dbo_tblPOGaugeDetail gd
INNER JOIN dbo_Gage_Master gm ON gd.strGageDetailID = gm.Gage_ID)
INNER JOIN qryRCEquipmentLastUse lu ON gm.Gage_ID = lu.Gage_ID
ORDER BY gd.strGageDetailID;
Piece by piece...
First -- I suspect you're trying to answer too many questions at once (as evidenced by 23 fields in your SELECT), which will make aggregation near-impossible. Start by narrowing down the scope of the query -- What question is this query attempting to answer? (You can always make more queries to answer other questions... :-)
1) How many uses since last calibration?
2) How many uses since last ...use? (not sure what you mean by that -- maybe last sign-out, or last rental, etc.?)
Tip -- learn to use table aliases. Large queries are difficult to read; worse because of repeated table names.
1) Ex.: dbo_tbl_POGaugeDetail.intGagePOID becomes d.intGagePOID
Here's a sample that might get you started:
SELECT
d.strCustomerJobNum,
Max(d.last_calibration_date) -- not sure what you named that field
Count(d.strCustomerJobNum)
FROM
dbo_tblPOGaugeDetail d
GROUP BY
d.strCustomerJobNum
Does this work:
SELECT dbo_tblPOGaugeDetail.intGagePOID, dbo_tblPOGaugeDetail.strGageDetailID,
OuterGageMaster.Description, OuterGageMaster.Manufacturer, OuterGageMaster.Model_No,
OuterGageMaster.Gage_SN, OuterGageMaster.Unit_of_Meas, OuterGageMaster.User_Defined,
OuterGageMaster.Calibration_Frequency, OuterGageMaster.Calibration_Frequency_UOM,
dbo_tblPOGaugeDetail.bolGageLeavePriceBlank, dbo_tblPOGaugeDetail.intGageCost,
OuterGageMaster.Last_Calibration_Date, OuterGageMasterNext_Due_Date,
dbo_tblPOGaugeDetail.bolGageEvaluate, dbo_tblPOGaugeDetail.bolGageExpedite,
dbo_tblPOGaugeDetail.bolGageAccredited, dbo_tblPOGaugeDetail.bolGageCalibrate,
dbo_tblPOGaugeDetail.bolGageRepair, dbo_tblPOGaugeDetail.bolGageReturned,
dbo_tblPOGaugeDetail.bolGageBER, dbo_tblPOGaugeDetail.intTurnaroundDaysOut,
qryRCEquipmentLastUse.MaxOfdatDateEntered,
(Select Count(strCustomerJobNum)
FROM tblGageActivity WHERE
OuterGageMaster.Last_Calibration_Date=tblGageActivity.datDateEntered) As JobCount
FROM
(dbo_tblPOGaugeDetail INNER JOIN dbo_Gage_Master OuterGageMaster ON
dbo_tblPOGaugeDetail.strGageDetailID = OuterGageMaster.Gage_ID) INNER JOIN
qryRCEquipmentLastUse ON OuterGageMaster.Gage_ID = qryRCEquipmentLastUse.Gage_ID
ORDER BY
dbo_tblPOGaugeDetail.strGageDetailID;
or is that what you tried?
Summary Problem:
Attempts to put an aggregate function in my report for the number of uses since the item's calibration are met either with undesired results, or the dreaded 'aggregate missing' error.
Solution:
I decided to leave the query driving the report alone - instead choosing to employ the use of DLookup and DCount as appropriate to retrieve the last used date from a query that provides the last used date of all the instruments, and the number of uses an instrument has had since it's last calibration, using the aforementioned domain aggregates respectively.
Using the query described in the problem description, I am able to retrieve the last used date for all instruments. I used a =DLookup statement as the source for a text box on the report's subreport dealing with various items as such:
=IIf((DLookUp("[qryRCCombinedUseEquipment]![datLastDateUsed]","[qryRCCombinedUseEquipment]","[qryRCCombinedUseEquipment]![strGageID]=[strGageDetailID]")) Is Null Or ([bolGageReturned]=True),"",DLookUp("[qryRCCombinedUseEquipment]![datLastDateUsed]","[qryRCCombinedUseEquipment]","[qryRCCombinedUseEquipment]![strGageID]=[strGageDetailID]"))
This allows items that have never been used to return a NULL result, which will display as a blank text box.
The number of uses, however, would not feed off a query using =DCount (I tried, it would take over ten minutes to retrieve results, if it ever did). However, using the underlying activity table, I used the following statement:
=IIf([bolGageReturned],"","Used " & DCount("[dbo_tblGageActivity]![strGageID]","[dbo_tblGageActivity]","[dbo_tblGageActivity]![strGageID] = [strGageDetailID] And [dbo_tblGageActivity]![datDateEntered] Between [txtLastCalibrationDate] And date()") & " times since last calibration")
It would retrieve a number of times used since the instrument was last calibrated, but no uses that are before that or after today (some jobs are post dated, strangely). Of course, this is SLOW (about thirty seconds for a large document with thirty or forty instruments).
Does anyone else have a better solution for this, or will I have to take the performance hit? If no one has any better ideas, I will accept this as the answer after five days (8/21/2011) .

Ruby Rails Complex SQL with aggregate function and DayOfWeek

Rails 2.3.4
I have searched google, and have not found an answer to my dilemma.
For this discussion, I have two models. Users and Entries. Users can have many Entries (one for each day).
Entries have values and sent_at dates.
I want to query and display the average value of entries for a user BY DAY OF WEEK. So if a user has entered values for, say, the past 3 weeks, I want to show the average value for Sundays, Mondays, etc. In MySQL, it is simple:
SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = ? GROUP BY 1
That query will return between 0 and 7 records, depending upon how many days a user has had at least one entry.
I've looked at find_by_sql, but while I am searching Entry, I don't want to return an Entry object; instead, I need an array of up to 7 days and averages...
Also, I am concerned a bit about the performance of this, as we would like to load this to the user model when a user logs in, so that it can be displayed on their dashboard. Any advice/pointers are welcome. I am relatively new to Rails.
You can query the database directly, no need to use an actual ActiveRecord object. For example:
ActiveRecord::Base.connection.execute "SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = #{user.id} GROUP BY DAYOFWEEK(sent_at);"
This will give you a MySql::Result or MySql2::Result that you can then use each or all on this enumerable, to view your results.
As for caching, I would recommend using memcached, but any other rails caching strategy will work as well. The nice benefit of memcached is that you can have your cache expire after a certain amount of time. For example:
result = Rails.cache.fetch('user/#{user.id}/averages', :expires_in => 1.day) do
# Your sql query and results go here
end
This would put your results into memcached for one day under the key 'user//averages'. For example if you were user with id 10 your averages would be in memcached under 'user/10/average' and the next time you went to perform this query (within the same day) the cached version would be used instead of actually hitting the database.
Untested, but something like this should work:
#user.entries.select('DAYOFWEEK(sent_at) as day, AVG(value) as average').group('1').all
NOTE: When you use select to specify columns explicitly, the returned objects are read only. Rails can't reliably determine what columns can and can't be modified. In this case, you probably wouldn't try to modify the selected columns, but you can'd modify your sent_at or value columns through the resulting objects either.
Check out the ActiveRecord Querying Guide for a breakdown of what's going on here in a fairly newb-friendly format. Oh, and if that query doesn't work, please post back so others that may stumble across this can see that (and I can possibly update).
Since that won't work due to entries returning an array, we can try using join instead:
User.where(:user_id => params[:id]).joins(:entries).select('...').group('1').all
Again, I don't know if this will work. Usually you can specify where after joins, but I haven't seen select combined in there. A tricky bit here is that the select is probably going to eliminate returning any data about the user at all. It might make more sense just to eschew find_by_* methods in favor of writing a method in the Entry model that just calls your query with select_all (docs) and skips the association mapping.