SELECT MIN from a subset of data obtained through GROUP BY - sql

There is a database in place with hourly timeseries data, where every row in the DB represents one hour. Example:
TIMESERIES TABLE
id date_and_time entry_category
1 2017/01/20 12:00 type_1
2 2017/01/20 13:00 type_1
3 2017/01/20 12:00 type_2
4 2017/01/20 12:00 type_3
First I used the GROUP BY statement to find the latest date and time for each type of entry category:
SELECT MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category;
However now, I want to find which is the date and time which is the LEAST RECENT among the datetime's I obtained with the query listed above. I will need to use somehow SELECT MIN(date_and_time), but how do I let SQL know I want to treat the output of my previous query as a "new table" to apply a new SELECT query on? The output of my total query should be a single value—in case of the sample displayed above, date_and_time = 2017/01/20 12:00.
I've tried using aliases, but don't seem to be able to do the trick, they only rename existing columns or tables (or I'm misusing them..).There are many questions out there that try to list the MAX or MIN for a particular group (e.g. https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ or Select max value of each group) which is what I have already achieved, but I want to do work now on this list of obtained datetime's. My database structure is very simple, but I lack the knowledge to string these queries together.
Thanks, cheers!

You can use your first query as a sub-query, it is similar to what you are describing as using the first query's output as the input for the second query. Here you will get the one row out put of the min date as required.
SELECT MIN(date_and_time)
FROM (SELECT MAX(date_and_time) as date_and_time, entry_category
FROM timeseries_table
GROUP BY entry_category)a;

Is this what you want?
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC;
This returns ties. If you do not want ties, then include an additional sort key:
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC, entry_category;

Related

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

SQL Finding maximum average time for distinct cell

I have a table with large number of records for which i am trying to find only 10 numbers with the largest average time per number.
So the table may look like so:
number | time
012345 | 10s
012345 | 20s
055555 | 50s
055555 | 30s
068976 | 11s
etc...
and the output should look like so:
number | time
012345 | 15s
055555 | 40s
068976 | 11s
tried this but to no avail
select distinct(destination), avg(totalqueuetime)
from call
group by destination, totalqueuetime
order by totalqueue time desc limit 10;
it does not seem to group the numbers.
Please try the following code, which has been tested as confirmed as effective. ...-
(If you wish to sort by average total queue time, as your code sample above suggests)
SELECT destination,
AVG( totalqueuetime ) AS avgTQT
FROM call
GROUP BY destination
ORDER BY avgTQT DESC LIMIT 10;
(If you wish to sort by destination, as your desired output sample above suggests)
SELECT destination,
AVG( totalqueuetime ) AS avgTQT
FROM call
GROUP BY destination
ORDER BY destination DESC LIMIT 10;
If you have any questions or comments, then please feel free to post a Comment accordingly.
Note : As for your supplied code, if you remove totalqueuetime from the GROUP BY clause you will not need to use DISTINCT. Thanks to AVG your SELECT statement will place the average in every returned field, potentially leading to many instances of the same combination of description and average. Grouping them by Destination will reduce the list to one instance of each combination only.
Your group by has two keys. It should only have one:
select destination, avg(totalqueuetime)
from call
group by destination
order by totalqueue time desc
limit 10;
Notes on the use of distinct. select distinct is almost never needed with group by. In fact, in almost all cases, you don't need select distinct at all -- because you can use group by.
In addition, distinct is not a function. It applies to the entire entire row. So, don't use parentheses around the first column, unless you want to confuse yourself.

Group by two columns is possible?

I have this table:
ID Price Time
0 20,00 20/10/10
1 20,00 20/10/10
2 20,00 12/12/10
3 14,00 23/01/12
4 87,00 30/07/14
4 20,00 30/07/14
I use this syntax sql to get the list of all prices in a way that does not get repeated values:
SELECT * FROM myTable WHERE id in (select min(id) from %# group by Price)
This code return me the values (20,14,87,20)
But in this case I would implement another check, that will not only sort by price but also by date, example: That syntax is getting the list by price, if I find a way to check by date, the code will return me the values (20,20,14,87,20)
He repeats 20 two times but if we see in the table we have three numbers 20 (two with the date 20/10/10 and one with the date 12/12/10) and is exactly what I'm wanting to get!
Somebody could help me?
To group by multiple columns, just put a comma in between the list.
SELECT price FROM myTable group by price, time order by time
The group by looks at all distinct combinations of the listed columns values, and discards duplicates. You can also use aggregate functions like sum or max to pull in additional columns to the results.
The following should work as long as all you need is the price/time combination. If you need to include the ID, things get more complicated:
SELECT `Price` FROM items
GROUP BY `Price`, `Time`
ORDER BY `Time`;
Here's a fiddle with the result in action: http://sqlfiddle.com/#!2/40821/1

Understanding a Correlated Subquery

I want to create a query that returns the most recent date for a date field and the highest value of a integer field for each "assessment" record. What I think is required is a correlated subquery and using the MAX function.
example data would be as follows
the date field could have duplicate dates for each assessment but each duplicate date group would have a different the integer in the integer field.
eg
1256 2/6/14 0
1256 2/6/14 1
1256 1/6/14 0
4534 3/6/14 0
4534 3/6/14 1
4534 3/6/14 2
select assessment, Max(correctnum) maxofcorrectnum, dateeffect
from lraassm outerassm
where dateeffect =
(select MAX(dateeffect) maxofdateeffect
from pthdbo.lraassm innerassm
innerassm.assessment = outerassm.assessment
group by innerassm.assessment)
group by assessment, dateeffect
so my theory is that the inner query executes and gives the outer query the criteria for the dateeffect field in the outer query and then the outer query would return the maximum of the correctnum field for this dateeffect and also return its corresponding assessment and the dateeffect.
Could someone please confirm this is correct. How does the subquery handle the rows? what other ways are there to solve this problem? thanks
Your query is doing the right thing, but granted, the correlated subquery is a little difficult to understand. What the subquery does is, it filters the records based on assessment from the outer query and then returns the maximum dateeffect for that assessment. In fact, you don't need the group by clause on the correlated query.
These types of queries are where common when working with data in ERP systems, when you're only interested in "latest" records, etc. This is also known as a "top segment" type of query (which the query optimizer is sometimes able to figure out by itself). I've found, that on SQL Server 2005 or newer, it is a lot easier to use the ROW_NUMBER() function. The following query should return the same as yours, namely one record from lraassm for each assessment, that has the highest value of dateeffect and correctnum.
select * from (
select
assessment, dateeffect, correctnum,
ROW_NUMBER() OVER (
PARTITION BY assessment,
ORDER BY dateeffect DESC, correctnum DESC
) AS segment
from lraassm) AS innerQuery
where segment = 1
This is the query I worked out using my tables. But it will get you on the right track and you should be able to substitute your fields/tables in.
Select * from Decode
where updated_time = (Select MAX(updated_time)from DECODE)
That Query gives you every record that has the most recent updated_time. The next query will return the greatest entry_id value as well as the most recent updated_time from those Records
Select MAX(entry_id), updated_time from Decode
where updated_time = (Select MAX(updated_time)from DECODE)
group by updated_time
The result is 2 columns 1 record, 1st column is the Maximum value of entry id, the second is the most recent updated_time. Is that what you wanted to return?

Group by in t-sql not displaying single result

See the image below. I have a table, tbl_AccountTransaction in which I have 10 rows. The lower most table having columsn AccountTransactionId, AgreementId an so on. Now what i want is to get a single row, that is sum of all amount of the agreement id. Say here I have agreement id =23 but when I ran my query its giving me two rows instead of single column, since there is nano or microsecond difference in between the time of insertion.
So i need a way that will give me row 1550 | 23 | 2011-03-21
Update
I have update my query to this
SELECT Sum(Amount) as Amount,AgreementID, StatementDate
FROM tbl_AccountTranscation
Where TranscationDate is null
GROUP BY AgreementID,Convert(date,StatementDate,101)
but still getting the same error
Msg 8120, Level 16, State 1, Line 1
Column 'tbl_AccountTranscation.StatementDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Your group by clause is in error
group by agreementid, convert(date,statementdate,101)
This makes it group by the date (without time) of the statementdate column. Whereas the original is grouping by the statementdate (including time) then for each row of the output, applying the stripping of time information.
To be clear, you weren't supposed to change the SELECT clause
SELECT Sum(Amount) as Amount,AgreementID, Convert(date,StatementDate,101)
FROM tbl_AccountTranscation
Where TranscationDate is null
GROUP BY AgreementID,Convert(date,StatementDate,101)
Because you have a Group By StatementDate.
In your example you have 2 StatementDates:
2011-03-21 14:38:59.470
2011-03-21 14:38:59.487
Change your query in the Group by section instead of StatementDate to be:
Convert(Date, StatementDate, 101)
Have you tried to
Group by (Convert(date,...)
instead of the StatementDate
You are close. You need to combine your two approaches. This should do it:
SELECT Sum(Amount) as Amount,AgreementID, Convert(date,StatementDate,101)
FROM tbl_AccountTranscation
Where TranscationDate is null
GROUP BY AgreementID,Convert(date,StatementDate,101)
If you never need the time, the perhaps you need to change the datatype, so you don't have to do alot of unnecessary converting in most queries. SQL Server 2008 has a date datatype that doesn't include the time. In earlier versions you could add an additional date column that is automatically generated to strip out the time companent so all the dates are like the format of '2011-01-01 00:00:00:000' then you can do date comparisons directly having only had to do the conversion once. This would allow you to have both the actual datetime and just the date.
You should group by DATEPART(..., StatementDate)
Ref: http://msdn.microsoft.com/en-us/library/ms174420.aspx