Select first record with aggregate functions - sql

I have a thermometer that starts logging data every morning whenever my machine turns on.
I would like to select the min, max, and average temperatures, as well as the temperatures when the machine turns on and off for every day.
My table structure is as follows:
Time Logged, Date Logged, Temperature
I group by Date Logged to get the aggregates for the day, but I can't seem to find a good way to select the temperature at the first and last time stamps recorded.
Any help?

You want to use a windows function like this:
select t.DateLogged, min(t.Temperature), max(t.Temperature), avg(t.Temperature),
max(case when t.seqnum_asc = 1 then t.Temperature end) as FirstTemperature,
max(case when t.seqnum_desc = 1 then t.Temperature end) as LastTemperature,
from (select t.*,
row_number() over (partition by dateLogged order by timeLogged) as seqnum_asc,
row_number() over (partition by dateLogged order by timeLogged desc) as seqnum_desc
from t
) t
group by t.DateLogged
order by DateLogged
What this is doing is adding two new variables. One enumerates the values during the day starting at 1 with the first reading (seqnum_asc). The other enumeration has 1for the last reading (seqnum_desc`).
To get the values, a conditional summation is used.
If you like, you can actually do pretty much the same thing using min() and max() as window fucnctions, rather than row_number():
select t.DateLogged, min(t.Temperature), max(t.Temperature), avg(t.Temperature),
max(case when timeLogged = mintime then t.Temperature end) as FirstTemperature,
max(case when timeLogged = maxtime then t.Temperature end) as LastTemperature,
from (select t.*,
 min(timeLogged) over (partition by dateLogged) as minTime,
max(timeLogged) over (partition by dateLogged) as maxTime
from t
) t
group by t.DateLogged
order by DateLogged

Related

how can i reset the count to 0 in sql when i have a condition that is false?

i have a sql table which the following data shown in the picture
I need to create a query in sql which counts for ticker the number of consecutive days per year in which
the close_value is greater than the open_value, if close_value is less than the open value the counter must be reset to zero and I have to save the counter in that instant
This is an example of a gaps-and-islands problem. You can use the difference of row_numbers():
select ticker, min(date), max(date), min(open_value), max(close_value),
count(*) as num_rows
from (select t.*,
row_number() over (partition by ticker order by date) as seqnum,
row_number() over (partition by ticker, (case when close_value > open_value then 1 else 2 end) order by date) as seqnum_2
from t
) t
where close_value > open_value
group by ticker, (seqnum - seqnum_2);
This returns all such periods. You haven't specified what the result set should look like, but this should be pretty close.

How to select specific rows in a "group by" groups using conditions on multiple columns?

I have the following table with many userId (in the example only one userId for demo purpose):
For every userId I want to extract two rows:
The first row should be isTransaction = 0 and the earliest date!
The second row should be isTransaction = 1, device should be different from that of the first row, isTransaction should be equal to 1 and the earliest date right after that of the first row
That is, the output should be:
Time userId device isTransaction
2021-01-27 10187675 mobile 0
2021-01-30 10187675 web 1
I tried to rank rows with partitioning and ordering but it didn't work:
Select * from
(SELECT *, rank() over(partition by userId, device, isTransaction order by isTransaction, Time) as rnk
FROM table 1)
where rnk=1
order by Time
Please help! It would be also good to check the time difference between these two rows to not exceed 30 days. Otherwise, userId should be dropped.
You can first identify the earliest time for 0. Then enumerate the rows and take only the first one:
select t.*
from (select t.*,
row_number() over (partition by userid, status order by time) as seqnum
from (select t.*,
min(case when isTransaction = 0 then time end) over (partition by userid order by time) as time_0
from t
) t
where time > time_0
) t
where seqnum = 1;
This satisfies the two conditions you enumerated.
Then buried in the text, you want to eliminate rows where the difference is greater than 30 days. That is a little tricker . . . but not too hard:
select t.*
from (select t.*,
min(case when isTransaction = 1 then time end) over (partition by userid) as time_1
row_number() over (partition by userid, status order by time) as seqnum
from (select t.*,
min(case when isTransaction = 0 then time end) over (partition by userid order by time) as time_0
from t
) t
where time > time_0
) t
where seqnum = 1 and
time_1 < timestamp_add(time_0, interval 30 day);

MariaDB get first and last record of the month - nested query

I am using MariaDB and I have these kind of data:
I have also data for March and I am using this query to select distinct Months from the database:
select distinct(DATE_FORMAT(DT,'%m-%Y')) AS singleMonth FROM myTable
I want to be able to select FIRST and LAST record of P2 column for every month. How it is possible using the query above for getting all distinct months and also getting first record for the month and last?
Example what the query should return look-like:
You can use window functions and conditional aggregation:
select year(dt), month(dt),
min(case when seqnumn_asc = 1 then p2 end) as first_p2,
min(case when seqnumn_desc = 1 then p2 end) as last_p2
from (select t.*,
row_number() over (partition by year(dt), month(dt) order by dt asc) as seqnum_asc,
row_number() over (partition by year(dt), month(dt) order by dt desc) as seqnum_desc
from t
) t
group by year(dt), month(dt);

Determine MIN Date from Consecutive Occurrences

I have a table that contains the following columns: Date, Customer, Active Flag. I need to add a fourth column called Start. The Start column should return the first date the client was active, based on consecutive active flags.
shows the three columns I currently have and the results I wish to return for the Start column.
Your insight into what my SQL code should look like to achieve this would be appreciated. Thanks!!
You can do this without subqueries, if I assume one date per month per customer:
select t.*,
(case when activeflag = 1
then coalesce(max(case when activeflag = 0 then date end) over (partition by customer order by date) + interval '1 month',
min(case when activeflag = 1 then date end) over (partition by customer)
)
end) as start
from t;
Subqueries, though, might make this easier. You can treat this as a gaps-and-islands problem:
select t.*,
(case when activeflag = 1
then min(date) over (partition by customerid, seqnum - seqnum_a)
end) as start
from (select t.*,
row_number() over (partition by customerid order by date) as seqnum,
row_number() over (partition by customerid, activeflag order by date) as seqnum_a
from t
) t

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1