SQL Statement Only latest entry of the day - sql

seems it is too long ago that I needed create own SQL Statements. I have a table (GAS_COUNTER) with timestamps (TS) and values (VALUE).
There are hundreds of entries per day, but I only need the latest of the day. I tried different ways but never get what I need.
Edit
Thanks for the fast replies, but some do not meet my needs (I need the latest value of each day in the table) and some don't work. My best own statement was:
select distinct (COUNT),
from
(select
extract (DAY_OF_YEAR from TS) as COUNT,
extract (YEAR from TS) as YEAR,
extract (MONTH from TS) as MONTH,
extract (DAY from TS) as DAY,
VALUE as VALUE
from GAS_COUNTER
order by COUNT)
but the value is missing. If I put it in the first select all rows return. (logical correct as every line is distinct)
Here an example of the Table content:
TS VALUE
2015-07-25 08:47:12.663 0.0
2015-07-25 22:50:52.155 2.269999999552965
2015-08-10 11:18:07.667 52.81999999284744
2015-08-10 20:29:20.875 53.27999997138977
2015-08-11 10:27:21.49 54.439999997615814
2nd Edit and solution
select TS, VALUE from GAS_COUNTER
where TS in (
select max(TS) from GAS_COUNTER group by extract(DAY_OF_YEAR from TS)
)

This one would give you the very last record:
select top 1 * from GAS_COUNTER order by TS desc
Here is one that would give you last records for every day:
select VALUE from GAS_COUNTER
where TS in (
select max(TS) from GAS_COUNTER group by to_date(TS,'yyyy-mm-dd')
)
Depending on the database you are using you might need to replace/adjust to_date(TS,'yyyy-mm-dd') function. Basically it should extract date-only part from the timestamp.

Select the max value for the timestamp.
select MAX(TS), value -- or whatever other columns you want from the record
from GAS_COUNTER
group by value

Something like this would window the data and give you the last value on the day - but what happens if you get two TS the same? Which one do you want?
select *
from ( select distinct cast( TS as date ) as dt
from GAS_COUNTER ) as gc1 -- distinct days
cross apply (
select top 1 VALUE -- last value on the date.
from GAS_COUNTER as gc2
where gc2.TS < dateadd( day, 1, gc1.dt )
and gc2.TS >= gc1.dt
order by gc2.TS desc
) as x

Related

How do I select a data every second with PostgreSQL?

I've got a SQL query that selects every data between two dates and now I would like to add the time scale factor so that instead of returning all the data it returns one data every second, minute or hour.
Do you know how I can achieve it ?
My query :
"SELECT received_on, $1 FROM $2 WHERE $3 <= received_on AND received_on <= $4", [data_selected, table_name, date_1, date_2]
The table input:
As you can see there are several data the same second, I would like to select only one per second
If you want to select data every second, you may use ROW_NUMBER() function partitioned by 'received_on' as the following:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY received_on ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
If you want to select data every minute or hour, you may use the extract function to get the number of seconds in 'received_on' and divide it by 60 to get the minutes or divide it by 3600 to get the hours.
epoch: For date and timestamp values, the number of seconds since 1970-01-01 00:00:00-00 (can be negative); for interval values, the total number of seconds in the interval
Group by minutes:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / 60) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
Group by hours:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / (60*60)) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
See a demo.
When there are several rows per second, and you only want one result row per second, you can decide to pick one of the rows for each second. This can be a randomly chosen row or you pick the row with the greatest or least value in a column as shown in Ahmed's answer.
It would be more typical, though, to aggregate your data per second. The columns show figures and you are interested in those figures. Your sample data shows two times the value 2509 and three times the value 2510 for the adc_v column at 2022-07-29, 15:52. Consider what you would like to see. Maybe you don't want this value go below some boundary, so you show the minimum value MIN(adc_v) to see how low it went in the second. Or you want to see the value that occured most often in the second MODE(adc_v). Or you'd like to see the average value AVG(adc_v). Make this decision for every value, so as to get the informarion most vital to you.
select
received_on,
min(adc_v),
avg(adc_i),
...
from mytable
group by received_on
order by received_on;
If you want this for another interval, say an hour instead of the month, truncate your received_on column accordingly. E.g.:
select
date_trunc('hour', received_on) as received_hour,
min(adc_v),
avg(adc_i),
...
from mytable
group by date_trunc('hour', received_on)
order by date_trunc('hour', received_on);

Group by for each row in bigquery

I have a table that stores user comments for each month. Comments are stored using UTC timestamps, I want to get the users that posts more than 20 comments per day. I am able to get the timestamp start and end for each day, but I can't group the comments table by number of comments.
This is the script that I have for getting dates, timestamps and distinct users.
SELECT
DATE(TIMESTAMP_SECONDS(r.ts_start)) AS date,
r.ts_start AS timestamp_start,
r.ts_start+86400 AS timestamp_end,
COUNT(*) AS number_of_comments,
COUNT(DISTINCT s.author) AS dictinct_authors
FROM ((
WITH
shifts AS (
SELECT
[STRUCT(" 00:00:00 UTC" AS hrs,
GENERATE_DATE_ARRAY('2018-07-01','2018-07-31', INTERVAL 1 DAY) AS dt_range) ] AS full_timestamps )
SELECT
UNIX_SECONDS(CAST(CONCAT( CAST(dt AS STRING), CAST(hrs AS STRING)) AS TIMESTAMP)) AS ts_start,
UNIX_SECONDS(CAST(CONCAT( CAST(dt AS STRING), CAST(hrs AS STRING)) AS TIMESTAMP)) + 86400 AS ts_end
FROM
shifts,
shifts.full_timestamps
LEFT JOIN
full_timestamps.dt_range AS dt)) r
INNER JOIN
`user_comments.2018_07` s
ON
(s.created_utc BETWEEN r.ts_start
AND r.ts_end)
GROUP BY
r.ts_start
ORDER BY
number_of_comments DESC
And this is the sample output 1:
The user_comments.2018_07 table is as the following:
More concretely I want the first output 1, has one more column showing the number of authors that have more than 20 comments for the date. How can I do that?
If the goal is only to get the number of users with more than twenty comments for each day from table user_comments.2018_07, and add it to the output you have so far, this should simplify the query you first used. So long as you're not attached to keeping the min/max timestamps for each day.
with nb_comms_per_day_per_user as (
SELECT
day,
author,
COUNT(*) as nb_comments
FROM
# unnest as we don't really want an array
unnest(GENERATE_DATE_ARRAY('2018-07-01','2018-07-31', INTERVAL 1 DAY)) AS day
INNER JOIN `user_comments.2018_07` c
on
# directly convert timestamp to a date, without using min/max timestamp
date(timestamp_seconds(created_utc))
=
day
GROUP BY day, c.author
)
SELECT
day,
sum(nb_comments) as total_comments,
count(*) as distinct_authors, # we have already grouped by author
# sum + if enables to count "very active" users
sum(if(nb_comments > 20, 1, 0)) as very_active_users
FROM nb_comms_per_day_per_user
GROUP BY day
ORDER BY total_comments desc
Also I supposed the column comment containing booleans is not used, as you do not use it in your initial query?

Delete duplicated rows

I ve got duplicated rows in a temp table mainly because there are some date values which are seconds/miliseconds different to each other.
For example:
2018-08-30 12:30:19.000
2018-08-30 12:30:20.000
This is what causes the duplication.
How can I keep only one of those values? Let s say the higher one?
Thank you.
Well, one method is to use lead():
select t.*
from (select t.*, lead(ts) over (order by ts) as next_ts
from t
) t
where next_ts is null or
datediff(second, ts, next_ts) < 60; -- or whatever threshold you want
You could assign a Row_Number to each value, as follows:
Select *
, Row_Number() over
(partition by ObjectID, cast(date as date)... ---whichever criteria you want to consider duplicates
order by date desc) --assign the latest date to row 1, may want other order criteria if you might have ties on this field
as RN
from MyTable
Then retain only the rows where RN = 1 to remove duplicates. See this answer for examples of how to round your dates to the nearest hour, minute, etc. as needed; I used truncating to the day above as an example.

Time difference between two rows for specified ID

I'm trying to find the time difference in seconds between two rows that have the same ID.
Here's a simple table.
The table is ordered by myid and timestamp. I'm trying to get the total second between two rows that have the same myid.
Here's what I have come up with. The only problem with this query is that it calculates the time difference for all records but not for the same ID.
SELECT DATEDIFF(second, pTimeStamp, TimeStamp), q.*
FROM (
SELECT *,
LAG(TimeStamp) OVER (ORDER BY TimeStamp) pTimeStamp
FROM data
) q
WHERE pTimeStamp IS NOT NULL
This is the output.
I only want the output highlighted in yellow.
Any suggestions?
SQLFIDDLE
The fix is simply a matter of narrowing the window, with PARTITION BY, to rows with the same ID:
SELECT DATEDIFF(second, pTimeStamp, TimeStamp), q.*
FROM (
SELECT *,
LAG(TimeStamp) OVER (PARTITION BY ID ORDER BY TimeStamp) pTimeStamp
FROM data
) q
WHERE pTimeStamp IS NOT NULL

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)