Grouping 6-hour Data for Individual Days with SQL Server - sql

Using SQL Server 2000,
I have data that contains a datetime and a value, say DateData and OtherData. Data is collected over time in five minute intervals, so for example, DateData is an ordered DateTime from 2/1/2016 to 2/24/2015, with a 5 minute gap between each new data point.
I am currently trying to average the data so that I can grab the average OtherData value every 6 hours, for each day individually. So far, I have come up with the following SQL that groups all the data into 6-hour intervals, so I end up with averages over all days, rather than individual.
SELECT
DATEPART(hour, DateData)/6,
AVG(OtherData) AS AvgData
FROM DATASITE
GROUP BY DATEPART(hour, DateData)/6
--Result:
|Column1 | AvgData|
|0 | 11|
|1 | 12|
|2 | 13|
|3 | 14|
How would I change the above query to give averages for individual days, rather than all days combined?

Add DateData in Group by to group the rows for each day
SELECT
DATEPART(hour, DateData)/6,
AVG(OtherData) AS AvgData
FROM DATASITE
GROUP BY cast(DateData as date),
DATEPART(hour, DateData)/6

Related

SUM function and between two max dates

I am trying to find out the total amount between 2 dates with MAX function in dates grouping by ID.
column 365date is the difference of column sessiondate-364.
Currently with this query I am getting the total amount, but I want to find out the amount between these 2-column date (i.e. 365 days).
This is my query:
SELECT
DATEADD(day, -364, (MAX(sessiondate))) AS 365date,
MAX(sessiondate)) AS lastdate,
SUM(Amount) AS amount,
ID
FROM
tablename
WHERE
date BETWEEN 365date AND lastdate
GROUP BY
MemberID
date | LastDate| Amount| ID| output amount(only last 365 days)| Total amount(all year)
29/07/2020 |28/07/2021 |100 |1 |1500 |63000
29/08/2020 |28/07/2021 |500 |1
02/05/2020 |28/07/2021 |600 |1
15/01/2020 |28/07/2021 |300 |1
10/10/2000 |28/07/2021 |50000 |1
10/10/1989 |28/07/2021 |10000 |1
"So need to take max(lastdate) for this ID which is 28/07/2021
and subtract 365 days from that then take all the days which lies
between 365 (29/07/2020,29/08/2020,02/05/2020,15/01/2020) and do sum and show it in last 365 days column.
For column totalamount(all year) needs to add all amount no matter of 365 days
Logic:
calculate date column
(MAX(date))-364
calculate lastdate column
Max(lastdate)
calculate last365 amount column
Sum (amount) Between (MAX(date))-364 and Max(lastdate)
calculate Total amount(all year)
sum(amount)
I need only 1 row which is highlighted. Not sure whats wrong with the query.
Can someone please help with this?
You can use CTE, I have convert your query to CTE for where clause.
;with cte AS (
SELECT
DATEADD(day, -364, (MAX(sessiondate))) AS '365date',
MAX(sessiondate)) AS 'lastdate',
SUM(Amount) AS 'amount',
ID
FROM
tablename
)
SELECT * FROM cte
WHERE
DATE BETWEEN 365date AND lastdate
GROUP BY
MemberID

Group By day for custom time interval

I'm very new to SQL and time series database. I'm using crate database. I want to aggregate the data by day. But the I want to start each day start time is 9 AM not 12AM..
Time interval is 9 AM to 11.59 PM.
Unix time stamp is used to store the data. following is my sample database.
|sensorid | reading | timestamp|
====================================
|1 | 1616457600 | 10 |
|1 | 1616461200 | 100 |
|2 | 1616493600 | 1 |
|2 | 1616493601 | 10 |
Currently i grouped using following command. But it gives the start time as 12 AM.
select date_trunc('day', v.timestamp) as day,sum(reading)
from sensor1 v(timestamp)
group by (DAY)
From the above table. i want sum of 1616493600 and 1616493601 data (11 is result). because 1616457600 and 1616461200 are less than 9 am data.
You want to add nine hours to midnight:
date_trunc('day', v.timestamp) + interval '9' hour
Edit: If you want to exclude hours before 9:00 from the data you add up, you must add a WHERE clause:
where extract(hour from v.timestamp) >= 9
Here is a complete query with all relevant data:
select
date_trunc('day', v.timestamp) as day,
date_trunc('day', v.timestamp) + interval '9' hour as day_start,
min(v.timestamp) as first_data,
max(v.timestamp) as last_data,
sum(reading) as total_reading
from sensor1 v(timestamp)
where extract(hour from v.timestamp) >= 9
group by day
order by day;

Aggregate data based on unix time stamp crate database

I'm very new to SQL and time series database. I'm using crate database ( it think which is used PostgreSQL).i want to aggregate the data by hour,day ,week and month. Unix time stamp is used to store the data. following is my sample database.
|sensorid | reading | timestamp|
====================================
|1 | 1604192522 | 10 |
|1 | 1604192702 | 9.65 |
|2 | 1605783723 | 8.1 |
|2 | 1601514122 | 9.6 |
|2 | 1602292210 | 10 |
|2 | 1602291611 | 12 |
|2 | 1602291615 | 10 |
i tried the sql query using FROM_UNIXTIME not supported .
please help me?
im looking the answer for hourly data as follows.
sensorid ,reading , timestamp
1 19.65(10+9.65) 1604192400(starting hour unixt time)
2 8.1 1605783600(starting hour unix time)
2 9.6 1601514000(starting hour unix time)
2 32 (10+12+10) 1602291600(starting hour unix time)
im looking the answer for monthly data is like
sensorid , reading , timestamp
1 24.61(10+9.65+8.1) 1604192400(starting month unix time)
2 41.6(9.6+10+12+10) 1601510400(starting month unix time)
A straight-forward approach is:
SELECT
(date '1970-01-01' + unixtime * interval '1 second')::date as date,
extract(hour from date '1970-01-01' + unixtime * interval '1 second') AS hour,
count(c.user) AS count
FROM core c
GROUP BY 1,2
If you are content with having the date and time in the same column (which would seem more helpful to me), you can use date_trunc():
select
date_trunc('hour', date '1970-01-01' + unixtime * interval '1 second') as date_hour,
count(c.user) AS count
FROM core c
GROUP BY 1,2
You can convert a unix timestamp to a date/time value using to_timestamp(). You can aggregate along multiple dimensions at the same time using grouping sets. So, you might want:
select date_trunc('year', v.ts) as year,
date_trunc('month', v.ts) as month,
date_trunc('week', v.ts) as week,
date_trunc('day', v.ts) as day,
date_trunc('hour', v.ts) as hour,
count(*), avg(reading), sum(reading)
from t cross join lateral
(values (to_timestamp(timestamp))) v(ts)
group by grouping sets ( (year), (month), (week), (day), (hour) );

group by value but only for continue value

OK, the title is far from obvious, I could not explain it better.
Let's consider the table with columns (date, xvalue, some other columns), what I need is to group them by xvalue but only when they are not interrupted considering time (column date), so for example, for:
Date |xvalue |yvalue|
1 Mar |10 |1 |
2 Mar |10 |2 |
3 Mar |20 |6 |
4 Mar |20 |1 |
5 Mar |10 |4 |
6 Mar |10 |2 |
From the above data, I would like to get three rows, for the first xvalue==10, for xvalue==20 and again for xvalue==10 and for each group aggregate of the other values, for example for sum:
1 Mar, 10, 3
3 Mar, 20, 7
5 Mar, 10, 6
It's like query:
select min(date), xvalue, sum(yvalue) from t group by xvalue
Except above will merge 1,2,5 and 6th of March and I want them separately
This is an example of a gaps-and-islands problem. But you need an ordering column. With such a column, you can use the difference of row numbers:
select min(date), xvalue, sum(yvalue)
from (select t.*,
row_number() over (partition by xvalue order by date) as seqnum_d,
row_number() over (order by date) as seqnum
from t
) t
group by xvalue, (seqnum - seqnum_d)
order by min(date)
Here is a db<>fiddle.
Datas in a database are logically stored in mathematicl sets inside which there is absolutly no order and no way to have a default ordering. they are comparable to bags in which objects can move during their use.
So there is no solution to answer your query until you add a specific column to give the requested sort order that the user need to have...

Data Summarization in Apache Pig/Apache Hive For Given Date Range

I have a requirement where-in i need to do data summarization on the date range provided as input. To be more specific: If my data looks like:
Input:
Id|amount|date
1 |10 |2016-01-01
2 |20 |2016-01-02
3 |20 |2016-01-03
4 |20 |2016-09-25
5 |20 |2016-09-26
6 |20 |2016-09-28
And If I want the summarization for the month of September, then
I need to calculate count of records on 4 ranges which are:
Current Date, which is each day in September.
Week Start Date(Sunday of the week as per the current date) to Current Date, Ex. if Current Date is 2016-09-28 then week start date would be 2016-09-25
and record counts between 2016-09-25 to 2016-09-28.
Month Start Date to Current Date, which is from 2016-09-01 to Current Date.
Year Start Date to Current Date,which is record count from 2016-01-01 to Current Date.
So My output should have one record with 4 Columns for each day of the month(in this case, Month is September), Something like
Output:
Current_Date|Current_date_count|Week_To_Date_Count|Month_to_date_Count|Year_to_date_count
2016-09-25 |1 |1 |1 |4
2016-09-26 |1 |2 |3 |5
2016-09-28 |1 |3 |3 |6
Important: i can pass only 2 variables, which is range start date and range end date. Rest calculation need to be dynamic.
Thanks in advance
You can join on year, then test each condition separately (using sum(if())):
select a.date, sum(if(a.date=b.date,1,0)),
sum(if(month(a.date)=month(b.date) and weekofyear(a.date)=weekofyear(b.date),1,0)),
sum(if(month(a.date)=month(b.date),1,0)),
count(*) from
(select * from input_table where date >= ${hiveconf:start} and date <${hiveconf:end}) a,
(select * from input_table where date <${hiveconf:end}) b
where year(a.date)=year(b.date) and b.date <= a.date group by a.date;