From hourly data to daily data using SQL GROUP BY - sql

I have a table like this:
fld_id fld_DateTime fld_Index
2017-07-01 00:00:00.000 5
2017-07-01 01:00:00.000 10
2017-07-01 02:00:00.000 15
2017-07-01 03:00:00.000 40
...........
...........
2017-07-01 23:00:00.000 70
2017-07-02 00:00:00.000 110
2017-07-02 01:00:00.000 140
2017-07-02 02:00:00.000 190
...............
...............
2017-07-02 23:00:00.000 190
What I am trying to do is to group them and count sum of fld_index per day like so:
fld_id fld_DateTime SUM
2017-07-01 190
2017-07-02 400
Here's what I've tried:
SELECT fld_dateTime, SUM(fld_Index) AS Sum
FROM tbl_data
WHERE
AND fld_ConsDateTime BETWEEN '2017-07-01' AND '2017-08-02'
GROUP BY fld_dateTime
It calculates the sum but still in hourly format. How to achieve the daily format like above example?
UPDATE Monthly Part Output
2017 8 30630800.0000
2017 7 589076201.1800

Simply cast as DATE
SELECT CAST(fld_dateTime AS DATE) AS fld_Date, SUM(fld_Index) AS Sum
FROM tbl_data
WHERE
fld_ConsDateTime BETWEEN '2017-07-01' AND '2017-08-02'
GROUP BY CAST(fld_dateTime AS DATE);
EDIT:
What about month? Is it the same logic?
It depends on your RDBMS, but in SQL Server you could use:
SELECT YEAR(fld_dateTime), MONTH(fld_dateTime), SUM(fld_Index) AS Sum
FROM tbl_data
GROUP BY YEAR(fld_dateTime), MONTH(fld_dateTime);
It is important to add year part to avoid grouping records from different years.

You need to extract the date. In SQL Server, you would do:
SELECT CAST(fld_dateTime as DATE) as fld_date, SUM(fld_Index) AS Sum
FROM tbl_data
WHERE fld_ConsDateTime >= '2017-07-01' AND
fld_ConsDateTime < '2017-08-03'
GROUP BY CAST(fld_dateTime as DATE)
ORDER BY fld_date
In MySQL, the above would work, but I would do:
SELECT DATE(fld_dateTime) as fld_date, SUM(fld_Index) AS Sum
FROM tbl_data
WHERE fld_ConsDateTime >= '2017-07-01' AND
fld_ConsDateTime < '2017-08-03'
GROUP BY DATE(fld_dateTime)
ORDER BY fld_date;
In both cases, you should change the WHERE clause. Your version would keep flights where the date/time is exactly midnight on 2017-08-02. Using >= and < is more accurate -- taking all date/times on one day but not the next.

Related

how to aggregate one record multiple times based on condition

I have a bunch of records in the table below.
product_id produced_date expired_date
123 2010-02-01 2012-05-31
234 2013-03-01 2014-08-04
345 2012-05-01 2018-02-25
... ... ...
I want the output to display how many unexpired products currently we have at the monthly level. (Say, if a product expires on August 04, we still count it in August stock)
Month n_products
2010-02-01 10
2010-03-01 12
...
2022-07-01 25
2022-08-01 15
How should I do this in Presto or Hive? Thank you!
You can use below SQL.
Here we are using case when to check if a product is expired or not(produced_date >= expired_date ), if its expired, we are summing it to get count of product that has been expired. And then group that data over expiry month.
select
TRUNC(expired_date, 'MM') expired_month,
SUM( case when produced_date >= expired_date then 1 else 0 end) n_products
from mytable
group by 1
We can use unnest and sequence functions to create a derived table; Joining our table with this derived table, should give us the desired result.
Select m.month,count(product_id) as n_products
(Select
(select x
from unnest(sequence(Min(month(produced_date)), Max(month(expired_date)), Interval '1' month)) t(x)
) as month
from table) m
left join table t on m.month >= t.produced_date and m.month <= t.expired_date
group by 1
order by 1

Cumulative sum() every 3 days SQL

I have a table like this
date amount
2020-02-01 5
2020-02-02 2
2020-02-03 10
2020-02-04 2
2020-02-06 3
2020-02-07 1
And I need sum() every 3 days as below:
date amount sum
2020-02-01 5 5
2020-02-02 2 7
2020-02-03 10 17
2020-02-04 2 2
2020-02-06 3 5
2020-02-07 1 1
...
So when a difference between days is 3, the summation should start over. Some days may not be in the table.
I tried to do this with window function like sum(amount) over (order by date) but I have no idea how to set a fixed number of days and get the date difference in cumulative sum like this. Is it possible in any SQL?
In MS Sql Server
select t.[date], t.Amount, sum(t.Amount) over(partition by datediff(d, '2020-02-01', t.[date])/3 order by t.[date]) cum
from tbl t
'2020-02-01' is a starting date you want.
Disclaimer
The following solution was written based on a Preview version of SQL Server 2022, and thus may not reflect the final release.
For a bit of fun, if you had access to SQL Server 2022 (which went into preview yesterday) you could use DATE_BUCKET to "round" the date in the PARTITION BY to 3 days, using the minimum date as the starting date.
DECLARE #StartDate date,
#EndDate date;
SELECT #StartDate = MIN(date),
#EndDate = MAX(date)
FROM dbo.YourTable;
SELECT date,
SUM(amount) OVER (PARTITION BY DATE_BUCKET(DAY,3,date,#StartDate) ORDER BY date) AS Amount
FROM dbo.YourTable
WHERE date >= #StartDate
AND date <= #EndDate; --Incase this would be parametrised
Image of results as expected, as Fiddles of 2022 don't exist:

SQL select dates from multi rows and datediff total hours

Hi have records entered into a table, I want to get the hours worked between rows.
id memberid dayname datesigned orderinout
310 987654321 Friday 2021-08-13 09:22:42 1
311 987654321 Friday 2021-08-13 10:15:50 2
312 987654321 Friday 2021-08-13 10:20:00 3
313 987654321 Friday 2021-08-13 12:36:15 4
314 987654321 Friday 2021-08-13 13:01:55 5
315 987654321 Friday 2021-08-13 18:55:41 6
Ideally I would like to work select a member and get the date signed, easy. then do a datediff to work out the hh:mm:ss difference. all good with 2 dates but multi on the same day? little stuck.
SELECT TIMEDIFF(MAX(datesigned),MIN(datesigned)) AS HoursIn
WHERE memberid = '987654321'
AND dayname = 'Friday'
when the date is saved, it will assign a number, first record will be 1 and so on for the member and the date.
so need to get the results for 1+2 then 3+4, 5+6 so on. might even be an odd one.
Any suggestions as im totally lost.
Use the LAG function to achieve the next record. Arrange the columns using orderinout and access the next row with the LAG function. 1 and 2 , 3 and 4 and .............
The TIMEDIFF function exists in mysql, and assuming your database management system is mysql, the following code.
in mysql
SELECT
id,
memberid,
dayname,
datesigned,
orderinout,
TIMEDIFF(datesigned,lag(datesigned,1) over(partition by memberid order by orderinout)) as HoursIn
from t
WHERE memberid = '987654321'
AND dayname = 'Friday'
demo in db<>fiddle
in sql-server
SELECT
id,
memberid,
dayname,
datesigned,
orderinout,
CONVERT (TIME, datesigned - lag(datesigned,1) over(partition by memberid order by orderinout)) as HoursIn
from t
WHERE memberid = '987654321'
AND dayname = 'Friday'
demo in db<>fiddle
If you want to calculate for all members and every day, use the LAG function as follows.
lag(datesigned,1) over(partition by memberid,dayname order by orderinout)
full query
SELECT
id,
memberid,
dayname,
datesigned,
orderinout,
TIMEDIFF(datesigned,lag(datesigned,1) over(partition by memberid,dayname order by orderinout)) as HoursIn
from t

Select Date if between any row's start and end date

I'm creating a list of months using the method shown here: Months between two dates
Once I have those dates I want to keep only those where they are between any row's start and end date for a particular client.
So the table is:
client start_date end_date
1 2014-06-01 2016-02-29
1 2016-03-01 2016-12-31
1 2017-04-01 NULL
Where NULL represents still active without a future end_date set.
So what I would like to get is (I'm using EOMONTH for each month):
2014-06-30
2014-07-31
... ect ...
2016-11-30
2016-12-31
2017-04-30
2017-05-31
... ect ...
So the months between December 2016 and April 2017 aren't there. There could be any number of rows for each client. They may be without gaps and they may be with gaps, as in the case above.
So I feel a bit silly now! It's quite simple:
WITH dates AS (
SELECT EOMONTH(DATEADD(MONTH, x.number, '2016-01-01')) calendar_month
FROM master.dbo.spt_values x
WHERE x.type = 'P'
AND x.number <= DATEDIFF(MONTH, '2016-01-01', '2018-12-31'))
SELECT
dates.calendar_month
FROM
clients
LEFT JOIN dates ON dates.calendar_month BETWEEN EOMONTH(clients.start_date) AND EOMONTH(clients.end_date) OR
(dates.calendar_month >= clients.start_date AND clients.end_date is NULL)

Available date range by year

I have a table that holds date
ID Dates
1 2014-01-20
2 2014-01-21
...
100 2014-05-20
101 2014-06-01 --Missing a few dates
102 2014-06-02
...
201 2014-10-31
202 2014-12-05 --Missing a few dates
...
349 2015-04-29
350 2015-04-30
I want to find the available date range by year between a from and to date, for example
#StartDate: 2014/04/06
#EndDate: 2015/04/05
The expected result is
Year StartRange EndRange
2014 2014-04-06 2014-05-20
2014 2014-06-01 2014-10-31
2014 2014-12-05 2014-12-31
2015 2015-01-01 2015-04-05
I am trying to find the available date ranges from the Dates column. Lets take the first row in the expected result 2014-04-06 to 2014-05-20 which says I have continuous dates from the 6th April to 20th May then there is a break (I do not have dates from 2014-05-21 to 2014-05-30)
The dates 2014-04-06 (in the first row) and 2015-04-05 (in the last row) are included in the expected result as it is the start and end date (parameter to the query) and I have those dates in the [Dates] column of my table
Thanks
This is an "Islands and Gaps" type of problem. Here is one way to do it:
;
WITH
cteDays As (SELECT *, DATEDIFF(dd,0,Dates) As DayNo From YourTable)
, cteDifs As
(
SELECT *,
DayNo-(ROW_NUMBER() OVER(ORDER BY Dates, ID)) As Dif
FROM cteDays
)
SELECT
Year(Dates) As [Year],
MIN(Dates) As StartRange,
MAX(Dates) As EndRange
FROM cteDifs
GROUP BY Year(Dates), Dif
ORDER BY [Year], StartRange