Group by hourly interval - sql

I'm new to SQL and I have problems when trying to make an hourly report on a database that supports HiveSQL.
Here's my dataset
|NAME| CHECKIN_HOUR |CHECKOUT_HOUR|
|----|--------------|-------------|
| A | 00 | 00 |
| B | 00 | 01 |
| C | 00 | 02 |
| D | 00 | null |
| E | 01 | 02 |
| F | 01 | null |
And I would like to get an hourly summary report that looks like this:
|TIME| CHECKIN_NUMBER |CHECKOUT_NUMBER|STAY_NUMBER|
|----|----------------|---------------|-----------|
| 00 | 4 | 1 | 3 |
| 01 | 2 | 1 | 4 |
| 02 | 0 | 2 | 2 |
stay_number means counting the number of people that haven't checked out by the end of that hour, e.g 2 at the last row means that by the end of 2am, there're two people (D and F) haven't checked out yet. So basically I'm trying to get a summarize check-in, check-out and stay report for each hour.
I've no idea how to compute an hourly interval table since simply grouping by check_in or check_out hour doesn't get the expected result. All the date field is originally in Unix timestamp data type, so feel free to use date functions on it.
Any instructions and help would be greatly appreciated, thanks!

Here is one method that unpivots the data and uses cumulative sums:
select hh,
sum(ins) as checkins, sum(outs) as checkouts,
sum(sum(ins)) over (order by hh) - sum(sum(outs)) over (order by hh)
from ((select checkin_hour as hh, count(*) as ins, 0 as outs
from t
group by checkin_hour
) union all
(select checkout_hour, 0 as ins, count(*) as outs
from t
where checkout_hour is not null
group by checkout_hour
)
) c
group by hh
order by hh;
The idea is to count the number of checks in and check outs in each hour and then accumulate the totals for each hour. The difference is the number of says.

Related

Filter on date relative to today but the dates are in separate fields

I have a table where the date parts are in separate fields and I am struggling to put a filter on it (pulling all the data is so much that it basically times out).
How can I write a sql query to pull the data for only the past 7 days?
| eventinfo | entity | year | month | day |
|------------|-------------------------|------|-------|-----|
| source=abc | abc=030,abd=203219,.... | 2022 | 08 | 07 |
| source=abc | abc=030,abd=203219,.... | 2022 | 08 | 05 |
| source=abc | abc=030,abd=203219,.... | 2022 | 07 | 33 |
Many thanks in advance.
You can use concatenation on your columns, convert them to date and then apply the filter.
-- Oracle database
select *
from event
where to_date( year||'-'||month||'-'||day,'YYYY-MM-DD') >= trunc(sysdate) - 7;

SQL Query to select specific records from dates

I have a table that has like a list of codes and the start and end date that the code was active. I want to select the most recent active codes. Which is simple enough but the part I'm getting stuck with is that you can have the same code with overlapping dates which means both are active and I'd need to select all these records. Or you can have the same codes but the dates follow on which means the previous one is no longer active and I want to ignore this.
See example of table below:
In the table below I'd essentially need to say okay this table if you have two of the same codes but the dates follow on then take most recent, if the dates overlap then select...
ID | Code | Start Date | End Date | I need to select
01 | A110 | 15/01/21 | NULL | select
02 | A110 | 14/05/19 | NULL | select
03 | A110 | 10/10/18 | 13/05/19 | Ignore
03 | B200 | 15/01/21 | NULL | select
04 | B200 | 10/12/20 | 14/01/21 | Ignore
05 | C600 | 15/01/21 | NULL | Select
to me it looks like
SELECT *
FROM TABLE
WHERE END_DATE IS NULL
but maybe i got the question wrong, it really isnt clear at the moment

Work out variance of groups of rows in SQL

I'm looking to work out a variance value per month for a table of data, with each month containing three rows to be accounted for. I'm struggling to think of a way of doing this without 'looping' which, as far as I'm aware, isn't supported in SQL.
Here is an example table of what I mean:
+======================+=======+
| timestamp | value |
+======================+=======+
| 2020-01-04T10:58:24Z | 10 | # January (Sum of vals = 110)
+----------------------+-------+
| 2020-01-14T10:58:21Z | 68 |
+----------------------+-------+
| 2020-01-29T10:58:12Z | 32 |
+----------------------+-------+
| 2020-02-04T10:58:13Z | 19 | # February (Sum of vals = 112)
+----------------------+-------+
| 2020-02-14T10:58:19Z | 5 |
+----------------------+-------+
| 2020-02-24T10:58:11Z | 88 |
+----------------------+-------+
| 2020-03-04T10:58:11Z | 72 | # March (Sum of vals = 184)
+----------------------+-------+
| 2020-03-15T10:58:10Z | 90 |
+----------------------+-------+
| 2020-03-29T10:58:16Z | 22 |
+----------------------+-------+
| .... | .... |
+======================+=======+
I need to build a query which can combine all 3 values from each item in each month, then work out the variation of the combined value across months. Hopefully this makes sense? So in this case, I would need to work out the variance betweeen January (110), February (112) and March (184).
Does anyone have any suggestions as to how I could accomplish this? I'm using PostgreSQL, but need a vanilla SQL solution :/
Thanks!
Are you looking for aggregation by month and then a variance calculation? If so:
select variance(sum_vals)
from (select date_trunc('month', timestamp) as mon, sum(val) as sum_vals
from t
group by mon
) t;

Oracle SQL - Pivoting multiple rows into single column

To all the Oracle SQL pros, please, I'm trying to use a pivot to convert multiple rows into a single column. I tried looking through past posting but could not find anything related to what I want to do, or maybe I just don't know how to search for it properly. I want to take the rows HAND, PECO, CHEP and make that into 1 column called EXTRA, while leaving STORAGE and RENEWAL as their own separate columns. I'm stumped, any help would be greatly appreciated. Thanks in advance!
My current sql is,
select * from (
select company, customer, rev_code, sum(rev_amt) amt
from revenue_anal
where company='01'
and rev_date between to_date('20\01\01','YY\MM\DD') and sysdate
group by company, customer, rev_code
order by 2,1 )
pivot (sum(amt) for rev_code in ('HAND', 'PECO', 'CHEP', 'STORAGE', 'RENEWAL'))
Query,
COMPANY | CUSTOMER | REV_CODE | REV_AMT
---------------------------------------------------
01 | 101962 | HAND | 253.377
01 | 101962 | PECO | 60
01 | 101962 | CHEP | 1632
01 | 101962 | STORAGE | 2700
01 | 101962 | RENEWAL | 60
---------------------------------------------------
Output with my current query,
COMPANY | CUSTOMER | HAND | PECO | CHEP | STORAGE | RENEWAL
--------------------------------------------------------------------------
01 | 101962 | 253.377 | 60 | 1632 | 2700 | 60
Trying to get the output to show as
COMPANY | CUSTOMER | EXTRA | STORAGE | RENEWAL
------------------------------------------------------------------
01 | 101962 | 1945.377 | 2700 | 60
Thank you for taking the time to assist.
Instead of select * from ( at the very beginning of your query, write
select company, customer, hand + peco + chep as extra, storage, renewal
from (
.........
If you expect that any of HAND, PECO or CHEP may be NULL, wrap each of them individually within NVL(...., 0) (if, in fact, NULL is to be interpreted as zero; otherwise, leave as is, and the result will be NULL if at least one of the terms is NULL).

Finding the max value between the last 22 months or between any 10 hour window within the last 22 months in Microsoft SQL Server

I'd like to find the max value within the last 22 months OR the max value within any 10 hour window of those last 22 months.
I'm doing this in Microsoft SQL Server.
Essentially, I'm looking to retrieve a value that has sustained a high for at least 10 hours before I consider it my max and if it is larger than the max of the last 22 months, it would be the new max, otherwise I would use the max of the last 22 months.
Here's what I think it should look like pseudo code:
if (time > 10 hours) AND (value = max) OR (18 > time > 0) AND (value = max)
then output = value
The SQL code that I've tried:
SELECT TOP 90 PERCENT
DATEADD(s,time,'19700101') as time_22month
,GETDATE() as date_22month
,b.tagname as tag_22month
,value as value_22month
,maximum as max_22month
FROM
db..hour a
INNER JOIN
db..tag b
ON
a.tagid = b.tagid
WHERE
b.tagname like '%T500.1234%'
AND
(GETDATE() - DATEADD(s, time, '19700101') < 670)
ORDER BY
max_22month DESC
SELECT
DATEADD(s,time,'19700101') as time_10hour
,GETDATE() as date_10hour
,b.tagname as tag_10hour
,value as value_10hour
,maximum as max_10hour
FROM
db..hour a
INNER JOIN
db..tag b
ON
a.tagid = b.tagid
WHERE
b.tagname like '%T500.1234%'
AND
(GETDATE() - DATEADD(s, time, '19700101') < 0.42)
ORDER BY
max_10hour DESC
Output right now is the following:
+-------------------------+----------------------------+-------------+----------------+---------------+
| time_22month | date_22month | tag_22month | value_22month | max_22month |
+-------------------------+----------------------------+-------------+----------------+---------------+
| 2016-03-08 06:00:00.000 | 2017-04-10 10:07:57:32.783 | T500.1234 | 1567.88546416 | 2445.56419848 |
| 2016-03-08 07:00:00.000 | 2017-04-10 10:07:57:32.783 | T500.1234 | 1499.88546416 | 2434.47673719 |
+-------------------------+----------------------------+-------------+----------------+---------------+
+-------------------------+----------------------------+------------+---------------+---------------+
| time_10hour | date_10hour | tag_10hour | value_10hour | max_10hour |
+-------------------------+----------------------------+------------+---------------+---------------+
| 2017-04-10 00:00:00.000 | 2017-04-10 10:07:57:32.783 | T500.1234 | 8763.42572454 | 8759.64548912 |
| 2017-04-10 01:00:00.000 | 2017-04-10 10:07:57:32.783 | T500.1234 | 8001.64578943 | 8001.64578943 |
+-------------------------+----------------------------+------------+---------------+---------------+
So I'm a little confused on how I should be comparing these max values, especially when the 10 hour window needs to be rolling (incrementing every hour). Any help is appreciated.
The output should be the greater value of the two parameters, so perhaps a new table column would be the output along with two columns that precede it that show the highest 22month value and the highest 10 hour window value.
+--------+-------------+------------+------+
| Month | 22Month_Max | 10Hour_Max | Max |
+--------+-------------+------------+------+
| July | 5478 | 5999 | 5999 |
| August | 4991 | 3523 | 4991 |
+--------+-------------+------------+------+