Return day difference between current row timestamp and the 1st returned row - sql

For each row returned, I want to compare it's timestamp column to the very 1st row in the SELECT, and get the date difference in days. How may I do this?
SELECT date
FROM table
ORDER BY DATE ASC
Desired output
id | date | day difference
0 | 2015-05-02 00:00:00 | day 1
1 | 2015-05-05 00:00:00 | day 3
2 | 2015-05-22 00:00:00 | day 20

Courtesy of #postgresql on freenode:
% dropdb testdb; createdb testdb; psql -X testdb
psql (9.4.4)
Type "help" for help.
testdb=# create table t as select 0 id, '2015-05-02'::timestamp date;
SELECT 1
testdb=# insert into t select 1, '2015-05-05';
INSERT 0 1
testdb=# insert into t select 2, '2015-05-22';
INSERT 0 1
testdb=# select id, date, date - min(date) over () from t;
id | date | ?column?
----+---------------------+----------
0 | 2015-05-02 00:00:00 | 00:00:00
1 | 2015-05-05 00:00:00 | 3 days
2 | 2015-05-22 00:00:00 | 20 days
(3 rows)

This may be a bit late, but you could use the first_value window function. See http://www.postgresql.org/docs/9.1/static/functions-window.html
select id, date, date - first_value(date) over ()
from Table1;
fiddle

Related

Question: Joining two data sets with date conditions

I'm pretty new with SQL, and I'm struggling to figure out a seemingly simple task.
Here's the situation:
I'm working with two data sets
Data Set A, which is the most accurate but only refreshes every quarter
Data Set B, which has all the date, including the most recent data, but is overall less accurate
My goal is to combine both data sets where I would have Data Set A for all data up to the most recent quarter and Data Set B for anything after (i.e., all recent data not captured in Data Set A)
For example:
Data Set A captures anything from Q1 2020 (January to March)
Let's say we are April 15th
Data Set B captures anything from Q1 2020 to the most current date, April 15th
My goal is to use Data Set A for all data from January to March 2020 (Q1) and then Data Set B for all data from April 1 to 15
Any thoughts or advice on how to do this? Potentially a join function along with a date one?
Any help would be much appreciated.
Thanks in advance for the help.
I hope I got your question right.
I put in some sample data that might match your description: a date and an amount. To keep it simple, one row per any month. You can extract the quarter from a date, and keep that as an additional column, and then filter by that down the line.
WITH
-- some sample data: date and amount ...
indata(dt,amount) AS (
SELECT DATE '2020-01-15', 234.45
UNION ALL SELECT DATE '2020-02-15', 344.45
UNION ALL SELECT DATE '2020-03-15', 345.45
UNION ALL SELECT DATE '2020-04-15', 346.45
UNION ALL SELECT DATE '2020-05-15', 347.45
UNION ALL SELECT DATE '2020-06-15', 348.45
UNION ALL SELECT DATE '2020-07-15', 349.45
UNION ALL SELECT DATE '2020-08-15', 350.45
UNION ALL SELECT DATE '2020-09-15', 351.45
UNION ALL SELECT DATE '2020-10-15', 352.45
UNION ALL SELECT DATE '2020-11-15', 353.45
UNION ALL SELECT DATE '2020-12-15', 354.45
)
-- real query starts here ...
SELECT
EXTRACT(QUARTER FROM dt) AS the_quarter
, CAST(
TIMESTAMPADD(
QUARTER
, CAST(EXTRACT(QUARTER FROM dt) AS INTEGER)-1
, TRUNC(dt,'YEAR')
)
AS DATE
) AS qtr_start
, *
FROM indata;
-- out the_quarter | qtr_start | dt | amount
-- out -------------+------------+------------+--------
-- out 1 | 2020-01-01 | 2020-01-15 | 234.45
-- out 1 | 2020-01-01 | 2020-02-15 | 344.45
-- out 1 | 2020-01-01 | 2020-03-15 | 345.45
-- out 2 | 2020-04-01 | 2020-04-15 | 346.45
-- out 2 | 2020-04-01 | 2020-05-15 | 347.45
-- out 2 | 2020-04-01 | 2020-06-15 | 348.45
-- out 3 | 2020-07-01 | 2020-07-15 | 349.45
-- out 3 | 2020-07-01 | 2020-08-15 | 350.45
-- out 3 | 2020-07-01 | 2020-09-15 | 351.45
-- out 4 | 2020-10-01 | 2020-10-15 | 352.45
-- out 4 | 2020-10-01 | 2020-11-15 | 353.45
-- out 4 | 2020-10-01 | 2020-12-15 | 354.45
If you filter by quarter, you can group your data by that column ...

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

Filling Out & Filtering Irregular Time Series Data

Using Postgresql 9.4, I am trying to craft a query on time series log data that logs new values whenever the value updates (not on a schedule). The log can update anywhere from several times a minute to once a day.
I need the query to accomplish the following:
Filter too much data by just selecting the first entry for the timestamp range
Fill in sparse data by using the last reading for the log value. For example, if I am grouping the data by hour and there was an entry at 8am with a log value of 10. Then the next entry isn't until 11am with a log value of 15, I would want the query to return something like this:
Timestamp | Value
2015-07-01 08:00 | 10
2015-07-01 09:00 | 10
2015-07-01 10:00 | 10
2015-07-01 11:00 | 15
I have got a query that accomplishes the first of these goals:
with time_range as (
select hour
from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
select
date_trunc('hour', time_stamp) as log_hour,
log_val,
rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
from time_series
)
select
time_range.hour,
ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1;
But I can't figure out how to fill in the nulls where there is no value. I tried using the lag() feature of Postgresql's Window functions, but it didn't work when there were multiple nulls in a row.
Here's a SQLFiddle that demonstrates the issue:
http://sqlfiddle.com/#!15/f4d13/5/0
your columns are log_hour and first_vlue
with time_range as (
select hour
from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
select
date_trunc('hour', time_stamp) as log_hour,
log_val,
rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
from time_series
),
base as (
select
time_range.hour lh,
ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1)
SELECT
log_hour, log_val, value_partition, first_value(log_val) over (partition by value_partition order by log_hour)
FROM (
SELECT
date_trunc('hour', base.lh) as log_hour,
log_val,
sum(case when log_val is null then 0 else 1 end) over (order by base.lh) as value_partition
FROM base) as q
UPDATE
this is what your query return
Timestamp | Value
2015-07-01 01:00 | 10
2015-07-01 02:00 | null
2015-07-01 03:00 | null
2015-07-01 04:00 | 15
2015-07-01 05:00 | nul
2015-07-01 06:00 | 19
2015-07-01 08:00 | 13
I want this result set to be split in groups like this
2015-07-01 01:00 | 10
2015-07-01 02:00 | null
2015-07-01 03:00 | null
2015-07-01 04:00 | 15
2015-07-01 05:00 | nul
2015-07-01 06:00 | 19
2015-07-01 08:00 | 13
and to assign to every row in a group the value of first row from that group (done by last select)
In this case, a method for obtaining the grouping is to create a column which holds the number of
not null values counted until current row and split by this value. (use of sum(case))
value | sum(case)
| 10 | 1 |
| null | 1 |
| null | 1 |
| 15 | 2 | <-- new not null, increment
| nul | 2 |
| 19 | 3 | <-- new not null, increment
| 13 | 4 | <-- new not null, increment
and now I can partion by sum(case)

SQL Server get number of days from date range excluding certain days from specific date range

I am using SQL Server 2008 R2.
I have a table in database with records as shown below :
Id | Status | UserId | StatusDate | ProgramStartDate
1 | Active |1 | 2014-04-02 00:00:00.000 | 2014-03-23
2 | Inactive |1 | 2014-04-05 00:00:00.000 | NULL
3 | Pause |1 | 2014-04-07 00:00:00.000 | NULL
4 | Inactive |1 | 2014-04-10 00:00:00.000 | NULL
5 | Active |1 | 2014-04-14 00:00:00.000 | NULL
ProgramStartDate is any date that is inserted by user. While StatusDate is actual date-time whenever user have inserted/updated his Status.
Now, I want to count the number of days from ProgramStartDate (2014-03-23) to Today's date (GETDATE()) excluding the number of days in which user was in Inactive status.
Here, user is Active from ProgramStartDate 2014-03-23 to 2014-04-05 (13 Days), 2014-04-07 to 2014-04-10 (3 days), and 2014-04-14 to GETDATE() (9 days)
So total number of active days = 13 + 3 + 9 = 25 days.
The formula work is like below example :
'2014/03/23' '2014/04/05' 13
'2014/04/05' '2014/04/07' -2
'2014/04/07' '2014/04/10' 3
'2014/04/10' '2014/04/14' -4
'2014/04/14' GetDate() 9
and total = 25 days.
Is there any way to achieve this Total Number of Days by SQL query?
here is the solution for your query. try it now.
Select SUM(TDays) SumDays
From (
Select Id, Status, UserId,
Case When (Status = 'Inactive') Then 0 Else
(DATEDIFF(DAY,StatusDate,(Case When (NextDate IS NULL) Then GetDate() Else NextDate End)))
End TDays
From (
Select Id, Status, UserId, Case When (ProgramStartDate IS NOT NULL) Then ProgramStartDate Else StatusDate End StatusDate,
(Select Min(StatusDate) From StatusMast M Where M.StatusDate > S.StatusDate) NextDate
From StatusMast S
) As Stat
)As TotDay
Your output is :
SumDays
25

SQL: earliest date from set of date fields

I have a series of dates associated with a unique identifier in a table. For example:
1 | 1999-04-01 | 0000-00-00 | 0000-00-00 | 0000-00-00 | 2008-12-01 |
2 | 1999-04-06 | 2000-04-01 | 0000-00-00 | 0000-00-00 | 2010-04-03 |
3 | 1999-01-09 | 0000-00-00 | 0000-00-00 | 0000-00-00 | 2007-09-03 |
4 | 1999-01-01 | 0000-00-00 | 1997-01-01 | 0000-00-00 | 2002-01-04 |
Is there a way, to select the earliest date from the predefined list of DATE fields using a straightforward SQL command?
So the expected output would be:
1 | 1999-04-01
2 | 1999-04-06
3 | 1998-01-09
4 | 1997-01-01
I am guessing this is not possible but I wanted to ask and make sure. My current solution in mind involves putting all the dates in a temporary table and then using that to get the MIN()
thanks
Edit: The problem with using LEAST() as stated is that the new behaviour is to return NULL if any of the columns in NULL. In a series of dates like the dataset in question, any date might be NULL. I would like to obtain the earliest actual date from the set of dates.
SOLUTION: Used a combination of LEAST() and IF() in order to filter out NULL dates.
SELECT LEAST( IF(date1=0,NOW(),date1), IF(date2=0,NOW(),date2), [...] );
Lessons learnt a) COALESCE does not treat '0000-00-00' as a NULL date, b) LEAST will return '0000-00-00' as the smallest value - I would guess this is due to internal integer comparison(?)
select id, least(date_col_a, date_col_b, date_col_c) from table
upd
select id, least (
case when date_col_a = '0000-00-00' then now() + interval 100 year else date_col_a end,
case when date_col_b = '0000-00-00' then now() + interval 100 year else date_col_b end) from table
Actually you can do it like bellow or using a large case structure... or with least(date1, date2, dateN) but with that null could be the minimum value...
select rowid, min(date)
from
( select rowid, date1 from table
union all
select rowid, date2 from table
union all
select rowid, date3 from table
/* and so on */
)
group by rowid;
HTH
select
id,
least(coalesce(date1, '9999-12-31'), ....)
from
table