Postgres SQL Join on Nearest less than quarter end - sql

I have table 1
ID | public_date
1 | 1992-06-03
2 | 2000-12-15
Table 2 is a series of the quarter end dates in a range
Date
1995-12-31
1996-03-31
..
..
2000-12-31
I would like to have the result table as
ID | date | public_date
1 | 1995-12-31 | 1992-06-03
1 | 1996-03-31 | 1992-06-03
1 | 1996-06-30 | 1992-06-03
...
...
1 | 2000-12-31 | 2000-12-15
Basically, assign the public date to the nearest quarter end date. Currently, I have this query
SELECT DISTINCT ON (x."date")
x."date", r.public_date
FROM quarter_end_series as x
LEFT JOIN public_time r ON r.public_date <= x."date"
where x.date >= '1995-12-31 00:00:00'
ORDER BY x."date", r.outlookdate desc;
But this query took 4 hours, any way to do it more efficiently?

Try a subquery:
select pt.*,
(select qes.date
from quarter_end_series qes
where qes.date <= pt.date
order by qes.date desc
) as quarter_end_date
from public_time pt;
Include an index on quarter_end_series(date).
This saves the sorting on a large amount of data -- which should make this more performant.

I guess your quarters are fixed for each year. Like:
1995-12-31
1996-03-31
1996-06-30
1996-09-31
1996-12-31
.... and so on
If it is then just find closest date from fixed quarter dates.
If quarter_end_series is not same dates for each year. You can try subquery instead of join. Like below:
SELECT DISTINCT ON ("date")
"date", (SELECT r.public_date FROM public_time r ORDER BY abs(date_diff(x."date",r.public_date)) ASC limit 1) as public_date
FROM quarter_end_series as x
where x.date >= '1995-12-31 00:00:00'
ORDER BY x."date";

Related

Add rows between two dates Presto

I have a table that has 3 columns- start, end and emp_num. I want to generate a new table which has all dates between these dates for every employee. Need to use Presto.
I refered this link - inserting dates into a table between a start and end date in Presto
Tried using unnest function by creating sequence but , I don't know how do I create sequence by pulling dates from two columns in another table.
select unnest(seq) as t(days)
from (select sequence(start, end, interval '1' day) as seq
from table1)
Here's table and expected format
Table 1:
start | end | emp_num
2018/01/01 | 2018/01/05 | 1
2019/02/01 | 2019/02/05 | 2
Expected:
start | emp_num
2018/01/01 | 1
2018/01/02 | 1
2018/01/03 | 1
2018/01/04 | 1
2018/01/05 | 1
2019/02/01 | 2
2019/01/02 | 2
2019/02/03 | 2
2019/02/04 | 2
2019/02/05 | 2
Here is a query that might get the job done for your use case.
The logic is to use Presto sequence() function to generate a wide date range (since year 2000 to end of 2018, you can adapt that as needed), that can be joined with the table to generate the output.
select dt.x, emp_num
from
( select x from unnest(sequence(date '2000-01-01', date '2018-01-31')) t(x) ) dt
inner join table1 ta on dt.x >= ta.start and dt.x <= ta.end
However, as commented JNevill, it would be more efficient to create a calendar table rather than generating it on the fly every time the query runs.
It should be a simple as :
create table calendar as
select x from unnest(sequence(date '1970-01-01', date '2099-01-01')) t(x);
And then your query would become :
select dt.x, emp_num
from
calendar dt
inner join table1 ta on dt.x >= ta.start and dt.x <= ta.end
PS : due to the lack of DB Fiddles for Presto in the wild, I could not test the queries (#PiotrFindeisen - if you happen to read this - a Presto fiddle would be nice to have !).

Calculating cumulative sum with date filtering in PostgreSQL

I have table users with the following values:
id | created_at
-------+---------------------
20127 | 2015-01-31 04:23:46
21468 | 2015-02-04 07:50:34
21571 | 2015-02-04 08:23:50
20730 | 2015-03-12 10:20:16
19955 | 2015-03-30 07:44:35
20148 | 2015-04-17 13:03:26
21552 | 2015-05-07 19:00:00
20145 | 2015-06-02 03:12:46
21467 | 2015-06-03 13:21:51
21074 | 2015-07-03 19:00:00
I want to:
find the cumulative sum for number of users over time (return count of users for every day in the date range, not just for the days that exist in the database)
be able to filter that sum by date, so if I put the date that is after some row, that row should be included in the cumulative sum (everything before the range specified should be included in the first sum, it shouldn't start counting from 0 at the beginning of the range specified)
return results grouped by each day in epoch format
I'm trying to achieve this with the following SQL:
SELECT extract(epoch from created_at)::bigint,
sum(count(id)::integer) OVER (ORDER BY created_at)
FROM data_users
WHERE created_at IS NOT NULL
GROUP BY created_at
But it's not working as expected since I can't add filtering by date here, without excluding records from the cumulative sum. Also it doesn't take into account days that have been missed (those for which the users don't exist).
Any help greatly appreciated.
As far as I understand your question a simple query with GROUP BY should be enough. You can use a left outer join with GENERATE_SERIES() to get all dates in the range. If you have the start and end date of the range, you can use this:
SELECT EXTRACT(EPOCH FROM d)::BIGINT, COALESCE(COUNT(u.id), 0)
FROM GENERATE_SERIES(start, end, '1 DAY'::INTERVAL) d
LEFT OUTER JOIN data_users u ON u.created_at::DATE = d
GROUP BY 1 ORDER BY 1
You can determine start and end from your table, too:
SELECT EXTRACT(EPOCH FROM d.date)::BIGINT, COALESCE(COUNT(u.id), 0)
FROM
(SELECT GENERATE_SERIES(MIN(created_at)::DATE, MAX(created_at)::DATE, '1 DAY'::INTERVAL) AS date
FROM data_users) d
LEFT OUTER JOIN data_users u ON u.created_at::DATE = d.date::DATE
GROUP BY 1 ORDER BY 1;
This returns:
date_part | coalesce
------------+----------
1422662400 | 1
1422748800 | 0
1422835200 | 0
1422921600 | 0
1423008000 | 2
1423094400 | 0
1423180800 | 0
...
1435536000 | 0
1435622400 | 0
1435708800 | 0
1435795200 | 0
1435881600 | 1
With this query you can get the cumulative sum for the rows before a start date:
SELECT EXTRACT(EPOCH FROM GREATEST(d.date, start))::BIGINT, COALESCE(COUNT(u.id), 0)
FROM
(SELECT GENERATE_SERIES(MIN(created_at)::DATE, MAX(created_at)::DATE, '1 DAY'::INTERVAL) AS date
FROM data_users) d
LEFT OUTER JOIN data_users u ON u.created_at::DATE = d.date::DATE
GROUP BY 1 ORDER BY 1;

How do I apply a function to each subgroup of a table in SQL

I want to find the minimum value of a column in a certain date range of a table.
so lets say I have a table like the following,
Date | Value
---------------
01-26 | 2
01-26 | 1
01-27 | 2
01-27 | 4
01-28 | 3
01-28 | 5
How can I apply the MIN() function to the subgroup of the Value column so that the result might be
Date | MIN(Value)
---------------
01-26 | 1
01-27 | 2
01-28 | 3
I thought about GROUP BY .. or such but couldn't figure out how to get the results into a table.
Using UNION and JOIN isn't quite scalable because the query could be using a date range of a month
Group by should work:
Select date, min( value )
From table1
Group by date
Maybe too simple, but seems like this would work
Select Min(col1), datecol from yourtable group by datecol;
HTH

How to get sum of one day and sum of last three days in single query?

Suppose I have a statistical table like this:
date | stats
-------------
10/1 | 2
10/1 | 3
10/1 | 2
10/2 | 1
10/3 | 3
10/3 | 2
10/4 | 1
10/4 | 1
What I want is three columns:
Date
sum(stats) of Date
sum(stats) of last three days before Date
I know I can use window function to handle the 2nd column, but I cannot handle 2nd and 3rd at the same time.
What should I do to archive this?
Thanks!
You can use aggregation and window functions:
select date, sum(stats) as day_stats,
sum(sum(stats)) over (order by date rows between 3 preceding and 1 preceding) as day_stats_3
from t
group by date
order by date;
You can use a correlated query:
SELECT s.date,sum(s.stats) as today_sum,
(SELECT sum(t.stats) FROM YourTable t
where t.date between s.date - 2 and s.date) as sum_3days
FROM YourTable s
GROUP BY s.date

SQL to find the date when the price last changed

Input:
Date Price
12/27 5
12/21 5
12/20 4
12/19 4
12/15 5
Required Output:
The earliest date when the price was set in comparison to the current price.
For e.g., price has been 5 since 12/21.
The answer cannot be 12/15 as we are interested in finding the earliest date where the price was the same as the current price without changing in value(on 12/20, the price has been changed to 4)
This should be about right. You didn't provide table structures or names, so...
DECLARE #CurrentPrice MONEY
SELECT TOP 1 #CurrentPrice=Price FROM Table ORDER BY Date DESC
SELECT MIN(Date) FROM Table WHERE Price=#CurrentPrice AND Date>(
SELECT MAX(Date) FROM Table WHERE Price<>#CurrentPrice
)
In one query:
SELECT MIN(Date)
FROM Table
WHERE Date >
( SELECT MAX(Date)
FROM Table
WHERE Price <>
( SELECT TOP 1 Price
FROM Table
ORDER BY Date DESC
)
)
This question kind of makes no sense so im not 100% sure what you are after.
create four columns, old_price, new_price, old_date, new_date.
! if old_price === new_price, simply print the old_date.
What database server are you using? If it was Oracle, I would use their windowing function. Anyway, here is a quick version that works in mysql:
Here is the sample data:
+------------+------------+---------------+
| date | product_id | price_on_date |
+------------+------------+---------------+
| 2011-01-01 | 1 | 5 |
| 2011-01-03 | 1 | 4 |
| 2011-01-05 | 1 | 6 |
+------------+------------+---------------+
Here is the query (it only works if you have 1 product - will have to add a "and product_id = ..." condition on the where clause if otherwise).
SELECT p.date as last_price_change_date
FROM test.prices p
left join test.prices p2 on p.product_id = p2.product_id and p.date < p2.date
where p.price_on_date - p2.price_on_date <> 0
order by p.date desc
limit 1
In this case, it will return "2011-01-03".
Not a perfect solution, but I believe it works. Have not tested on a larger dataset, though.
Make sure to create indexes on date and product_id, as it will otherwise bring your database server to its knees and beg for mercy.
Bernardo.