Getting values for a varying date range in Postgresql

Getting values for a varying date range in Postgresql - sql

I am trying to get a maximum value from one table using the date range from another table. I am using SQL on a postgresql database. The goal is to get the maximum value for the date range added to the table that has the start and end date (by year and area). I see this as two steps outlined below.
Step One: I am looking to use two columns that are dates in Table1 to create a range. The table has these columns:
ID (integer(11))
Date1 (varchar(10))
Date2 (varchar(10))
Year (integer)
Area (varchar)
Here is some sample data from Table1:
ID Date1 Date2 Year Area
101 8/21/2000 11/20/2000 2000 5
102 7/31/2000 10/30/2000 2000 5
103 7/10/2000 10/9/2000 2000 6
104 7/10/2000 10/9/2000 2000 6
105 7/4/2000 10/3/2000 2000 6
106 7/10/2000 10/9/2000 2000 6
107 7/31/2000 10/30/2000 2000 7
108 7/31/2000 10/30/2000 2000 7
Step Two: Pull the maximum value from Table2 based on the varying date range from Date1 to Date2 in Table1. Table2 has these values:
Date (varchar(12))
Area (varchar(11))
Value (varchar(6))
Here is some (very limited) sample data from Table2:
Date Area Value
8/2/2000 5 72.1
8/25/2000 5 68.4
9/14/2000 5 53.3
7/5/2000 6 47.9
8/1/2000 6 10.2
9/30/2000 6 11.6
8/5/2000 7 35.2
9/1/2000 7 45.4
So in the end I would like a modified Table1 that adds the Max_Value for the date range (pulled from Table2) and looks like this:
ID Date1 Date2 Year Area Max_Value
101 8/21/2000 11/20/2000 2000 5 68.4
102 7/31/2000 10/30/2000 2000 5 72.1
103 7/10/2000 10/9/2000 2000 6 11.6
104 7/10/2000 10/9/2000 2000 6 11.6
105 7/4/2000 10/3/2000 2000 6 47.9
106 7/10/2000 10/9/2000 2000 6 11.6
107 7/31/2000 10/30/2000 2000 7 45.4
108 7/31/2000 10/30/2000 2000 7 45.4
Thanks for any help in advance.

You can do this in several ways. One method would use join with an explicit aggregation. However, because you only want one column, I think a correlated subquery is simpler to code:
select t1.*,
(select max(t2.value)
from table2 t2
where t2.date between t1.date1 and t1.date2
) as maxvalue
from table1 t1;

Related

What is the best why to aggregate data for last 7,30,60.. days in SQL

Hi I have a table with date and the number of views that we had in our channel at the same day
date views
03/06/2020 5
08/06/2020 49
09/06/2020 50
10/06/2020 1
13/06/2020 1
16/06/2020 1
17/06/2020 102
23/06/2020 97
29/06/2020 98
07/07/2020 2
08/07/2020 198
12/07/2020 1
14/07/2020 168
23/07/2020 292
No we want to see in each calendar date the sum of the past 7 and 30 days
so the result will be
date sum_of_7d sum_of_30d
01/06/2020 0 0
02/06/2020 0 0
03/06/2020 5 5
04/06/2020 5 5
05/06/2020 5 5
06/06/2020 5 5
07/06/2020 5 5
08/06/2020 54 54
09/06/2020 104 104
10/06/2020 100 105
11/06/2020 100 105
12/06/2020 100 105
13/06/2020 101 106
14/06/2020 101 106
15/06/2020 52 106
16/06/2020 53 107
17/06/2020 105 209
18/06/2020 105 209
so I was wondering what is the best SQL that I can write in order to get it
I'm working on redshift and the actual table (not this example) include over 40B rows
I used to do something like this:
select dates_helper.date
, tbl1.cnt
, sum(tbl1.cnt) over (order by date rows between 7 preceding and current row ) as sum_7d
, sum(tbl1.cnt) over (order by date rows between 30 preceding and current row ) as sum_7d
from bi_db.dates_helper
left join tbl1
on tbl1.invite_date = dates_helper.date

Query to find active days per year to find revenue per user per year

I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;

You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.

result is wrong when retrieving the date

I'm working with PostgreSQL. I have two database tables,i want to get the min and max date stored in table1 daterange column which is of type character varying. table1 and table2 is mapped using sid. i want to get the max and min date range of table1 when compared with sid of table2. Please find the demo here. The result is wrong.
table1:
sid daterange
100 5/25/2017
101 1/24/2017
102 4/4/2014
103 11/12/2007
104 4/24/2012
105 01/15/2017
106 1/1/2017
107 3/11/2016
108 10/10/2001
109 1/10/2016
110 12/12/2016
111 4/24/2017
112 06/28/2015
113 5/24/2017
114 5/22/2017
table2:
sid description
100 success
101 pending
104 pending
105 success
106 success
107 success
110 success
111 pending
112 failed
113 failed
114 pending
Below is my query:
select min(daterange) as minDate,max(daterange) as maxDate from (SELECT to_date(table1.daterange, 'DD/MM/YYYY') as daterange FROM table1,table2 where
table1.sid = table2.sid) tt;
The result is as below which is wrong(mindate and maxdate displayed are wrong dates).
mindate maxdate
2013-12-07 2019-01-07
Please advice. daterange column in table1 is of type character varying.I cannot use ::date to convert to date type, because i need to use this query in my java hibernate code and the java code is not recognizing ::

You have day and month mixed up in the date format string.
Should be
to_date(table1.daterange, 'MM/DD/YYYY')

Need help on Query

Table_Name : Order_trans_detail
Order_id Order_date Order_qty Item_id order_amount
100 12-Jan-16 1 1001 20
101 13-Feb-15 4 1001 80
103 14-Mar-16 3 1001 60
104 16-Dec-15 9 1001 180
105 17-Jan-16 1 1001 20
106 18-Feb-16 4 1001 80
107 19-Feb-16 3 1001 60
108 20-Jan-15 9 1001 180
109 21-Mar-15 3 1001 60
110 21-Apr-15 3 1001 60
Need Query to identify how many orders placed in Month of Feb-2016 as to display Month Name and count.

You need to use DATENAME and YEAR function to extract Month name and Year from date and use it in Group by to get the count
select DATENAME(MONTH,Order_date ),YEAR(Order_date), Count(*)
From Order_trans_detail
Group by DATENAME(MONTH,Order_date ),YEAR
To filter the records add Where clause
Where DATENAME(MONTH,Order_date ) = 'february' and YEAR(Order_date) = 2016
To get the result in Mon-year format use this in Select
DATENAME(MONTH,Order_date )+'-'+cast(YEAR(Order_date) as char(4))
If you are using SQL Server 2012+ to concatenate month and year use CONCAT function
CONCAT(DATENAME(MONTH,Order_date ),'-',YEAR(Order_date))
Advantage of using CONCAT is that you don't need to perform explicit conversion when concatenating Int with Varchar

update corresponding value from other table

I have two tables named sales and login.My table structure is given below.Some times my program update the custid instead of userid in sales table column userid, but the logid updated correctly in sales table. I have the another table tbl_log shown below. I want to update the sales table userid based on logid using the tbl_log.
sales table
Fld_id Fld_cust_id Fld_log_id Fld_amount Fld_user_id
1 S1002 101 100 d2121
2 S1003 102 121 S1003
3 S1004 103 120 d2123
4 S1005 102 130 d2122
5 S1006 102 1234 S1006
6 S1007 102 111 d2122
7 S1008 103 21 d2123
8 S1009 103 234 S1009
9 S1010 104 31 d2124
10 S1011 104 60 S1011
Log Table
Fld_log_id Fld_user_id
101 d2121
102 d2122
103 d2123
104 d2124
Exact output
Fld_id Fld_cust_id Fld_log_id Fld_amount Fld_user_id
1 S1002 101 100 d2121
2 S1003 102 121 d2122
3 S1004 103 120 d2123
4 S1005 102 130 d2122
5 S1006 102 1234 d2122
6 S1007 102 111 d2122
7 S1008 103 21 d2123
8 S1009 103 234 d2123
9 S1010 104 31 d2124
10 S1011 104 60 d2124

To update the values in sales based on the values in the log table you do:
UPDATE sales S
SET S.Fld_user_id = (SELECT l.Fld_user_id
FROM logSales l
WHERE l.Fld_log_id = s.Fld_log_id);
sqlfiddle demo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Getting values for a varying date range in Postgresql - sql

Related

What is the best why to aggregate data for last 7,30,60.. days in SQL

Query to find active days per year to find revenue per user per year

result is wrong when retrieving the date

Need help on Query

update corresponding value from other table

Categories

Resources