I have a table with start_date and end_date columns and I want to remove records where both start_date and end_date are in an existing date range
source data:
start_date end_date
2019-03-18 00:00:00.000 2019-04-08 00:00:00.000
2019-04-01 00:00:00.000 2019-05-31 00:00:00.000
2019-04-03 00:00:00.000 2019-04-24 00:00:00.000
2019-04-24 00:00:00.000 2019-05-05 00:00:00.000
2019-05-06 00:00:00.000 2019-05-16 00:00:00.000
2019-05-06 00:00:00.000 2019-05-20 00:00:00.000
2019-05-06 00:00:00.000 2019-06-17 00:00:00.000
2019-05-10 00:00:00.000 2019-05-14 00:00:00.000
expected result:
start_date end_date
2019-03-18 00:00:00.000 2019-04-08 00:00:00.000
2019-04-01 00:00:00.000 2019-05-31 00:00:00.000
2019-05-06 00:00:00.000 2019-06-17 00:00:00.000
Well it's really not that hard, you just check for literally the thing you want to check for. Simply verify there aren't any records that would would contain your start date and end date between their own start and end date.
Something like this will work:
select *
from so_58088216 wrapping
where not exists (select *
from so_58088216 wrapped
where wrapping.start_date between wrapped.start_date and wrapped.end_date
and wrapping.end_date between wrapped.start_date and wrapped.end_date
-- don't check against yourself, this would be easier if had an ID or something
and wrapping.start_date != wrapped.start_date
and wrapping.end_date != wrapped.end_date)
Here's a working example
Related
I have this dataframe(below is a sample) with three columns: clientnummer, begindatum(startdate) and einddatum(enddate)
clientnummer begindatum einddatum
3 2013-09-17 00:00:00.000 2014-03-16 00:00:00.000
3 2012-11-28 00:00:00.000 2015-04-04 00:00:00.000
4 2016-02-12 00:00:00.000 none
4 2015-09-10 00:00:00.000 2016-03-09 00:00:00.000
4 2015-12-01 00:00:00.000 2016-04-18 00:00:00.000
5 2016-09-01 00:00:00.000 2016-12-11 00:00:00.000
5 2018-02-20 00:00:00.000 2018-08-31 00:00:00.000
5 2017-02-20 00:00:00.000 2018-02-19 00:00:00.000
5 2021-01-01 00:00:00.000 2021-07-25 00:00:00.000
5 2022-01-01 00:00:00.000 2022-06-30 00:00:00.000
5 2018-09-01 00:00:00.000 2020-08-31 00:00:00.000
5 2015-11-17 00:00:00.000 2017-05-31 00:00:00.000
i Want to combine all rows of the same client if it is in the same range of the rows with the same clientnumber above. Where the beginningdate is the lowest and enddate the highest. This also has to account for None values and clients with only one row.
so outcome would be something like:
clientnummer begindatum einddatum
3 2013-09-17 00:00:00.000 2015-04-04 00:00:00.000
4 2015-09-10 00:00:00.000 2016-04-18 00:00:00.000
4 2016-02-12 00:00:00.000 none
5 2015-11-17 00:00:00.000 2018-02-19 00:00:00.000
5 2018-02-20 00:00:00.000 2018-08-31 00:00:00.000
5 2018-09-01 00:00:00.000 None
5 2021-01-01 00:00:00.000 2021-07-25 00:00:00.000
5 2022-01-01 00:00:00.000 2022-06-30 00:00:00.000
But im keet getting the error: ERROR:root:Error on transforming data: single positional indexer is out-of-bounds
This is my relevant code:
df_grouped = df_indicaties.groupby('clientnummer')
#maak een result tabel
result = pd.DataFrame(columns=['clientnummer', 'begindatum', 'einddatum'])
for client, groep in df_grouped:
# Check if the group is empty or has only one row
if groep.empty or groep.shape[0] == 1:
result = pd.concat([result, pd.DataFrame({'clientnummer': client, 'begindatum': groep['begindatum'].iloc[0], 'einddatum': groep['einddatum'].iloc[0]}, index=[0])])
else:
begindatum_start = groep.iloc[0]['begindatum']
einddatum_start = groep.iloc[0]['einddatum']
for i in groep.index:
n_current = groep.iloc[i]['n_indicatie']
current_begin = groep.iloc[i]['begindatum']
current_eind = groep.iloc[i]['einddatum']
if n_current > 1 and begindatum_start is not None and einddatum_start is not None and current_begin is not None and current_eind is not None:
if current_begin >= begindatum_start and current_begin <= einddatum_start:
if einddatum_start <= current_eind:
einddatum_start = current_eind
else:
result = result.append({'clientnummer': client, 'begindatum': current_begin, 'einddatum': current_eind}, ignore_index=True)
begindatum_start = groep.iloc[i]['begindatum']
einddatum_start = groep.iloc[i]['einddatum']
result = result.append({'clientnummer': client, 'begindatum': begindatum_start, 'einddatum': einddatum_start}, ignore_index=True)
return result
How can I apply weights from a one table to another [Port] where the weight table has sparse dates?
[Port] table
utcDT UsdPnl
-----------------------------------------------
2012-03-09 00:00:00.000 -0.00581815226439161
2012-03-11 00:00:00.000 -0.000535272460588547
2012-03-12 00:00:00.000 -0.00353079778650661
2012-03-13 00:00:00.000 0.00232882689252497
2012-03-14 00:00:00.000 -0.0102592811199384
2012-03-15 00:00:00.000 0.00254451559598693
2012-03-16 00:00:00.000 0.0146718613139845
2012-03-18 00:00:00.000 0.000425144543842752
2012-03-19 00:00:00.000 -0.00388548271428044
2012-03-20 00:00:00.000 -0.00662423680184768
2012-03-21 00:00:00.000 0.00405506208635343
2012-03-22 00:00:00.000 -0.000814822806982203
2012-03-23 00:00:00.000 -0.00289523953346103
2012-03-25 00:00:00.000 0.00204150859774465
2012-03-26 00:00:00.000 -0.00641635182718787
2012-03-27 00:00:00.000 -0.00107168420738448
2012-03-28 00:00:00.000 0.00131000520696153
2012-03-29 00:00:00.000 0.0008223678402638
2012-03-30 00:00:00.000 -0.00255345945390133
2012-04-01 00:00:00.000 -0.00337792814650089
[Weights] table
utcDT Weight
--------------------------------
2012-03-09 00:00:00.000 1
2012-03-20 00:00:00.000 3
2012-03-29 00:00:00.000 7
So, I want to use the weights as if I had a full table like this below. i.e. change to new weight on first day it appears in [Weights] table:
utcDT UsedWeight
----------------------------------
2012-03-09 00:00:00.000 1
2012-03-11 00:00:00.000 1
2012-03-12 00:00:00.000 1
2012-03-13 00:00:00.000 1
2012-03-14 00:00:00.000 1
2012-03-15 00:00:00.000 1
2012-03-16 00:00:00.000 1
2012-03-18 00:00:00.000 1
2012-03-19 00:00:00.000 1
2012-03-20 00:00:00.000 3
2012-03-21 00:00:00.000 3
2012-03-22 00:00:00.000 3
2012-03-23 00:00:00.000 3
2012-03-25 00:00:00.000 3
2012-03-26 00:00:00.000 3
2012-03-27 00:00:00.000 3
2012-03-28 00:00:00.000 3
2012-03-29 00:00:00.000 7
2012-03-30 00:00:00.000 7
2012-04-01 00:00:00.000 7
You can use apply:
select p.*, w.*
from port p outer apply
(select top (1) w.*
from weights w
where w.utcDT <= p.utcDT
order by w.utcDT desc
) w;
outer apply is usually pretty efficient, if you have the right indexes. In this case, the right inex is on weights(utcDT desc).
You can use lead() in a subquery to associate the next date a weight changes to each weights record, and then join with port using an inequality condition on the dates:
select p.utcDt, w.weight
from port p
inner join (
select utcDt, weight, lead(utcDt) over(order by utcDt) lead_utcDt from weights
) w
on p.utcDt >= w.utcDt
and (w.lead_utcDt is null or p.utcDt < w.lead_utcDt)
I have the following SQL Server query problem.
If there is a row where Issue_DATE = as Maturity_Date in another row, and if both rows have the same ID and Amount USD, then none of these rows should be displayed.
Here is a simplified version of my table:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2015-12-01 00:00:00.000 5000
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
2 2015-02-02 00:00:00.000 2015-12-05 00:00:00.000 12000
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Result should be:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
I tried with self join, but I do not get right result.
Thanks in advance!
Can you try something like this? 'not exists' is the way of doing it.
select * from table t1 where not exists (select 'x' from table t2 where t1.issue_date = t2.maturity_date and t1.amount_usd=t2.amount_usd and t1.id = t2.id)
I'd think about making subquery of all the dupes and then eliminating them from the first table like so:
select t1.ID
, t1.ISSUE_DATE
, t1.MATURITY_DATE
, t1.AMOUNT_USD
FROM
t1
LEFT JOIN
(select a.ID
, a.ISSUE_DATE
, a.MATURITY_DATE
, a.AMOUNT_USD
FROM
t1 a
INNER JOIN
ti b
) dupes
on
t1.ID = dupes.ID
WHERE dupes.ID IS NULL;
I'm new in SQL and I this for my uni project. I have a table with columns Rego, FirstRego, LastRego, RegoDue.
Suppose I have the following data:
Rego FirstRego LastRego RegoDue
YGF 615 2011-04-07 00:00:00.000 2011-04-07 00:00:00.000 2012-04-07 00:00:00.000
YGF 615 2011-04-07 00:00:00.000 2012-04-07 00:00:00.000 2013-04-07 00:00:00.000
ZIR 377 2012-10-05 00:00:00.000 2012-10-05 00:00:00.000 2013-10-05 00:00:00.000
ZIR 377 2012-10-05 00:00:00.000 2013-10-05 00:00:00.000 2014-10-05 00:00:00.000
ZJT 795 2012-10-31 00:00:00.000 2012-10-31 00:00:00.000 2013-10-31 00:00:00.000
ZSU 823 2012-04-30 00:00:00.000 2012-04-30 00:00:00.000 2013-04-30 00:00:00.000
In the query output I want the Rego to be once with the highest RegoDue i.e the output should look like:
Rego FirstRego LastRego RegoDue
YGF 615 2011-04-07 00:00:00.000 2012-04-07 00:00:00.000 2013-04-07 00:00:00.000
ZIR 377 2012-10-05 00:00:00.000 2013-10-05 00:00:00.000 2014-10-05 00:00:00.000
ZJT 795 2012-10-31 00:00:00.000 2012-10-31 00:00:00.000 2013-10-31 00:00:00.000
ZSU 823 2012-04-30 00:00:00.000 2012-04-30 00:00:00.000 2013-04-30 00:00:00.000
How can I do this?
Select *
From Tbl T1
Where RegoDue =
(
Select Max(RegoDue)
From Tbl T2
Where T2.Rego = T1.Rego
)
Firstly you want to find the latest rego due date. Once you have this you can join on it to filter out the undesired results.
select
t.Rego, t.FirstRego, t.LastRego, lr.LatestRegoDue as RegoDue
from Table1 t
join
(
select
Rego, max(RegoDue) as LatestRegoDue
from Table1
group by Rego
) lr on t.Rego = lr.Rego and t.RegoDue = lr.LatestRegoDue
I have daily values in one table and monthly values in another table. I need to use the values of the monthly table and calculate them on a daily basis.
basically, monthly factor * daily factor -- for each day
thanks!
I have a table like this:
2010-12-31 00:00:00.000 28.3
2010-09-30 00:00:00.000 64.1
2010-06-30 00:00:00.000 66.15
2010-03-31 00:00:00.000 12.54
and a table like this :
2010-12-31 00:00:00.000 98.1
2010-12-30 00:00:00.000 97.61
2010-12-29 00:00:00.000 99.03
2010-12-28 00:00:00.000 97.7
2010-12-27 00:00:00.000 96.87
2010-12-23 00:00:00.000 97.44
2010-12-22 00:00:00.000 97.76
2010-12-21 00:00:00.000 96.63
2010-12-20 00:00:00.000 95.47
2010-12-17 00:00:00.000 95.2
2010-12-16 00:00:00.000 94.84
2010-12-15 00:00:00.000 94.8
2010-12-14 00:00:00.000 94.1
2010-12-13 00:00:00.000 93.88
2010-12-10 00:00:00.000 93.04
2010-12-09 00:00:00.000 91.07
2010-12-08 00:00:00.000 90.89
2010-12-07 00:00:00.000 92.72
2010-12-06 00:00:00.000 93.05
2010-12-03 00:00:00.000 91.74
2010-12-02 00:00:00.000 90.74
2010-12-01 00:00:00.000 90.25
I need to take the value for the quarter and multiply it buy all the days in the quarter by the daily value
You could try:
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND FLOOR((MONTH(dt.day)-1)/3) = FLOOR((MONTH(mt.day)-1)/3)
ORDER BY dt.day
or (as suggested by #Andriy)
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND DATEPART(QUARTER, dt.day) = DATEPART(QUARTER, mt.day)
ORDER BY dt.day