Can't fin fix for: "Error on transforming data: single positional indexer is out-of-bounds" in for loops - pandas

I have this dataframe(below is a sample) with three columns: clientnummer, begindatum(startdate) and einddatum(enddate)
clientnummer begindatum einddatum
3 2013-09-17 00:00:00.000 2014-03-16 00:00:00.000
3 2012-11-28 00:00:00.000 2015-04-04 00:00:00.000
4 2016-02-12 00:00:00.000 none
4 2015-09-10 00:00:00.000 2016-03-09 00:00:00.000
4 2015-12-01 00:00:00.000 2016-04-18 00:00:00.000
5 2016-09-01 00:00:00.000 2016-12-11 00:00:00.000
5 2018-02-20 00:00:00.000 2018-08-31 00:00:00.000
5 2017-02-20 00:00:00.000 2018-02-19 00:00:00.000
5 2021-01-01 00:00:00.000 2021-07-25 00:00:00.000
5 2022-01-01 00:00:00.000 2022-06-30 00:00:00.000
5 2018-09-01 00:00:00.000 2020-08-31 00:00:00.000
5 2015-11-17 00:00:00.000 2017-05-31 00:00:00.000
i Want to combine all rows of the same client if it is in the same range of the rows with the same clientnumber above. Where the beginningdate is the lowest and enddate the highest. This also has to account for None values and clients with only one row.
so outcome would be something like:
clientnummer begindatum einddatum
3 2013-09-17 00:00:00.000 2015-04-04 00:00:00.000
4 2015-09-10 00:00:00.000 2016-04-18 00:00:00.000
4 2016-02-12 00:00:00.000 none
5 2015-11-17 00:00:00.000 2018-02-19 00:00:00.000
5 2018-02-20 00:00:00.000 2018-08-31 00:00:00.000
5 2018-09-01 00:00:00.000 None
5 2021-01-01 00:00:00.000 2021-07-25 00:00:00.000
5 2022-01-01 00:00:00.000 2022-06-30 00:00:00.000
But im keet getting the error: ERROR:root:Error on transforming data: single positional indexer is out-of-bounds
This is my relevant code:
df_grouped = df_indicaties.groupby('clientnummer')
#maak een result tabel
result = pd.DataFrame(columns=['clientnummer', 'begindatum', 'einddatum'])
for client, groep in df_grouped:
# Check if the group is empty or has only one row
if groep.empty or groep.shape[0] == 1:
result = pd.concat([result, pd.DataFrame({'clientnummer': client, 'begindatum': groep['begindatum'].iloc[0], 'einddatum': groep['einddatum'].iloc[0]}, index=[0])])
else:
begindatum_start = groep.iloc[0]['begindatum']
einddatum_start = groep.iloc[0]['einddatum']
for i in groep.index:
n_current = groep.iloc[i]['n_indicatie']
current_begin = groep.iloc[i]['begindatum']
current_eind = groep.iloc[i]['einddatum']
if n_current > 1 and begindatum_start is not None and einddatum_start is not None and current_begin is not None and current_eind is not None:
if current_begin >= begindatum_start and current_begin <= einddatum_start:
if einddatum_start <= current_eind:
einddatum_start = current_eind
else:
result = result.append({'clientnummer': client, 'begindatum': current_begin, 'einddatum': current_eind}, ignore_index=True)
begindatum_start = groep.iloc[i]['begindatum']
einddatum_start = groep.iloc[i]['einddatum']
result = result.append({'clientnummer': client, 'begindatum': begindatum_start, 'einddatum': einddatum_start}, ignore_index=True)
return result

Related

SQL Server query Group By Trimester

I'm finding a way to group SQL query by trimester. I have found a way to do it using MySQL on this link.
This is what I'm expecting:
Range Start Range End Count
----------- ---------- -----
2013-09-01 2013-11-26 87
2013-06-01 2013-08-31 92
2013-03-01 2013-05-31 92
2012-12-01 2013-02-28 90
2012-09-01 2012-11-30 91
This is what I have tried:
SELECT MIN(start_date) AS Range_Start, MAX(start_date) AS Range_End, COUNT(ID) AS Total
FROM [dbo].[table]
GROUP BY FLOOR(DATEDIFF(MONTH, DATEADD(DAY, -DAY(start_date)+1, start_date), DATEADD(DAY, -DAY(start_date)+1,getdate())) /3)
ORDER BY 1 ASC
This is what I get:
Range Start Range End Count
----------- ---------- -----
1900-01-01 00:00:00.000 1900-01-01 00:00:00.000 8
1952-01-01 00:00:00.000 1952-01-01 00:00:00.000 2
1954-01-01 00:00:00.000 1954-01-01 00:00:00.000 11
1955-01-01 00:00:00.000 1955-01-01 00:00:00.000 3
1956-01-01 00:00:00.000 1956-01-01 00:00:00.000 2
1957-01-01 00:00:00.000 1957-01-01 00:00:00.000 8
1958-01-01 00:00:00.000 1958-01-01 00:00:00.000 2
1959-01-01 00:00:00.000 1959-01-01 00:00:00.000 5
1960-01-01 00:00:00.000 1960-01-01 00:00:00.000 17
1960-03-17 00:00:00.000 1960-03-17 00:00:00.000 1

How do I join a sparse table and fill rows between in SQL Server

How can I apply weights from a one table to another [Port] where the weight table has sparse dates?
[Port] table
utcDT UsdPnl
-----------------------------------------------
2012-03-09 00:00:00.000 -0.00581815226439161
2012-03-11 00:00:00.000 -0.000535272460588547
2012-03-12 00:00:00.000 -0.00353079778650661
2012-03-13 00:00:00.000 0.00232882689252497
2012-03-14 00:00:00.000 -0.0102592811199384
2012-03-15 00:00:00.000 0.00254451559598693
2012-03-16 00:00:00.000 0.0146718613139845
2012-03-18 00:00:00.000 0.000425144543842752
2012-03-19 00:00:00.000 -0.00388548271428044
2012-03-20 00:00:00.000 -0.00662423680184768
2012-03-21 00:00:00.000 0.00405506208635343
2012-03-22 00:00:00.000 -0.000814822806982203
2012-03-23 00:00:00.000 -0.00289523953346103
2012-03-25 00:00:00.000 0.00204150859774465
2012-03-26 00:00:00.000 -0.00641635182718787
2012-03-27 00:00:00.000 -0.00107168420738448
2012-03-28 00:00:00.000 0.00131000520696153
2012-03-29 00:00:00.000 0.0008223678402638
2012-03-30 00:00:00.000 -0.00255345945390133
2012-04-01 00:00:00.000 -0.00337792814650089
[Weights] table
utcDT Weight
--------------------------------
2012-03-09 00:00:00.000 1
2012-03-20 00:00:00.000 3
2012-03-29 00:00:00.000 7
So, I want to use the weights as if I had a full table like this below. i.e. change to new weight on first day it appears in [Weights] table:
utcDT UsedWeight
----------------------------------
2012-03-09 00:00:00.000 1
2012-03-11 00:00:00.000 1
2012-03-12 00:00:00.000 1
2012-03-13 00:00:00.000 1
2012-03-14 00:00:00.000 1
2012-03-15 00:00:00.000 1
2012-03-16 00:00:00.000 1
2012-03-18 00:00:00.000 1
2012-03-19 00:00:00.000 1
2012-03-20 00:00:00.000 3
2012-03-21 00:00:00.000 3
2012-03-22 00:00:00.000 3
2012-03-23 00:00:00.000 3
2012-03-25 00:00:00.000 3
2012-03-26 00:00:00.000 3
2012-03-27 00:00:00.000 3
2012-03-28 00:00:00.000 3
2012-03-29 00:00:00.000 7
2012-03-30 00:00:00.000 7
2012-04-01 00:00:00.000 7
You can use apply:
select p.*, w.*
from port p outer apply
(select top (1) w.*
from weights w
where w.utcDT <= p.utcDT
order by w.utcDT desc
) w;
outer apply is usually pretty efficient, if you have the right indexes. In this case, the right inex is on weights(utcDT desc).
You can use lead() in a subquery to associate the next date a weight changes to each weights record, and then join with port using an inequality condition on the dates:
select p.utcDt, w.weight
from port p
inner join (
select utcDt, weight, lead(utcDt) over(order by utcDt) lead_utcDt from weights
) w
on p.utcDt >= w.utcDt
and (w.lead_utcDt is null or p.utcDt < w.lead_utcDt)

Combing temp tables in series

Say, I have 6 temp tables stored as the following (those 3 are samples) and I would to form them into 1 single table, to be in series (headers Date, Com, Price).
Com A
Date Price
2015-05-01 00:00:00.000 34.25
2015-05-02 00:00:00.000 35.20
2015-05-03 00:00:00.000 36.70
2015-05-04 00:00:00.000 32.37
2015-05-05 00:00:00.000 32.40
2015-05-06 00:00:00.000 32.20
Com B
Date Price
2015-05-07 00:00:00.000 54.29
2015-05-08 00:00:00.000 54.50
2015-05-09 00:00:00.000 56.21
2015-05-10 00:00:00.000 56.70
2015-05-11 00:00:00.000 58.20
Com C
Date Price
2015-05-12 00:00:00.000 34.29
2015-05-13 00:00:00.000 24.50
2015-05-14 00:00:00.000 76.21
2015-05-15 00:00:00.000 36.70
2015-05-16 00:00:00.000 48.20
The output to look like, and I would like to store it as another temp table for merging later:
Date Com Price
2015-05-01 00:00:00.000 A 34.25
2015-05-02 00:00:00.000 A 35.20
2015-05-03 00:00:00.000 A 36.70
2015-05-04 00:00:00.000 A 32.37
2015-05-05 00:00:00.000 A 32.40
2015-05-06 00:00:00.000 A 32.20
2015-05-07 00:00:00.000 B 54.29
2015-05-08 00:00:00.000 B 54.50
2015-05-09 00:00:00.000 B 56.21
2015-05-10 00:00:00.000 B 56.70
2015-05-11 00:00:00.000 B 58.20
2015-05-12 00:00:00.000 C 34.29
2015-05-13 00:00:00.000 C 24.50
2015-05-14 00:00:00.000 C 76.21
2015-05-15 00:00:00.000 C 36.70
2015-05-16 00:00:00.000 C 48.20
Seems like a simple union all to me:
SELECT [Date], 'A' as Com, Price
FROM [Com A]
UNION ALL
SELECT [Date], 'B' as Com, Price
FROM [Com B]
UNION ALL
SELECT [Date], 'C' as Com, Price
FROM [Com C]
Based on your sample data
Select Date,'A' AS Com,Price from [COM A]
UNION ALL
Select Date,'B' AS Com,Price from [COM B]
UNION ALL
Select Date,'C' AS Com,Price from [COM C]

SQL Server : compare rows, exclude from results when some values are the same

I have the following SQL Server query problem.
If there is a row where Issue_DATE = as Maturity_Date in another row, and if both rows have the same ID and Amount USD, then none of these rows should be displayed.
Here is a simplified version of my table:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2015-12-01 00:00:00.000 5000
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
2 2015-02-02 00:00:00.000 2015-12-05 00:00:00.000 12000
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Result should be:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
I tried with self join, but I do not get right result.
Thanks in advance!
Can you try something like this? 'not exists' is the way of doing it.
select * from table t1 where not exists (select 'x' from table t2 where t1.issue_date = t2.maturity_date and t1.amount_usd=t2.amount_usd and t1.id = t2.id)
I'd think about making subquery of all the dupes and then eliminating them from the first table like so:
select t1.ID
, t1.ISSUE_DATE
, t1.MATURITY_DATE
, t1.AMOUNT_USD
FROM
t1
LEFT JOIN
(select a.ID
, a.ISSUE_DATE
, a.MATURITY_DATE
, a.AMOUNT_USD
FROM
t1 a
INNER JOIN
ti b
) dupes
on
t1.ID = dupes.ID
WHERE dupes.ID IS NULL;

joining monthly values with daily values in sql

I have daily values in one table and monthly values in another table. I need to use the values of the monthly table and calculate them on a daily basis.
basically, monthly factor * daily factor -- for each day
thanks!
I have a table like this:
2010-12-31 00:00:00.000 28.3
2010-09-30 00:00:00.000 64.1
2010-06-30 00:00:00.000 66.15
2010-03-31 00:00:00.000 12.54
and a table like this :
2010-12-31 00:00:00.000 98.1
2010-12-30 00:00:00.000 97.61
2010-12-29 00:00:00.000 99.03
2010-12-28 00:00:00.000 97.7
2010-12-27 00:00:00.000 96.87
2010-12-23 00:00:00.000 97.44
2010-12-22 00:00:00.000 97.76
2010-12-21 00:00:00.000 96.63
2010-12-20 00:00:00.000 95.47
2010-12-17 00:00:00.000 95.2
2010-12-16 00:00:00.000 94.84
2010-12-15 00:00:00.000 94.8
2010-12-14 00:00:00.000 94.1
2010-12-13 00:00:00.000 93.88
2010-12-10 00:00:00.000 93.04
2010-12-09 00:00:00.000 91.07
2010-12-08 00:00:00.000 90.89
2010-12-07 00:00:00.000 92.72
2010-12-06 00:00:00.000 93.05
2010-12-03 00:00:00.000 91.74
2010-12-02 00:00:00.000 90.74
2010-12-01 00:00:00.000 90.25
I need to take the value for the quarter and multiply it buy all the days in the quarter by the daily value
You could try:
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND FLOOR((MONTH(dt.day)-1)/3) = FLOOR((MONTH(mt.day)-1)/3)
ORDER BY dt.day
or (as suggested by #Andriy)
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND DATEPART(QUARTER, dt.day) = DATEPART(QUARTER, mt.day)
ORDER BY dt.day