I have a table that contains Id, Date and a float value as below:
ID startDt Days
1328 2015-04-01 00:00:00.000 15
2444 2015-04-03 00:00:00.000 5.7
1658 2015-05-08 00:00:00.000 6
1329 2015-05-12 00:00:00.000 28.5
1849 2015-06-23 00:00:00.000 28.5
1581 2015-06-30 00:00:00.000 25.5
3535 2015-07-03 00:00:00.000 3
3536 2015-08-13 00:00:00.000 13.5
2166 2015-09-22 00:00:00.000 28.5
3542 2015-11-05 00:00:00.000 13.5
3543 2015-12-18 00:00:00.000 6
2445 2015-12-25 00:00:00.000 5.7
4096 2015-12-31 00:00:00.000 7.5
2446 2016-01-01 00:00:00.000 5.7
4287 2016-02-11 00:00:00.000 13.5
4288 2016-02-18 00:00:00.000 13.5
4492 2016-03-02 00:00:00.000 19.7
2447 2016-03-25 00:00:00.000 5.7
I am using a stored procedure which adds up the Days then subtracts it from a fixed value stored in a variable.
The total in the table is 245 and the variable is set to 245 so I should get a value of 0 when subtracting the two. However, I am getting a value of 5.6843418860808E-14 instead. I cant figure out why this is the case and I have gone and re entered each number in the table but I still get the same result.
This is my sql statement that I am using to calculate the result:
Declare #AL_Taken as float
Declare #AL_Remaining as float
Declare #EntitledLeave as float
Set #EntitledLeave=245
set #AL_Taken= (select sum(Days) from tblALMain)
Set #AL_Remaining=#EntitledLeave-#AL_Taken
Select #EntitledLeave, #AL_Taken, #AL_Remaining
The select returns the following:
245, 245, 5.6843418860808E-14
Can anyone suggest why I am getting this number when I should be getting 0?
Thanks for the help
Rob
I changed the data type to Decimal as Tab Allenman suggested and this resolved my issue. I still dont understand why I didnt get zero when using float as all the values added up to 245 exactly (I even re-entered the values manually) and 245 - 245 should have given me 0.
Thanks again for all the comments and explanations.
Rob
Related
I have this dataframe(below is a sample) with three columns: clientnummer, begindatum(startdate) and einddatum(enddate)
clientnummer begindatum einddatum
3 2013-09-17 00:00:00.000 2014-03-16 00:00:00.000
3 2012-11-28 00:00:00.000 2015-04-04 00:00:00.000
4 2016-02-12 00:00:00.000 none
4 2015-09-10 00:00:00.000 2016-03-09 00:00:00.000
4 2015-12-01 00:00:00.000 2016-04-18 00:00:00.000
5 2016-09-01 00:00:00.000 2016-12-11 00:00:00.000
5 2018-02-20 00:00:00.000 2018-08-31 00:00:00.000
5 2017-02-20 00:00:00.000 2018-02-19 00:00:00.000
5 2021-01-01 00:00:00.000 2021-07-25 00:00:00.000
5 2022-01-01 00:00:00.000 2022-06-30 00:00:00.000
5 2018-09-01 00:00:00.000 2020-08-31 00:00:00.000
5 2015-11-17 00:00:00.000 2017-05-31 00:00:00.000
i Want to combine all rows of the same client if it is in the same range of the rows with the same clientnumber above. Where the beginningdate is the lowest and enddate the highest. This also has to account for None values and clients with only one row.
so outcome would be something like:
clientnummer begindatum einddatum
3 2013-09-17 00:00:00.000 2015-04-04 00:00:00.000
4 2015-09-10 00:00:00.000 2016-04-18 00:00:00.000
4 2016-02-12 00:00:00.000 none
5 2015-11-17 00:00:00.000 2018-02-19 00:00:00.000
5 2018-02-20 00:00:00.000 2018-08-31 00:00:00.000
5 2018-09-01 00:00:00.000 None
5 2021-01-01 00:00:00.000 2021-07-25 00:00:00.000
5 2022-01-01 00:00:00.000 2022-06-30 00:00:00.000
But im keet getting the error: ERROR:root:Error on transforming data: single positional indexer is out-of-bounds
This is my relevant code:
df_grouped = df_indicaties.groupby('clientnummer')
#maak een result tabel
result = pd.DataFrame(columns=['clientnummer', 'begindatum', 'einddatum'])
for client, groep in df_grouped:
# Check if the group is empty or has only one row
if groep.empty or groep.shape[0] == 1:
result = pd.concat([result, pd.DataFrame({'clientnummer': client, 'begindatum': groep['begindatum'].iloc[0], 'einddatum': groep['einddatum'].iloc[0]}, index=[0])])
else:
begindatum_start = groep.iloc[0]['begindatum']
einddatum_start = groep.iloc[0]['einddatum']
for i in groep.index:
n_current = groep.iloc[i]['n_indicatie']
current_begin = groep.iloc[i]['begindatum']
current_eind = groep.iloc[i]['einddatum']
if n_current > 1 and begindatum_start is not None and einddatum_start is not None and current_begin is not None and current_eind is not None:
if current_begin >= begindatum_start and current_begin <= einddatum_start:
if einddatum_start <= current_eind:
einddatum_start = current_eind
else:
result = result.append({'clientnummer': client, 'begindatum': current_begin, 'einddatum': current_eind}, ignore_index=True)
begindatum_start = groep.iloc[i]['begindatum']
einddatum_start = groep.iloc[i]['einddatum']
result = result.append({'clientnummer': client, 'begindatum': begindatum_start, 'einddatum': einddatum_start}, ignore_index=True)
return result
How can I apply weights from a one table to another [Port] where the weight table has sparse dates?
[Port] table
utcDT UsdPnl
-----------------------------------------------
2012-03-09 00:00:00.000 -0.00581815226439161
2012-03-11 00:00:00.000 -0.000535272460588547
2012-03-12 00:00:00.000 -0.00353079778650661
2012-03-13 00:00:00.000 0.00232882689252497
2012-03-14 00:00:00.000 -0.0102592811199384
2012-03-15 00:00:00.000 0.00254451559598693
2012-03-16 00:00:00.000 0.0146718613139845
2012-03-18 00:00:00.000 0.000425144543842752
2012-03-19 00:00:00.000 -0.00388548271428044
2012-03-20 00:00:00.000 -0.00662423680184768
2012-03-21 00:00:00.000 0.00405506208635343
2012-03-22 00:00:00.000 -0.000814822806982203
2012-03-23 00:00:00.000 -0.00289523953346103
2012-03-25 00:00:00.000 0.00204150859774465
2012-03-26 00:00:00.000 -0.00641635182718787
2012-03-27 00:00:00.000 -0.00107168420738448
2012-03-28 00:00:00.000 0.00131000520696153
2012-03-29 00:00:00.000 0.0008223678402638
2012-03-30 00:00:00.000 -0.00255345945390133
2012-04-01 00:00:00.000 -0.00337792814650089
[Weights] table
utcDT Weight
--------------------------------
2012-03-09 00:00:00.000 1
2012-03-20 00:00:00.000 3
2012-03-29 00:00:00.000 7
So, I want to use the weights as if I had a full table like this below. i.e. change to new weight on first day it appears in [Weights] table:
utcDT UsedWeight
----------------------------------
2012-03-09 00:00:00.000 1
2012-03-11 00:00:00.000 1
2012-03-12 00:00:00.000 1
2012-03-13 00:00:00.000 1
2012-03-14 00:00:00.000 1
2012-03-15 00:00:00.000 1
2012-03-16 00:00:00.000 1
2012-03-18 00:00:00.000 1
2012-03-19 00:00:00.000 1
2012-03-20 00:00:00.000 3
2012-03-21 00:00:00.000 3
2012-03-22 00:00:00.000 3
2012-03-23 00:00:00.000 3
2012-03-25 00:00:00.000 3
2012-03-26 00:00:00.000 3
2012-03-27 00:00:00.000 3
2012-03-28 00:00:00.000 3
2012-03-29 00:00:00.000 7
2012-03-30 00:00:00.000 7
2012-04-01 00:00:00.000 7
You can use apply:
select p.*, w.*
from port p outer apply
(select top (1) w.*
from weights w
where w.utcDT <= p.utcDT
order by w.utcDT desc
) w;
outer apply is usually pretty efficient, if you have the right indexes. In this case, the right inex is on weights(utcDT desc).
You can use lead() in a subquery to associate the next date a weight changes to each weights record, and then join with port using an inequality condition on the dates:
select p.utcDt, w.weight
from port p
inner join (
select utcDt, weight, lead(utcDt) over(order by utcDt) lead_utcDt from weights
) w
on p.utcDt >= w.utcDt
and (w.lead_utcDt is null or p.utcDt < w.lead_utcDt)
How can I convert in pandas a date format that looks something like this:
2018-08-27 00:00:00.000
2018-08-26 00:00:00.000
2018-08-24 00:00:00.000
2018-08-24 00:00:00.000
2018-08-24 00:00:00.000
2018-08-24 00:00:00.000
2018-08-23 00:00:00.000
2018-08-23 00:00:00.000
2018-08-20 00:00:00.000
2018-08-20 00:00:00.000
to an integer format counting the days since first of January 2010?
Subtract date from column by Series.sub and convert timedeltas to days by Series.dt.days:
df['days'] = pd.to_datetime(df['date']).sub(pd.Timestamp('2010-01-01')).dt.days
print (df)
date days
0 2018-08-27 00:00:00.000 3160
1 2018-08-26 00:00:00.000 3159
2 2018-08-24 00:00:00.000 3157
3 2018-08-24 00:00:00.000 3157
4 2018-08-24 00:00:00.000 3157
5 2018-08-24 00:00:00.000 3157
6 2018-08-23 00:00:00.000 3156
7 2018-08-23 00:00:00.000 3156
8 2018-08-20 00:00:00.000 3153
9 2018-08-20 00:00:00.000 3153
You can simply apply sub on the pandas Timestamp column like this as mentioned by jezrael in his answer which is very direct.
If you want to do the same serially one by one you can do it like this with the help of map
base_date = pd.Timestamp('2010-01-01 00:00:00')
df['days'] = df['date'].map(lambda date : (pd.Timestamp(date) - base_date).days )
I have daily values in one table and monthly values in another table. I need to use the values of the monthly table and calculate them on a daily basis.
basically, monthly factor * daily factor -- for each day
thanks!
I have a table like this:
2010-12-31 00:00:00.000 28.3
2010-09-30 00:00:00.000 64.1
2010-06-30 00:00:00.000 66.15
2010-03-31 00:00:00.000 12.54
and a table like this :
2010-12-31 00:00:00.000 98.1
2010-12-30 00:00:00.000 97.61
2010-12-29 00:00:00.000 99.03
2010-12-28 00:00:00.000 97.7
2010-12-27 00:00:00.000 96.87
2010-12-23 00:00:00.000 97.44
2010-12-22 00:00:00.000 97.76
2010-12-21 00:00:00.000 96.63
2010-12-20 00:00:00.000 95.47
2010-12-17 00:00:00.000 95.2
2010-12-16 00:00:00.000 94.84
2010-12-15 00:00:00.000 94.8
2010-12-14 00:00:00.000 94.1
2010-12-13 00:00:00.000 93.88
2010-12-10 00:00:00.000 93.04
2010-12-09 00:00:00.000 91.07
2010-12-08 00:00:00.000 90.89
2010-12-07 00:00:00.000 92.72
2010-12-06 00:00:00.000 93.05
2010-12-03 00:00:00.000 91.74
2010-12-02 00:00:00.000 90.74
2010-12-01 00:00:00.000 90.25
I need to take the value for the quarter and multiply it buy all the days in the quarter by the daily value
You could try:
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND FLOOR((MONTH(dt.day)-1)/3) = FLOOR((MONTH(mt.day)-1)/3)
ORDER BY dt.day
or (as suggested by #Andriy)
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND DATEPART(QUARTER, dt.day) = DATEPART(QUARTER, mt.day)
ORDER BY dt.day
I am faced with getting the latest (newest) record for a part, and my only way is to do it using a "max date" approach.
Here is a basic schema (and sample data):
ldDate ldTime ldPart id
2010-10-26 00:00:00.000 52867 90-R6600-4100 186
2010-11-01 00:00:00.000 24634 90-R6600-4100 187
2010-11-24 00:00:00.000 58785 90-R6600-4100 194
2010-11-24 00:00:00.000 58771 90-R6600-4100 195
2010-11-17 00:00:00.000 29588 90-R6600-4100 201
2010-11-08 00:00:00.000 29196 90-R6600-4100 282
2010-11-08 00:00:00.000 29640 90-R6600-4100 290
2010-10-19 00:00:00.000 58695 90-R6600-4100 350
2010-09-22 00:00:00.000 32742 BH4354-F0 338
2010-09-22 00:00:00.000 32504 BH4354-F0 340
2010-11-17 00:00:00.000 31157 BH4354-F0 206
2010-11-08 00:00:00.000 27601 BH4354-F0 218
2010-11-08 00:00:00.000 27865 BH4354-F0 21
2010-09-22 00:00:00.000 23264 BR6950-F0 70
2010-09-22 00:00:00.000 23270 BR6950-F0 77
2010-09-24 00:00:00.000 27781 BR6950-F0 97
2010-11-24 00:00:00.000 57735 BR6950-F0 196
What I have above is an ldDate with no time, and an ldTime but not a proper HH:MM:SS representation, it is seconds since midnight, so the first line would be:
2010-10-26 00:00:00.000 52867
or if I convert the time:
52 867 seconds = 14.6852778 hours
~2010-10-26 14:68:52
Is there a clean way I can combine both columns in my SQL query? I want to do a simple max(ldCombinedDate) and group by ldPart (part number).
The id, is almost useless, so ignore it, it is there right now for what order a record was inserted, nothing else, could be used for a nested query maybe? NOTE: Order of entry does not mean latest date...
Thanks...
The expression
DATEADD(second, ldTime, ldDate)
will return the combined date. So, I guess you want something like this:
SELECT ldPart, MAX(DATEADD(second, ldTime, ldDate)) AS MaxCombinedDate
FROM yourtable
GROUP BY ldPart