Why do my arrays display missing values when identifying a bandwidth? (geopandas) - pandas

I'm trying to identify a suitable bandwidth to use for a geographically weighted regression but every time I search for the bandwidth it displays that there are missing (NaN) values within the arrays of the dataset. Although, each row features all values.
g_y = df_ct2008xy['2008 HP'].values.reshape((-1,1))
g_X = df_ct2008xy[['2008 AF', '2008 MI', '2008 MP', '2008 EB']].values
u = df_ct2008xy['X']
v = df_ct2008xy['Y']
g_coords = list(zip(u,v))
g_X = (g_X - g_X.mean(axis=0)) / g_X.std(axis=0)
g_y = g_y.reshape((-1,1))
g_y = (g_y - g_y.mean(axis=0)) / g_y.std(axis=0)
bw = mgwr.sel_bw.Sel_BW(g_coords,
g_y, # Independent variable
g_X, # Dependent variable
fixed=True, # True for fixed bandwidth and false for adaptive bandwidth
spherical=True) # Spherical coordinates (long-lat) or projected coordinates
I searched using numpy to identify if these were individual values using
np.isnan(g_y).any()
and
np.isnan(g_X)
but apparently every value is 'missing' and returning 'True'

Related

How to fix "Solution Not Found" Error in Gekko Optimization with rolling principle

My program is optimizing the charging and decharging of a home battery to minimize the cost of electricity at the end of the year. In this case there also is a PV, which means that sometimes you're injecting electricity into the grid and receive money. The net offtake is the result of the usage of the home and the PV installation. So these are the possible situations:
Net offtake > 0 => usage home > PV => discharge from battery or take from grid
Net offtake < 0 => usage home < PV => charge battery or injection into grid
The electricity usage of homes is measured each 15 minutes, so I have 96 measurement point in 1 day. I want to optimilize the charging and decharging of the battery for 2 days, so that day 1 takes the usage of day 2 into account.
I wrote a controller that reads the data and gives each time the input values for 2 days to the optimization. With a rolling principle, it goes to the next 2 days and so on. Below you can see the code from my controller.
from gekko import GEKKO
from simulationModel2_2d_1 import getSimulation2
from exportModel2 import exportToExcelModel2
import numpy as np
#import matplotlib.pyplot as plt
import pandas as pd
import time
import math
# ------------------------ Import and read input data ------------------------
file = r'Data Sim 2.xlsx'
data = pd.read_excel(file, sheet_name='Input', na_values='NaN')
dataRead = pd.DataFrame(data, columns= ['Timestep','Verbruik woning (kWh)','Netto afname (kWh)','Prijs afname (€/kWh)',
'Prijs injectie (€/kWh)','Capaciteit batterij (kW)',
'Capaciteit batterij (kWh)','Rendement (%)',
'Verbruikersprofiel','Capaciteit PV (kWp)','Aantal dagen'])
timestep = dataRead['Timestep'].to_numpy()
usage_home = dataRead['Verbruik woning (kWh)'].to_numpy()
net_offtake = dataRead['Netto afname (kWh)'].to_numpy()
price_offtake = dataRead['Prijs afname (€/kWh)'].to_numpy()
price_injection = dataRead['Prijs injectie (€/kWh)'].to_numpy()
cap_batt_kW = dataRead['Capaciteit batterij (kW)'].iloc[0]
cap_batt_kWh = dataRead['Capaciteit batterij (kWh)'].iloc[0]
efficiency = dataRead['Rendement (%)'].iloc[0]
usersprofile = dataRead['Verbruikersprofiel'].iloc[0]
days = dataRead['Aantal dagen'].iloc[0]
pv = dataRead['Capaciteit PV (kWp)'].iloc[0]
# ------------- Optimization model & Rolling principle (2 days) --------------
# Initialise model
m = GEKKO()
# Output data
ts = []
charging = [] # Amount to charge/decharge batterij
e_batt = [] # Amount of energy in the battery
usage_net = [] # Usage after home, battery and pv
p_paid = [] # Price paid for energy of 15min
# Energy in battery to pass
energy = 0
# Iterate each day for one year
for d in range(int(days)-1):
d1_timestep = []
d1_net_offtake = []
d1_price_offtake = []
d1_price_injection = []
d2_timestep = []
d2_net_offtake = []
d2_price_offtake = []
d2_price_injection = []
# Iterate timesteps
for i in range(96):
d1_timestep.append(timestep[d*96+i])
d2_timestep.append(timestep[d*96+i+96])
d1_net_offtake.append(net_offtake[d*96+i])
d2_net_offtake.append(net_offtake[d*96+i+96])
d1_price_offtake.append(price_offtake[d*96+i])
d2_price_offtake.append(price_offtake[d*96+i+96])
d1_price_injection.append(price_injection[d*96+i])
d2_price_injection.append(price_injection[d*96+i+96])
# Input data simulation of 2 days
ts_temp = np.concatenate((d1_timestep, d2_timestep))
net_offtake_temp = np.concatenate((d1_net_offtake, d2_net_offtake))
price_offtake_temp = np.concatenate((d1_price_offtake, d2_price_offtake))
price_injection_temp = np.concatenate((d1_price_injection, d2_price_injection))
if(d == 7):
print(ts_temp)
print(energy)
# Simulatie uitvoeren
charging_temp, e_batt_temp, usage_net_temp, p_paid_temp, energy_temp = getSimulation2(ts_temp, net_offtake_temp, price_offtake_temp, price_injection_temp, cap_batt_kW, cap_batt_kWh, efficiency, energy, pv)
# Take over output first day, unless last 2 days
energy = energy_temp
if(d == (days-2)):
for t in range(1,len(ts_temp)):
ts.append(ts_temp[t])
charging.append(charging_temp[t])
e_batt.append(e_batt_temp[t])
usage_net.append(usage_net_temp[t])
p_paid.append(p_paid_temp[t])
elif(d == 0):
for t in range(int(len(ts_temp)/2)+1):
ts.append(ts_temp[t])
charging.append(charging_temp[t])
e_batt.append(e_batt_temp[t])
usage_net.append(usage_net_temp[t])
p_paid.append(p_paid_temp[t])
else:
for t in range(1,int(len(ts_temp)/2)+1):
ts.append(ts_temp[t])
charging.append(charging_temp[t])
e_batt.append(e_batt_temp[t])
usage_net.append(usage_net_temp[t])
p_paid.append(p_paid_temp[t])
print('Simulation day '+str(d+1)+' complete.')
# ------------------------ Export output data to Excel -----------------------
a = exportToExcelModel2(ts, usage_home, net_offtake, price_offtake, price_injection, charging, e_batt, usage_net, p_paid, cap_batt_kW, cap_batt_kWh, efficiency, usersprofile, pv)
print(a)
The optimization with Gekko happens in the following code:
from gekko import GEKKO
def getSimulation2(timestep, net_offtake, price_offtake, price_injection,
cap_batt_kW, cap_batt_kWh, efficiency, start_energy, pv):
# ---------------------------- Optimization model ----------------------------
# Initialise model
m = GEKKO(remote = False)
# Global options
m.options.SOLVER = 1
m.options.IMODE = 6
# Constants
speed_charging = cap_batt_kW/4
m.time = timestep
max_cap_batt = m.Const(value = cap_batt_kWh)
min_cap_batt = m.Const(value = 0)
max_charge = m.Const(value = speed_charging) # max battery can charge in 15min
max_decharge = m.Const(value = -speed_charging) # max battery can decharge in 15min
# Parameters
usage_home = m.Param(net_offtake)
price_offtake = m.Param(price_offtake)
price_injection = m.Param(price_injection)
# Variables
e_batt = m.Var(value=start_energy, lb = min_cap_batt, ub = max_cap_batt) # energy in battery
price_paid = m.Var() # price paid each 15min
charging = m.Var(lb = max_decharge, ub = max_charge) # amount charge/decharge each 15min
usage_net = m.Var(lb=min_cap_batt)
# Equations
m.Equation(e_batt==(m.integral(charging)+start_energy)*efficiency)
m.Equation(-charging <= e_batt)
m.Equation(usage_net==usage_home + charging)
price = m.Intermediate(m.if2(usage_net*1e6, price_injection, price_offtake))
price_paid = m.Intermediate(usage_net * price / 100)
# Objective
m.Minimize(price_paid)
# Solve problem
m.options.COLDSTART=2
m.solve()
m.options.TIME_SHIFT=0
m.options.COLDSTART=0
m.solve()
# Energy to pass
energy_left = e_batt[95]
#m.cleanup()
return charging, e_batt, usage_net, price_paid, energy_left
The data you need for input can be found in this Excel document:
https://docs.google.com/spreadsheets/d/1S40Ut9-eN_PrftPCNPoWl8WDDQtu54f0/edit?usp=sharing&ouid=104786612700360067470&rtpof=true&sd=true
With this code, it always ends at day 17 with the "Solution Not Found" Error.
I already tried extending the default iteration limit to 500 but it didn't work.
I also tried with other solvers but also no improvement.
By presolving with COLDSTART it already reached day 17, without this it ends at day 8.
I solved the days were my optimization ends individually and then the solution was always found immediately with the same code.
Someone who can explain this to me and maybe find a solution? Thanks in advance!
This is kind of big to troubleshoot, but here are some general ideas that might help. This assumes, as you said, that the model solves fine for day 1-2, and day 3-4, and day 5-6, etc. And that those results pass inspection (aka the basic model is working as you say).
Then something is (obviously) amiss around day 17. Some things to look at and try:
Start the model at day 16-17, see if it works in isolation
gather your results as you go and do a time series plot of the key variables, maybe one of them is on an obvious bad trend towards a limit causing an infeasibility... Perhaps the e_batt variable is slowly declining because not enough PV energy is available and hits minimum on Day 17
Radically change the upper/lower bounds on your variables to test them to see if they might be involved in the infeasibility (assuming there is one)
Make a different (fake) dataset where the solution is assured and kind of obvious... all constants or a pattern that you know will produce some known result and test the model outputs against it.
It might also be useful to pursue the tips in this excellent post on troubleshooting gekko, and edit your question with the results of that effort.
edit: couple ideas from your comment...
You didn't say what the result was from troubleshooting and whether the error is infeasible or max iterations, or ???. But...
If the model seems to crunch after about 15 days, I'm betting that it is becoming infeasible. Did you plot the battery level over course of days?
Also, I'm suspicious of your equation for the e_batt level... Why are you multiplying the prior battery state by the efficiency factor? That seems incorrect. That is charge that is already in the battery. Odds are you are (incorrectly) hitting the battery charge level every day with the efficiency tax and that the max charge level isn't sufficient to keep up with demand.
In addition to tips above try:
fix your efficiency formula to not multiply the efficiency times the previous state
change the efficiency to 100%
make the upper limit on charge huge
As an aside: I don't really see the connection to PV energy available here. What you are basically modeling is some "mystery battery" that you can charge and discharge anytime you want. I would get this debugged first and then look at the energy available by time of day...you aren't going to be generating charge at midnight. :).

Moving Average of time series using a sliding window over an array

I need to write a function below that can compute the moving average of time series using a sliding window over an array. This function should take an array of date strings (say arr_date), an array of numbers (say arr_record), and a sliding window (default value 50). It should:
Return a list of dictionaries for all windows.
Each dictionary should include the date, average value, min, max, standard deviation at each window.
Able to handle missing data in time series by replacing missing data with the most recent available data.
(b) Download SPY daily data (Dec. 31, 2017 to Dec. 31, 2018) from Yahoo! as your test data in a .csv file. Read reading .csv file example and write a test programming for calling your function.
Does anyone have any thoughts? Extremely new to python and struggling.
So something following this logic should probably be a good starting point. Hope this is a helpful start, and welcome to the cs community.
def sliding_window( dates, numbers, sliding_window_value):
# list of dictionaries
return_dicts =[{}]
# if window size is greater than length of dates, there's only one window
if sliding_window_value >= len(dates):
return_dicts += [create_window(dates, numbers)]
return return_dicts
# gather all our windows into one list
for i in range (0, len(dates) - sliding_window_value ):
# get our window subsets
dates_subset = dates[i:(sliding_window_value+1)]
numbers_subset = numbers[i:(sliding_window_value+1)]
# get our window stats dictionary
window_stats = create_window(dates_subset,numbers_subset)
# add these stats to our return list
return_dicts += [window_stats]
return return_dicts
def create_window(dates_subset, numbers_subset):
window_min = 1000000 # some high minimum to start
window_max = -1000000 # some low maximuim to start
window_total = 0
for i in range ( 0, len(dates_subset)):
# calculate total
window_total += numbers_subset[i]
# calculate max
if numbers_subset[i] > window_max:
window_max = numbers_subset[i]
# calculate min
if numbers_subset[i] < window_min:
window_min = numbers_subset[i]
# other calculations....
return_dict = {
"min" : window_min,
"max" : window_max,
"average" : window_total / len(dates_subset),
# other calculations....
}
return return_dict
Good luck bud, the work is worth it.

Data Selection - Finding relations between dataframe attributes

let's say i have a dataframe of 80 columns and 1 target column,
for example a bank account table with 80 attributes for each record (account) and 1 target column which decides if the client stays or leaves.
what steps and algorithms should i follow to select the most effective columns with the higher impact on the target column ?
There are a number of steps you can take, I'll give some examples to get you started:
A correlation coefficient, such as Pearson's Rho (for parametric data) or Spearman's R (for ordinate data).
Feature importances. I like XGBoost for this, as it includes the handy xgb.ggplot.importance / xgb.plot_importance methods.
One of the many feature selection options, such as python's sklearn.feature_selection methods.
This one way to do it using the Pearson correlation coefficient in Rstudio, I used it once when exploring the red_wine dataset my targeted variable or column was the quality and I wanted to know the effect of the rest of the columns on it.
see below figure shows the output of the code as you can see the blue color represents positive relation and red represents negative relations and the closer the value to 1 or -1 the darker the color
c <- cor(
red_wine %>%
# first we remove unwanted columns
dplyr::select(-X) %>%
dplyr::select(-rating) %>%
mutate(
# now we translate quality to a number
quality = as.numeric(quality)
)
)
corrplot(c, method = "color", type = "lower", addCoef.col = "gray", title = "Red Wine Variables Correlations", mar=c(0,0,1,0), tl.cex = 0.7, tl.col = "black", number.cex = 0.9)

torch7: Unexpected 'counts' in k-Means Clustering

I am trying to apply k-means clustering on a set of images (images are loaded as float torch.Tensors) using the following segment of code:
print('[Clustering all samples...]')
local points = torch.Tensor(trsize, 3, 221, 221)
for i = 1,trsize do
points[i] = trainData.data[i]:clone() -- dont want to modify the original tensors
end
points:resize(trsize, 3*221*221) -- to convert it to a 2-D tensor
local centroids, counts = unsup.kmeans(points, total_classes, 40, total_classes, nil, true)
print(counts)
When I observe the values in the counts tensor, I observe that it contains unexpected values, in the form of some entries being more than trsize, whereas the documentation says that counts stores the counts per centroid. I expected that it means counts[i] equals the number of samples out of trsize belonging to cluster with centroid centroids[i]. Am I wrong in assuming so?
If that indeed is the case, shouldn't sample-to-centroid be a hard-assignment (i.e. shouldn't counts[i] sum to trsize, which clearly is not the case with my clustering)? Am I missing something here?
Thanks in advance.
In the current version of the code, counts are accumulated after each iteration
for i = 1,niter do
-- k-means computations...
-- total counts
totalcounts:add(counts)
end
So in the end counts:sum() is a multiple of niter.
As a workaround you can use the callback to obtain the final counts (non-accumulated):
local maxiter = 40
local centroids, counts = unsup.kmeans(
points,
total_classes,
maxiter,
total_classes,
function(i, _, totalcounts) if i < maxiter then totalcounts:zero() end end,
true
)
As an alternative you can use vlfeat.torch and explicitly quantize your input points after kmeans to obtain these counts:
local assignments = kmeans:quantize(points)
local counts = torch.zeros(total_classes):int()
for i=1,total_classes do
counts[i] = assignments:eq(i):sum()
end

Error Handling with Live Data Matlab

I am using a live data API that is returning the next arriving trains. I plan on giving the user the next 5 trains arriving. If there are less than 5 trains arriving, how you handle that? I am having trouble thinking of a way, I was thinking a way with if statements but don't know how I would set them up.
time1Depart = dataReturnedFromLiveAPI{1,1}.orig_departure_time;
time2Depart = dataReturnedFromLiveAPI{1,2}.orig_departure_time;
time3Depart = dataReturnedFromLiveAPI{1,3}.orig_departure_time;
time4Depart = dataReturnedFromLiveAPI{1,4}.orig_departure_time;
time5Depart = dataReturnedFromLiveAPI{1,5}.orig_departure_time;
time1Arrival = dataReturnedFromLiveAPI{1,1}.orig_arrival_time;
time2Arrival = dataReturnedFromLiveAPI{1,2}.orig_arrival_time;
time3Arrival = dataReturnedFromLiveAPI{1,3}.orig_arrival_time;
time4Arrival = dataReturnedFromLiveAPI{1,4}.orig_arrival_time;
time5Arrival = dataReturnedFromLiveAPI{1,5}.orig_arrival_time;
The code right now uses a matrix that goes from 1:numoftrains but I am using just the first five.
It's a bad practice to assign individual value to a separate variable. Better if you pass all related values to a vector or cell array depending on class of orig_departure_time and orig_arrival_time.
It looks like dataReturnedFromLiveAPI is a cell array of structures. Then you can do:
timeDepart = cellfun(#(x), x.orig_departure_time, ...
dataReturnedFromLiveAPI(1,1:min(5,size(dataReturnedFromLiveAPI,2))), ...
'UniformOutput',0 );
timeArrival = cellfun(#(x), x.orig_arrival_time, ...
dataReturnedFromLiveAPI(1,1:min(5,size(dataReturnedFromLiveAPI,2))), ...
'UniformOutput',0 );
Then you how to access the values one by one as
time1Depart = timeDepart{1};
If orig_departure_time and orig_arrival_time are numeric scalars, you can use ...'UniformOutput',1.... You will get output as a vector and can get single values with timeDepart(1).