Appending tables generated from a loop - pandas

I am a new python user here and am trying to append data together that I have pulled from a pdf using Camelot but am having trouble getting them to join together.
Here is my code:
url = 'https://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_Tables.pdf'
tables = camelot.read_pdf(url,flavor='stream', edge_tol = 500, pages = '1-end')
i = 0
while i in range(0,tables.n):
header = tables[i].df.index[tables[i].df.iloc[:,0]=='Metropolitan Statistical Area'].to_list()
header = str(header)[1:-1]
header = (int(header))
tables[i].df = tables[i].df.rename(columns = tables[i].df.iloc[header])
tables[i].df = tables[i].df.drop(columns = {'': 'Blank'})
print(tables[i].df)
#appended_data.append(tables[i].df)
#if i > 0:
# dfs = tables[i-1].append(tables[i], ignore_index = True)
#pass
i = i + 1
any help would be much appreciated

You can use pandas.concat() to concat a list of dataframe.
while i in range(0,tables.n):
header = tables[i].df.index[tables[i].df.iloc[:,0]=='Metropolitan Statistical Area'].to_list()
header = str(header)[1:-1]
header = (int(header))
tables[i].df = tables[i].df.rename(columns = tables[i].df.iloc[header])
tables[i].df = tables[i].df.drop(columns = {'': 'Blank'})
df_ = pd.concat([table.df for table in tables])

Related

How to optimize for a variable that goes into the argument of a function in pyomo?

I am trying to code a first order plus dead time (FOPDT) model and use it
for PID tuning. The inspiration for the work is the scipy code from: https://apmonitor.com/pdc/index.php/Main/FirstOrderOptimization
When I use model.Thetam() in the ODE constraint, it does not optimize Thetam,
keeps it at the initial value. When I use only model.Theta then the code throws an error -
ValueError: object arrays are not supported if I remove it from uf argument i.e.model.Km * (uf(tt - model.Thetam)-model.U0))
and if I remove it from the if statement (if tt > model.Thetam), then the error is - ERROR:pyomo.core:Rule failed when generating expression for Constraint ode with index 0.0: PyomoException: Cannot convert non-constant Pyomo expression (Thetam < 0.0) to bool. This error is usually caused by using a Var, unit, or mutable Param in a Boolean context such as an "if" statement, or when checking container membership or equality.
Code:
`url = 'http://apmonitor.com/pdc/uploads/Main/data_fopdt.txt'
data = pd.read_csv(url)
data = data.iloc[1:]
t = data['time'].values - data['time'].values[0]
u = data['u'].values
yp = data['y'].values
u0 = u[0]
yp0 = yp[0]
yf = interp1d(t, yp)
# specify number of steps
ns = len(t)
delta_t = t[1]-t[0]
# create linear interpolation of the u data versus time
uf = interp1d(t,u,fill_value="extrapolate")
model = ConcreteModel()
model.T = ContinuousSet(initialize = t)
model.Y = Var(model.T)
model.dYdT = DerivativeVar(model.Y, wrt = (model.T))
model.Y[0].fix(yp0)
model.Yp0 = Param(initialize = yp0)
model.U0 = Param(initialize = u0)
model.Km = Var(initialize = 2, bounds = (0.1, 10))
model.Taum = Var(initialize = 3, bounds = (0.1, 10))
model.Thetam = Var(initialize = 0, bounds = (0, 10))
model.ode = Constraint(model.T,
rule = lambda model, tt: model.dYdT[tt] == (-(model.Y[tt]-model.Yp0) + model.Km * (uf(tt - model.Thetam())-model.U0))/model.Taum if tt > model.Thetam()
else model.dYdT[tt] == -(model.Y[tt]-model.Yp0)/model.Taum)
def obj_rule(m):
return sum((m.Y[i] - yf(i))**2 for i in m.T)
model.obj = Objective(rule = obj_rule)
discretizer = TransformationFactory('dae.finite_difference')
discretizer.apply_to(model, nfe = 500, wrt = model.T, scheme = 'BACKWARD')
opt=SolverFactory('ipopt', executable='/content/ipopt')
opt.solve(model)#, tee = True)
model.pprint()
model2 = ConcreteModel()
model2.T = ContinuousSet(initialize = t)
model2.Y = Var(model2.T)
model2.dYdT = DerivativeVar(model2.Y, wrt = (model2.T))
model2.Y[0].fix(yp0)
model2.Yp0 = Param(initialize = yp0)
model2.U0 = Param(initialize = u0)
model2.Km = Param(initialize = 3.0145871)#3.2648)
model2.Taum = Param(initialize = 1.85862177) # 5.2328)
model2.Thetam = Param(initialize = 0)#2.936839032) #0.1)
model2.ode = Constraint(model2.T,
rule = lambda model, tt: model.dYdT[tt] == (-(model.Y[tt]-model.Yp0) + model.Km * (uf(tt - model.Thetam())-model.U0))/model.Taum)
discretizer2 = TransformationFactory('dae.finite_difference')
discretizer2.apply_to(model2, nfe = 500, wrt = model2.T, scheme = 'BACKWARD')
opt2=SolverFactory('ipopt', executable='/content/ipopt')
opt2.solve(model2)#, tee = True)
# model.pprint()
t = [i for i in model.T]
ypred = [model.Y[i]() for i in model.T]
ytrue = [yf(i) for i in model.T]
yoptim = [model2.Y[i]() for i in model2.T]
plt.plot(t, ypred, 'r-')
plt.plot(t, ytrue)
plt.plot(t, yoptim)
plt.legend(['pred', 'true', 'optim'])
`

How to add entity_effects to correspond to variable?

I wanted to add entity_effects that correspond with my column host_id, and add drop_absorbed = True to define the data.
panel_data
This is what I tried:
panel.reset_index(inplace = True)
modFE = PanelOLS(panel.price_USD2, panel[['bedrooms','beds','number_of_reviews','review_scores_rating','host_is_superhost2_x','n_listings','centrococo_d','basilica_d']], entity_effects = panel.host_id, time_effects = False, drop_absorbed = True)
resFE = modFE.fit(cov_type = 'unadjusted')
print(resFE)
However, it returned "ValueError: series can only be used with a 2-level MultiIndex" in modFE

Spacy v3 - ValueError: [E030] Sentence boundaries unset

I'm training an entity linker model with spacy 3, and am getting the following error when running spacy train:
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe('sentencizer'). Alternatively, add the dependency parser or sentence recognizer, or set sentence boundaries by setting doc[i].is_sent_start. .
I've tried with both transformer and tok2vec pipelines, it seems to be failing on this line:
File "/usr/local/lib/python3.7/dist-packages/spacy/pipeline/entity_linker.py", line 252, in update sentences = [s for s in eg.reference.sents]
Running spacy debug data shows no errors.
I'm using the following config, before filling it in with spacy init fill-config:
[paths]
train = null
dev = null
kb = "./kb"
[system]
gpu_allocator = "pytorch"
[nlp]
lang = "en"
pipeline = ["transformer","parser","sentencizer","ner", "entity_linker"]
batch_size = 128
[components]
[components.transformer]
factory = "transformer"
[components.transformer.model]
#architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
tokenizer_config = {"use_fast": true}
[components.transformer.model.get_spans]
#span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96
[components.sentencizer]
factory = "sentencizer"
punct_chars = null
[components.entity_linker]
factory = "entity_linker"
entity_vector_length = 64
get_candidates = {"#misc":"spacy.CandidateGenerator.v1"}
incl_context = true
incl_prior = true
labels_discard = []
[components.entity_linker.model]
#architectures = "spacy.EntityLinker.v1"
nO = null
[components.entity_linker.model.tok2vec]
#architectures = "spacy.HashEmbedCNN.v1"
pretrained_vectors = null
width = 96
depth = 2
embed_size = 2000
window_size = 1
maxout_pieces = 3
subword_features = true
[components.parser]
factory = "parser"
[components.parser.model]
#architectures = "spacy.TransitionBasedParser.v2"
state_type = "parser"
extra_state_tokens = false
hidden_width = 128
maxout_pieces = 3
use_upper = false
nO = null
[components.parser.model.tok2vec]
#architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
[components.parser.model.tok2vec.pooling]
#layers = "reduce_mean.v1"
[components.ner]
factory = "ner"
[components.ner.model]
#architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null
[components.ner.model.tok2vec]
#architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
[components.ner.model.tok2vec.pooling]
#layers = "reduce_mean.v1"
[corpora]
[corpora.train]
#readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
[corpora.dev]
#readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
[training.optimizer]
#optimizers = "Adam.v1"
[training.optimizer.learn_rate]
#schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 5e-5
[training.batcher]
#batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
[initialize]
vectors = ${paths.vectors}
[initialize.components]
[initialize.components.sentencizer]
[initialize.components.entity_linker]
[initialize.components.entity_linker.kb_loader]
#misc = "spacy.KBFromFile.v1"
kb_path = ${paths.kb}
I can write a script to add the sentence boundaries in manually to the docs, but am wondering why the sentencizer component is not doing this for me, is there something missing in the config?
You haven't put the sentencizer in annotating_components, so the updates it makes aren't visible to other components during training. Take a look at the relevant section in the docs.

Oanda API - Issue Price - Instruments

I'm using Oanda API to automate Trading strategies, I have a 'price' error that only occurs when selecting some instruments such as XAG (silver), my guess is that there is a classification difference but Oanda is yet to answer on the matter.
The error does not occur when selecting Forex pairs.
If anyone had such issues in the past and managed to solve it I'll be happy to hear form them.
PS: I'm UK based and have access to most products including CFDs
class SMABollTrader(tpqoa.tpqoa):
def __init__(self, conf_file, instrument, bar_length, SMA, dev, SMA_S, SMA_L, units):
super().__init__(conf_file)
self.instrument = instrument
self.bar_length = pd.to_timedelta(bar_length)
self.tick_data = pd.DataFrame()
self.raw_data = None
self.data = None
self.last_bar = None
self.units = units
self.position = 0
self.profits = []
self.price = []
#*****************add strategy-specific attributes here******************
self.SMA = SMA
self.dev = dev
self.SMA_S = SMA_S
self.SMA_L = SMA_L
#************************************************************************
def get_most_recent(self, days = 5):
while True:
time.sleep(2)
now = datetime.utcnow()
now = now - timedelta(microseconds = now.microsecond)
past = now - timedelta(days = days)
df = self.get_history(instrument = self.instrument, start = past, end = now,
granularity = "S5", price = "M", localize = False).c.dropna().to_frame()
df.rename(columns = {"c":self.instrument}, inplace = True)
df = df.resample(self .bar_length, label = "right").last().dropna().iloc[:-1]
self.raw_data = df.copy()
self.last_bar = self.raw_data.index[-1]
if pd.to_datetime(datetime.utcnow()).tz_localize("UTC") - self.last_bar < self.bar_length:
break
def on_success(self, time, bid, ask):
print(self.ticks, end = " ")
recent_tick = pd.to_datetime(time)
df = pd.DataFrame({self.instrument:(ask + bid)/2},
index = [recent_tick])
self.tick_data = self.tick_data.append(df)
if recent_tick - self.last_bar > self.bar_length:
self.resample_and_join()
self.define_strategy()
self.execute_trades()
def resample_and_join(self):
self.raw_data = self.raw_data.append(self.tick_data.resample(self.bar_length,
label="right").last().ffill().iloc[:-1])
self.tick_data = self.tick_data.iloc[-1:]
self.last_bar = self.raw_data.index[-1]
def define_strategy(self): # "strategy-specific"
df = self.raw_data.copy()
#******************** define your strategy here ************************
df["SMA"] = df[self.instrument].rolling(self.SMA).mean()
df["Lower"] = df["SMA"] - df[self.instrument].rolling(self.SMA).std() * self.dev
df["Upper"] = df["SMA"] + df[self.instrument].rolling(self.SMA).std() * self.dev
df["distance"] = df[self.instrument] - df.SMA
df["SMA_S"] = df[self.instrument].rolling(self.SMA_S).mean()
df["SMA_L"] = df[self.instrument].rolling(self.SMA_L).mean()
df["position"] = np.where(df[self.instrument] < df.Lower) and np.where(df["SMA_S"] > df["SMA_L"] ,1,np.nan)
df["position"] = np.where(df[self.instrument] > df.Upper) and np.where(df["SMA_S"] < df["SMA_L"], -1, df["position"])
df["position"] = np.where(df.distance * df.distance.shift(1) < 0, 0, df["position"])
df["position"] = df.position.ffill().fillna(0)
self.data = df.copy()
#***********************************************************************
def execute_trades(self):
if self.data["position"].iloc[-1] == 1:
if self.position == 0 or None:
order = self.create_order(self.instrument, self.units, suppress = True, ret = True)
self.report_trade(order, "GOING LONG")
elif self.position == -1:
order = self.create_order(self.instrument, self.units * 2, suppress = True, ret = True)
self.report_trade(order, "GOING LONG")
self.position = 1
elif self.data["position"].iloc[-1] == -1:
if self.position == 0:
order = self.create_order(self.instrument, -self.units, suppress = True, ret = True)
self.report_trade(order, "GOING SHORT")
elif self.position == 1:
order = self.create_order(self.instrument, -self.units * 2, suppress = True, ret = True)
self.report_trade(order, "GOING SHORT")
self.position = -1
elif self.data["position"].iloc[-1] == 0:
if self.position == -1:
order = self.create_order(self.instrument, self.units, suppress = True, ret = True)
self.report_trade(order, "GOING NEUTRAL")
elif self.position == 1:
order = self.create_order(self.instrument, -self.units, suppress = True, ret = True)
self.report_trade(order, "GOING NEUTRAL")
self.position = 0
def report_trade(self, order, going):
time = order["time"]
units = order["units"]
price = order["price"]
pl = float(order["pl"])
self.profits.append(pl)
cumpl = sum(self.profits)
print("\n" + 100* "-")
print("{} | {}".format(time, going))
print("{} | units = {} | price = {} | P&L = {} | Cum P&L = {}".format(time, units, price, pl, cumpl))
print(100 * "-" + "\n")
trader = SMABollTrader("oanda.cfg", "EUR_GBP", "15m", SMA = 82, dev = 4, SMA_S = 38, SMA_L = 135, units = 100000)
trader.get_most_recent()
trader.stream_data(trader.instrument, stop = None )
if trader.position != 0: # if we have a final open position
close_order = trader.create_order(trader.instrument, units = -trader.position * trader.units,
suppress = True, ret = True)
trader.report_trade(close_order, "GOING NEUTRAL")
trader.signal = 0
I have done Hagmann course as well and I have recognised your code immediately.
Firstly the way you define your positions is not the best. Look at the section of combining two strategies. There are two ways.
Now regarding your price problem I had a similar situation with BTC. You can download it's historical data but when I plotted it to the strategy code and started to stream I had exactly the same error indicating that tick data was never streamed.
I am guessing that simply not all instruments are tradeable via api or in your case maybe you tried to stream beyond trading hours?

Syntax for wildcard SQL Server Merge statement

This SQL Server MERGE statement works, but it is clumsy. Is there any syntax to merge these two tables so that they have the exact same structure? I am trying to update the Score_date from the Score_Import table. I have many tables to do and do not want to type them out. Thanks.
MERGE INTO [Score_Data].[dbo].Product as Dp
USING [Score_import].[dbo].Product as Ip
ON Dp.part_no = Ip.part_no
WHEN MATCHED THEN
UPDATE
SET Dp.total = Ip.total
,Dp.description = Ip.description
,Dp.family = Ip.family
,DP.um = IP.um
,DP.new_part_no = IP.new_part_no
,DP.prod_code = IP.prod_code
,DP.sub1 = IP.sub1
,DP.sub2 = IP.sub2
,DP.ven_no = IP.ven_no
,DP.no_sell = IP.no_sell
,DP.rp_dns = IP.rp_dns
,DP.nfa = IP.nfa
,DP.loc = IP.loc
,DP.cat_desc = IP.cat_desc
,DP.cat_color = IP.cat_color
,DP.cat_size = IP.cat_size
,DP.cat_fits = IP.cat_fits
,DP.cat_brand = IP.cat_brand
,DP.cat_usd1 = IP.cat_usd1
,DP.cat_usd2 = IP.cat_usd2
,DP.cat_usd3 = IP.cat_usd3
,DP.cat_usd4 = IP.cat_usd4
,DP.cat_usd5 = IP.cat_usd5
,DP.cat_usd6 = IP.cat_usd6
,DP.cat_usd7 = IP.cat_usd7
,DP.cat_usd8 = IP.cat_usd8
,DP.cat_usd9 = IP.cat_usd9
,DP.cat_usd10 = IP.cat_usd10
,DP.cat_usd11 = IP.cat_usd11
,DP.cat_usd12 = IP.cat_usd12
,DP.cat_usd13 = IP.cat_usd13
,DP.cat_usd14 = IP.cat_usd14
,DP.cat_usd15 = IP.cat_usd15
,DP.buy = IP.buy
,DP.price_1 = IP.price_1
,DP.price_2 = IP.price_2
,DP.price_3 = IP.price_3
,DP.price_4 = IP.price_4
,DP.price_5 = IP.price_5
,DP.price_6 = IP.price_6
,DP.price_7 = IP.price_7
,DP.price_8 = IP.price_8
,DP.price_9 = IP.price_9
,DP.create_date = IP.create_date
,DP.barcode = IP.barcode
,DP.check_digit = IP.check_digit
,DP.supplier = IP.supplier
,DP.prc_fam_code = DP.prc_fam_code
,DP.note = IP.note
,DP.mfg_part_no = IP.mfg_part_no
,DP.special = IP.special
,DP.spc_price = IP.spc_price
,DP.firm = IP.firm
,DP.box = IP.box
,DP.no_split = IP.no_split
,DP.drop_ship = IP.drop_ship
,DP.case_pack = IP.case_pack
,DP.inner_pack = IP.inner_pack
WHEN NOT MATCHED BY TARGET THEN
INSERT (part_no
,description
,family
,Total
,um
,new_part_no
,prod_code
,sub1
,sub2
,ven_no
,no_sell
,rp_dns
,nfa
,loc
,cat_desc
,cat_color
,cat_size
,cat_fits
,cat_brand
,cat_usd1
,cat_usd2
,cat_usd3
,cat_usd4
,cat_usd5
,cat_usd6
,cat_usd7
,cat_usd8
,cat_usd9
,cat_usd10
,cat_usd11
,cat_usd12
,cat_usd13
,cat_usd14
,cat_usd15
,buy
,price_1
,price_2
,price_3
,price_4
,price_5
,price_6,
,price_7,
,price_8
,price_9
,create_date
,barcode
,check_digit
,supplier
,prc_fam_code
,note
,mfg_part_no
,special
,spc_price
,firm
,box
,no_split
,drop_ship
,case_pack
,inner_pack)
VALUES
(Ip.Part_no
,Ip.description
,Ip.family
,Ip.Total
,Ip.um
,Ip.new_part_no
,Ip.prod_code
,Ip.sub1
,Ip.sub2
,Ip.ven_no
,Ip.no_sell
,Ip.rp_dns
,Ip.nfa
,Ip.loc
,Ip.cat_desc
,Ip.cat_color
,Ip.cat_size
,Ip.cat_fits
,Ip.cat_brand
,Ip.cat_usd1
,Ip.cat_usd2
,Ip.cat_usd3
,Ip.cat_usd4
,Ip.cat_usd5
,Ip.cat_usd6
,Ip.cat_usd7
,Ip.cat_usd8
,Ip.cat_usd9
,Ip.cat_usd10
,Ip.cat_usd11
,Ip.cat_usd12
,Ip.cat_usd13
,Ip.cat_usd14
,Ip.cat_usd15
,Ip.buy
,Ip.price_1
,Ip.price_2
,Ip.price_3
,Ip.price_4
,Ip.price_5
,Ip.price_6
,Ip.price_7
,Ip.price_8
,Ip.price_9
,Ip.create_date
,Ip.barcode
,Ip.check_digit
,Ip.supplier
,Ip.prc_fam_code
,Ip.note
,Ip.mfg_part_no
,Ip.special
,Ip.spc_price
,Ip.firm
,Ip.box
,Ip.no_split
,Ip.drop_ship
,Ip.case_pack
,Ip.inner_pack)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT $action, Inserted.*, Deleted.*;