Doxygen generation stops at 'Searching for documented variables...' - variables
I currently want to create a doxygen documentation for a large C++ source code project (2400+ files).
The process stops at the following step:
Searching for enumerations...
Searching for documented typedefs...
Searching for members imported via using declarations...
Searching for included using directives...
Searching for documented variables...
I've waited more than 12 hours for this step, and there wasn't any progress at this point (Windows Task Manager says one core is working at 100 % (13 % overall) with 95 600 KB of memory usage).
My machine is an Intel Xeon E5-1620, 8 Cores # 3.5 GHz with 8 GB of RAM.
This is the doxygen config file which I'm using.
# Doxyfile 1.8.10
#---------------------------------------------------------------------------
# Project related configuration options
#---------------------------------------------------------------------------
DOXYFILE_ENCODING = UTF-8
PROJECT_NAME = "Phil's FSD Doc"
PROJECT_NUMBER =
PROJECT_BRIEF =
PROJECT_LOGO =
OUTPUT_DIRECTORY = D:\kbPhil\doxygen\generatedDoc
CREATE_SUBDIRS = NO
ALLOW_UNICODE_NAMES = NO
OUTPUT_LANGUAGE = English
BRIEF_MEMBER_DESC = YES
REPEAT_BRIEF = YES
ABBREVIATE_BRIEF = "The $name class" \
"The $name widget" \
"The $name file" \
is \
provides \
specifies \
contains \
represents \
a \
an \
the
ALWAYS_DETAILED_SEC = NO
INLINE_INHERITED_MEMB = NO
FULL_PATH_NAMES = YES
STRIP_FROM_PATH =
STRIP_FROM_INC_PATH =
SHORT_NAMES = NO
JAVADOC_AUTOBRIEF = NO
QT_AUTOBRIEF = NO
MULTILINE_CPP_IS_BRIEF = NO
INHERIT_DOCS = YES
SEPARATE_MEMBER_PAGES = NO
TAB_SIZE = 3
ALIASES =
TCL_SUBST =
OPTIMIZE_OUTPUT_FOR_C = YES
OPTIMIZE_OUTPUT_JAVA = NO
OPTIMIZE_FOR_FORTRAN = NO
OPTIMIZE_OUTPUT_VHDL = NO
EXTENSION_MAPPING =
MARKDOWN_SUPPORT = YES
AUTOLINK_SUPPORT = YES
BUILTIN_STL_SUPPORT = NO
CPP_CLI_SUPPORT = NO
SIP_SUPPORT = NO
IDL_PROPERTY_SUPPORT = YES
DISTRIBUTE_GROUP_DOC = NO
GROUP_NESTED_COMPOUNDS = NO
SUBGROUPING = YES
INLINE_GROUPED_CLASSES = NO
INLINE_SIMPLE_STRUCTS = NO
TYPEDEF_HIDES_STRUCT = NO
LOOKUP_CACHE_SIZE = 0
#---------------------------------------------------------------------------
# Build related configuration options
#---------------------------------------------------------------------------
EXTRACT_ALL = YES
EXTRACT_PRIVATE = NO
EXTRACT_PACKAGE = NO
EXTRACT_STATIC = YES
EXTRACT_LOCAL_CLASSES = YES
EXTRACT_LOCAL_METHODS = YES
EXTRACT_ANON_NSPACES = NO
HIDE_UNDOC_MEMBERS = NO
HIDE_UNDOC_CLASSES = NO
HIDE_FRIEND_COMPOUNDS = NO
HIDE_IN_BODY_DOCS = NO
INTERNAL_DOCS = NO
CASE_SENSE_NAMES = NO
HIDE_SCOPE_NAMES = YES
HIDE_COMPOUND_REFERENCE= NO
SHOW_INCLUDE_FILES = YES
SHOW_GROUPED_MEMB_INC = NO
FORCE_LOCAL_INCLUDES = NO
INLINE_INFO = YES
SORT_MEMBER_DOCS = YES
SORT_BRIEF_DOCS = NO
SORT_MEMBERS_CTORS_1ST = NO
SORT_GROUP_NAMES = NO
SORT_BY_SCOPE_NAME = NO
STRICT_PROTO_MATCHING = NO
GENERATE_TODOLIST = YES
GENERATE_TESTLIST = YES
GENERATE_BUGLIST = YES
GENERATE_DEPRECATEDLIST= YES
ENABLED_SECTIONS =
MAX_INITIALIZER_LINES = 30
SHOW_USED_FILES = YES
SHOW_FILES = YES
SHOW_NAMESPACES = YES
FILE_VERSION_FILTER =
LAYOUT_FILE =
CITE_BIB_FILES =
#---------------------------------------------------------------------------
# Configuration options related to warning and progress messages
#---------------------------------------------------------------------------
QUIET = NO
WARNINGS = YES
WARN_IF_UNDOCUMENTED = YES
WARN_IF_DOC_ERROR = YES
WARN_NO_PARAMDOC = NO
WARN_FORMAT = "$file:$line: $text"
WARN_LOGFILE =
#---------------------------------------------------------------------------
# Configuration options related to the input files
#---------------------------------------------------------------------------
INPUT = ..
INPUT_ENCODING = UTF-8
FILE_PATTERNS = *.c \
*.cc \
*.cxx \
*.cpp \
*.c++ \
*.java \
*.ii \
*.ixx \
*.ipp \
*.i++ \
*.inl \
*.idl \
*.ddl \
*.odl \
*.h \
*.hh \
*.hxx \
*.hpp \
*.h++ \
*.cs \
*.d \
*.php \
*.php4 \
*.php5 \
*.phtml \
*.inc \
*.m \
*.markdown \
*.md \
*.mm \
*.dox \
*.py \
*.f90 \
*.f \
*.for \
*.tcl \
*.vhd \
*.vhdl \
*.ucf \
*.qsf \
*.as \
*.js
RECURSIVE = YES
EXCLUDE =
EXCLUDE_SYMLINKS = NO
EXCLUDE_PATTERNS =
EXCLUDE_SYMBOLS =
EXAMPLE_PATH =
EXAMPLE_PATTERNS = *
EXAMPLE_RECURSIVE = NO
IMAGE_PATH =
INPUT_FILTER =
FILTER_PATTERNS =
FILTER_SOURCE_FILES = NO
FILTER_SOURCE_PATTERNS =
USE_MDFILE_AS_MAINPAGE =
#---------------------------------------------------------------------------
# Configuration options related to source browsing
#---------------------------------------------------------------------------
SOURCE_BROWSER = YES
INLINE_SOURCES = NO
STRIP_CODE_COMMENTS = YES
REFERENCED_BY_RELATION = NO
REFERENCES_RELATION = NO
REFERENCES_LINK_SOURCE = YES
SOURCE_TOOLTIPS = YES
USE_HTAGS = NO
VERBATIM_HEADERS = YES
CLANG_ASSISTED_PARSING = NO
CLANG_OPTIONS =
#---------------------------------------------------------------------------
# Configuration options related to the alphabetical class index
#---------------------------------------------------------------------------
ALPHABETICAL_INDEX = YES
COLS_IN_ALPHA_INDEX = 5
IGNORE_PREFIX =
#---------------------------------------------------------------------------
# Configuration options related to the HTML output
#---------------------------------------------------------------------------
GENERATE_HTML = YES
HTML_OUTPUT = html
HTML_FILE_EXTENSION = .html
HTML_HEADER =
HTML_FOOTER =
HTML_STYLESHEET =
HTML_EXTRA_STYLESHEET =
HTML_EXTRA_FILES =
HTML_COLORSTYLE_HUE = 220
HTML_COLORSTYLE_SAT = 100
HTML_COLORSTYLE_GAMMA = 80
HTML_TIMESTAMP = NO
HTML_DYNAMIC_SECTIONS = NO
HTML_INDEX_NUM_ENTRIES = 100
GENERATE_DOCSET = NO
DOCSET_FEEDNAME = "Doxygen generated docs"
DOCSET_BUNDLE_ID = org.doxygen.Project
DOCSET_PUBLISHER_ID = org.doxygen.Publisher
DOCSET_PUBLISHER_NAME = Publisher
GENERATE_HTMLHELP = NO
CHM_FILE =
HHC_LOCATION =
GENERATE_CHI = NO
CHM_INDEX_ENCODING =
BINARY_TOC = NO
TOC_EXPAND = NO
GENERATE_QHP = NO
QCH_FILE =
QHP_NAMESPACE = org.doxygen.Project
QHP_VIRTUAL_FOLDER = doc
QHP_CUST_FILTER_NAME =
QHP_CUST_FILTER_ATTRS =
QHP_SECT_FILTER_ATTRS =
QHG_LOCATION =
GENERATE_ECLIPSEHELP = NO
ECLIPSE_DOC_ID = org.doxygen.Project
DISABLE_INDEX = NO
GENERATE_TREEVIEW = YES
ENUM_VALUES_PER_LINE = 4
TREEVIEW_WIDTH = 250
EXT_LINKS_IN_WINDOW = NO
FORMULA_FONTSIZE = 10
FORMULA_TRANSPARENT = YES
USE_MATHJAX = NO
MATHJAX_FORMAT = HTML-CSS
MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest
MATHJAX_EXTENSIONS =
MATHJAX_CODEFILE =
SEARCHENGINE = YES
SERVER_BASED_SEARCH = NO
EXTERNAL_SEARCH = NO
SEARCHENGINE_URL =
SEARCHDATA_FILE = searchdata.xml
EXTERNAL_SEARCH_ID =
EXTRA_SEARCH_MAPPINGS =
#---------------------------------------------------------------------------
# Configuration options related to the LaTeX output
#---------------------------------------------------------------------------
GENERATE_LATEX = NO
LATEX_OUTPUT = latex
LATEX_CMD_NAME = latex
MAKEINDEX_CMD_NAME = makeindex
COMPACT_LATEX = NO
PAPER_TYPE = a4
EXTRA_PACKAGES =
LATEX_HEADER =
LATEX_FOOTER =
LATEX_EXTRA_STYLESHEET =
LATEX_EXTRA_FILES =
PDF_HYPERLINKS = NO
USE_PDFLATEX = YES
LATEX_BATCHMODE = NO
LATEX_HIDE_INDICES = NO
LATEX_SOURCE_CODE = NO
LATEX_BIB_STYLE = plain
#---------------------------------------------------------------------------
# Configuration options related to the RTF output
#---------------------------------------------------------------------------
GENERATE_RTF = NO
RTF_OUTPUT = rtf
COMPACT_RTF = NO
RTF_HYPERLINKS = NO
RTF_STYLESHEET_FILE =
RTF_EXTENSIONS_FILE =
RTF_SOURCE_CODE = NO
#---------------------------------------------------------------------------
# Configuration options related to the man page output
#---------------------------------------------------------------------------
GENERATE_MAN = NO
MAN_OUTPUT = man
MAN_EXTENSION = .3
MAN_SUBDIR =
MAN_LINKS = NO
#---------------------------------------------------------------------------
# Configuration options related to the XML output
#---------------------------------------------------------------------------
GENERATE_XML = NO
XML_OUTPUT = xml
XML_PROGRAMLISTING = YES
#---------------------------------------------------------------------------
# Configuration options related to the DOCBOOK output
#---------------------------------------------------------------------------
GENERATE_DOCBOOK = NO
DOCBOOK_OUTPUT = docbook
DOCBOOK_PROGRAMLISTING = NO
#---------------------------------------------------------------------------
# Configuration options for the AutoGen Definitions output
#---------------------------------------------------------------------------
GENERATE_AUTOGEN_DEF = NO
#---------------------------------------------------------------------------
# Configuration options related to the Perl module output
#---------------------------------------------------------------------------
GENERATE_PERLMOD = NO
PERLMOD_LATEX = NO
PERLMOD_PRETTY = YES
PERLMOD_MAKEVAR_PREFIX =
#---------------------------------------------------------------------------
# Configuration options related to the preprocessor
#---------------------------------------------------------------------------
ENABLE_PREPROCESSING = YES
MACRO_EXPANSION = YES
EXPAND_ONLY_PREDEF = YES
SEARCH_INCLUDES = YES
INCLUDE_PATH =
INCLUDE_FILE_PATTERNS =
PREDEFINED = __attribute__(x)=
EXPAND_AS_DEFINED =
SKIP_FUNCTION_MACROS = YES
#---------------------------------------------------------------------------
# Configuration options related to external references
#---------------------------------------------------------------------------
TAGFILES =
GENERATE_TAGFILE =
ALLEXTERNALS = NO
EXTERNAL_GROUPS = YES
EXTERNAL_PAGES = YES
PERL_PATH = /usr/bin/perl
#---------------------------------------------------------------------------
# Configuration options related to the dot tool
#---------------------------------------------------------------------------
CLASS_DIAGRAMS = YES
MSCGEN_PATH =
DIA_PATH =
HIDE_UNDOC_RELATIONS = YES
HAVE_DOT = YES
DOT_NUM_THREADS = 8
DOT_FONTNAME = Helvetica
DOT_FONTSIZE = 10
DOT_FONTPATH =
CLASS_GRAPH = NO
COLLABORATION_GRAPH = NO
GROUP_GRAPHS = YES
UML_LOOK = NO
UML_LIMIT_NUM_FIELDS = 10
TEMPLATE_RELATIONS = NO
INCLUDE_GRAPH = YES
INCLUDED_BY_GRAPH = YES
CALL_GRAPH = YES
CALLER_GRAPH = YES
GRAPHICAL_HIERARCHY = NO
DIRECTORY_GRAPH = YES
DOT_IMAGE_FORMAT = png
INTERACTIVE_SVG = NO
DOT_PATH =
DOTFILE_DIRS =
MSCFILE_DIRS =
DIAFILE_DIRS =
PLANTUML_JAR_PATH =
PLANTUML_INCLUDE_PATH =
DOT_GRAPH_MAX_NODES = 50
MAX_DOT_GRAPH_DEPTH = 0
DOT_TRANSPARENT = NO
DOT_MULTI_TARGETS = NO
GENERATE_LEGEND = YES
DOT_CLEANUP = YES
Many thanks in advance for your help!
Related
Spacy v3 - ValueError: [E030] Sentence boundaries unset
I'm training an entity linker model with spacy 3, and am getting the following error when running spacy train: ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe('sentencizer'). Alternatively, add the dependency parser or sentence recognizer, or set sentence boundaries by setting doc[i].is_sent_start. . I've tried with both transformer and tok2vec pipelines, it seems to be failing on this line: File "/usr/local/lib/python3.7/dist-packages/spacy/pipeline/entity_linker.py", line 252, in update sentences = [s for s in eg.reference.sents] Running spacy debug data shows no errors. I'm using the following config, before filling it in with spacy init fill-config: [paths] train = null dev = null kb = "./kb" [system] gpu_allocator = "pytorch" [nlp] lang = "en" pipeline = ["transformer","parser","sentencizer","ner", "entity_linker"] batch_size = 128 [components] [components.transformer] factory = "transformer" [components.transformer.model] #architectures = "spacy-transformers.TransformerModel.v3" name = "roberta-base" tokenizer_config = {"use_fast": true} [components.transformer.model.get_spans] #span_getters = "spacy-transformers.strided_spans.v1" window = 128 stride = 96 [components.sentencizer] factory = "sentencizer" punct_chars = null [components.entity_linker] factory = "entity_linker" entity_vector_length = 64 get_candidates = {"#misc":"spacy.CandidateGenerator.v1"} incl_context = true incl_prior = true labels_discard = [] [components.entity_linker.model] #architectures = "spacy.EntityLinker.v1" nO = null [components.entity_linker.model.tok2vec] #architectures = "spacy.HashEmbedCNN.v1" pretrained_vectors = null width = 96 depth = 2 embed_size = 2000 window_size = 1 maxout_pieces = 3 subword_features = true [components.parser] factory = "parser" [components.parser.model] #architectures = "spacy.TransitionBasedParser.v2" state_type = "parser" extra_state_tokens = false hidden_width = 128 maxout_pieces = 3 use_upper = false nO = null [components.parser.model.tok2vec] #architectures = "spacy-transformers.TransformerListener.v1" grad_factor = 1.0 [components.parser.model.tok2vec.pooling] #layers = "reduce_mean.v1" [components.ner] factory = "ner" [components.ner.model] #architectures = "spacy.TransitionBasedParser.v2" state_type = "ner" extra_state_tokens = false hidden_width = 64 maxout_pieces = 2 use_upper = false nO = null [components.ner.model.tok2vec] #architectures = "spacy-transformers.TransformerListener.v1" grad_factor = 1.0 [components.ner.model.tok2vec.pooling] #layers = "reduce_mean.v1" [corpora] [corpora.train] #readers = "spacy.Corpus.v1" path = ${paths.train} max_length = 0 [corpora.dev] #readers = "spacy.Corpus.v1" path = ${paths.dev} max_length = 0 [training] accumulate_gradient = 3 dev_corpus = "corpora.dev" train_corpus = "corpora.train" [training.optimizer] #optimizers = "Adam.v1" [training.optimizer.learn_rate] #schedules = "warmup_linear.v1" warmup_steps = 250 total_steps = 20000 initial_rate = 5e-5 [training.batcher] #batchers = "spacy.batch_by_padded.v1" discard_oversize = true size = 2000 buffer = 256 [initialize] vectors = ${paths.vectors} [initialize.components] [initialize.components.sentencizer] [initialize.components.entity_linker] [initialize.components.entity_linker.kb_loader] #misc = "spacy.KBFromFile.v1" kb_path = ${paths.kb} I can write a script to add the sentence boundaries in manually to the docs, but am wondering why the sentencizer component is not doing this for me, is there something missing in the config?
You haven't put the sentencizer in annotating_components, so the updates it makes aren't visible to other components during training. Take a look at the relevant section in the docs.
Oanda API - Issue Price - Instruments
I'm using Oanda API to automate Trading strategies, I have a 'price' error that only occurs when selecting some instruments such as XAG (silver), my guess is that there is a classification difference but Oanda is yet to answer on the matter. The error does not occur when selecting Forex pairs. If anyone had such issues in the past and managed to solve it I'll be happy to hear form them. PS: I'm UK based and have access to most products including CFDs class SMABollTrader(tpqoa.tpqoa): def __init__(self, conf_file, instrument, bar_length, SMA, dev, SMA_S, SMA_L, units): super().__init__(conf_file) self.instrument = instrument self.bar_length = pd.to_timedelta(bar_length) self.tick_data = pd.DataFrame() self.raw_data = None self.data = None self.last_bar = None self.units = units self.position = 0 self.profits = [] self.price = [] #*****************add strategy-specific attributes here****************** self.SMA = SMA self.dev = dev self.SMA_S = SMA_S self.SMA_L = SMA_L #************************************************************************ def get_most_recent(self, days = 5): while True: time.sleep(2) now = datetime.utcnow() now = now - timedelta(microseconds = now.microsecond) past = now - timedelta(days = days) df = self.get_history(instrument = self.instrument, start = past, end = now, granularity = "S5", price = "M", localize = False).c.dropna().to_frame() df.rename(columns = {"c":self.instrument}, inplace = True) df = df.resample(self .bar_length, label = "right").last().dropna().iloc[:-1] self.raw_data = df.copy() self.last_bar = self.raw_data.index[-1] if pd.to_datetime(datetime.utcnow()).tz_localize("UTC") - self.last_bar < self.bar_length: break def on_success(self, time, bid, ask): print(self.ticks, end = " ") recent_tick = pd.to_datetime(time) df = pd.DataFrame({self.instrument:(ask + bid)/2}, index = [recent_tick]) self.tick_data = self.tick_data.append(df) if recent_tick - self.last_bar > self.bar_length: self.resample_and_join() self.define_strategy() self.execute_trades() def resample_and_join(self): self.raw_data = self.raw_data.append(self.tick_data.resample(self.bar_length, label="right").last().ffill().iloc[:-1]) self.tick_data = self.tick_data.iloc[-1:] self.last_bar = self.raw_data.index[-1] def define_strategy(self): # "strategy-specific" df = self.raw_data.copy() #******************** define your strategy here ************************ df["SMA"] = df[self.instrument].rolling(self.SMA).mean() df["Lower"] = df["SMA"] - df[self.instrument].rolling(self.SMA).std() * self.dev df["Upper"] = df["SMA"] + df[self.instrument].rolling(self.SMA).std() * self.dev df["distance"] = df[self.instrument] - df.SMA df["SMA_S"] = df[self.instrument].rolling(self.SMA_S).mean() df["SMA_L"] = df[self.instrument].rolling(self.SMA_L).mean() df["position"] = np.where(df[self.instrument] < df.Lower) and np.where(df["SMA_S"] > df["SMA_L"] ,1,np.nan) df["position"] = np.where(df[self.instrument] > df.Upper) and np.where(df["SMA_S"] < df["SMA_L"], -1, df["position"]) df["position"] = np.where(df.distance * df.distance.shift(1) < 0, 0, df["position"]) df["position"] = df.position.ffill().fillna(0) self.data = df.copy() #*********************************************************************** def execute_trades(self): if self.data["position"].iloc[-1] == 1: if self.position == 0 or None: order = self.create_order(self.instrument, self.units, suppress = True, ret = True) self.report_trade(order, "GOING LONG") elif self.position == -1: order = self.create_order(self.instrument, self.units * 2, suppress = True, ret = True) self.report_trade(order, "GOING LONG") self.position = 1 elif self.data["position"].iloc[-1] == -1: if self.position == 0: order = self.create_order(self.instrument, -self.units, suppress = True, ret = True) self.report_trade(order, "GOING SHORT") elif self.position == 1: order = self.create_order(self.instrument, -self.units * 2, suppress = True, ret = True) self.report_trade(order, "GOING SHORT") self.position = -1 elif self.data["position"].iloc[-1] == 0: if self.position == -1: order = self.create_order(self.instrument, self.units, suppress = True, ret = True) self.report_trade(order, "GOING NEUTRAL") elif self.position == 1: order = self.create_order(self.instrument, -self.units, suppress = True, ret = True) self.report_trade(order, "GOING NEUTRAL") self.position = 0 def report_trade(self, order, going): time = order["time"] units = order["units"] price = order["price"] pl = float(order["pl"]) self.profits.append(pl) cumpl = sum(self.profits) print("\n" + 100* "-") print("{} | {}".format(time, going)) print("{} | units = {} | price = {} | P&L = {} | Cum P&L = {}".format(time, units, price, pl, cumpl)) print(100 * "-" + "\n") trader = SMABollTrader("oanda.cfg", "EUR_GBP", "15m", SMA = 82, dev = 4, SMA_S = 38, SMA_L = 135, units = 100000) trader.get_most_recent() trader.stream_data(trader.instrument, stop = None ) if trader.position != 0: # if we have a final open position close_order = trader.create_order(trader.instrument, units = -trader.position * trader.units, suppress = True, ret = True) trader.report_trade(close_order, "GOING NEUTRAL") trader.signal = 0
I have done Hagmann course as well and I have recognised your code immediately. Firstly the way you define your positions is not the best. Look at the section of combining two strategies. There are two ways. Now regarding your price problem I had a similar situation with BTC. You can download it's historical data but when I plotted it to the strategy code and started to stream I had exactly the same error indicating that tick data was never streamed. I am guessing that simply not all instruments are tradeable via api or in your case maybe you tried to stream beyond trading hours?
Appending tables generated from a loop
I am a new python user here and am trying to append data together that I have pulled from a pdf using Camelot but am having trouble getting them to join together. Here is my code: url = 'https://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_Tables.pdf' tables = camelot.read_pdf(url,flavor='stream', edge_tol = 500, pages = '1-end') i = 0 while i in range(0,tables.n): header = tables[i].df.index[tables[i].df.iloc[:,0]=='Metropolitan Statistical Area'].to_list() header = str(header)[1:-1] header = (int(header)) tables[i].df = tables[i].df.rename(columns = tables[i].df.iloc[header]) tables[i].df = tables[i].df.drop(columns = {'': 'Blank'}) print(tables[i].df) #appended_data.append(tables[i].df) #if i > 0: # dfs = tables[i-1].append(tables[i], ignore_index = True) #pass i = i + 1 any help would be much appreciated
You can use pandas.concat() to concat a list of dataframe. while i in range(0,tables.n): header = tables[i].df.index[tables[i].df.iloc[:,0]=='Metropolitan Statistical Area'].to_list() header = str(header)[1:-1] header = (int(header)) tables[i].df = tables[i].df.rename(columns = tables[i].df.iloc[header]) tables[i].df = tables[i].df.drop(columns = {'': 'Blank'}) df_ = pd.concat([table.df for table in tables])
Train custom NER component with a base model in spaCy v3
I'm having problems training a custom NER component within a base model in spaCy's new version. So far, I've been training my NER model at CLI with the following command: python -m spacy train en model training validation --base-model en_core_web_sm --pipeline "ner" -R -n 10 Depending on the use case, I took en_core_web_sm or en_core_web_lg as the base model to make use of the other components like tagger and pos. In spaCy version 3 a config file is required to handle the command at CLI. I'm using the following configurations for training: [paths] train = "training/" dev = "validation/" vectors = null init_tok2vec = null [system] gpu_allocator = null seed = 0 [nlp] lang = "en" pipeline = ["ner"] batch_size = 1000 disabled = [] before_creation = null after_creation = null after_pipeline_creation = null tokenizer = {"#tokenizers":"spacy.Tokenizer.v1"} [components] [components.ner] factory = "ner" moves = null update_with_oracle_cut_size = 100 [components.ner.model] #architectures = "spacy.TransitionBasedParser.v2" state_type = "ner" extra_state_tokens = false hidden_width = 64 maxout_pieces = 2 use_upper = true nO = null [corpora] [corpora.dev] #readers = "spacy.Corpus.v1" path = ${paths.dev} max_length = 0 gold_preproc = false limit = 0 augmenter = null [corpora.train] #readers = "spacy.Corpus.v1" path = ${paths.train} max_length = 2000 gold_preproc = false limit = 0 augmenter = null [training] dev_corpus = "corpora.dev" train_corpus = "corpora.train" seed = ${system.seed} gpu_allocator = ${system.gpu_allocator} dropout = 0.1 accumulate_gradient = 1 patience = 1600 max_epochs = 0 max_steps = 20000 eval_frequency = 200 frozen_components = [] before_to_disk = null [training.batcher] #batchers = "spacy.batch_by_words.v1" discard_oversize = false tolerance = 0.2 get_length = null [training.batcher.size] #schedules = "compounding.v1" start = 100 stop = 1000 compound = 1.001 t = 0.0 [training.logger] #loggers = "spacy.ConsoleLogger.v1" progress_bar = false [training.optimizer] #optimizers = "Adam.v1" beta1 = 0.9 beta2 = 0.999 L2_is_weight_decay = true L2 = 0.01 grad_clip = 1.0 use_averages = false eps = 0.00000001 learn_rate = 0.001 [training.score_weights] ents_per_type = null ents_f = 1.0 ents_p = 0.0 ents_r = 0.0 [pretraining] [initialize] vectors = null init_tok2vec = null vocab_data = null lookups = null before_init = null after_init = null [initialize.components] Since I'm not familiar to spaCy's new version, these are pretty much the default settings. Unfortunately, I can only the the model from scratch and I can't find an option anymore, to only train the NER component within an existing language model. I have also tried to add the parser component in the configuration file with [components] [components.parser] source = "en_core_web_sm" ... But then the model is not even loadable raising the following error nn_parser.pyx in spacy.syntax.nn_parser.Parser.from_disk() nn_parser.pyx in spacy.syntax.nn_parser.Parser.Model() TypeError: Model() takes exactly 1 positional argument (0 given)
In SpaCy 3.0, what you want to do first is initialize your config file to have components that you need: python -m spacy init config config.cfg --lang en --pipeline tagger,parser,ner,attribute_ruler,senter,lemmatizer,tok2vec Then, you want to go to the config.cfg and override settings - for example, you can use vectors from existing model: [initialize] vectors = "en_core_veb_lg" init_tok2vec = null vocab_data = null lookups = null before_init = null after_init = null Then you can run the train command: python -m spacy train config.cfg --paths.train ./path_to_your_train_data.spacy --paths.dev ./path_to_your_validation_data.spacy --output ./your_model_name I also found that it's possible to just go to the model folder and swap out components manually, as well as load different components from different models in the code into a single pipeline. If you need to use a component from an existing model, you can use the following setting in your config.cfg: [components.tagger] source = "en_core_web_lg" For more info on using existing models and components go to SpaCy documentation.
In Middleman how do I generate day, month, and year blog archive links?
I'm trying to generate year and month archive links. Right now I can set one of the three. When I set two or more and look at the Sitemap I get the error: NoMethodError at /__middleman/sitemap/ undefined method `add_path' for #<Middleman::MetaPages::SitemapResource:0x007fedd61597a0>` Here is my blog from config.rb activate :blog do |blog| blog.name = "blog" blog.permalink = "/{title}/" blog.taglink = "blog/:tag.html" blog.year_link = "blog/{year}/" blog.month_link = "blog/{year}/{month}/" blog.day_link = "blog/{year}/{month}/{day}/" blog.sources = "blog/{year}-{month}-{day}-{title}.html" #blog.year_template = "blog/calendar.html" blog.month_template = "blog/calendar.html" #blog.day_template = "blog/calendar.html" blog.layout = "blog-layout" blog.tag_template = "blog/tag.html" #blog.calendar_template = "blog/calendar.html" # This will add a prefix to all links, template references and source paths # blog.prefix = "blog" # Matcher for blog source files # blog.sources = "{year}-{month}-{day}-{title}.html" # blog.summary_separator = /(READMORE)/ # blog.summary_length = 250 # blog.default_extension = ".markdown" # Enable pagination # blog.paginate = true # blog.per_page = 10 # blog.page_link = "page/{num}" end