Is there a way to view a list of PyTable file marks? - pytables

In the Pytable file class you can get_current_mark(), goto(mark), undo(mark), mark(), etc. but is there a way to see a list of all available marks? Reference: http://pytables.github.io/usersguide/libref/file_class.html

While not part of the public API - which does not provide a way of doing what you are asking for - there are two private members that you can use. The first is File._markers which is a dictionary of mark names to the their row number in the File._actionlog table, which is a table in the file of all of the actions taken thus far. I think that sorted(File._markers.items(), key=lambda x: x[1]) is probably what you want.

Related

What is the difference between `matplotlib.rc` and `matplotlib.rcParams`? And which one to use?

I have been using matplotlib.rc in my scripts to preprocess my plots. But recently I have realized that using matplotlib.rcParams is much easier before doing a quick plot interactively (e.g. via IPython). This got me into thinking what difference between the two is.
I searched the matplotlib documentation wherein no clear answer was provided in this regard. Moreover, when I issue type(matplotlib.rc), the interpreter says that it is a function. On the other hand, when I issue type(matplotlib.rcParams), I am told that it is a class object. These two answers are not at all helpful and hence I would appreciate some help differentiating the two.
Additionally, I would like to know which one to prefer over the other.
Thanks in advance.
P.S. I went through this question: What's the difference between matplotlib.rc and matplotlib.pyplot.rc? but the answers are specific to the difference between the matplotlib instance and the pyplot instance of the two types I am enquiring about and, hence, is also not that helpful.
matplotlib.rc is a function that updates matplotlib.rcParams.
matplotlib.rcParams is a dict-subclass that provides a validate key-value map for Matplotlib configuration.
The docs for mpl.rc are at https://matplotlib.org/stable/api/matplotlib_configuration_api.html?highlight=rc#matplotlib.rc and the code is here.
The class definition of RcParams is here and it the instance is created here.
If we look at the guts of matplotlib.rc we see:
for g in group:
for k, v in kwargs.items():
name = aliases.get(k) or k
key = '%s.%s' % (g, name)
try:
rcParams[key] = v
except KeyError as err:
raise KeyError(('Unrecognized key "%s" for group "%s" and '
'name "%s"') % (key, g, name)) from err
where we see that matplotlib.rc does indeed update matplotlib.rcParams (after doing some string formatting).
You should use which ever one is more convenient for you. If you know exactly what key you want to update, then interacting with the dict-like is better, if you want to set a whole bunch of values in a group then mpl.rc is likely better!

Automatically detect security identifier columns using Visions

I'm interested in using the Visions library to automate the process of identifying certain types of security (stock) identifiers. The documentation mentions that it could be used in such a way for ISBN codes but I'm looking for a more concrete example of how to do it. I think the process would be pretty much identical for the fields I'm thinking of as they all have check digits (ISIN, SEDOL, CUSIP).
My general idea is that I would create custom types for the different identifier types and could use those types to
Take a dataframe where the types are unknown and identify columns matching the types (even if it's not a 100% match)
Validate the types on a dataframe where the intended type is known
Great question and use-case! Unfortunately, the documentation on making new types probably needs a little love right now as there were API breaking changes with the 0.7.0 release. Both the previous link and this post from August, 2020 should cover the conceptual idea of type creation in greater detail. If any of those examples break then mea culpa and our apologies, we switched to a dispatch based implementation to support different backends (pandas, numpy, dask, spark, etc...) for each type. You shouldn't have to worry about that for now but if you're interested you can find the default type definitions here with their backends here.
Building an ISBN Type
We need to make two basic decisions when defining a type:
What defines the type
What other types are our new type related to?
For the ISBN use-case O'Reilly provides a validation regex to match ISBN-10 and ISBN-13 codes. So,
What defines a type?
We want every element in the sequence to be a string which matches a corresponding ISBN-10 or ISBN-13 regex
What other types are our new type related to?
Since ISBN's are themselves strings we can use the default String type provided by visions.
Type Definition
from typing import Sequence
import pandas as pd
from visions.relations import IdentityRelation, TypeRelation
from visions.types.string import String
from visions.types.type import VisionsBaseType
isbn_regex = "^(?:ISBN(?:-1[03])?:?●)?(?=[0-9X]{10}$|(?=(?:[0-9]+[-●]){3})[-●0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[-●]){4})[-●0-9]{17}$)(?:97[89][-●]?)?[0-9]{1,5}[-●]?[0-9]+[-●]?[0-9]+[-●]?[0-9X]$"
class ISBN(VisionsBaseType):
#staticmethod
def get_relations() -> Sequence[TypeRelation]:
relations = [
IdentityRelation(String),
]
return relations
#staticmethod
def contains_op(series: pd.Series, state: dict) -> bool:
return series.str.contains(isbn_regex).all()
Looking at this closely there are three things to take note of.
The new type inherits from VisionsBaseType
We had to define a get_relations method which is how we relate a new type to others we might want to use in a typeset. In this case, I've used an IdentityRelation to String which means ISBNs are subsets of String. We can also use InferenceRelation's when we want to support relations which change the underlying data (say converting the string '4.2' to the float 4.2).
A contains_op this is our definition of the type. In this case, we are applying a regex string to every element in the input and verifying it matched the regex provided by O'Reilly.
Extensions
In theory ISBNs can be encoded in what looks like a 10 or 13 digit integer as well - to work with those you might want to create an InferenceRelation between Integer and ISBN. A simple implementation would involve coercing Integers to string and applying the above regex.

Is there a way to assign an internal string or identifier or tag to a matplotlib artist?

Sometimes it is useful to assign a 'tag', which can be a simple string, to a matplotlib artist in order to later find it easily.
If we imagine a scenario where say plt.Line2D had a property called tag which can be retrieved using plt.Line2D.get_tag() it would be very easy to find it later in a complicated plot.
The only thing I can find that looks remotely similar is the group ID: for example line.set_gid() and line.get_gid(). I haven't found any good documentation on this. The only reference is this. Is this meant for such use as described above? Is it reserved for other operations in matplotlib?
This would be very useful for grouping different artists and then performing operations on them later, for example:
for line in ax.get_lines():
if line.get_tag() == 'group A'
line.set_color('red')
# or whatever other operation
Does such a thing exist?
You can use the gid for such purposes. The only side-effect is that those names will appear in a saved svg file as the gid tag.
Alternatively you can assign any attribute to a python object.
line, = plt.plot(...)
line.myid = "group A"
just make sure not to use any existing attribute in such case.

SHOW KEYS in Aerospike?

I'm new to Aerospike and am probably missing something fundamental, but I'm trying to see an enumeration of the Keys in a Set (I'm purposefully avoiding the word "list" because it's a datatype).
For example,
To see all the Namespaces, the docs say to use SHOW NAMESPACES
To see all the Sets, we can use SHOW SETS
If I want to see all the unique Keys in a Set ... what command can I use?
It seems like one can use client.scan() ... but that seems like a super heavy way to get just the key (since it fetches all the bin data as well).
Any recommendations are appreciated! As of right now, I'm thinking of inserting (deleting) into (from) a meta-record.
Thank you #pgupta for pointing me in the right direction.
This actually has two parts:
In order to retrieve original keys from the server, one must -- during put() calls -- set policy to save the key value server-side (otherwise, it seems only a digest/hash is stored?).
Here's an example in Python:
aerospike_client.put(key, {'bin': 'value'}, policy={'key': aerospike.POLICY_KEY_SEND})
Then (modified Aerospike's own documentation), you perform a scan and set the policy to not return the bin data. From this, you can extract the keys:
Example:
keys = []
scan = client.scan('namespace', 'set')
scan_opts = { 'concurrent': True, 'nobins': True, 'priority': aerospike.SCAN_PRIORITY_MEDIUM }
for x in (scan.results(policy=scan_opts)): keys.append(x[0][2])
The need to iterate over the result still seems a little clunky to me; I still think that using a 'master-key' Record to store a list of all the other keys will be more performant, in my case -- in this way, I can simply make one get() call to the Aerospike server to retrieve the list.
You can choose not bring the data back by setting includeBinData in ScanPolicy to false.

Is there any way I can define a variable in LaTeX?

In LaTeX, how can I define a string variable whose content is used instead of the variable in the compiled PDF?
Let's say I'm writing a tech doc on a software and I want to define the package name in the preamble or somewhere so that if its name changes, I don't have to replace it in a lot of places but only in one place.
add the following to you preamble:
\newcommand{\newCommandName}{text to insert}
Then you can just use \newCommandName{} in the text
For more info on \newcommand, see e.g. wikibooks
Example:
\documentclass{article}
\newcommand\x{30}
\begin{document}
\x
\end{document}
Output:
30
Use \def command:
\def \variable {Something that's better to use as a variable}
Be aware that \def overrides preexisting macros without any warnings and therefore can cause various subtle errors. To overcome this either use namespaced variables like my_var or fall back to \newcommand, \renewcommand commands instead.
For variables describing distances, you would use \newlength (and manipulate the values with \setlength, \addlength, \settoheight, \settolength and \settodepth).
Similarly you have access to \newcounter for things like section and figure numbers which should increment throughout the document. I've used this one in the past to provide code samples that were numbered separatly of other figures...
Also of note is \makebox which allows you to store a bit of laid-out document for later re-use (and for use with \settolength...).
If you want to use \newcommand, you can also include \usepackage{xspace} and define command by \newcommand{\newCommandName}{text to insert\xspace}.
This can allow you to just use \newCommandName rather than \newCommandName{}.
For more detail, http://www.math.tamu.edu/~harold.boas/courses/math696/why-macros.html
I think you probably want to use a token list for this purpose:
to set up the token list
\newtoks\packagename
to assign the name:
\packagename={New Name for the package}
to put the name into your output:
\the\packagename.