Dynamically Generating Pydantic Model from a Schema JSON File - dynamic

I want to dynamically generate a Pydantic model at runtime. I can do this by calling create_model. For example,
from pydantic import create_model
create_model("MyModel", i=(int,...), s=(str...))
does this same thing as
from pydantic import BaseModel
class MyModel(BaseModel):
i: int
s: str
I want to serialize these Pydantic schemas as JSON. It's easy to write code to parse JSON into create_model arguments, and it would make sense to use the output of BaseModel.schema_json() since that already defines a serialization format. That makes me think that there should already be some sort of BaseModel.from_json_schema classmethod that could dynamically create a model like so
from pydantic import BaseModel
class MyModel(BaseModel):
i: int
s: str
my_model = BaseModel.from_json_schema(MyModel.schema_json())
my_model(i=5, s="s") # returns MyModel(i=5, s="s")
I can't find any such function in the documentation. Am I overlooking something, or do I have to write my own own JSON schema deserialization code?

This has been discussed some time ago and Samuel Colvin said he didn't want to pursue this as a feature for Pydantic.
If you are fine with code generation instead of actual runtime creation of models, you can use the datamodel-code-generator.
To be honest, I struggle to see the use case for generating complex models at runtime, seeing as their main purpose is validation, implying that you think about correct schema before running your program. But that is just my view.
For simple models I guess you can throw together your own logic for this fairly quickly.
If do you need something more sophisticated, the aforementioned library does offer some extensibility. You should be able to import and inherit from some of their classes like the JsonSchemaParser. Maybe that will get you somewhere.
Ultimately I think this becomes non-trivial very quickly, which is why Pydantic's maintainer didn't want to deal with it and why there is a whole separate project for this.

Related

Is protobuf-net suited for serializing arbitrary object/domain models?

I have been exploring the CQRS/DDD-principles and patterns for a while now and have started implementing a sample project where I have split my storage-model into a WriteModel and a ReadModel. The WriteModel will use a simple NoSQL-like database where aggregates are stored in a key-value style, with value being just a serialized version of the aggregate.
I am now looking at ProtoBuf-Net for serializing and deserializing my domain model aggregates in and out of storage. Other than this post I haven't found any guidance or tips for using ProtoBuf-Net in this area. The point is that the (ideal) requirements for serialization and deserialization of aggregates is that the domain model should have as little knowledge as possible about this infrastructural concern, which implies the following:
No attributes on the classes
No constructors, getters, setters or any other piece of code just for the sake of serialization.
Ability to use any (custom) type possible and have it serialized/deserialized.
Thus far I have implemented just the serialization of the first versions of my aggregates which works perfectly fine. I use the RuntimeTypeModel.Default-instance to configure the MetaModel at runtime and have UseConstructor = false everywhere, which enables me to completely separate the serialization mechanics from my domain-assembly. I have even implemented a custom post-deserialization mechanism that enables me to just-in-time initialize fields after ProtoBuf-Net has deserialized it into a valid instance. So suppose I have class AggregateA like so:
[Version(1)]
public sealed class AggregateA
{
private readonly int _x;
private readonly string _y;
...
}
Then in my serialization-library I have code something along the following lines:
var metaType = RuntimeTypeModel.Default.Add(typeof(AggregateA), false);
metaType.UseConstructor = false;
metaType.AddField(1, "_x");
metaType.AddField(2, "_y");
...
However, I realize that up to this point I have only implemented the basic scenario, and I am now starting to think about how to approach versioning of my model. I am particularly interested in larger refactoring-scenario's, where type A has been split into type A1 and A2, for example:
[Version(2)]
public sealed class AggregateA1
{
private readonly int _x;
...
}
[Version(2)]
public sealed class AggregateA2
{
private readonly string _y;
...
}
Suppose I have a serialized bunch of instances of AggregateA, but now my domain model knows only AggregateA1 and AggregateA2, how would you handle this scenario with ProtoBuf-Net?
A second question deals with point 3: is ProtoBuf-Net capable of handling arbitrary types if you're willing to put in some extra configuration-effort? I've read about exceptions raised when using the DateTimeOffset-type, which makes me think not all types can be serialized by the framework out-of-the-box, but can I serialize these types by registering them in the RuntimeTypeModel? Should I even want to go there? Or better to forget about serializing common .NET types other than the simple ones?
protobuf-net is intended to work with predictable known models. It is true that everything can be configured at runtime, but I have not put any thought as to how to handle your A1/A2 scenario, precisely because that is not a supported scenario (in my defense, I can't see that working nicely with most serializers). Thinking off the top of my head, if you have the configuration/mapping data somewhere, then you could simply deserialize twice; i.e. as long as we still tell it that AggregateA1._x maps to 1 and AggregateA2._y maps to 2, you can do:
object a1 = model.Deserialize(source, null, typeof(AggregateA1));
source.Position = 0; // rewind
object a2 = model.Deserialize(source, null, typeof(AggregateA2));
However, more complex tweaks would require additional thought.
Re "arbitrary types"... define "arbitrary" ;p In particular, there is support for "surrogate" types which can be useful for some transformations - but without a very specific "problem statement" it is hard to answer completely.
Summary:
protobuf-net has an intended usage, which includes both serialization-aware (attributed, etc) and non-aware scenarios (runtime configuration, etc) - but it also works for a range of more bespoke scenarios (letting you drop to the raw reader/writer API if you want to). It does not and cannot guarantee to be a direct fit for every serialization scenario imaginable, and how well it behaves will depend on how far from that scenario you are.

OOP: class inheritance to add just one property vs constructor argument

I'm new to OOP and I'm in the following situation: I have something like a report "Engine" that is used for several reports, the only thing needed is the path of a config file.
I'll code in Python, but this is an agnostic question.So, I have the following two approaches
A) class ReportEngine is an abstract class that has everything needed BUT the path for the config file. This way you just have to instantiate the ReportX class
class ReportEngine(object):
...
class Report1(ReportEngine):
_config_path = '...'
class Report2(ReportEngine):
_config_path = '...'
report_1 = Report1()
B) class ReportEngine can be instantiated passing the config file path
class ReportEngine(object):
def __init__(self, config_path):
self._config_path = config_path
...
report_1 = ReportEngine(config_path="/files/...")
Which approach is the right one? In case it matters, the report object would be inserted in another class, using composition.
IMHO the A) approach is better if you need to implement report engines that are different from each other. If your reports are populated using different logic, follow this approach.
But if the only difference among your report engines is the _config_path i think that B) approach is the right one for you. Obviosly, this way you'll have a shared logic to build every report, regardless the report type.
Generally spoken, put everything which every Report has, in the superclass. Put specific things in the subclasses.
So in your case, put the _config_path in the superclass ReportEngine like in B) (since every Report has a _config_path), but instanciate specific Reports like in A), whereas every Report can set its own path.
I don't know Python, but did a quick search for the proper syntax for Python 3.0+, I hope it makes sense:
class ReportEngine(object):
def __init__(self, config_path):
self._config_path = config_path
def printPath(self):
print self._config_path
...
class Report1(ReportEngine):
def __init__(self):
super().__init__('/files/report1/...')
Then a
reportObj = Report1()
reportObj.printPath()
should print
'/files/report1/...'
Basically the main difference is that approach A is more flexible than B(not mutual change in one report does not influence other reports), while B is simpler and clearer (shows exactly where the difference is) but a change affecting one report type would require more work. If you are pretty sure the reports won't change in time - go with B, if you feel like the differences will not be common in the future - go with A.

What should I name a class whose sole purpose is procedural?

I have a lot to learn in the way of OO patterns and this is a problem I've come across over the years. I end up in situations where my classes' sole purpose is procedural, just basically wrapping a procedure up in a class. It doesn't seem like the right OO way to do things, and I wonder if someone is experienced with this problem enough to help me consider it in a different way. My specific example in the current application follows.
In my application I'm taking a set of points from engineering survey equipment and normalizing them to be used elsewhere in the program. By "normalize" I mean a set of transformations of the full data set until a destination orientation is reached.
Each transformation procedure will take the input of an array of points (i.e. of the form class point { float x; float y; float z; }) and return an array of the same length but with different values. For example, a transformation like point[] RotateXY(point[] inList, float angle). The other kind of procedure wold be of the analysis type, used to supplement the normalization process and decide what transformation to do next. This type of procedure takes in the same points as a parameter but returns a different kind of dataset.
My question is, what is a good pattern to use in this situation? The one I was about to code in was a Normalization class which inherits class types of RotationXY for instance. But RotationXY's sole purpose is to rotate the points, so it would basically be implementing a single function. This doesn't seem very nice, though, for the reasons I mentioned in the first paragraph.
Thanks in advance!
The most common/natural approach for finding candidate classes in your problem domain is to look for nouns and then scan for the verbs/actions associated with those nouns to find the behavior that each class should implement. While this is generally a good advise, it doesn't mean that your objects must only represent concrete elements. When processes (which are generally modeled as methods) start to grow and become complex, it is a good practice to model them as objects. So, if your transformation has a weight on its own, it is ok to model it as an object and do something like:
class RotateXY
{
public function apply(point p)
{
//Apply the transformation
}
}
t = new RotateXY();
newPoint = t->apply(oldPoint);
in case you have many transformations you can create a polymorphic hierarchy and even chain one transformation after another. If you want to dig a bit deeper you can also take a look at the Command design pattern, which closely relates to this.
Some final comments:
If it fits your case, it is a good idea to model the transformation at the point level and then apply it to a collection of points. In that way you can properly isolate the transformation concept and is also easier to write test cases. You can later even create a Composite of transformations if you need.
I generally don't like the Utils (or similar) classes with a bunch of static methods, since in most of the cases it means that your model is missing the abstraction that should carry that behavior.
HTH
Typically, when it comes to classes that contain only static methods, I name them Util, e.g. DbUtil for facading DB access, FileUtil for file I/O etc. So find some term that all your methods have in common and name it that Util. Maybe in your case GeometryUtil or something along those lines.
Since the particulars of the transformations you apply seem ad-hoc for the problem and possibly prone to change in the future you could code them in a configuration file.
The point's client would read from the file and know what to do. As for the rotation or any other transformation method, they could go well as part of the Point class.
I see nothing particularly wrong with classes/interfaces having just essentially one member.
In your case the member is an "Operation with some arguments of one type that returns same type" - common for some math/functional problems. You may find convenient to have interface/base class and helper methods that combine multiple transformation classes together into more complex transformation.
Alternative approach: if you language support it is just go functional style altogether (similar to LINQ in C#).
On functional style suggestion: I's start with following basic functions (probably just find them in standard libraries for the language)
collection = map(collection, perItemFunction) to transform all items in a collection (Select in C#)
item = reduce (collection, agregateFunction) to reduce all items into single entity (Aggregate in C#)
combine 2 functions on item funcOnItem = combine(funcFirst, funcSecond). Can be expressed as lambda in C# Func<T,T> combined = x => second(first(x)).
"bind"/curry - fix one of arguments of a function functionOfOneArg = curry(funcOfArgs, fixedFirstArg). Can be expressed in C# as lambda Func<T,T> curried = x => funcOfTwoArg(fixedFirstArg, x).
This list will let you do something like "turn all points in collection on a over X axis by 10 and shift Y by 15": map(points, combine(curry(rotateX, 10), curry(shiftY(15))).
The syntax will depend on language. I.e. in JavaScript you just pass functions (and map/reduce are part of language already), C# - lambda and Func classes (like on argument function - Func<T,R>) are an option. In some languages you have to explicitly use class/interface to represent a "function" object.
Alternative approach: If you actually dealing with points and transformation another traditional approach is to use Matrix to represent all linear operations (if your language supports custom operators you get very natural looking code).

Dependency Injection/config object

I have an object that is responsible for exporting a file to csv.
It works well but i am looking at ways to refactor it.
This question pertains to the constructor which takes a number of arguments, relating to how the csv is to be exported:
For example, file name, delimiter, etc. etc.
Also, lately I have been reading about dependency injection but cant decide if this is a case where I should:
A. Leave the constructor as is.
B. Create a new class that gets passed to the constructor that simple holds config values for the file name etc
C. Something else altogether?
Here is the existing constructor (in PHP)
public function __construct($file,$overwriteExistingFile, $enclosure, $delim, $headerRow)
{
//set all properties here
}
Each of those values represents data that is an input to some process. $enclosure, $delimiter, and $headerRow pertain to generating the CSV content, while $file and $overwriteExistingFile pertain to persisting the content to disk.
A hallmark of a DI-style refactoring is to identify the various responsibilities (generate, persist) and encapsulate each of them in its own type. This shifts the refactoring from "how do I best get the values to this class?" to "how do I remove knowledge of these values from this class?"
To answer that, we would define two new concepts, each of which takes one of the responsibilities, and pass those into the existing constructor:
public function __construct($csvGenerator, $csvFileWriter)
{
...save dependencies...
}
...at some point, generate the CSV content and pass it to the file writer...
In this way, the original class becomes the orchestrator of the interaction between the generation and file writing, without having intimate knowledge of either activity. We have elevated the class to a higher level of abstraction, simplifying it as well as isolating its responsibilities into its collaborators.
Now, you would define two new classes, constructing them with the relevant parameters:
Generator
public function __construct($enclosure, $delimiter, $headerRow)
File Writer
public function __construct($file, $overwriteExistingFile)
With these elements in place, you can compose them together by creating the generator, then the file writer, then passing both to the orchestrator.
I would create a CSVFormatter that you can setup the deliminator on and unit test the formatting independently with.
Inject the formatter into a CSVWriter which writes the formatted output to a file.
The reason you would do this is to unit test the formatting logic or if you need to do multiple kinds of formatting or write to different kinds of output streams. If the code is very small and simple then you don't need to break it into multiple classes.

How do I set up Protobuf for my VB.net application?

So, this may seem very elementary for you guys but I am officially stumped. I am trying to save some data in my application to a file using protobuf (suggested to me by some peers) but I can't seem to find any documentation for it and what I can find always gives me some weird error. I have an array declared as follows:
Private Terrain(,,) As TiledTerrain
The TiledTerrain class looks like this:
Public Class TiledTerrain
Public X As Integer
Public Y As Integer
Public Texture_X As Integer
Public Texture_Y As Integer
End Class
Pretty dog-on simple right? Well, I can't seem to figure out how to save my Terrain array to a file using Protobuf?
The Terrain array is just a simple 3 dimensional array (about 100x100x2). Each cell of the array may or may not actually contain a value (TiledTerrain) and if it doesn't it will contain "Nothing".
Can anybody explain to me in full on how I should go about doing this? I've currently referenced protobuf-net.dll and protobuf-net.Extensions.dll because I don't really know which to use...
Thanks for any help!
-A Moron Among Geniuses :)
first read Getting Started which describes the simplest scenario, using attributes. VB has slightly different syntax for attributes, which you are probably more familiar with than me - but the concept is the same.
There are alternatives, note:
in v2 the model can be configured entirey at runtime if you want, without the need for any attributes
if the type looks like an obvious "tuple" (including, importantly, a constructor that takes a parameter that matches every public member), it will use the constructor order to infer a contract
There is a problem though; protobuf-net does not currently support multi-dimensional arrays. It can of course be added, but as with all features: it doesn't exist until it gets written. The reason this isn't supported directly is that the underlying protobuf specification (by Google) does not support this. It would work if flattened into a vector (1-dimensional zero-based array). If you want help with an example, let me know.