Pydantic: how to pass custom object to validators - pydantic

I wanted to use pydantic to validate messages sent by players in a turn based game.
For example, I want players to select a card that they want to play, and the validation is first, whether the card ID is correct at all, and then I want to validate if player has this card on hand.
Here's the code:
class SelectCardActionParams(BaseModel):
selected_card: CardIdentifier # just my enum
#validator('selected_card')
def player_has_card_on_hand(cls, v, values, config, field):
# To tell whether the player has card on hand, I need access to my <GameInstance> object which tracks entire
# state of the game, has info on which player has which cards on hand. How do I pass this instance here?
pass
It seems like the "custom validation" feature lets me write code that can access only fields directly on the model instance, and nothing else, i.e. no access to any external state. It seems like a serious limitation of how validation can be used by programmers.
Am I missing something? I know the doc says that pydantic is mainly a parsing lib not a validation lib but it does have the "custom validation", and I thought there should be a way to pass custom arguments to the validator methods (I could not find any example though).

Your comment did clarify your intention a bit. I think I understand what you are after now.
A Pydantic models defines a schema for some data. A validator enforces constraints on that data. As I said in my comment, a validator is a class method. It has no access to any instance of the model it is defined on, unless that instance is somehow present in the class' (or even the global) namespace. Aside from that, a validator (depending on the configuration) may have access to already validated values meant to be assigned to other fields of the model.
If you need some instance of Game, you can thus either assign one as a class variable to your SelectCardActionParams model or just make it globally available.
Then, since you need both values to be present and probably don't want to rely on which order they are defined it, it is probably easiest to just use a #root_validator.
It could look like this (simplified):
from typing import Any, ClassVar
from pydantic import BaseModel, ValidationError, root_validator
class Game:
def player_has_card(self, player_id: int, card_id: int) -> bool:
return player_id == card_id # just for demo purposes
game_singleton = Game()
class SelectCardActionParams(BaseModel):
game: ClassVar[Game] = game_singleton
acting_player_id: int
selected_card_id: int
#root_validator
def player_has_card(cls, values: dict[str, Any]) -> dict[str, Any]:
player_id = values["acting_player_id"]
card_id = values["selected_card_id"]
if not cls.game.player_has_card(player_id, card_id):
raise ValueError(f"Player {player_id} has no card {card_id}")
return values
if __name__ == '__main__':
print(SelectCardActionParams(acting_player_id=1, selected_card_id=1))
try:
SelectCardActionParams(acting_player_id=1, selected_card_id=2)
except ValidationError as e:
print(e)
The output:
acting_player_id=1 selected_card_id=1
1 validation error for SelectCardActionParams
__root__
Player 1 has no card 2 (type=value_error)

Related

When is a class a data class?

I know what classes are about, but for better understanding I need a use case. Recently I discovered the construct of data classes. I get the idea behind normal classes, but I cannot imagine a real use case for data classes.
When should I use a data class and when I use a "normal" class? For all I know, all classes keep data.
Can you provide a good example that distinguishes data classes from non-data classes?
A data class is used to store data. It's lighter than a normal class, and can be compared to an array with key/value (dictionary, hash, etc.), but represented as an object with fixed attributes. In kotlin, according to the documentation, that adds those attributes to the class:
equals()/hashCode() pair
toString() of the form "User(name=John, age=42)"
componentN() functions corresponding to the properties in their order of declaration.
copy() function
Also it has a different behavior during class inheritence :
If there are explicit implementations of equals(), hashCode(), or toString() in the data class body or final implementations in a
superclass, then these functions are not generated, and the existing
implementations are used.
If a supertype has componentN() functions that are open and return compatible types, the corresponding functions are generated for the
data class and override those of the supertype. If the functions of
the supertype cannot be overridden due to incompatible signatures or
due to their being final, an error is reported.
Providing explicit implementations for the componentN() and copy() functions is not allowed.
So in kotlin, if you want to describe an object (a data) then you may use a dataclass, but if you're creating a complex application and your class needs to have special behavior in the constructor, with inheritence or abstraction, then you should use a normal class.
I do not know Kotlin, but in Python, a dataclass can be seen as a structured dict. When you want to use a dict to store an object which has always the same attributes, then you should not put it in a dict but use a Dataclass.
The advantage with a normal class is that you don't need to declare the __init__ method, as it is "automatic" (inherited).
Example :
This is a normal class
class Apple:
def __init__(size:int, color:str, sweet:bool=True):
self.size = size
self.color = color
self.sweet = sweet
Same class as a dataclass
from dataclasses import dataclass
#dataclass
class Apple:
size: int
color: str
sweet: bool = True
Then the advantage compared to a dict is that you are sure of what attribute it has. Also it can contains methods.
The advantage over to a normal class is that it is simpler to declare and make the code lighter. We can see that the attributes keywords (e.g size) are repeated 3 times in a normal class, but appear only once in a dataclass.
The advantage of normal class also is that you can personalize the __init__ method, (in a dataclass also, but then you lose it's main advantage I think) example:
# You need only 2 variable to initialize your class
class Apple:
def __init__(size:int, color:str):
self.size = size
self.color = color
# But you get much more info from those two.
self.sweet = True if color == 'red' else False
self.weight = self.__compute_weight()
self.price = self.weight * PRICE_PER_GRAM
def __compute_weight(self):
# ...
return (self.size**2)*10 # That's a random example
Abstractly, a data class is a pure, inert information record that doesn’t require any special handling when copied or passed around, and it represents nothing more than what is contained in its fields; it has no identity of its own. A typical example is a point in 3D space:
data class Point3D(
val x: Double,
val y: Double,
val z: Double
)
As long as the values are valid, an instance of a data class is entirely interchangeable with its fields, and it can be put apart or rematerialized at will. Often there is even little use for encapsulation: users of the data class can just access the instance’s fields directly. The Kotlin language provides a number of convenience features when data classes are declared as such in your code, which are described in the documentation. Those are useful when for example building more complex data structures employing data classes: you can for example have a hashmap assign values to particular points in space, and then be able to look up the value using a newly-constructed Point3D.
val map = HashMap<Point3D, String>()
map.set(Point3D(3, 4, 5), "point of interest")
println(map.get(Point3D(3, 4, 5))) // prints "point of interest"
For an example of a class that is not a data class, take FileReader. Underneath, this class probably keeps some kind of file handle in a private field, which you can assume to be an integer (as it actually is on at least some platforms). But you cannot expect to store this integer in a database, have another process read that same integer from the database, reconstruct a FileReader from it and expect it to work. Passing file handles between processes requires more ceremony than that, if it is even possible on a given platform. That property makes FileReader not a data class. Many examples of non-data classes will be of this kind: any class whose instances represent transient, local resources like a network connection, a position within a file or a running process, cannot be a data class. Likewise, any class where different instances should not be considered equal even if they contain the same information is not a data class either.
From the comments, it sounds like your question is really about why non-data classes exist in Kotlin and why you would ever choose not to make a data class. Here are some reasons.
Data classes are a lot more restrictive than a regular class:
They have to have a primary constructor, and every parameter of the primary constructor has to be a property.
They cannot have an empty primary constructor.
They cannot be open so they cannot be subclassed.
Here are other reasons:
Sometimes you don't want a class to have a copy function. If a class holds onto some heavy state that is expensive to copy, maybe it shouldn't advertise that it should be copied by presenting a copy function.
Sometimes you want to use an instance of a class in a Set or as Map keys without two different instances being considered as equivalent just because their properties have the same values.
The features of data classes are useful specifically for simple data holders, so the drawbacks are often something you want to avoid.

Spock Extension - Extracting variable names from Data Tables

In order to extract data table values to use in a reporting extension for Spock, I am using the
following code:
#Override
public void beforeIteration(IterationInfo iteration) {
Object[] values = iteration.getDataValues();
}
This returns to me the reference to the objects in the data table. However, I would like to get
the name of the variable that references the value.
For example, in the following test:
private static User userAge15 = instantiateUserByAge(15);
private static User userAge18 = instantiateUserByAge(18);
private static User userAge19 = instantiateUserByAge(19);
private static User userAge40 = instantiateUserByAge(40);
def "Should show popup if user is 18 or under"(User user, Boolean shouldShowPopup) {
given: "the user <user>"
when: "the user do whatever"
...something here...
then: "the popup is shown is <showPopup>"
showPopup == shouldShowPopup
where:
user | shouldShowPopup
userAge15 | true
userAge18 | true
userAge19 | false
userAge40 | false
}
Is there a way to receive the string “userAge15”, “userAge18”, “userAge19”, “userAge40” instead of their values?
The motivation for this is that the object User is complex with lots of information as name, surname, etc, and its toString() method would make the where clause unreadable in the report I generate.
You can use specificationContext.currentFeature.dataVariables. It returns a list of strings containing the data variable names. This should work both in Spock 1.3 and 2.0.
Edit: Oh sorry, you do not want the data variable names ["a", "b", "expected"] but ["test1", "test1", "test2"]. Sorry, I cannot help you with that and would not if I could because that is just a horrible way to program IMO. I would rather make sure the toString() output gets shortened or trimmed in an appropriate manner, if necessary, or to (additionally or instead) print the class name and/or object ID.
Last but not least, writing tests is a design tool uncovering potential problems in your application. You might want to ask yourself why toString() results are not suited to print in a report and refactor those methods. Maybe your toString() methods use line breaks and should be simplified to print a one-line representation of the object. Maybe you want to factor out the multi-line representation into other methods and/or have a set of related methods like toString(), toShortString(), toLongString() (all seen in APIs before) or maybe something specific like toMultiLineString().
Update after OP significantly changed the question:
If the user of your extension feels that the report is not clear enough, she could add a column userType to the data table, containing values like "15 years old".
Or maybe simpler, just add an age column with values like 15, 18, 19, 40 and instantiate users directly via instantiateUserByAge(age) in the user column or in the test's given section instead of creating lots of static variables. The age value would be reported by your extension. In combination with an unrolled feature method name using #age this should be clear enough.
Is creating users so super expensive you have to put them into static variables? You want to avoid statics if not really necessary because they tend to bleed over side effects to other tests if those objects are mutable and their internal state changes in one test, e.g. because someone conveniently uses userAge15 in order to test setAge(int). Try to avoid premature optimisations via static variables which often just save microseconds. Even if you do decide to pre-create a set of users and re-use them in all tests, you could put them into a map with the age being the key and conveniently retrieve them from your feature methods, again just using the age in the data table as an input value for querying the map, either directly or via a helper method.
Bottom line: I think you do not have to change your extension in order to cater to users writing bad tests. Those users ought to learn how to write better tests. As a side effect, the reports will also look more comprehensive. 😀

$model->validate() returns true without any attribute assignments?

I have a model with many attributes, I want to validate before saving it. So at some I would have only model initialization but no attribute assignments
$model=new patient(); $model->validate();
yet it returns true, I have many attributes set to required. I don't understand how does this happen?
Is there a way to validate a model before saving it even though no attributes are assigned to the model?
Your have validation rules but You may Miss the scenario's on validate .
Check the validators on your current object model by this
print_r($model->validatorList);
refer this link
You can validate a model at any time once it is created.
Let's look at the sequence of events :-
You have created a model class :
class User extends CActiveRecord
{
...
}
This creates a template for the patient class, which inherits from CActiveRecord, which means it has inherited functions available to it. So, while your class files does not the activate() function, it inherits it from the CActiveRecord class.
Now you create an instance of the class
$patient1 = new patient;
This means there is now some space in memory that exists for patient1. It has values. Some may be empty and some not (for example, they may have a default value).
So, at any time, you could
$model->validate();
This would change the model to include the error messages. To get the errors that have been raise, use
$allErrors = $model->getErrors()
print_r($all_errors); // This will show you the structure.
To print the errors in a neat way, use
echo CHtml::errorSummary($model);

Unit testing value objects in isolation from its dependencies

TL;DR
How do you test a value object in isolation from its dependencies without stubbing or injecting them?
In Misko Hevery's blog post To “new” or not to “new”… he advocates the following (quoted from the blog post):
An Injectable class can ask for other Injectables in its constructor.(Sometimes I refer to Injectables as Service Objects, but
that term is overloaded.). Injectable can never ask for a non-Injectable (Newable) in its constructor.
Newables can ask for other Newables in their constructor, but not for Injectables (Sometimes I refer to Newables as Value Object, but
again, the term is overloaded)
Now if I have a Quantity value object like this:
class Quantity{
$quantity=0;
public function __construct($quantity){
$intValidator = new Zend_Validate_Int();
if(!$intValidator->isValid($quantity)){
throw new Exception("Quantity must be an integer.");
}
$gtValidator = new Zend_Validate_GreaterThan(0);
if(!$gtvalidator->isValid($quantity)){
throw new Exception("Quantity must be greater than zero.");
}
$this->quantity=$quantity;
}
}
My Quantity value object depends on at least 2 validators for its proper construction. Normally I would have injected those validators through the constructor, so that I can stub them during testing.
However, according to Misko a newable shouldn't ask for injectables in its constructor. Frankly a Quantity object that looks like this
$quantity=new Quantity(1,$intValidator,$gtValidator); looks really awkward.
Using a dependency injection framework to create a value object is even more awkward. However now my dependencies are hard coded in the Quantity constructor and I have no way to alter them if the business logic changes.
How do you design the value object properly for testing and adherence to the separation between injectables and newables?
Notes:
This is just a very very simplified example. My real object my have serious logic in it that may use other dependencies as well.
I used a PHP example just for illustration. Answers in other languages are appreciated.
A Value Object should only contain primitive values (integers, strings, boolean flags, other Value Objects, etc.).
Often, it would be best to let the Value Object itself protect its invariants. In the Quantity example you supply, it could easily do that by checking the incoming value without relying on external dependencies. However, I realize that you write
This is just a very very simplified example. My real object my have serious logic in it that may use other dependencies as well.
So, while I'm going to outline a solution based on the Quantity example, keep in mind that it looks overly complex because the validation logic is so simple here.
Since you also write
I used a PHP example just for illustration. Answers in other languages are appreciated.
I'm going to answer in F#.
If you have external validation dependencies, but still want to retain Quantity as a Value Object, you'll need to decouple the validation logic from the Value Object.
One way to do that is to define an interface for validation:
type IQuantityValidator =
abstract Validate : decimal -> unit
In this case, I patterned the Validate method on the OP example, which throws exceptions upon validation failures. This means that if the Validate method doesn't throw an exception, all is good. This is the reason the method returns unit.
(If I hadn't decided to pattern this interface on the OP, I'd have preferred using the Specification pattern instead; if so, I'd instead have declared the Validate method as decimal -> bool.)
The IQuantityValidator interface enables you to introduce a Composite:
type CompositeQuantityValidator(validators : IQuantityValidator list) =
interface IQuantityValidator with
member this.Validate value =
validators
|> List.iter (fun validator -> validator.Validate value)
This Composite simply iterates through other IQuantityValidator instances and invokes their Validate method. This enables you to compose arbitrarily complex validator graphs.
One leaf validator could be:
type IntegerValidator() =
interface IQuantityValidator with
member this.Validate value =
if value % 1m <> 0m
then
raise(
ArgumentOutOfRangeException(
"value",
"Quantity must be an integer."))
Another one could be:
type GreaterThanValidator(boundary) =
interface IQuantityValidator with
member this.Validate value =
if value <= boundary
then
raise(
ArgumentOutOfRangeException(
"value",
"Quantity must be greater than zero."))
Notice that the GreaterThanValidator class takes a dependency via its constructor. In this case, boundary is just a decimal, so it's a Primitive Dependency, but it could just as well have been a polymorphic dependency (A.K.A a Service).
You can now compose your own validator from these building blocks:
let myValidator =
CompositeQuantityValidator([IntegerValidator(); GreaterThanValidator(0m)])
When you invoke myValidator with e.g. 9m or 42m, it returns without errors, but if you invoke it with e.g. 9.8m, 0m or -1m it throws the appropriate exception.
If you want to build something a bit more complicated than a decimal, you can introduce a Factory, and compose the Factory with the appropriate validator.
Since Quantity is very simple here, we can just define it as a type alias on decimal:
type Quantity = decimal
A Factory might look like this:
type QuantityFactory(validator : IQuantityValidator) =
member this.Create value : Quantity =
validator.Validate value
value
You can now compose a QuantityFactory instance with your validator of choice:
let factory = QuantityFactory(myValidator)
which will let you supply decimal values as input, and get (validated) Quantity values as output.
These calls succeed:
let x = factory.Create 9m
let y = factory.Create 42m
while these throw appropriate exceptions:
let a = factory.Create 9.8m
let b = factory.Create 0m
let c = factory.Create -1m
Now, all of this is very complex given the simple nature of the example domain, but as the problem domain grows more complex, complex is better than complicated.
Avoid value types with dependencies on non-value types. Also avoid constructors that perform validations and throw exceptions. In your example I'd have a factory type that validates and creates quantities.
Your scenario can also be applied to entities. There are cases where an entity requires some dependency in order to perform some behaviour. As far as I can tell the most popular mechanism to use is double-dispatch.
I'll use C# for my examples.
In your case you could have something like this:
public void Validate(IQuantityValidator validator)
As other answers have noted a value object is typically simple enough to perform its invariant checking in the constructor. An e-mail value object would be a good example as an e-mail has a very specific structure.
Something a bit more complex could be an OrderLine where we need to determine, totally hypothetical, whether it is, say, taxable:
public bool IsTaxable(ITaxableService service)
In the article you reference I would assert that the 'newable' relates quite a bit to the 'transient' type of life cycle that we find in DI containers as we are interested in specific instances. However, when we need to inject specific values the transient business does not really help. This is the case for entities where each is a new instance but has very different state. A repository would hydrate the object but it could just as well use a factory.
The 'true' dependencies typically have a 'singleton' life-cycle.
So for the 'newable' instances a factory could be used if you would like to perform validation upon construction by having the factory call the relevant validation method on your value object using the injected validator dependency as Mark Seemann has mentioned.
This gives you the freedom to still test in isolation without coupling to a specific implementation in your constructor.
Just a slightly different angle on what has already been answered. Hope it helps :)

Efficient way to define a class with multiple, optionally-empty slots in S4 of R?

I am building a package to handle data that arrives with up to 4 different types. Each of these types is a legitimate class in the form of a matrix, data.frame or tree. Depending on the way the data is processed and other experimental factors, some of these data components may be missing, but it is still extremely useful to be able to store this information as an instance of a special class and have methods that recognize the different component data.
Approach 1:
I have experimented with an incremental inheritance structure that looks like a nested tree, where each combination of data types has its own class explicitly defined. This seems difficult to extend for additional data types in the future, and is also challenging for new developers to learn all the class names, however well-organized those names might be.
Approach 2:
A second approach is to create a single "master-class" that includes a slot for all 4 data types. In order to allow the slots to be NULL for the instances of missing data, it appears necessary to first define a virtual class union between the NULL class and the new data type class, and then use the virtual class union as the expected class for the relevant slot in the master-class. Here is an example (assuming each data type class is already defined):
################################################################################
# Use setClassUnion to define the unholy NULL-data union as a virtual class.
################################################################################
setClassUnion("dataClass1OrNULL", c("dataClass1", "NULL"))
setClassUnion("dataClass2OrNULL", c("dataClass2", "NULL"))
setClassUnion("dataClass3OrNULL", c("dataClass3", "NULL"))
setClassUnion("dataClass4OrNULL", c("dataClass4", "NULL"))
################################################################################
# Now define the master class with all 4 slots, and
# also the possibility of empty (NULL) slots and an explicity prototype for
# slots to be set to NULL if they are not provided at instantiation.
################################################################################
setClass(Class="theMasterClass",
representation=representation(
slot1="dataClass1OrNULL",
slot2="dataClass2OrNULL",
slot3="dataClass3OrNULL",
slot4="dataClass4OrNULL"),
prototype=prototype(slot1=NULL, slot2=NULL, slot3=NULL, slot4=NULL)
)
################################################################################
So the question might be rephrased as:
Are there more efficient and/or flexible alternatives to either of these approaches?
This example is modified from an answer to a SO question about setting the default value of slot to NULL. This question differs in that I am interested in knowing the best options in R for creating classes with slots that can be empty if needed, despite requiring a specific complex class in all other non-empty cases.
In my opinion...
Approach 2
It sort of defeats the purpose to adopt a formal class system, and then to create a class that contains ill-defined slots ('A' or NULL). At a minimum I would try to make DataClass1 have a 'NULL'-like default. As a simple example, the default here is a zero-length numeric vector.
setClass("DataClass1", representation=representation(x="numeric"))
DataClass1 <- function(x=numeric(), ...) {
new("DataClass1", x=x, ...)
}
Then
setClass("MasterClass1", representation=representation(dataClass1="DataClass1"))
MasterClass1 <- function(dataClass1=DataClass1(), ...) {
new("MasterClass1", dataClass1=dataClass1, ...)
}
One benefit of this is that methods don't have to test whether the instance in the slot is NULL or 'DataClass1'
setMethod(length, "DataClass1", function(x) length(x#x))
setMethod(length, "MasterClass1", function(x) length(x#dataClass1))
> length(MasterClass1())
[1] 0
> length(MasterClass1(DataClass1(1:5)))
[1] 5
In response to your comment about warning users when they access 'empty' slots, and remembering that users usually want functions to do something rather than tell them they're doing something wrong, I'd probably return the empty object DataClass1() which accurately reflects the state of the object. Maybe a show method would provide an overview that reinforced the status of the slot -- DataClass1: none. This seems particularly appropriate if MasterClass1 represents a way of coordinating several different analyses, of which the user may do only some.
A limitation of this approach (or your Approach 2) is that you don't get method dispatch -- you can't write methods that are appropriate only for an instance with DataClass1 instances that have non-zero length, and are forced to do some sort of manual dispatch (e.g., with if or switch). This might seem like a limitation for the developer, but it also applies to the user -- the user doesn't get a sense of which operations are uniquely appropriate to instances of MasterClass1 that have non-zero length DataClass1 instances.
Approach 1
When you say that the names of the classes in the hierarchy are going to be confusing to your user, it seems like this is maybe pointing to a more fundamental issue -- you're trying too hard to make a comprehensive representation of data types; a user will never be able to keep track of ClassWithMatrixDataFrameAndTree because it doesn't represent the way they view the data. This is maybe an opportunity to scale back your ambitions to really tackle only the most prominent parts of the area you're investigating. Or perhaps an opportunity to re-think how the user might think of and interact with the data they've collected, and to use the separation of interface (what the user sees) from implementation (how you've chosen to represent the data in classes) provided by class systems to more effectively encapsulate what the user is likely to do.
Putting the naming and number of classes aside, when you say "difficult to extend for additional data types in the future" it makes me wonder if perhaps some of the nuances of S4 classes are tripping you up? The short solution is to avoid writing your own initialize methods, and rely on the constructors to do the tricky work, along the lines of
setClass("A", representation(x="numeric"))
setClass("B", representation(y="numeric"), contains="A")
A <- function(x = numeric(), ...) new("A", x=x, ...)
B <- function(a = A(), y = numeric(), ...) new("B", a, y=y, ...)
and then
> B(A(1:5), 10)
An object of class "B"
Slot "y":
[1] 10
Slot "x":
[1] 1 2 3 4 5