Django rest framework: Is there a way to clean data before validating it with a serializer? - serialization

I've got an API endpoint POST /data.
The received data is formatted in a certain way which is different from the way I store it in the db.
I'll use geometry type from postgis as an example.
class MyPostgisModel(models.Model):
...
position = models.PointField(null=True)
my_charfield = models.CharField(max_length=10)
...
errors = JSONField() # Used to save the cleaning and validation errors
class MyPostgisSerializer(serializers.ModelSerializer):
class Meta:
model = MyPostgisModel
fields = [
...
"position",
...
"my_charfield",
"errors",
]
def to_internal_value(self, data):
...
# Here the data is coming in the field geometry but in the db, it's called
# position. Moreover I need to apply the `GEOSGeometry(json.dumps(...))`
# method as well.
data["position"] = GEOSGeometry(json.dumps(data["geometry"]))
return data
The problem is that there is not only one field like position but many. And I would like (maybe wrongly) to do like the validate_*field_name* scheme but for cleaning (clean_*field_name*).
There is another problem. In this scheme, I would like to still save the rest of the data in the database even if some fields have raised ValidationError (eg: a CharField that is too long) but are not part of the primary_key/a unique_together constraint. And save the related errors into a JSONField like this:
{
"cleaning_errors": {
...
"position": 'Invalid format: {
"type": "NotAValidType", # Should be "Point"
"coordinates": [
4.22,
50.67
]
}'
...
},
"validating_errors": {
...
"my_charfield": "data was too long: 'this data is way too long for 10 characters'",
...
}
}
For the first problem, I thought of doing something like this:
class BaseSerializerCleanerMixin:
"""Abstract Mixin that clean fields."""
def __init__(self, *args, **kwargs):
"""Initialize the cleaner strategy."""
# This is the error_dict to be filled by the `clean_*field_name*`
self.cleaning_error_dict = {}
super().__init__(*args, **kwargs)
def clean_fields(self, data):
"""Clean the fields listed in self.fields_to_clean before validating them."""
cleaned_data = {}
for field_name in getattr(self.Meta, "fields", []):
cleaned_field = (
getattr(self, "clean_" + field_name)(data)
if hasattr(self, "clean_" + field_name)
else data.get(field_name)
)
if cleaned_field is not None:
cleaned_data[field_name] = cleaned_field
return cleaned_data
def to_internal_value(self, data):
"""Reformat data to put it in the database."""
cleaned_data = self.clean_fields(data)
return super().to_internal_value(cleaned_data)
I'm not sure that's a good idea and maybe there is an easy way to deal with such things.
For the second problem ; catching the errors of the validation without specifying with is_valid() returning True when no primary_key being wrongly formatted, I'm not sure how to proceed.

Related

Pydantic: how to make model with some mandatory and arbitrary number of other optional fields, which names are unknown and can be any?

I'd like to represent the following json by Pydantic model:
{
"sip" {
"param1": 1
}
"param2": 2
...
}
Means json may contain sip field and some other field, any number any names, so I'd like to have model which have sip:Optional[dict] field and some kind of "rest", which will be correctly parsed from/serialized to json. Is it possible?
Maybe you are looking for the extra model config:
extra
whether to ignore, allow, or forbid extra attributes during model initialization. Accepts the string values of 'ignore', 'allow', or 'forbid', or values of the Extra enum (default: Extra.ignore). 'forbid' will cause validation to fail if extra attributes are included, 'ignore' will silently ignore any extra attributes, and 'allow' will assign the attributes to the model.
Example:
from typing import Any, Dict, Optional
import pydantic
class Foo(pydantic.BaseModel):
sip: Optional[Dict[Any, Any]]
class Config:
extra = pydantic.Extra.allow
foo = Foo.parse_raw(
"""
{
"sip": {
"param1": 1
},
"param2": 2
}
"""
)
print(repr(foo))
print(foo.json())
Output:
Foo(sip={'param1': 1}, param2=2)
{"sip": {"param1": 1}, "param2": 2}

Pymongo: Best way to remove $oid in Response

I have started using Pymongo recently and now I want to find the best way to remove $oid in Response
When I use find:
result = db.nodes.find_one({ "name": "Archer" }
And get the response:
json.loads(dumps(result))
The result would be:
{
"_id": {
"$oid": "5e7511c45cb29ef48b8cfcff"
},
"about": "A jazz pianist falls for an aspiring actress in Los Angeles."
}
My expected:
{
"_id": "5e7511c45cb29ef48b8cfcff",
"about": "A jazz pianist falls for an aspiring actress in Los Angeles."
}
As you seen, we can use:
resp = json.loads(dumps(result))
resp['id'] = resp['id']['$oid']
But I think this is not the best way. Hope you guys have better solution.
You can take advantage of aggregation:
result = db.nodes.aggregate([{'$match': {"name": "Archer"}}
{'$addFields': {"Id": '$_id.oid'}},
{'$project': {'_id': 0}}])
data = json.dumps(list(result))
Here, with $addFields I add a new field Id in which I introduce the value of oid. Then I make a projection where I eliminate the _id field of the result. After, as I get a cursor, I turn it into a list.
It may not work as you hope but the general idea is there.
First of all, there's no $oid in the response. What you are seeing is the python driver represent the _id field as an ObjectId instance, and then the dumps() method represent the the ObjectId field as a string format. the $oid bit is just to let you know the field is an ObjectId should you need to use for some purpose later.
The next part of the answer depends on what exactly you are trying to achieve. Almost certainly you can acheive it using the result object without converting it to JSON.
If you just want to get rid of it altogether, you can do :
result = db.nodes.find_one({ "name": "Archer" }, {'_id': 0})
print(result)
which gives:
{"name": "Archer"}
import re
def remove_oid(string):
while True:
pattern = re.compile('{\s*"\$oid":\s*(\"[a-z0-9]{1,}\")\s*}')
match = re.search(pattern, string)
if match:
string = string.replace(match.group(0), match.group(1))
else:
return string
string = json_dumps(mongo_query_result)
string = remove_oid(string)
I am using some form of custom handler. I managed to remove $oid and replace it with just the id string:
# Custom Handler
def my_handler(x):
if isinstance(x, datetime.datetime):
return x.isoformat()
elif isinstance(x, bson.objectid.ObjectId):
return str(x)
else:
raise TypeError(x)
# parsing
def parse_json(data):
return json.loads(json.dumps(data, default=my_handler))
result = db.nodes.aggregate([{'$match': {"name": "Archer"}}
{'$addFields': {"_id": '$_id'}},
{'$project': {'_id': 0}}])
data = parse_json(result)
In the second argument of find_one, you can define which fields to exclude, in the following way:
site_information = mongo.db.sites.find_one({'username': username}, {'_id': False})
This statement will exclude the '_id' field from being selected from the returned documents.

Issue with Dynamically loaded limit string using Flask-Limiter

I'm following the docs, for Dynamically loaded limit string(s). Basically, I'm trying to implement company specific RATE LIMIT.
Following is the comapny model:
# company model
class Company(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(100), unique=True)
limit = db.Column(db.String(50), default=DEFAULT_LIMIT)
def __init__(self, name, limit):
self.name = name
self.limit = limit
this is my limiter.py:
DEFAULT_LIMIT = "100/day, 10/minute"
app = Flask(__name__)
# defining limiter
limiter = Limiter(
app,
key_func=get_remote_address,
default_limits=[DEFAULT_LIMIT] # this is default limit set for app
)
def get_company_limit():
try:
company = Company.query.get(request.view_args['id'])
return app.config.get("CUSTOM_LIMIT", company.limit)
except:
abort(403) # if company not found then raise forbidden
# an end point
#app.route("/company/<id>", methods=["GET"])
#limiter.limit(get_company_limit)
def get_company(id):
return "success"
if __name__ == '__main__':
app.run(debug=True)
lets suppose these are the companies:
[
{
"name": "company1"
"limit": "100/day, 5/minute",
},
{
"name": "company2"
"limit": "100/day, 10/minute",
},
{
"name": "company3"
"limit": "50/day, 5/minute",
},
{
"name": "company4"
"limit": "100/day, 2/minute",
}
]
PROBLEM:
Dynamically fetched limit is working fine with comapnies 1, 2 and 4. But as you can see there is some matching limit (5/minute) is for company1 and company3, limit is being shared for these companies whatever the the order of calling api end point. On hitting the end point, company1 and company3 share the limit counter of 5.
For example if end point is called with company1 id for 3 times, on calling with the company3 id, after 2 success responses, limiter will raise 429. And also raise 429 for company1.
Unable to understand this behaviour and what I've missed to understand this?
The problem is with:
# defining limiter
limiter = Limiter(
app,
key_func=get_remote_address,
default_limits=[DEFAULT_LIMIT] # this is default limit set for app
)
Basically here, get_remote_address is used as key_func whcih is having clash with each new request. It might be returning the same remote address of the client's interface for each request. There need a user defined key_func to regulate the request.
def get_comany_id():
return (Company.query.get(request.view_args['id']), 0)
# defining limiter
limiter = Limiter(
app,
key_func=get_comany_id,
default_limits=[DEFAULT_LIMIT] # this is default limit set for app
)
And it worked.

Django Rest Framework Displaying Serialized data through Views.py

class International(object):
""" International Class that stores versions and lists
countries
"""
def __init__(self, version, countrylist):
self.version = version
self.country_list = countrylist
class InternationalSerializer(serializers.Serializer):
""" Serializer for International page
Lists International countries and current version
"""
version = serializers.IntegerField(read_only=True)
country_list = CountrySerializer(many=True, read_only=True)
I have a serializer set up this way, and I wish to display serialized.data (which will be a dictionary like this: { "version": xx, and "country_list": [ ] } ) using views.py
I have my views.py setup this way:
class CountryListView(generics.ListAPIView):
""" Endpoint : somedomain/international/
"""
## want to display a dictionary like the one below
{
"version": 5
"country_list" : [ { xxx } , { xxx } , { xxx } ]
}
What do I code in this CountryListView to render a dictionary like the one above? I'm really unsure.
Try this
class CountryListView(generics.ListAPIView):
""" Endpoint : somedomain/international/
"""
def get(self,request):
#get your version and country_list data and
#init your object
international_object = International(version,country_list)
serializer = InternationalSerializer(instance=international_object)
your_data = serializer.data
return your_data
You can build on the idea from here:
http://www.django-rest-framework.org/api-guide/pagination/#example
Suppose we want to replace the default pagination output style with a modified format that includes the next and previous links under in a nested 'links' key. We could specify a custom pagination class like so:
class CustomPagination(pagination.PageNumberPagination):
def get_paginated_response(self, data):
return Response({
'links': {
'next': self.get_next_link(),
'previous': self.get_previous_link()
},
'count': self.page.paginator.count,
'results': data
})
As long as you don't need the pagination, you can setup a custom pagination class which would pack your response in whichever layout you may need:
class CountryListPagination(BasePagination):
def get_paginated_response(self, data):
return {
'version': 5,
'country_list': data
}
Then all you need to do is to specify this pagination to your class based view:
class CountryListView(generics.ListAPIView):
# Endpoint : somedomain/international/
pagination_class = CountryListPagination
Let me know how is this working for you.

Elasticsearch bulk/batch indexing with python requests module

I have a smallish (~50,00) array of json dictionaries that I want to store/index in ES. My preference is to use python, since the data I want to index is coming from a csv file, loaded and converted to json via python. Alternatively, I would like to skip the step of converting to json, and simply use the array of python dictionaries I have. Anyway, a quick search revealed the bulk indexing functionality of ES. I want to do something like this:
post_url = 'http://localhost:9202/_bulk'
request.post(post_url, data = acc ) # acc a python array of dictionaries
or
post_url = 'http://localhost:9202/_bulk'
request.post(post_url, params = acc ) # acc a python array of dictionaries
both request give a [HTTP 500 error]
My understanding is that you have to have one "command" per line (index, create, delete...) and then some of them (like index) takes a row of data on the next line like so
{'index': ''}\n
{'your': 'data'}\n
{'index': ''}\n
{'other': 'data'}\n
NB the new-lines, even on the last row.
Empty index objects like above works if you POST to ../index/type/_bulk or else you need to specify index and type I think, have not tried that.
You the following function will do it:
def post_request(self, endpoint, data):
endpoint = 'localhost:9200/_bulk'
response = requests.post(endpoint, data=data, headers={'content-type':'application/json', 'charset':'UTF-8'})
return response
As data you need to pass a String such:
{ "index" : { "_index" : "test-index", "_type" : "_doc", "_id" : "1681", "routing" : 0 }}
{ "field1" : ... , ..., "fieldN" : ... }
{ "index" : { "_index" : "test-index", "_type" : "_doc", "_id" : "1684", "routing" : 1 }}
{ "field1" : ... , ..., "fieldN" : ... }
Make sure you add a "\n" at the end of each line.
I don't know much about Python, but did you look at Pyes?
Bulk is supported in Pyes.