Scrapy: Default values for items & fields. What is the best implementation? - scrapy

As far as I could find out from the documentation and various discussions on the net, the ability to add default values to fields in a scrapy item has been removed.
This doesn't work
category = Field(default='null')
So my question is: what is a good way to initialize fields with a default value?
I already tried to implement it as a item pipeline as suggested here, without any success.
https://groups.google.com/forum/?fromgroups=#!topic/scrapy-users/-v1p5W41VDQ

figured out what the problem was. the pipeline is working (code follows for other people's reference). my problem was, that I am appending values to a field. and I wanted the default method work on one of these listvalues... chose a different way and it works. I am now implementing it with a custom setDefault processor method.
class DefaultItemPipeline(object):
def process_item(self, item, spider):
item.setdefault('amz_VendorsShippingDurationFrom', 'default')
item.setdefault('amz_VendorsShippingDurationTo', 'default')
# ...
return item

Typically, a constructor is used to initialize fields.
class SomeItem(scrapy.Item):
id = scrapy.Field()
category = scrapy.Field()
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self['category'] = 'null' # set default value
This may not be a clean solution, but it avoids unnecessary pipelines.

Related

How can I create an alias name of a relational field in Odoo?

I need to create a model, to have backward compatibility with older field names.
This way,
I can develop modules that could read the "new" fields, but migrating the old ones is not necessary for this to work.
This works only for reading or presenting the fields, but not for writing them.
So I thought it would be good to create an alias for each field, and made this:
from openerp import models, fields, api
class backward_compatibility(models.Model):
_description = 'Backward compatibility'
_inherit = 'account.invoice'
new_document_class_id = fields.Integer(
compute='_comp_new_doc_class', string='Tipo')
new_document_number = fields.Char(
compute='_comp_new_doc_number', string='Folio')
#api.multi
def _comp_new_doc_class(self):
for record in self:
try:
record.new_document_class_id = record.old_document_class_id
except:
pass
#api.multi
def _comp_new_doc_number(self):
for record in self:
try:
record.new_document_number = record.old_document_number
except:
pass
This approach works for the Char field, but it doesn't for the Integer (Many2one).
What ideas do you have to make this work? Should I replicate the relationship in the new field?
oldname: the previous name of this field, so that ORM can rename it automatically at migration
Try to use "oldname". I saw this in the core modules. Never used personally.
_inherit = 'res.partner'
_columns = {
'barcode' : fields.char('Barcode', help="BarCode", oldname='ean13'),
}
Also dummy fields are user to help with backward compatibility.
'pricelist_id': fields.dummy(string='Pricelist', relation='product.pricelist', type='many2one'),

When to use api.one and api.multi in odoo | openerp?

Recently odoo (formerly OpenERP) V8 has been released. In new API method decorators are introduced. in models.py methods needs to be decorated with #api.one or #api.multi.
Referring odoo documentation i can not determine the exact use. Can anybody explain in detail.
Thanks.
Generally both decoarators are used to decorate a record-style method where 'self' contains recordset(s). Let me explain in brief when to use #api.one and #api.multi:
1. #api.one:
Decorate a record-style method where 'self' is expected to be a singleton instance.
The decorated method automatically loops on records (i.e, for each record in recordset it calls the method), and makes a list with the results.
In case the method is decorated with #returns, it concatenates the resulting instances. Such a method:
#api.one
def method(self, args):
return self.name
may be called in both record and traditional styles, like::
# recs = model.browse(cr, uid, ids, context)
names = recs.method(args)
names = model.method(cr, uid, ids, args, context=context)
Each time 'self' is redefined as current record.
2. #api.multi:
Decorate a record-style method where 'self' is a recordset. The method typically defines an operation on records. Such a method:
#api.multi
def method(self, args):
may be called in both record and traditional styles, like::
# recs = model.browse(cr, uid, ids, context)
recs.method(args)
model.method(cr, uid, ids, args, context=context)
When to use:
If you are using #api.one, the returned value is in a list.
This is not always supported by the web client, e.g. on button action
methods.
In that case, you should use #api.multi to decorate your method, and probably call self.ensure_one() in
the method definition.
It is always better use #api.multi with self.ensure_one() instead of #api.one to avoid the side effect in return values.
#api.one:
This decorator loops automatically on Records of RecordSet for you. Self is redefined as current record:
#api.one
def func(self):
self.name = 'xyz'
#api.multi:
Self will be the current RecordSet without iteration. It is the default behavior:
#api.multi
def func(self):
len(self)
For the detailed description of all API you can refer this Link
#api.model #When the record data/self is not as relevant. Sometimes also used with old API calls.
def model_text(self):
return "this text does not rely on self"
#api.multi #Normally followed by a loop on self because self may contain multiple records
def set_field(self):
for r in self:
r.abc = r.a + r.b
#api.one #The api will do a loop and call the method for each record. Not preferred because of potential problems with returns to web clients
def set_field(self):
self.abc = self.a + self.b

Scrapy pipeline architecture - need to return variables

I need some advice on how to proceed with my item pipeline. I need to POST an item to an API (working well) and with the response object get the ID of the entity created (have this working too) and then use it to populate another entity. Ideally, the item pipeline can return the entity ID. Basically, I am in a situation where I have a one to many relationship that I need to encode in a no-SQL database. What would be the best way to proceed?
The best way to proceed for you is to use Mongodb, a NO-sql databse which runs best in compliance with scrapy. The pipeline for the mongodb can be found here and the the process is explained in this tutorial .
Now what is explained in the solution from Pablo Hoffman, updating different items from different pipelines into one can be achieved by the following decorator on the process_item method of a Pipeline object so that it checks the pipeline attribute of your spider for whether or not it should be executed. (Not tested the code but hope it would help)
def check_spider_pipeline(process_item_method):
#functools.wraps(process_item_method)
def wrapper(self, item, spider):
# message template for debugging
msg = '%%s %s pipeline step' % (self.__class__.__name__,)
# if class is in the spider's pipeline, then use the
# process_item method normally.
if self.__class__ in spider.pipeline:
spider.log(msg % 'executing', level=log.DEBUG)
return process_item_method(self, item, spider)
# otherwise, just return the untouched item (skip this step in
# the pipeline)
else:
spider.log(msg % 'skipping', level=log.DEBUG)
return item
return wrapper
And the decorator goes something like this :
class MySpider(BaseSpider):
pipeline = set([
pipelines.Save,
pipelines.Validate,
])
def parse(self, response):
# insert scrapy goodness here
return item
class Save(BasePipeline):
#check_spider_pipeline
def process_item(self, item, spider):
# more scrapy goodness here
return item
At last you can take help from this question.
Perhaps I don't understand your question, but it sounds like you just need to call your submission code in the def close_spider(self, spider): method. Have you tried that?

Django REST framework flat, read-write serializer

In Django REST framework, what is involved in creating a flat, read-write serializer representation? The docs refer to a 'flat representation' (end of the section http://django-rest-framework.org/api-guide/serializers.html#dealing-with-nested-objects) but don't offer examples or anything beyond a suggestion to use a RelatedField subclass.
For instance, how to provide a flat representation of the User and UserProfile relationship, below?
# Model
class UserProfile(models.Model):
user = models.OneToOneField(User)
favourite_number = models.IntegerField()
# Serializer
class UserProfileSerializer(serializers.ModelSerializer):
email = serialisers.EmailField(source='user.email')
class Meta:
model = UserProfile
fields = ['id', 'favourite_number', 'email',]
The above UserProfileSerializer doesn't allow writing to the email field, but I hope it expresses the intention sufficiently well. So, how should a 'flat' read-write serializer be constructed to allow a writable email attribute on the UserProfileSerializer? Is it at all possible to do this when subclassing ModelSerializer?
Thanks.
Looking at the Django REST framework (DRF) source I settled on the view that a DRF serializer is strongly tied to an accompanying Model for unserializing purposes. Field's source param make this less so for serializing purposes.
With that in mind, and viewing serializers as encapsulating validation and save behaviour (in addition to their (un)serializing behaviour) I used two serializers: one for each of the User and UserProfile models:
class UserSerializer(serializer.ModelSerializer):
class Meta:
model = User
fields = ['email',]
class UserProfileSerializer(serializer.ModelSerializer):
email = serializers.EmailField(source='user.email')
class Meta:
model = UserProfile
fields = ['id', 'favourite_number', 'email',]
The source param on the EmailField handles the serialization case adequately (e.g. when servicing GET requests). For unserializing (e.g. when serivicing PUT requests) it is necessary to do a little work in the view, combining the validation and save behaviour of the two serializers:
class UserProfileRetrieveUpdate(generics.GenericAPIView):
def get(self, request, *args, **kwargs):
# Only UserProfileSerializer is required to serialize data since
# email is populated by the 'source' param on EmailField.
serializer = UserProfileSerializer(
instance=request.user.get_profile())
return Response(serializer.data)
def put(self, request, *args, **kwargs):
# Both UserSerializer and UserProfileSerializer are required
# in order to validate and save data on their associated models.
user_profile_serializer = UserProfileSerializer(
instance=request.user.get_profile(),
data=request.DATA)
user_serializer = UserSerializer(
instance=request.user,
data=request.DATA)
if user_profile_serializer.is_valid() and user_serializer.is_valid():
user_profile_serializer.save()
user_serializer.save()
return Response(
user_profile_serializer.data, status=status.HTTP_200_OK)
# Combine errors from both serializers.
errors = dict()
errors.update(user_profile_serializer.errors)
errors.update(user_serializer.errors)
return Response(errors, status=status.HTTP_400_BAD_REQUEST)
First: better handling of nested writes is on it's way.
Second: The Serializer Relations docs say of both PrimaryKeyRelatedField and SlugRelatedField that "By default this field is read-write..." — so if your email field was unique (is it?) it might be you could use the SlugRelatedField and it would just work — I've not tried this yet (however).
Third: Instead I've used a plain Field subclass that uses the source="*" technique to accept the whole object. From there I manually pull the related field in to_native and return that — this is read-only. In order to write I've checked request.DATA in post_save and updated the related object there — This isn't automatic but it works.
So, Fourth: Looking at what you've already got, my approach (above) amounts to marking your email field as read-only and then implementing post_save to check for an email value and perform the update accordingly.
Although this does not strictly answer the question - I think it will solve your need. The issue may be more in the split of two models to represent one entity than an issue with DRF.
Since Django 1.5, you can make a custom user, if all you want is some method and extra fields but apart from that you are happy with the Django user, then all you need to do is:
class MyUser(AbstractBaseUser):
favourite_number = models.IntegerField()
and in settings: AUTH_USER_MODEL = 'myapp.myuser'
(And of course a db-migration, which could be made quite simple by using db_table option to point to your existing user table and just add the new columns there).
After that, you have the common case which DRF excels at.

How to get in openerp all objects from a class?

I need to get all objects from a class and iterate through them.
I tried this, but without any results:
def my_method(self, cr, uid, ids, context=None):
pool_obj = pooler.get_pool(cr.dbname)
my_objects=pool_obj.get('project.myobject')
#here i'll iterate through them...
How can I get in 'my_objects' variable all objects of class 'project.myobject'?
You have to search with empty parameters to get all the ids of existing objects, like:
myobj = pool.get('project.myobject')
ids = myobj.search(cr, uid, [])
Then you can browse or read them passing an id or the list of ids.
It seems you forget to import pooler.
from openerp import pooler
May it will help you.