How do Scrapy from_settings and from_crawler class methods work? - scrapy

I need to add the following class method to my existing pipeline
http://doc.scrapy.org/en/latest/faq.html#i-m-getting-an-error-cannot-import-name-crawler
i am not sure how to have 2 of these class methods in my class
from twisted.enterprise import adbapi
import MySQLdb.cursors
class MySQLStorePipeline(object):
"""A pipeline to store the item in a MySQL database.
This implementation uses Twisted's asynchronous database API.
"""
def __init__(self, dbpool):
self.dbpool = dbpool
#classmethod
def from_settings(cls, settings):
dbargs = dict(
host= settings['DB_HOST'],
db= settings['DB_NAME'],
user= settings['DB_USER'],
passwd= settings['DB_PASSWD'],
charset='utf8',
use_unicode=True,
)
dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
return cls(dbpool)
def process_item(self, item, spider):
pass

From my understanding of class methods, several class methods in a python class should just be fine. It just depends on which one the caller requires. However, I have only seen from_crawler until now in scrapy pipelines. From there you can get access to the settings via crawler.settings
Are you sure that from_settings is required? I did not check all occurences, but in middleware.py priority seems to apply: If a crawler object is available and a from_crawler method exists, this is taken. Otherwise, if there is a from_settings method, that is taken. Otherwise, the raw constructor is taken.
if crawler and hasattr(mwcls, 'from_crawler'):
mw = mwcls.from_crawler(crawler)
elif hasattr(mwcls, 'from_settings'):
mw = mwcls.from_settings(settings)
else:
mw = mwcls()
I admit, I do not know if this is also the place where pipelines get created (I guess not, but there is no pipelines.py), but the implementation seems very reasonable.
So, I'd just either:
reimplement the whole method as from_crawler and only use that one
add method from_crawler and use both
The new method could look like follows (to duplicate as little code as possible):
#classmethod
def from_crawler(cls, crawler):
obj = cls.from_settings(crawler.settings)
obj.do_something_on_me_with_crawler(crawler)
return obj
Of course this depends a bit on what you need.

Related

How to create a callable mock from a non-callable spec?

I have two classes: class One that does some stuff and class Wrapper that translates the API of One. To test the Wrapper I want to mock the class One. My problem is that One sets a number of attributes in it's __init__ method and I'd like my test to throw an error when these attributes are changed (e.g. renamed or deleted) in One but not in Wrapper.
I have tried to initialize an instance of One and to use it as a spec, but the resulting mock is non-callable:
from unittest import mock
import pytest
class One:
def __init__(self, a):
self.a = a
spec_obj = One(a='foo')
# working code, the test should pass
class Wrapper:
def __init__(self, wa):
self._one = One(a=wa)
self.wa = self._one.a
#mock.patch(__name__ + '.One', autospec=spec_obj)
def test_wrapper(MockOne):
MockOne.return_value.a = test_a = 'bar'
wrapper_obj = Wrapper(wa=test_a)
MockOne.assert_called_once_with(a=test_a)
assert wrapper_obj.wa == test_a
which throws the error:
TypeError: 'NonCallableMagicMock' object is not callable
since the spec spec_obj is non-callable.
If I set autospec=True, everything works but the test passes even when the One's attribute is renamed.
MockOne.return_value produces a Mock that you configure, but when the mock is actually called, you get a different Mock that hasn't been properly configured. You need to configure MockOne.return_value directly.
#mock.patch('One', autospec=True)
def test_wrapper(MockOne):
MockOne.return_value.a = test_a = 'bar'
wrapper_obj = Wrapper(wa=test_a)
MockOne.assert_called_once_with(a=test_a)
assert wrapper_obj.wa == test_a

How to send private data in spider to pipeline?

Say, each time I run scrapy like below
scrapy crawl [spidername] -a file='filename'
I want send the filename to pipeline to specify the item storage location. Each time the location may be different so it can't define in settings.py.
The file save in spider as private var
def __init__(self,file):
self.filename=file
How can I send the parameter to pipeline?
There are a few ways to do this. One way would be to define the filename as a parameter in your spider's init() method, and then pass it to your pipeline as an argument when you call the process_item() method.
Another way would be to define the filename as a class variable in your spider, and then access it from your pipeline as an attribute of the spider.
Here is an example of how you could do it using a class variable:
class MySpider(Spider):
filename = 'filename'
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
def parse(self, response):
# do stuff
def process_item(self, item, spider):
item['filename'] = spider.filename
return item

How to do filtering according to log in user?

I want to get hobbys according to log in user but I am always getting
TypeError at /backend/api/hobbys/
init() takes 1 positional argument but 2 were given
this is my views.py
class ListCreateHobbyView(GenericAPIView):
queryset = Hobby.objects.all()
serializer_class = HobbySerializer
# Filtering by logged in user
def get(self, request, *args, **kwargs):
queryset = Hobby.objects.filter(user=request.user)
serializer = self.get_serializer(queryset, many=True)
return Response(serializer.data)
What can be wrong?
Please do not override the entire .get(…) method. This means that (nearly) all boilerplate code that Django has written is no longer applied. You filter in the .get_queryset(…) method [drf-doc]:
from rest_framework.generics import ListAPIView
class ListCreateHobbyView(ListAPIView):
queryset = Hobby.objects.all()
serializer_class = HobbySerializer
def get_queryset(self, *args, **kwargs):
return super().get_queryset(*args, **kwargs).filter(
user=self.request.user
)
You can also make a custom filter backend: this is a reusable component that you then can use in other views. This thus looks like:
from rest_framework import filters
class UserFilterBackend(filters.BaseFilterBackend):
def filter_queryset(self, request, queryset, view):
return queryset.filter(user=request.user)
then you use this as filter backend:
from rest_framework.generics import ListAPIView
class ListCreateHobbyView(ListAPIView):
queryset = Hobby.objects.all()
serializer_class = HobbySerializer
filter_backends = [UserFilterBackend]
The advantage of this approach is that if you later construct extra views that need the same filtering, you can simply "plug" these in.

Subclass `pathlib.Path` fails

I would like to enhance the class pathlib.Path but the simple example above dose not work.
from pathlib import Path
class PPath(Path):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
test = PPath("dir", "test.txt")
Here is the error message I have.
Traceback (most recent call last):
File "/Users/projetmbc/test.py", line 14, in <module>
test = PPath("dir", "test.txt")
File "/anaconda/lib/python3.4/pathlib.py", line 907, in __new__
self = cls._from_parts(args, init=False)
File "/anaconda/lib/python3.4/pathlib.py", line 589, in _from_parts
drv, root, parts = self._parse_args(args)
File "/anaconda/lib/python3.4/pathlib.py", line 582, in _parse_args
return cls._flavour.parse_parts(parts)
AttributeError: type object 'PPath' has no attribute '_flavour'
What I am doing wrong ?
You can subclass the concrete implementation, so this works:
class Path(type(pathlib.Path())):
Here's what I did with this:
import pathlib
class Path(type(pathlib.Path())):
def open(self, mode='r', buffering=-1, encoding=None, errors=None, newline=None):
if encoding is None and 'b' not in mode:
encoding = 'utf-8'
return super().open(mode, buffering, encoding, errors, newline)
Path('/tmp/a.txt').write_text("я")
Here is the definition of the Path class. It does something rather clever. Rather than directly returning an instance of Path from its __new__(), it returns an instance of a subclass, but only if it's been invoked directly as Path() (and not as a subclass).
Otherwise, it expects to have been invoked via either WindowsPath() or PosixPath(), which both provide a _flavour class attribute via multiple inheritance. You must also provide this attribute when subclassing. You'll probably need to instantiate and/or subclass the _Flavour class to do this. This is not a supported part of the API, so your code might break in a future version of Python.
TL;DR: This idea is fraught with peril, and I fear that my answers to your questions will be interpreted as approval rather than reluctant assistance.
You may be able to simplify your life depending on why you want to extend Path (or PosixPath, or WindowsPath). In my case, I wanted to implement a File class that had all the methods of Path, and a few others. However, I didn't actually care if isinstance(File(), Path).
Delegation works beautifully:
class File:
def __init__(self, path):
self.path = pathlib.Path(path)
...
def __getattr__(self, attr):
return getattr(self.path, attr)
def foobar(self):
...
Now, if file = File('/a/b/c'), I can use the entire Path interface on file, and also do file.foobar().
Combining some of the previous answers you could also just write:
class MyPath(pathlib.Path):
_flavour = type(pathlib.Path())._flavour
I have been struggling with this too.
Here is what i did, studying from the pathlib module.
Seems to me that is the cleaner way to do it, but if the pathlib module changes its implementation, it probably won't hold.
from pathlib import Path
import os
import pathlib
class PPath(Path):
_flavour = pathlib._windows_flavour if os.name == 'nt' else pathlib._posix_flavour
def __new__(cls, *args):
return super(PPath, cls).__new__(cls, *args)
def __init__(self, *args):
super().__init__() #Path.__init__ does not take any arg (all is done in new)
self._some_instance_ppath_value = self.exists() #Path method
def some_ppath_method(self, *args):
pass
test = PPath("dir", "test.txt")
Note
I have opened a bug track here after a little discussion on the Python dev. list.
A temporary solution
Sorry for this double answer but here is a way to achieve what I want. Thanks to Kevin that points me to the source of pathlib and the fact we have here constructors.
import pathlib
import os
def _extramethod(cls, n):
print("=== "*n)
class PathPlus(pathlib.Path):
def __new__(cls, *args):
if cls is PathPlus:
cls = pathlib.WindowsPath if os.name == 'nt' else pathlib.PosixPath
setattr(cls, "extramethod", _extramethod)
return cls._from_parts(args)
test = PathPlus("C:", "Users", "projetmbc", "onefile.ext")
print("File ?", test.is_file())
print("Dir ?", test.is_dir())
print("New name:", test.with_name("new.name"))
print("Drive ?", test.drive)
test.extramethod(4)
This prints the following lines.
File ? False
Dir ? False
New name: C:/Users/projetmbc/new.name
Drive ?
=== === === ===
In order to inherit from pathlib.Path, you need to specify which OS, or "flavour" you're representing. All you need to do is specify that you are using either Windows or Unix (seems to be Unix based on your traceback) by inheriting from pathlib.PosixPath or pathlib.WindowsPath.
import pathlib
class PPath(pathlib.PosixPath):
pass
test = PPath("dir", "test.txt")
print(test)
Which outputs:
dir\test.txt
Using type(pathlib.Path()) as proposed in this answer does the exact same thing as directly inheriting from pathlib.PosixPath or pathlib.WindowsPath since instantiating pathlib.Path "creates either a PosixPath or a WindowsPath" (pathlib documentation).
If you know your application will not be cross-platform, it is simpler to directly inherit from the flavor Path that represents your OS.
Here is a simple way to do things regarding to the observation made by Kevin.
class PPath():
def __init__(self, *args, **kwargs):
self.path = Path(*args, **kwargs)
Then I will need to use a trick so as to automatically bind all the Path's methods to my PPpath class. I think that will be funny to do.
It's work too.
from pathlib import Path
class SystemConfigPath(type(Path())):
def __new__(cls, **kwargs):
path = cls._std_etc()
return super().__new__(cls, path, **kwargs)
#staticmethod
def _std_etc():
return '/etc'
name = SystemConfigPath()
name = name / 'apt'
print(name)
Printed:
/etc/apt
#staticmethod can be replaced by #classmethod

Django Rest Framework custom permissions per view

I want to create permissions in Django Rest Framework, based on view + method + user permissions.
Is there a way to achieve this without manually writing each permission, and checking the permissions of the group that the user is in?
Also, another problem I am facing is that permission objects are tied up to a certain model. Since I have views that affect different models, or I want to grant different permissions on the method PUT, depending on what view I accessed (because it affects different fields), I want my permissions to be tied to a certain view, and not to a certain model.
Does anyone know how this can be done?
I am looking for a solution in the sort of:
Create a Permissions object with the following parameters: View_affected, list_of_allowed_methods(GET,POST,etc.)
Create a Group object that has that permission associated
Add a user to the group
Have my default permission class take care of everything.
From what I have now, the step that is giving me problems is step 1. Because I see no way of tying a Permission with a View, and because Permissions ask for a model, and I do not want a model.
Well, the first step could be done easy with DRF. See http://www.django-rest-framework.org/api-guide/permissions#custom-permissions.
You must do something like that:
from functools import partial
from rest_framework import permissions
class MyPermission(permissions.BasePermission):
def __init__(self, allowed_methods):
super().__init__()
self.allowed_methods = allowed_methods
def has_permission(self, request, view):
return request.method in self.allowed_methods
class ExampleView(APIView):
permission_classes = (partial(MyPermission, ['GET', 'HEAD']),)
Custom permission can be created in this way, more info in official documentation( https://www.django-rest-framework.org/api-guide/permissions/):
from rest_framework.permissions import BasePermission
# Custom permission for users with "is_active" = True.
class IsActive(BasePermission):
"""
Allows access only to "is_active" users.
"""
def has_permission(self, request, view):
return request.user and request.user.is_active
# Usage
from rest_framework.views import APIView
from rest_framework.response import Response
from .permissions import IsActive # Path to our custom permission
class ExampleView(APIView):
permission_classes = (IsActive,)
def get(self, request, format=None):
content = {
'status': 'request was permitted'
}
return Response(content)
I took this idea and got it to work like so:
class genericPermissionCheck(permissions.BasePermission):
def __init__(self, action, entity):
self.action = action
self.entity = entity
def has_permission(self, request, view):
print self.action
print self.entity
if request.user and request.user.role.access_rights.filter(action=self.action,entity=self.entity):
print 'permission granted'
return True
else:
return False
I used partially in the decorator for the categories action in my viewset class like so:
#list_route(methods=['get'],permission_classes=[partial(genericPermissionCheck,'Fetch','Categories')])
def Categories(self, request):
"access_rights" maps to an array of objects with a pair of actions and object e.g. 'Edit' and 'Blog'