Handling traversal in Zope2 product - zope

I want to create a simple Zope2 product that implements a "virtual" folder where a part of the path is processed by my code. A URI of the form
/members/$id/view
e.g.
/members/isaacnewton/view
should be handled by code in the /members object, i.e. a method like members.view(id='isaacnewton').
The Zope TTW Python scripts have traverse_subpath but I have no idea how to do this in my product code.
I have looked at the IPublishTraverse interface publishTraverse() but it seems very generic.
Is there an easier way?

Still the easiest way is to use the __before_publishing_traverse__ hook on the members object:
from zExceptions import Redirect
def __before_publishing_traverse__(self, object, request):
stack = request.TraversalRequestNameStack
if len(stack) > 1 and stack[-2] == 'view':
try:
self.request.form['member_id'] = stack.pop(-1)
if not validate(self.request['member_id']):
raise ValueError
except (IndexError, ValueError):
# missing context or not an integer id; perhaps some URL hacking going on?
raise Redirect(self.absolute_url()) # redirects to `/members`, adjust as needed
This method is called by the publisher before traversing further; so the publisher has already located the members object, and this method is passed itself (object), and the request. On the request you'll find the traversal stack; in your example case that'll hold ['view', 'isaacnewton'] and this method moves 'isaacnewton' to the request under the key 'member_id' (after an optional validation).
When this method returns, the publisher will use the remaining stack to continue the traverse, so it'll now traverse to view, which should be a browser view that expects a member_id key in the request. It then can do it's work:
class MemberView(BrowserView):
def __call__(self):
if 'member_id' in self.request.form: # Huzzah, the traversal worked!

Related

Scrapy concurrent spiders instance variables

I have a number of Scrapy spiders running and recently had a strange bug. I have a base class and a number of sub classes:
class MyBaseSpider(scrapy.Spider):
new_items = []
def spider_closed(self):
#Email any new items that weren't in the last run
class MySpiderImpl1(MyBaseSpider):
def parse(self):
#Implement site specific checks
self.new_items.append(new_found_item)
class MySpiderImpl2(MyBaseSpider):
def parse(self):
#Implement site specific checks
self.new_items.append(new_found_item)
This seems to have been running well, new items get emailed to me on a per-site basis. However I've recently had some emails from MySpiderImpl1 which contain items from Site 2.
I'm following the documentation to run from a script:
scraper_settings = get_project_settings()
runner = CrawlerRunner(scraper_settings)
configure_logging()
sites = get_spider_names()
for site in sites:
runner.crawl(site.spider_name)
d = runner.join()
d.addBoth(lambda _: reactor.stop())
reactor.run()
I suspect the solution here is to switch to a pipeline which collates the items for a site and emails them out when pipeline.close_spider is called but I was surprised to see the new_items variable leaking between spiders.
Is there any documentation on concurrent runs? Is it bad practice to keep variables on a base class? I do also track other pieces of information on the spiders in variables such as the run number - should this be tracked elsewhere?
In python all class variables are shared between all instances and subclasses. So your MyBaseSpider.new_items is the exact same list that is used by MySpiderImpl1.new_items and MySpiderImpl2.new_items.
As you suggested you could implement a pipeline, although this might require significantly refactoring your current code. It could look something like this.
pipelines.py
class MyPipeline:
def process_item(self, item, spider):
if spider.name == 'site1':
... email item
elif spider.name == 'site2':
... do something different
I am assuming all of your spiders have names... I think it's a requirement.
Another option that probably requires less effort might be to override the start_requests method in your base class to assign a unique list at start of the crawling process.

Custom strategy for warden not getting called

I am trying to use a different warden strategy to authenticate my action cable end points.
But the strategy is not getting called. I tried to place warden.authenticate!(:action_cable_auth) in a controller to test but none of the debug statements are getting printed on console.
Below are the relevant part of the code.
config/initializers/warden.rb
Warden::Strategies.add(:action_cable_auth) do
def valid?
#check if its a websocket request & for action cable?
#Rails.logger.error request.inspect
p 'checking if strategy is valid?'
true
end
def authenticate!
p 'unauthenticate the user'
fail!('user not active')
end
end
in my controller
warden.authenticate!(:action_cable_auth)
Assuming that you are setting your initializer in the proper place, please recall that if your session is already instantiated somewhere else (for example if you authenticate the user at the point your action is being called, then your strategy will never be called.
This is basically how warden works: if some valid? strategy returns a success! then no other will be called as soon as any authenticate! method in the list of strategies is successful.
Please also be sure that if you want your strategy up the list of strategies to check you may need to also shift it up on the list, such as:
manager.default_strategies(scope: :user).unshift(:action_cable_auth)
Where the manager is your Warden::Manager instance. The scope may also be optional (this is an example where the user scope is used alongside Devise), but you may check your instance .default_strategies to figure out where it is and where you want it.

Multiple mongoDB related to same django rest framework project

We are having one django rest framework (DRF) project which should have multiple databases (mongoDB).Each databases should be independed. We are able to connect to one database, but when we are going to another DB for writing connection is happening but data is storing in DB which is first connected.
We changed default DB and everything but no changes.
(Note : Solution should be apt for the usage of serializer. Because we need to use DynamicDocumentSerializer in DRF-mongoengine.
Thanks in advance.
While running connect() just assign an alias for each of your databases and then for each Document specify a db_alias parameter in meta that points to a specific database alias:
settings.py:
from mongoengine import connect
connect(
alias='user-db',
db='test',
username='user',
password='12345',
host='mongodb://admin:qwerty#localhost/production'
)
connect(
alias='book-db'
db='test',
username='user',
password='12345',
host='mongodb://admin:qwerty#localhost/production'
)
models.py:
from mongoengine import Document
class User(Document):
name = StringField()
meta = {'db_alias': 'user-db'}
class Book(Document):
name = StringField()
meta = {'db_alias': 'book-db'}
I guess, I finally get what you need.
What you could do is write a really simple middleware that maps your url schema to the database:
from mongoengine import *
class DBSwitchMiddleware:
"""
This middleware is supposed to switch the database depending on request URL.
"""
def __init__(self, get_response):
# list all the mongoengine Documents in your project
import models
self.documents = [item for in dir(models) if isinstance(item, Document)]
def __call__(self, request):
# depending on the URL, switch documents to appropriate database
if request.path.startswith('/main/project1'):
for document in self.documents:
document.cls._meta['db_alias'] = 'db1'
elif request.path.startswith('/main/project2'):
for document in self.documents:
document.cls._meta['db_alias'] = 'db2'
# delegate handling the rest of response to your views
response = get_response(request)
return response
Note that this solution might be prone to race conditions. We're modifying a Documents globally here, so if one request was started and then in the middle of its execution a second request is handled by the same python interpreter, it will overwrite document.cls._meta['db_alias'] setting and first request will start writing to the same database, which will break your database horribly.
Same python interpreter is used by 2 request handlers, if you're using multithreading. So with this solution you can't start your server with multiple threads, only with multiple processes.
To address the threading issues, you can use threading.local(). If you prefer context manager approach, there's also a contextvars module.

How to access `request_seen()` inside Spider?

I have a Spider and I have a situation where I want to check if the request I am going to schedule already exists in request_seen() or not?
I don't want any method to check inside a download/spider middleware, I just want to check inside my Spider.
Is there any way to call that method?
You should be able to access the dupe filter itself from the spider like this:
self.dupefilter = self.crawler.engine.slot.scheduler.df
then you could use that in other places to check:
req = scrapy.Request('whatever')
if self.dupefilter.request_seen(req):
# it's already been seen
pass
else:
# never saw this one coming
pass
I did something similar to yours with pipeline. Following command is the code that I use.
You should specify an identifier and then go with it to check whether it is seen or not.
class SeenPipeline(object):
def __init__(self):
self.isbns_seen = set()
def process_item(self, item, spider):
if item['isbn'] in self.isbns_seen:
raise DropItem("Duplicate item found : %s" %item)
else:
self.isbns_seen.add(item['isbn'])
return item
Note: You can use these codes within your spider, too

Testing Dexterity content creation in isolation

For a project, I have a complex master object that contains a number of subcomponents. Set up of these objects is controlled by a Constructor interface, which I bind to various lifecycle & workflow events, like so:
#grok.subscribe(schema.ICustomFolder, lifecycleevent.IObjectAddedEvent)
def setup_custom_folder(folder, event):
interfaces.IConstructor(folder).setup()
#grok.subscribe(schema.ICustomFolder, lifecycleevent.IObjectModifiedEvent)
def setup_custom_folder(folder, event):
interfaces.IConstructor(folder).update()
What I'd like to be able to do is test the Constructor methods without relying on the event handlers. I've tried doing this by creating objects directly to avoid the lifecycle events:
def test_custom_item_constructor(self):
master = createContent('model.master_object',
needed_attribute = 2
)
folder = createContent('model.custom_folder',
__parent__ = master
)
self.assertEqual(0, len(folder))
constructor = interfaces.IConstructor(folder)
constructor.setup()
self.assertEqual(2, len(folder))
The setup method creates a number of items inside the Custom_Folder instance, dependent on the provided attribute on the master object. However, this is hanging, which I think is due to neither object actually belonging to the site, so there's no acquisition of permissions. I can get this by changing the createContent on the master object to createContentInContainer and adding it to the appropriate part of the test site, but that triggers all of the lifecycle events, which end up doing the Constructor calls, which doesn't let me test them in isolation.
I've tried using mock objects for this, but that got messy dealing with the content creation that is meant to occur during the Constructor .setup.
What's the best way to approach this?
I'm not sure if this is the best way, but I managed to get the result I wanted by disabling the relevant event handlers first, and then creating the content properly within the site:
def test_custom_item_constructor(self):
zope.component.getGlobalSiteManager().unregisterHandler(
adapters.master.constructor.setup_masterobject,
required=[schema.IMasterObject, lifecycleevent.IObjectAddedEvent]
)
zope.component.getGlobalSiteManager().unregisterHandler(
adapters.custom.constructor.setup_customfolder,
required=[schema.ICustomFolder, lifecycleevent.IObjectAddedEvent]
)
master = createContentInContainer(self.portal, 'model.master_object',
needed_attribute = 2
)
folder = createContentInContainer(master, 'model.custom_folder',
__parent__ = master
)
self.assertEqual(0, len(folder))
constructor = interfaces.IConstructor(folder)
constructor.setup()
self.assertEqual(2, len(folder))
This was enough to disengage the chain of events triggered by the addition of a new master object.