Django - URL template tag breaks with subdomains

Django - URL template tag breaks with subdomains - django-templates

I use a custom middleware for mapping subdomains to applications' urls.py by assigning the relevant urls.py to the request.urlconf variable.
This works fine, with the exception of the {% url %} template tag.
I'm getting a NoReverseMatch and can't figure out why.
The debug page shows that the reverse function receives a value in the view_name parameter, so it should work.
This happens to every {% url %} tag in the template.
If I switch to folder-like urls (eg myproject.com/sub, instead of sub.myproject.com), the tags work fine.
Any ideas on why this happens and how can this be fixed are much appreciated.

Well, after not finding what I needed, I came to the following solution.
I discarded the request.urlconf mapping and went for rewriting request.path_info.
I'm posting my solution here, in case someone runs into this problem.
First step, add the following middleware to your project:
class SubdomainMiddleware:
"""Subdomain for Django"""
def process_request(self, request):
domain_parts = request.get_host().lower().replace('www.', '').split('.example.com', 1)
if len(domain_parts) > 1 and domain_parts[0]:
subdomain = domain_parts[0]
else:
subdomain = None
if subdomain:
if request.path_info[-1] != '/':
request.path_info += '/'
request.path_info = '/%s%s' % (subdomain, request.path_info)
Next, add the following code that overrides the reverse function:
from django.conf import settings
from django.core import urlresolvers
_reverse = urlresolvers.reverse
def reverse(*args, **kwargs):
# In case reversing a full url
if args[0].startswith('http'):
return args[0]
# In case reversing a url name
if '/' not in args[0]:
url = _reverse(*args, **kwargs)
else:
# In case reversing a url path
url = args[0]
parts = url.strip('/').split('/', 1)
subdomain = parts[0]
path = parts[1] if len(parts) > 1 else ''
protocol = 'http://' if settings.DEBUG else 'https://'
return '%s%s%s/%s' % (protocol, subdomain, '.example.com', path)
urlresolvers.reverse = reverse
I placed it in the same file the custom middleware sits.
That's it!
As far as I know and tested, everything works: reversing, redirecting, template {% url %} tags etc.
Note:
I made three assumptions in this code:
The domain name is example.com. Change it as you need, place it in settings.py, whatever suits you.
The subdomain maps to a subfolder. E.g. sub.example.com will be converted to example.com/sub.
My site uses SSL anywhere. So I simply check for settings.DEBUG value in the custom reverse function to see if it should use http or https.
Hope it helps.

Related

How to make scrapy wait for request result before continuing to next line

In my spider, I have some code like this:
next_page_url = response.follow(
url=self.start_urls[0][:-1]+str(page_number+1),
callback=self.next_page
)
if next_page_url:
next_page looks like this:
def next_page(self, response):
next_page_count = len(<xpath I use>)
if next_page_count > 0:
return True
else:
return False
I need next_page_url to be set before I can continue the next segment of code.
This code essentially checks if the current page is the last page for some file writing purposes

The answer I ended up going with:
instead of checking if the next page exists and then continuing on current request, I made the request to the page, checked if I got a response, and if I didn't, I said that the previous page was the final page.
I did this by using the meta keyword in scrapy's Request library (response.follow()) to pass the current page's necessary tracking data into the new request

Spider not recursively calling itself after setting callback

The goal of my project is to search a website for a company phone number.
I'm attempting to parse a webpage and regex for a phone number (I have that part working) and then look for links on the page. These links are what I want to recursively call. So I would call the function on those links and repeat. However, it is only running the function once. See code below:
def parse(self, response):
# The main method of the spider. It scrapes the URL(s) specified in the
# 'start_url' argument above. The content of the scraped URL is passed on
# as the 'response' object.
hxs = HtmlXPathSelector(response)
#print(phone_detail)
print('here')
for phone_num in response.xpath('//body').re(r'\d{3}.\d{3}.\d{4}'):
item = PhoneNumItem()
item['label'] = "a"
item['phone_num'] = phone_num
yield item
for url in hxs.xpath('//a/#href').extract():
# This loops through all the URLs found
# Constructs an absolute URL by combining the responses URL with a possible relative URL:
next_page = response.urljoin(url)
print("Found URL: " + next_page)
#yield response.follow(next_page, self.parse_page)
yield scrapy.Request(next_page, callback=self.parse)
Please let me know what you think...to me it seems like this code should work, but it is not.

Django 2.0 - Rendering wrong template (with no error)

I'm trying to load a template visit_form.html which is a DetailView with a form within it. Each time I click on a link from main.html the wrong template gets loaded -> main_detail.html. I have cleared browser cache, invalidated caches.
The goal is to have the MainVisitDisplay render the visit_form.html, but all I get is the main_detail.html. It throws an error for main_detail.html when I change the location of the main_detail.html template, and throws a "TemplateDoesNotExist" error, looking for the main_detail.html template.
My MWE is:
urls.py
from django.conf.urls import url
from . import views
from django.urls import path
urlpatterns = [
path('', views.index, name='index'),
path('main/', views.MainListView.as_view(), name='main'),
path('main/<int:pk>/', views.MainDetailView.as_view(), name='main_detail'),
path('visit/add/<int:pk>/', views.MainVisitDisplay.as_view(), name='visit_form'),
]
views.py
class MainVisitDisplay(DetailView):
model = Main
template = "visit_form.html"
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
context['form'] = VisitForm()
return context
class MainDetailView(generic.DetailView):
template_name = "clincher/main_detail.html"
model = Main
main.html (template) url
{% url 'clincher:visit_form' main.id %}

This was a really simple. use template_name = "template_name.html" NOT template = "template_name.html. Not sure why it kept rendering the other templates. Also, apparently, Django 2.0 does not cache templates, but feel free to confirm or deny this.

Get url in body_html odoo 9

Is it possible get web page url in xml template?
<field name="body_html">
<![CDATA[
<p>Get url here</p>
]]>
</field>
Note: ${object.id}
return id eg. 10

In my opinion, QWeb has some fallbacks when it comes to doing what would otherwise be a simple thing in Python, such as accessing env to browse, search, or get other data like dbname or the company parameters like base_url.
What I've done in the past is just create a helper to have Python do the dirty work for you so that you can keep QWeb simple.
your_module/helpers/mixins.py
class CanGenerateUrl:
def generate_url(self):
"""
Build the URL to the record's form view.
- Base URL + Database Name + Record ID + Model Name
:param self: any Odoo record browse object (with access to env, _cr, and _model)
:return: string with url
"""
self.ensure_one()
base_url = self.env['ir.config_parameter'].get_param('web.base.url')
if base_url and base_url[-1:] != '/':
base_url += '/'
db = self._cr.dbname
return "{}web?db={}#id={}&view_type=form&model={}".format(base_url, db, self.id, self._model)
your_module/models/model.py
from openerp.addons.your_module.helpers.mixins import CanGenerateUrl
class YourModel(models.Model, CanGenerateUrl):
your_module/views/report.xml
<p><a href="${object.generate_url()}">${object.name or 'None'}<a/></p>
Reports Documentation
QWeb Documentation

Scrapy Running Results

Just getting started with Scrapy, I'm hoping for a nudge in the right direction.
I want to scrape data from here:
https://www.sportstats.ca/display-results.xhtml?raceid=29360
This is what I have so far:
import scrapy
import re
class BlogSpider(scrapy.Spider):
name = 'sportstats'
start_urls = ['https://www.sportstats.ca/display-results.xhtml?raceid=29360']
def parse(self, response):
headings = []
results = []
tables = response.xpath('//table')
headings = list(tables[0].xpath('thead/tr/th/span/span/text()').extract())
rows = tables[0].xpath('tbody/tr[contains(#class, "ui-widget-content ui-datatable")]')
for row in rows:
result = []
tds = row.xpath('td')
for td in enumerate(tds):
if headings[td[0]].lower() == 'comp.':
content = None
elif headings[td[0]].lower() == 'view':
content = None
elif headings[td[0]].lower() == 'name':
content = td[1].xpath('span/a/text()').extract()[0]
else:
try:
content = td[1].xpath('span/text()').extract()[0]
except:
content = None
result.append(content)
results.append(result)
for result in results:
print(result)
Now I need to move on to the next page, which I can do in a browser by clicking the "right arrow" at the bottom, which I believe is the following li:
<li><a id="mainForm:j_idt369" href="#" class="ui-commandlink ui-widget fa fa-angle-right" onclick="PrimeFaces.ab({s:"mainForm:j_idt369",p:"mainForm",u:"mainForm:result_table mainForm:pageNav mainForm:eventAthleteDetailsDialog",onco:function(xhr,status,args){hideDetails('athlete-popup');showDetails('event-popup');scrollToTopOfElement('mainForm\\:result_table');;}});return false;"></a>
How can I get scrapy to follow that?

If you open the url in a browser without javascript you won't be able to move to the next page. As you can see inside the li tag, there is some javascript to be executed in order to get the next page.
Yo get around this, the first option is usually try to identify the request generated by javascript. In your case, it should be easy: just analyze the java script code and replicate it with python in your spider. If you can do that, you can send the same request from scrapy. If you can't do it, the next option is usually to use some package with javascript/browser emulation or someting like that. Something like ScrapyJS or Scrapy + Selenium.

You're going to need to perform a callback. Generate the url from the xpath from the 'next page' button. So url = response.xpath(xpath to next_page_button) and then when you're finished scraping that page you'll do yield scrapy.Request(url, callback=self.parse_next_page). Finally you create a new function called def parse_next_page(self, response):.
A final, final note is if it happens to be in Javascript (and you can't scrape it even if you're sure you're using the correct xpath) check out my repo in using splash with scrapy https://github.com/Liamhanninen/Scrape

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Django - URL template tag breaks with subdomains - django-templates

Related

How to make scrapy wait for request result before continuing to next line

Spider not recursively calling itself after setting callback

Django 2.0 - Rendering wrong template (with no error)

Get url in body_html odoo 9

Scrapy Running Results

Categories

Resources