I'm trying to load a template visit_form.html which is a DetailView with a form within it. Each time I click on a link from main.html the wrong template gets loaded -> main_detail.html. I have cleared browser cache, invalidated caches.
The goal is to have the MainVisitDisplay render the visit_form.html, but all I get is the main_detail.html. It throws an error for main_detail.html when I change the location of the main_detail.html template, and throws a "TemplateDoesNotExist" error, looking for the main_detail.html template.
My MWE is:
urls.py
from django.conf.urls import url
from . import views
from django.urls import path
urlpatterns = [
path('', views.index, name='index'),
path('main/', views.MainListView.as_view(), name='main'),
path('main/<int:pk>/', views.MainDetailView.as_view(), name='main_detail'),
path('visit/add/<int:pk>/', views.MainVisitDisplay.as_view(), name='visit_form'),
]
views.py
class MainVisitDisplay(DetailView):
model = Main
template = "visit_form.html"
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
context['form'] = VisitForm()
return context
class MainDetailView(generic.DetailView):
template_name = "clincher/main_detail.html"
model = Main
main.html (template) url
{% url 'clincher:visit_form' main.id %}
This was a really simple. use template_name = "template_name.html" NOT template = "template_name.html. Not sure why it kept rendering the other templates. Also, apparently, Django 2.0 does not cache templates, but feel free to confirm or deny this.
Related
I've created a new XBLock and trying to pass data from python file to HTML file, but the standard template syntax is throwing error, HTML file code below
<p>EdlyCarouselXBlock: count is now
<span class='count'>{self.count}</span> (click me to increment).
{{ self.count }}
{% if self.count%}
<p>Simple If</p>
{%endif%}
</p>
<p id="decrement">(click me to Decrement).</p>
</div>
It is supporting a single bracket to access values {self.name}, for loops & if-else it throws an error. Tried using python 3.6.5 & 3.8.0, the same issue on both versions, any help would be appreciated, same issue in the lms/cms as well.
Solve the issue, posting the solution for future references.
Default xblock is created without using Django templating so we need to configure it manually.
Following default student_view function is added
def student_view(self, context=None):
"""
The primary view of the SampleXBlock, shown to students
when viewing courses.
"""
html = self.resource_string("static/html/sample.html")
frag = Fragment(html.format(self=self))
frag.add_css(self.resource_string("static/css/sample.css"))
frag.add_javascript(self.resource_string("static/js/src/sample.js"))
frag.initialize_js('SampleXBlock')
return frag
so in order to involve Django template rendering make the following changes.
include template and context
from django.template import Context, Template
# Add this method in your xblock class
def render_template(self, template_path, context={}):
template_str = self.resource_string(template_path)
template = Template(template_str)
return template.render(Context(context))
def student_view(self, context=None):
"""
The primary view of the SampleXBlock, shown to students
when viewing courses.
"""
frag = Fragment()
html = self.render_template("static/html/sample.html", {'count': self.count})
frag.add_content(html)
frag.add_css(self.resource_string("static/css/sample.css"))
frag.add_javascript(self.resource_string("static/js/src/sample.js"))
frag.initialize_js('SampleXBlock')
return frag
I have a template in my django application and I need to get it rendered in a variable or save it in an html file.
My goal is to convert the html rendering of the template to pdf, I am using pdfkit since it is the best html to pdf converter I have seen, reportlab does not do what I want.
When I try to do something like this:
pdf = pdfkit.from_file ('app / templates / app / table.html', 'table.pdf')
I get the pdf but print something like this:
enter image description here
I appreciate any help!
This is the solution to my case that I use django 2.0.1 and pdfkit 0.6.1:
To obtain the template:
template = get_template ('plapp / person_list.html')
To render it with the data:
html = template.render ({'persons': persons})
To continuation the definition of the method in views.py, the one that downloads the pdf directly in the browser:
def pdf(request):
persons = Person.objects.all()
template = get_template('plapp/person_list.html')
html = template.render({'persons': persons})
options = {
'page-size': 'Letter',
'encoding': "UTF-8",
}
pdf = pdfkit.from_string(html, False, options)
response = HttpResponse(pdf, content_type='application/pdf')
response['Content-Disposition'] = 'attachment;
filename="pperson_list_pdf.pdf"'
return response
from django.template.loader import get_template, render_to_string
Use the above to import functions that return the template. get_template returns the template object while render_to_string returns the string of a rendered template. Here's how I do it using weasyprint not pdfkit though.
def weasy_pdf_generation(request, id):
# my data
_, _, draft_details = get_draft_details('setup', request, id)
radios_dict = {k:v[1] for k,v in draft_details.items()}
# rendering to string
html_template = render_to_string('tax/setupreview report.html', radios_dict)
styles = CSS(url="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css")
pdf_file = HTML(string=html_template).write_pdf(stylesheets=[styles])
#response details
response = HttpResponse(pdf_file, content_type='application/pdf')
response['Content-Disposition'] = 'filename="home_page.pdf"'
return response
my question is similar to this post:
How to use scrapy for Amazon.com links after "Next" Button?
I want my crawler to traverse through all "Next" links. I've searched a lot, but most people ether focus on how to parse the ULR or simply put all URL's in the initial URL list.
So far, I am able to visit the first page and parse the next page's link. But I don't know how to visit that page using the same crawler(spider). I tried to append the new URL into my URL list, it does appended (I checked the length), but later it doesn't visit the link. I have no idea why...
Note that in my case, I only know the first page's URL. Second page's URL can only be obtained after visiting the first page. The same, (i+1)'th page's URL is hidden in the i'th page.
In the parse function, I can parse and print the correct next page link URL. I just don't know how to visit it.
Please help me. Thank you!
import scrapy
from bs4 import BeautifulSoup
class RedditSpider(scrapy.Spider):
name = "test2"
allowed_domains = ["http://www.reddit.com"]
urls = ["https://www.reddit.com/r/LifeProTips/search?q=timestamp%3A1427232122..1437773560&sort=new&restrict_sr=on&syntax=cloudsearch"]
def start_requests(self):
for url in self.urls:
yield scrapy.Request(url, self.parse, meta={
'splash': {
'endpoint': 'render.html',
'args': { 'wait': 0.5 }
}
})
`
def parse(self, response):
page = response.url[-10:]
print(page)
filename = 'reddit-%s.html' % page
#parse html for next link
soup = BeautifulSoup(response.body, 'html.parser')
mydivs = soup.findAll("a", { "rel" : "nofollow next" })
link = mydivs[0]['href']
print(link)
self.urls.append(link)
with open(filename, 'wb') as f:
f.write(response.body)
Update
Thanks to Kaushik's answer, I figured out how to make it work. Though I still don't know why my original idea of appending new URL's doesn't work...
The updated code is as follow:
import scrapy
from bs4 import BeautifulSoup
class RedditSpider(scrapy.Spider):
name = "test2"
urls = ["https://www.reddit.com/r/LifeProTips/search?q=timestamp%3A1427232122..1437773560&sort=new&restrict_sr=on&syntax=cloudsearch"]
def start_requests(self):
for url in self.urls:
yield scrapy.Request(url, self.parse, meta={
'splash': {
'endpoint': 'render.html',
'args': { 'wait': 0.5 }
}
})
def parse(self, response):
page = response.url[-10:]
print(page)
filename = 'reddit-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
#parse html for next link
soup = BeautifulSoup(response.body, 'html.parser')
mydivs = soup.findAll("a", { "rel" : "nofollow next" })
if len(mydivs) != 0:
link = mydivs[0]['href']
print(link)
#yield response.follow(link, callback=self.parse)
yield scrapy.Request(link, callback=self.parse)
What you require is explained very well in the Scrapy docs . I don't think you would need any other explanation other than that. Suggest going through it once for better understanding.
A brief explanation first though:
To follow a link to the next page, Scrapy provides many methods. The most basic methods is using the http.Request method
Request object :
class scrapy.http.Request(url[, callback,
method='GET', headers, body, cookies, meta, encoding='utf-8',
priority=0, dont_filter=False, errback, flags])
>>> yield scrapy.Request(url, callback=self.next_parse)
url (string) – the URL of this request
callback (callable) – the function that will be called with the response of this request (once its downloaded) as its first parameter.
For convenience though, Scrapy has inbuilt shortcut for creating Request objects by using response.follow where the url can be an absolute path or a relative path.
follow(url, callback=None, method='GET', headers=None, body=None,
cookies=None, meta=None, encoding=None, priority=0, dont_filter=False,
errback=None)
>>> yield response.follow(url, callback=self.next_parse)
In case if you have to move through to the next link by passing values to a form or any other type of input field, you can use the Form Request objects. The FormRequest class extends the base Request with functionality
for dealing with HTML forms. It uses lxml.html forms to pre-populate
form fields with form data from Response objects.
Form Request object
from_response(response[, formname=None,
formid=None, formnumber=0, formdata=None, formxpath=None,
formcss=None, clickdata=None, dont_click=False, ...])
If you want to simulate a HTML Form POST in your spider and send a couple of key-value fields, you can return a FormRequest object (from your spider) like this:
return [FormRequest(url="http://www.example.com/post/action",
formdata={'name': 'John Doe', 'age': '27'},
callback=self.after_post)]
Note : If a Request doesn’t specify a callback, the spider’s parse() method will be used. If exceptions are raised during processing, errback is called instead.
I am newbie to scrapy. I am trying to download an image from here. I was following Official-Doc and this article.
My settings.py looks like:
BOT_NAME = 'shopclues'
SPIDER_MODULES = ['shopclues.spiders']
NEWSPIDER_MODULE = 'shopclues.spiders'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {
'scrapy.contrib.pipeline.images.ImagesPipeline':1
}
IMAGES_STORE="home/pr.singh/Projects"
and items.py looks like:
import scrapy
from scrapy.item import Item
class ShopcluesItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pass
class ImgData(Item):
image_urls=scrapy.Field()
images=scrapy.Field()
I think both these files are good. But I am unable to write correct spider for getting the image. I am able to grab the image URL but don't know how to store image using imagePipeline.
My spider looks like:
from shopclues.items import ImgData
import scrapy
import datetime
class DownloadFirstImg(scrapy.Spider):
name="DownloadfirstImg"
start_urls=[
'http://www.shopclues.com/canon-powershot-sx410-is-2.html',
]
def parse (self, response):
url= response.css("body div.site-container div#container div.ml_containermain div.content-helper div.aside-site-content div.product form#product_form_83013851 div.product-gallery div#product_images_83013851_update div.slide a#det_img_link_83013851_25781870")
yield scrapy.Request(url.xpath('#href').extract(),self.parse_page)
def parse_page(self,response):
imgURl=response.css("body div.site-container div#container div.ml_containermain div.content-helper div.aside-site-content div.product form#product_form_83013851 div.product-gallery div#product_images_83013851_update div.slide a#det_img_link_83013851_25781870::attr(href)").extract()
yield {
ImgData(image_urls=[imgURl])
}
I have written the spider following this-article. But I am not getting anything. I run my spider as scrapy crawl DownloadfirstImg -o img5.json
but I am not getting any json nor any image? Any help on How to grab image if it's url is known. I have never worked with python also so things seem much complicated to me. Links to any good tutorial may help. TIA
I don't understand why you yield a request for an image you just need to save it on the item and the images pipeline will do the rest, this is all you need.
def parse (self, response):
url= response.css("body div.site-container div#container div.ml_containermain div.content-helper div.aside-site-content div.product form#product_form_83013851 div.product-gallery div#product_images_83013851_update div.slide a#det_img_link_83013851_25781870")
yield ImgData(image_urls=[url.xpath('#href').extract_first()])
I'm trying to scrape content from listing detail page that can only be viewed by clicking the 'view' button which triggers a form submit . I am new to both Python and Scrapy
Example markup
<li><h3>Abc Widgets</h3>
<form action="/viewlisting?id=123" method="post">
<input type="image" src="/images/view.png" value="submit" >
</form>
</li>
My solution in Scrapy is to extract form actions then use Request to return the page with a callback to parse it for for the desired content. However I have hit a few issues
I'm getting the following error "request url must be str or unicode"
secondly when I hardcode a URL to overcome the above issue it seems my parsing function is returning what looks like a list
Here is my code - with reactions of the real URLs
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from wfi2.items import Wfi2Item
class ProfileSpider(Spider):
name = "profiles"
allowed_domains = ["wfi.com.au"]
start_urls = ["http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=WA",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=VIC",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=QLD",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=NSW",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=TAS"
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=NT"
]
def parse(self, response):
hxs = Selector(response)
forms = hxs.xpath('//*[#id="area-managers"]//*/form')
for form in forms:
action = form.xpath('#action').extract()
print "ACTION: ", action
#request = Request(url=action,callback=self.parse_profile)
request = Request(url=action,callback=self.parse_profile)
yield request
def parse_profile(self, response):
hxs = Selector(response)
profile = hxs.xpath('//*[#class="contentContainer"]/*/text()')
print "PROFILE", profile
I'm getting the following error "request url must be str or unicode"
Please have a look at the scrapy documentation for extract(). It says : "Serialize and return the matched nodes as a list of unicode strings" (bold added by me).
The first element of the list is probably what you want. So you could do something like:
request = Request(url=response.urljoin(action[0]), callback=self.parse_profile)
secondly when I hardcode a URL to overcome the above issue it seems my
parsing function is returning what looks like a list
According to the documentation of xpath it's a SelectorList. Add extract() to the xpath and you'll get a list with the text tokens. Eventually you want to clean up and join the elements that list before further processing.