How to get the rendered template from django?-pdfkit - django-templates

I have a template in my django application and I need to get it rendered in a variable or save it in an html file.
My goal is to convert the html rendering of the template to pdf, I am using pdfkit since it is the best html to pdf converter I have seen, reportlab does not do what I want.
When I try to do something like this:
pdf = pdfkit.from_file ('app / templates / app / table.html', 'table.pdf')
I get the pdf but print something like this:
enter image description here
I appreciate any help!

This is the solution to my case that I use django 2.0.1 and pdfkit 0.6.1:
To obtain the template:
template = get_template ('plapp / person_list.html')
To render it with the data:
html = template.render ({'persons': persons})
To continuation the definition of the method in views.py, the one that downloads the pdf directly in the browser:
def pdf(request):
persons = Person.objects.all()
template = get_template('plapp/person_list.html')
html = template.render({'persons': persons})
options = {
'page-size': 'Letter',
'encoding': "UTF-8",
}
pdf = pdfkit.from_string(html, False, options)
response = HttpResponse(pdf, content_type='application/pdf')
response['Content-Disposition'] = 'attachment;
filename="pperson_list_pdf.pdf"'
return response

from django.template.loader import get_template, render_to_string
Use the above to import functions that return the template. get_template returns the template object while render_to_string returns the string of a rendered template. Here's how I do it using weasyprint not pdfkit though.
def weasy_pdf_generation(request, id):
# my data
_, _, draft_details = get_draft_details('setup', request, id)
radios_dict = {k:v[1] for k,v in draft_details.items()}
# rendering to string
html_template = render_to_string('tax/setupreview report.html', radios_dict)
styles = CSS(url="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css")
pdf_file = HTML(string=html_template).write_pdf(stylesheets=[styles])
#response details
response = HttpResponse(pdf_file, content_type='application/pdf')
response['Content-Disposition'] = 'filename="home_page.pdf"'
return response

Related

Splash return embedded response

I am looking to return an embedded response from a website. This website makes it very difficult to reach this embedded response without javascript so I am hoping to use splash. I am not interested in returning the rendered HTML, but rather one embedded response. Below is a screenshot of the exact response that I am looking to get back from splash.
This response returns a JSON object to the site to render, I would like the raw JSON returned from this response, how do I do this in Lua?
Turns out this is a bit tricky. The following is the kludge I have found to do this:
Splash call with LUA script, called from Scrapy:
scrpitBusinessUnits = """
function main(splash, args)
splash.request_body_enabled = true
splash.response_body_enabled = true
assert(splash:go(args.url))
assert(splash:wait(18))
splash:runjs('document.getElementById("RESP_INQA_WK_BUSINESS_UNIT$prompt").click();')
assert(splash:wait(20))
return {
har = splash:har(),
}
end
"""
yield SplashRequest(
url=self.start_urls[0],
callback=self.parse,
endpoint='execute',
magic_response=True,
meta={'handle_httpstatus_all': True},
args={'lua_source': scrpitBusinessUnits,'timeout':90,'images':0},
)
This script works by returning the HAR file of the whole page load, it is key to set splash.request_body_enabled = true and splash.response_body_enabled = true to get the actual response content in the HAR file.
The HAR file is just a glorified JSON object with a different name... so:
def parse(self, response):
harData = json.loads(response.text)
responseData = harData['har']['log']['entries']
...
# Splash appears to base64 encode large content fields,
# you may have to decode the field to load it properly
bisData = base64.b64decode(bisData['content']['text'])
From there you can search the JSON object for the exact embedded response.
I really dont think this is a very efficient method, but it works.

Acess data image url when the data url is only obtain upon rendering

I would like to automatically get images saved as browser's data after the page renders, using their corresponding data URLs.
For example:
You can go to the webpage: https://en.wikipedia.org/wiki/Truck
Using the WebInspector from Firefox pick the first thumbnail image on the right.
Now on the Inspector tab, right click over the img tag, go to Copy and press "Image Data-URL"
Open a new tab, paste and enter to see the image from the data URL.
Notice that the data URL is not available on the page source. On the website I want to scrape, the images are rendered after passing through a php script. The server returns a 404 response if the images try to be accessed directly with the src tag attribute.
I believe it should be possible to list the data URLs of the images rendered by the website and download them, however I was unable to find a way to do it.
I normally scrape using selenium webdriver with Firefox coded in python, but any solution would be welcome.
I managed to work out a solution using chrome webdriver with CORS disabled as with Firefox I could not find a cli argument to disable it.
The solution executes some javascript to redraw the image on a new canvas element and then use toDataURL method to get the data url. To save the image I convert the base64 data to binary data and save it as png.
This apparently solved the issue in my use case.
Code to get first truck image
from binascii import a2b_base64
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--disable-web-security")
chrome_options.add_argument("--disable-site-isolation-trials")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://en.wikipedia.org/wiki/Truck")
img = driver.find_element_by_xpath("/html/body/div[3]/div[3]"
"/div[5]/div[1]/div[4]/div"
"/a/img")
img_base64 = driver.execute_script(
"""
const img = arguments[0];
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
canvas.width = img.width;
canvas.height = img.height;
ctx.drawImage(img, 0, 0);
data_url = canvas.toDataURL('image/png');
return data_url
""",
img)
binary_data = a2b_base64(img_base64.split(',')[1])
with open('image.png', 'wb') as save_img:
save_img.write(binary_data)
Also, I found that the data url that you get with the procedure described in my question, was generated by the Firefox web inspector on request, so it should not be possible to get a list of data urls (that are not within the page source) as I first thought.
BeautifulSoup is the best library to use for such problem statements. When u wanna retrieve data from any website, u can blindly use BeautifulSoup as it is faster than selenium. BeautifulSoup just takes around 10 seconds to complete this task, whereas selenium would approximately take 15-20 seconds to complete the same task, so it is better to use BeautifulSoup. Here is how u do it using BeautifulSoup:
from bs4 import BeautifulSoup
import requests
import time
st = time.time()
src = requests.get('https://en.wikipedia.org/wiki/Truck').text
soup = BeautifulSoup(src,'html.parser')
divs = soup.find_all('div',class_ = "thumbinner")
count = 1
for x in divs:
url = x.a.img['srcset']
url = url.split('1.5x,')[-1]
url = url.split('2x')[0]
url = "https:" + url
url = url.replace(" ","")
path = f"D:\\Truck_Img_{count}.png"
response = requests.get(url)
file = open(path, "wb")
file.write(response.content)
file.close()
count+=1
print(f"Execution Time = {time.time()-st} seconds")
Output:
Execution Time = 9.65831208229065 seconds
29 Images. Here is the first image:
Hope that this helps!

Django 2.0 - Rendering wrong template (with no error)

I'm trying to load a template visit_form.html which is a DetailView with a form within it. Each time I click on a link from main.html the wrong template gets loaded -> main_detail.html. I have cleared browser cache, invalidated caches.
The goal is to have the MainVisitDisplay render the visit_form.html, but all I get is the main_detail.html. It throws an error for main_detail.html when I change the location of the main_detail.html template, and throws a "TemplateDoesNotExist" error, looking for the main_detail.html template.
My MWE is:
urls.py
from django.conf.urls import url
from . import views
from django.urls import path
urlpatterns = [
path('', views.index, name='index'),
path('main/', views.MainListView.as_view(), name='main'),
path('main/<int:pk>/', views.MainDetailView.as_view(), name='main_detail'),
path('visit/add/<int:pk>/', views.MainVisitDisplay.as_view(), name='visit_form'),
]
views.py
class MainVisitDisplay(DetailView):
model = Main
template = "visit_form.html"
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
context['form'] = VisitForm()
return context
class MainDetailView(generic.DetailView):
template_name = "clincher/main_detail.html"
model = Main
main.html (template) url
{% url 'clincher:visit_form' main.id %}
This was a really simple. use template_name = "template_name.html" NOT template = "template_name.html. Not sure why it kept rendering the other templates. Also, apparently, Django 2.0 does not cache templates, but feel free to confirm or deny this.

Scrapy Running Results

Just getting started with Scrapy, I'm hoping for a nudge in the right direction.
I want to scrape data from here:
https://www.sportstats.ca/display-results.xhtml?raceid=29360
This is what I have so far:
import scrapy
import re
class BlogSpider(scrapy.Spider):
name = 'sportstats'
start_urls = ['https://www.sportstats.ca/display-results.xhtml?raceid=29360']
def parse(self, response):
headings = []
results = []
tables = response.xpath('//table')
headings = list(tables[0].xpath('thead/tr/th/span/span/text()').extract())
rows = tables[0].xpath('tbody/tr[contains(#class, "ui-widget-content ui-datatable")]')
for row in rows:
result = []
tds = row.xpath('td')
for td in enumerate(tds):
if headings[td[0]].lower() == 'comp.':
content = None
elif headings[td[0]].lower() == 'view':
content = None
elif headings[td[0]].lower() == 'name':
content = td[1].xpath('span/a/text()').extract()[0]
else:
try:
content = td[1].xpath('span/text()').extract()[0]
except:
content = None
result.append(content)
results.append(result)
for result in results:
print(result)
Now I need to move on to the next page, which I can do in a browser by clicking the "right arrow" at the bottom, which I believe is the following li:
<li><a id="mainForm:j_idt369" href="#" class="ui-commandlink ui-widget fa fa-angle-right" onclick="PrimeFaces.ab({s:"mainForm:j_idt369",p:"mainForm",u:"mainForm:result_table mainForm:pageNav mainForm:eventAthleteDetailsDialog",onco:function(xhr,status,args){hideDetails('athlete-popup');showDetails('event-popup');scrollToTopOfElement('mainForm\\:result_table');;}});return false;"></a>
How can I get scrapy to follow that?
If you open the url in a browser without javascript you won't be able to move to the next page. As you can see inside the li tag, there is some javascript to be executed in order to get the next page.
Yo get around this, the first option is usually try to identify the request generated by javascript. In your case, it should be easy: just analyze the java script code and replicate it with python in your spider. If you can do that, you can send the same request from scrapy. If you can't do it, the next option is usually to use some package with javascript/browser emulation or someting like that. Something like ScrapyJS or Scrapy + Selenium.
You're going to need to perform a callback. Generate the url from the xpath from the 'next page' button. So url = response.xpath(xpath to next_page_button) and then when you're finished scraping that page you'll do yield scrapy.Request(url, callback=self.parse_next_page). Finally you create a new function called def parse_next_page(self, response):.
A final, final note is if it happens to be in Javascript (and you can't scrape it even if you're sure you're using the correct xpath) check out my repo in using splash with scrapy https://github.com/Liamhanninen/Scrape

how to display a variable that contain html template in smarty?

how to display a variable that contain html template in smarty?
$smarty = new my_smarty();
$page_content = "<p>{$my_content}</p>";
$smary->assign("my_content","whatever...");
$smarty->display($page_content); // how to render $page_content ???
From the documentation:
Smarty can render templates from a string by using the string: or eval: resource.
The string: resource behaves much the same as a template file. The template source is compiled from a string and stores the compiled template code for later reuse. [...]
The eval: resource evaluates the template source every time a page is rendered. [...]
For your case:
$smarty = new my_smarty();
$page_content = "<p>{$my_content}</p>";
$smary->assign("my_content","whatever...");
$smarty->display("string:" . $page_content);