Is there a way to save Scrapy response screenshot of page, I.e
scrapy shell "https://google.com"
view(response)
I know I can save the output as HTML and view it later, but is there a way to save the output as image?
I checked this Question Scrapy Splash Screenshots?, (the most relevant one) but I get
png_bytes = base64.b64decode(response.data['png'])
Traceback (most recent call last):
File "/usr/lib/python3.6/code.py", line 91, in runcode
exec(code, self.locals)
File "<console>", line 1, in <module>
AttributeError: 'HtmlResponse' object has no attribute 'data'
I assume this error is because in Question he uses Splash Request,in my case normal Request
Do you mean a screenshot of the webpage or the output in your command line?
The other question you provided seems to be talking about screenshots on the pages being scraped.
The Scrapy documentation has some info on that https://docs.scrapy.org/en/latest/topics/item-pipeline.html?highlight=screenshot#take-screenshot-of-item I'm not sure if splash is required
Splash is the most common approach for this.
See https://splash.readthedocs.io/en/stable/api.html#render-png after you’ve read a bit about Splash in general.
Related
I have no coding experience but I want to get into coding so I watched a video and I tried to follow it to make a discord bot.
This is my code, I'm trying to make it become a csv file I think (image below)
When I did the !ls command in the beginning it appeared as kassie_bot.txt.gdoc (in the video i was watching there was no .gdoc at the end) This is how it appears Image
How can I fix this so that I don't recieve an error. Sorry if its a weird question I'm just a teenager hoping for the best.
This is the video if it helps the code i'm trying to do is at (starts at timestamp) https://youtu.be/Rk8eM1p_xgM?t=621
I tried to remove the .txt from the end of the file name, I tried including the .gdoc at the end of it, I tried uppercase and lowercase, but to no avail. This is the error (image below)Error
I've encountered a weird issue with Phantomjs when converting an html file to pdf. My html, resulting pdf, and rasterize.js files are below:
http://401web.com/_pub/2TRTI8E.html
http://401web.com/_pub/2TRTI8E.pdf
http://401web.com/_pub/rasterize.js
You will notice that in the PDF file, at page 6, the content gets cut off and then on page 7, the content is repeated and is then correct all the way to the end of the document.
The html file contains a series of tags with their src attributes set as data:image/png;base64...
The application call to the phantom library is as follows:
phantomJS.Run("C:\path\to\directory\rasterize.js"),
new[] { webpath, outFilePdf, "A4", "1", "portrait"}, null, null);
Note that sometimes the rendered pdf file will exhibit the break/repeat behavior in different locations within the document eg: page 7 instead of 6) but the same issue always occurs.
Also, I am using phantomjs throughout my application (with the same rasterize.js script) with no other issues. This only happens on this export and only if there are a number of images.
My theory is that there is something going on with the image.onload event, specifically with base64 data but I have no idea how to troubleshoot this.
This is all within a .Net MVC application. I am using the PhantomJS nuget package found here: https://www.nuget.org/packages/PhantomJS/
Help is greatly appreciated.
Update: when running phantomjs locally via command line I was receiving the error below:
[CRITICAL] QNetworkReplyImpl: backend error: caching was enabled after some bytes had been written
libpng error: Read Error
I solved this (though I have no idea how/why) by replacing the cdn references in the html file to font-awesome.css and weather-icons.css files with locally hosted versions. After that, no more error and no more duplicate content.
enter image description here
I have build the demo succesfully, and runs the demo, but when I try to load .off file, it comes out nothing. In the console window, the error message " QWindowsNativeFieDialogBase::shellitem:SHCreateItemFromParsingName(file:debug)failed(no such file or direcory)"
Could somebody give me any instruction?
It seams the problem is similar to this one : [http://cgal-discuss.949826.n4.nabble.com/Problem-with-loading-off-files-in-Polyhedron-demo-td4661212.html][1]
Do you build any plugin ? You need at least off_plugin to load an off file in addition of Polyhedron_3
I have developed an app using the GEPlugin located at https://code.google.com/p/winforms-geplugin-control-library/.
I use the GEWebBrowser and the GETreeView, and both works nicely.
I only load local kml files on the controls, so the way to do this, is to copy the "KML_Samples.kml" file into the webroot directory, and call the function as follows:
GeWebBrowser.FetchKml(http://localhost:8080/KML_Samples.kml)
Each time I call this method, the event GeWebBrowser_KmlLoaded is launched in the correct way.
However, I have checked lately that this works fine for the two or three first kml files loaded. After this two or three files, when I try to load a new kml file, I can see that the the KML_Samples.kml file has been updated, but the GeWebBrowser_KmlLoaded event IS NOT LAUNCHED!
I have tried to execute the app step by step setting a breakpoint on the line
GeWebBrowser.FetchKml(http://localhost:8080/KML_Samples.kml)
and in this case, I can load the kml files!!.
I have tried to execute some code after this line in order to execute all the pending events, with the following line:
Application.DoEvents()
However, this has not the expected result, and the trouble remains: I'm only able to load the two or three first kml files.
I wonder if something is missing by my side using this control, but I have not found anything on the documentation that could help me.
If anyone could help me with this issue, I would be very thankful .
I answer my own question.
I have detected the GEControl does not works fine with the built-in server. I can load local kml files by sopying them at webroot\KML_Samples.kml, but this only works for the first two or three files to be loaded.
For the following kml files, it doesn't work. I suppose there is some mistake inside the control, so I'm going to develop all the code that load the kml file into a TreeView (I'll try to use the KmlTreeView), and load all the separated points of each kml file into the plugin.
I am trying to use pageres(https://github.com/sindresorhus/pageres) module to take screenshots of my website in different resolutions.
It works fine when I provide the URL and the size in command line but it doesn't work when I have my urls in a text file. It takes a screenshot of only the last URL in the file. I use the following command to run pageres:
pageres 640x768 < urls.txt
URLs in the text file are newline separated so they look like this:
http://www.yahoo.com
http://www.msn.com
http://www.apple.com
So it basically takes a screenshot of only apple.com and throws the error below for each of the screenshot above it.
The error I get is:
TypeError: 'undefined' is not an object <evaluating 'options.windowSize.width'>
and the file it points to is webshot.phantom.js line 13.
Am I running the command incorrectly or something? I use it the way it is mentioned on their site.
Thank you for your help.
It might have been a bug at some point, but it works fine in the latest version. Just tested.