I'm using scrapy to do a project.
I got the image name and image url in html, how can I name this image with that name instead of the hash name?
I got this url: http://a3.mzstatic.com/us/r1000/104/Purple/v4/55/35/20/55352022-0aba-260b-76ed-314eacd8c1fc/mzm.zqqzrdix.175x175-75.jpg
and it's name: iBook
I want my scrapy to download this picture and rename it with iBook.
You have to use the Images pipeline and something like this
class MyImagesPipeline(ImagesPipeline):
def image_key(self, url):
image_guid = url.split('/')[-1]
return 'full/%s' % (image_guid)
It will use the original image name. For custom filename use something like
return 'ibook.jpg' But it will overwrite all the images with same file. be careful
get some more ideas from this Scrapy image download how to use custom filename
Related
I want to create a dynamic url using karate framework. lets assume URL I want to create is :
https://www.mars.com/mars/profile/{profileID}/line
In above URL {profileID} is path.
Currently I have written below feature file which is able to create the url however due using path keyword it encodes the url and add %0A after profile id.
https://www.mars.com/mars/profile/264%0A/line
Feature File:
#smoke
Scenario: Create a line score in existing profile
And def urlname = marsuri+ '/mars/profile/'
Given url urlname
Given path id + '/line'
Please let me know how can I create a URL with path in between URL without encoding it.
You are not using the path syntax correctly. Please read the documentation: https://github.com/intuit/karate#path
Make this change:
Given path id, 'line'
EDIT: please also see this answer: https://stackoverflow.com/a/54477346/143475
Actually that id variable wherever you are getting it from is having a new line at the end of the string, something like this "264\n" that is why it is getting encoded to 264%0A
If all you wanted to pass is "264" you have to remove the unwanted values before adding it to the path
Background:
* def removeNewLine = function(x){return x.replace("\n","")}
Scenario: Create a line score in existing profile
And def urlname = marsuri+ '/mars/profile/'
Given url urlname
* def id = removeNewLine(id)
Given path id + '/line'
If you can modify the data directly from the source where you are getting the id that would be great.
I am using a custom field to upload image on s3 using django 1.9. I want to delete image from s3 whenever model instance is deleted. I have tried post_delete signal with ImageField's delete() method but since I'm using custom field cannot achieve the result. Any suggestion on how to achieve this?
from django.db.models.signals import pre_delete
.....
pre_delete.connect(delete_image, dispatch_uid="delete_image")
.....
def delete_image(sender, instance, **kwargs):
for field_name in instance._meta.get_fields():
try:
field = getattr(instance, field_name)
except:
field = None
if isinstance(field, your_custom_field):
your_app_utils.clean_images(field.get_images())
I am trying to get the image URL inside Edit Theme Files -> product.html
I already have some values like:
{{product.title}}, that gives me "Product name"
{{product.url}}, that gives me the URL
etc...
but no matter what i try, i can't get the URL.
The closest I got was using this: {{product.main_image.data}}
which gives me something like this:
https://cdn3.bigcommerce.com/s-8lxh1/images/stencil/%7B:size%7D/products/882/1700/RAM_VB_193_SW1__09586.1463668242.jpg?c=2
But that does not produce a image link.
Would appreciate any help or insights.
Thanks in advance!
The string that you're returning doesn't work because there's a "size" placeholder embedded in the URL. You can use Stencil's getImage Handlebars helper to specify the image dimensions and return a working URL:
{{getImage product.main_image.data "thumbnail"}}
The "thumbnail" string specifies an image size that exists in your theme settings. You can also specify a numeric size, like "200x200". Check out the docs here
Apologies if this is a scrapy noob question but I have spent ages looking for the answer to this:
I want to store the raw data from each & every URL I crawl in my local filesystem as a separate file (ie response.body -> /files/page123.html) - ideally with the filename being a hash of the URL. This is so I can do further processing of the HTML (ie further parsing, indexing in Solr/ElasticSearch etc).
I've read the docs and not sure if there's a built-in way of doing this? Since the pages are by default being downloaded by the system it doesn't seem to make sense to be writing custom pipelines etc
As paul t said HttpCache Middleware might work for you but I'd advise writing you're own custom pipeline.
Scrapy has built-in ways of exporting data to files but they're for json, xml and csv not raw html. Don't worry though it's not too hard!
provided your items.py looks somthing like:
from scrapy.item import Item, Field
class Listing(Item):
url = Field()
html = Field()
and you've been saving your scraped data to those items in your spider like so:
item['url'] = response.url
item['html'] = response.body
your pipelines.py would just be:
import hashlib
class HtmlFilePipeline(object):
def process_item(self, item, spider):
file_name = hashlib.sha224(item['url']).hexdigest() #chose whatever hashing func works for you
with open('files/%s.html' % file_name, 'w+b') as f:
f.write(item['html'])
Hope that helps. Oh and dont forget to and to put a files/ directory in your project root and add to your settings.py :
ITEM_PIPELINES = {
'myproject.pipeline.HtmlFilePipeline': 300,
}
source: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
We can get some information by json.
But how can i get images by the information in json.
e.g. "thumbnail": "medivh/1/1-avatar.jpg"
It does not work when i concat the url behind the host + request.
So, is there some other way to get images?
The url for the static images has the following format:
http:// REGION . battle.net/static-render/ REGION / THUMBNAIL
Example:
For the avatar image
http://eu.battle.net/static-render/eu/alexstrasza/57/51685945-avatar.jpg
The picture you see when you visit your armory profile
http://eu.battle.net/static-render/eu/alexstrasza/57/51685945-profilemain.jpg
Another angle, I don't know where this is used
http://eu.battle.net/static-render/eu/alexstrasza/57/51685945-inset.jpg
As of the latest changes to the static renderer, the following is correct:
http://render-<Region>.worldofwarcraft.com/character/<Thumbnail>
For example:
http://render-us.worldofwarcraft.com/character/kul-tiras/148/130814612-profilemain.jpg