Using scrapy 2.7 ModuleNotFoundError: No module named 'scrapy.squeue' - scrapy

I run my scrapy as a standalone script like this
if __name__ == "__main__":
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
s = get_project_settings()
process = CrawlerProcess(s)
process.crawl(MySpider)
process.start()
my scraper was consuming huge memory so i thought of using these two custom settings.
SCHEDULER_DISK_QUEUE = "scrapy.squeue.PickleFifoDiskQueue"
SCHEDULER_MEMORY_QUEUE = "scrapy.squeue.FifoMemoryQueue"
But after adding these two custom settings and when i run my standalone spider I get error saying.
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/twisted/internet/defer.py", line 1696, in _inlineCallbacks
result = context.run(gen.send, result)
File "/usr/local/lib/python3.9/dist-packages/scrapy/crawler.py", line 118, in crawl
yield self.engine.open_spider(self.spider, start_requests)
ModuleNotFoundError: No module named 'scrapy.squeue'
Any ideas what's the issue with this ?

ModuleNotFoundError: No module named 'scrapy.squeue'
You have a typo:
SCHEDULER_DISK_QUEUE = "scrapy.squeues.PickleFifoDiskQueue"
SCHEDULER_MEMORY_QUEUE = "scrapy.squeues.FifoMemoryQueue"

Related

AWS Notebook Instance is working but Lambda is not accepting the input

I developed an ANN tool by using pycharm/tensorflow on my own computer. I uploaded the h5 and json files to Amazon Sagemaker by creating a Notebook Instance. I was finally able to successfully create an endpoint and make it work. The following code in Notebook Instance -Jupyter works:
import json
import boto3
import numpy as np
import io
import sagemaker
from sagemaker.tensorflow.model import TensorFlowModel
client = boto3.client('runtime.sagemaker')
data = np.random.randn(1,6).tolist()
endpoint_name = 'sagemaker-tensorflow-**********'
response = client.invoke_endpoint(EndpointName=endpoint_name, Body=json.dumps(data))
response_body = response['Body']
print(response_body.read())
However, the problem occurs when I created a lambda function and call the endpoint from there. The input should be a row of 6 features -that is a 1-by-6 vector. I enter the following input into lambda {"data":"1,1,1,1,1,1"} and it gives me the following error:
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 20, in lambda_handler
Body=payload)
File "/var/runtime/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 635, in _make_api_call
raise error_class(parsed_response, operation_name)
I think the problem is that the input needs to be 1-by-6 instead of 6-by-1 and I don't know how to do that.
I assume the content type you specified is text/csv, so try out:
{"data": ["1,1,1,1,1,1"]}

scrapy extracting urls from a simple site

I'm trying to extract basic data from a basic site: vapedonia.com. It's a simple ecommerce site and I do it pretty easily "reinventing the wheel" (mainly working on a big html string) but when I have to work in that mold called scrapy, it just does not work.
I first analyze the html code and create my xpath expression using a plugin. In that plugin, everything goes fine but when I create my code (or even when I use the scrappy shell), it doesn't work.
here's the code:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class MySpider(BaseSpider):
name = "vapedonia"
allowed_domains = ["vapedonia.com"]
start_urls = ["https://www.vapedonia.com/23-e-liquidos"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
products = hxs.select("//div[#class='product-container clearfix']")
for products in products:
image = products.select("div[#class='center_block']/a/img/#src").extract()
name = products.select("div[#class='center_block']/a/#title").extract()
link = products.select("div[#class='right_block']/p[#class='s_title_block']/a/#href").extract()
price = products.select("div[#class='right_block']/div[#class='content_price']/span[#class='price']").extract()
print image, name, link, price
Here are the errors:
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample>scrapy crawl vapedonia
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test.py:1: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead
from scrapy.spider import BaseSpider
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test.py:6: ScrapyDeprecationWarning: craigslist_sample.spiders.test.MySpider inherits from deprecated class scrapy.spiders.BaseSpider, please inherit from scrapy.spiders.Spider. (warning only on first subclass, there may be others)
class MySpider(BaseSpider):
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test2.py:1: ScrapyDeprecationWarning: Module `scrapy.contrib.spiders` is deprecated, use `scrapy.spiders` instead
from scrapy.contrib.spiders import CrawlSpider, Rule
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test2.py:2: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors` is deprecated, use `scrapy.linkextractors` instead
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test2.py:2: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors.sgml` is deprecated, use `scrapy.linkextractors.sgml` instead
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test2.py:13: ScrapyDeprecationWarning: SgmlLinkExtractor is deprecated and will be removed in future releases. Please use scrapy.linkextractors.LinkExtractor
Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[#class="button next"]',)), callback="parse_items", follow= True),
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test4.py:15: ScrapyDeprecationWarning: SgmlLinkExtractor is deprecated and will be removed in future releases. Please use scrapy.linkextractors.LinkExtractor
Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[#class="button next"]',)), callback="parse_items", follow= True),
Traceback (most recent call last):
File "C:\Users\eric\Miniconda2\Scripts\scrapy-script.py", line 5, in <module>
sys.exit(scrapy.cmdline.execute())
File "C:\Users\eric\Miniconda2\lib\site-packages\scrapy\cmdline.py", line 148, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "C:\Users\eric\Miniconda2\lib\site-packages\scrapy\crawler.py", line 243, in __init__
super(CrawlerProcess, self).__init__(settings)
File "C:\Users\eric\Miniconda2\lib\site-packages\scrapy\crawler.py", line 134, in __init__
self.spider_loader = _get_spider_loader(settings)
File "C:\Users\eric\Miniconda2\lib\site-packages\scrapy\crawler.py", line 330, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "C:\Users\eric\Miniconda2\lib\site-packages\scrapy\spiderloader.py", line 61, in from_settings
return cls(settings)
File "C:\Users\eric\Miniconda2\lib\site-packages\scrapy\spiderloader.py", line 25, in __init__
self._load_all_spiders()
File "C:\Users\eric\Miniconda2\lib\site-packages\scrapy\spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "C:\Users\eric\Miniconda2\lib\site-packages\scrapy\utils\misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "C:\Users\eric\Miniconda2\lib\importlib\__init__.py", line 37, in import_module
__import__(name)
File "C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test5.py", line 17
link = products.select("div[#class='right_block']/p[#class='s_title_block']/a/#href").extract()
^
IndentationError: unexpected indent
C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample>
I don't know what the problem is but I have several spiders coded spiders in the spiders directory/folder. May be it's some kind of mix of codes between spiders.
Thanks.
When scrapy runs it scans for all scrapers present in the project to fine their names and run the one that you specified. So if any scraper has a syntax error then it won't work
File "C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist_sample\craigslist_sample\spiders\test5.py", line 17
link = products.select("div[#class='right_block']/p[#class='s_title_block']/a/#href").extract()
As you can see in the exception the error is there in your test5.py. Fix the indentation in this file or comment it if you don't need it. That should allow you to run the spider
Edit-1: Mixing of Tabs and Spaces
Python is dependent on indentation and a visually same looking indentation may be different in code. It may be using mix of tabs and spaces in different line. Which will cause errors. So make sure to check your editor to show tab and space characters and convert all tabs to spaces.

ImportError: No module named Image when importing ironpython dll

I have a python package called CoreCode which I have compiled using clr.CompileModules() in IronPython 2.7.5. This generated a file called CoreCode.dll. I then import this dll into my IronPython module by using clr.AddReference(). I know the dll works because I have successfully tested some of the classes as shown below. However, my problem lies with the Base_Slice_Previewer class. This class makes use of Image and ImageDraw from PIL in order to generate and save a bitmap file.
I know the problem doesn't lie with PIL because the package works perfectly well when run in Python 2.7. I'm assuming that this error is coming up because IronPython can't find PIL but I'm not sure how to work around this problem. Any help will be much appreciated.
Code to create the dll
import clr
clr.CompileModules("CoreCode.dll", "CoreCode\AdvancedFileHandlers\ScannerSliceWriter.py", "CoreCode\AdvancedFileHandlers\__init__.py", "CoreCode\MarcamFileHandlers\MTTExport.py", "CoreCode\MarcamFileHandlers\MTTImporter.py", "CoreCode\MarcamFileHandlers\__init__.py", "CoreCode\Visualizer\SlicePreviewMaker.py", "CoreCode\Visualizer\__init__.py", "CoreCode\Timer.py", "CoreCode\__init__.py")
Test for Timer.py
>>> import clr
>>> clr.AddReference('CoreCode.dll')
>>> from CoreCode.Timer import StopWatch
>>> stop_watch = StopWatch()
>>> print stop_watch.__str__()
0:00:00:00 0:00:00:00
>>>
Test for MTTExport.py
>>> from CoreCode.MarcamFileHandlers.MTTExport import MTT_Layer_Exporter
>>> mttlayer = MTT_Layer_Exporter()
>>> in_val = (2**20)+ (2**16) + 2
>>> bytes = mttlayer.write_lf_int(in_val, force_full_size=True)
>>> print "%s = %s" %(bytes, [hex(ord(x)) for x in bytes])
à ◄ ☻ = ['0xe0', '0x0', '0x0', '0x0', '0x0', '0x11', '0x0', '0x2']
>>>
Test for SlicePreviewMaker.py
>>> from CoreCode.Visualizer.SlicePreviewMaker import Base_Slice_Previewer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "CoreCode\Visualizer\SlicePreviewMaker", line 1, in <module>
ImportError: No module named Image
>>>

py2exe setup.py not working

I have an small code that uses pandas and sqlalchemy and is declared in my main.py as:
import pandas as pd
from sqlalchemy import create_engine
this is my complete setup.py:
from distutils.core import setup
import py2exe
from glob import glob
data_files = [("Microsoft.VC90.CRT", glob(r'C:\Users\Flavio\Documents\Python_dll\*.*'))]
opts = {
"py2exe": {
"packages": ["pandas", "sqlalchemy"]
}
}
setup(
data_files=data_files,
options = opts,
console=['main.py']
)
And I'm using this command in terminal:
python setup.py py2exe
But when I run main.exe it's open the terminal start to execute code and suddenly close window.
when I run over terminal it's the error:
C:\Users\Flavio\Documents\python\python\untitled\dist>main.exe
Please add a valid tradefile date as yyyymmdd: 20150914
Traceback (most recent call last):
File "main.py", line 11, in <module>
File "C:\Users\Flavio\Anaconda3\lib\site-packages\sqlalchemy\engine\__init__.p
y", line 386, in create_engine
return strategy.create(*args, **kwargs)
File "C:\Users\Flavio\Anaconda3\lib\site-packages\sqlalchemy\engine\strategies
.py", line 75, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "C:\Users\Flavio\Anaconda3\lib\site-packages\sqlalchemy\connectors\pyodbc
.py", line 51, in dbapi
return __import__('pyodbc')
ImportError: No module named 'pyodbc'
without know what your program does
I would try the following 1st
open a command window and run your .exe from there
The window will not close and any error messages (if any) will be displayed

win32com.client error

When using win32com, something puzzled my.
>>> import win32com
>>> w=win32com.client.Dispatch('Word.Application')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
w=win32com.client.Dispatch('Word.Application')
AttributeError: 'module' object has no attribute 'client'
what's wrong?
win32com.client is a module in the win32com package you need to import the actual module.
import win32com.client
w = win32com.client.Dispatch('Word.Application')