scrapy.Request can't download the page - scrapy

Description
my query is:
Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning.
The get_url function return this request url: http://api.scraperapi.com/?api_key=apikey&url=https%3A%2F%2Fscholar.google.com%2Fscholar%3Fhl%3Den%26q%3DWider%2Band%2BDeeper%252C%2BCheaper%2Band%2BFaster%253A%2BTensorized%2BLSTMs%2Bfor%2BSequence%2BLearning.&country_code=us
But, the software doesn't execute my callback function self.parse. For other queries, it does execute the callback function.
My Code
def get_url(url):
payload = {'api_key': API_KEY, 'url': url, 'country_code': 'us'}
proxy_url = 'http://api.scraperapi.com/?' + urlencode(payload)
print("request url:", proxy_url)
return proxy_url
class ExampleSpider(scrapy.Spider):
name = 'scholar'
allowed_domains = ['api.scraperapi.com']
def start_requests(self):
queries = ["Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning.", "Decoding with Value Networks for Neural Machine Translation."]
for query in queries:
print("current query is:")
print(query)
self.query = query
url = 'https://scholar.google.com/scholar?' \
+ urlencode({'hl': 'en', 'q': self.query})
yield scrapy.Request(get_url(url), callback=self.parse,
meta={'position': 0})
Versions
Scrapy : 2.4.1
lxml : 4.6.1.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 20.3.0
Python : 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) - [GCC 7.3.0]
pyOpenSSL : 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020)
cryptography : 2.9.2
Platform : Linux-5.4.0-53-generic-x86_64-with-debian-bullseye-sid

Related

During use gluoncv and mxnet, gpu is not working

Version I used:
python 3.6.5
mxnet 1.5.0
cuda 9.2 (I also installed cuda 11.4 and cudnn 8.2.4 because I checked cmd and my NVIDIA used it)
cudnn 7.6.5
window10 64bit
Question:
I used mxnet and gluoncv for image segmentation and gpu problem occured consistently.
I install and uninstall almost every cuda versions(and cudnns) but it didn't help.
plus, I'm little confused that should I use mxnet-cu92 or something else?
when I first installed cuda 11.4, I installed mxnet-cu101(mxnet-cu112 didn't work for me)
but I found cu92 is for using gpu so I installed it again with cuda9.2.
and still not working
here is my code
ctx = mx.gpu(0)
model = gluoncv.model_zoo.get_model('fcn_resnet50_ade', pretrained=True, ctx=ctx) #deeplab_resnet101_ade #fcn_resnet50_ade
total_df = pd.DataFrame(columns=ADE20KSegmentation.CLASSES)
start = time.time()
Moly = []
Fences = {}
for i in range(len(image_file)):
if i%100==0:
print(i)
print(time.time()-start)
start = time.time()
img = mx.image.imread(image_file[i])
image = test_transform(mx.img.imresize(img, 1200, 1200), ctx)
output_array = model.predict(image)
predict_index = mx.nd.argmax(output_array,1).asnumpy()
holy = find_fence(predict_index)
Moly.append(holy)
flat = predict_index.flatten()
output_dict = {}
for index, cls in enumerate(ADE20KSegmentation.CLASSES):
num_pixel = len(np.where(flat==index)[0])
output_dict[cls] = round(num_pixel/1440000, 4)
total_df = total_df.append(output_dict, ignore_index=True)
for names, holy in zip(image_names, Moly):
Fences[names] = holy
and I got "MXNetError: C:\Jenkins\workspace\mxnet-tag\mxnet\src\ndarray\ndarray.cc:1285: GPU is not enabled" this error on
model = gluoncv.model_zoo.get_model('fcn_resnet50_ade', pretrained=True, ctx=ctx)
this code.
what should I do now...?

How import cplex in google colab?

!apt install cplex-utils
!pip install cplex
solver = SolverFactory('cplex')
res_NLP= solver.solve(HN_model)
The error is:
WARNING: Could not locate the 'cplex' executable, which is required
for solver
cplex
--------------------------------------------------------------------------- ApplicationError Traceback (most recent call
last) in ()
1 solver = SolverFactory('cplex')
----> 2 res_NLP= solver.solve(HN_model)
2 frames
/usr/local/lib/python3.7/dist-packages/pyomo/opt/solver/shellcmd.py in
available(self, exception_flag)
123 if exception_flag:
124 msg = "No executable found for solver '%s'"
--> 125 raise ApplicationError(msg % self.name)
126 return False
127 return True
ApplicationError: No executable found for solver 'cplex'
Within IBM Watson Studio, CPLEX comes pre-installed in the Notebooks. But with other Notebook cloud providers, you need to find a way to install it, or else call CPLEX as a service in the IBM Cloud.
You could try to use dowml : https://xavier-nodet.medium.com/submit-decision-optimization-jobs-to-wml-using-dowml-be26e0de6b7f
Or directly wml : https://pypi.org/project/ibm-watson-machine-learning/
With google colab
!pip install cplex
!pip install docplex
from docplex.mp.model import Model
mdl = Model(name='buses')
nbbus40 = mdl.integer_var(name='nbBus40')
nbbus30 = mdl.integer_var(name='nbBus30')
mdl.add_constraint(nbbus40*40 + nbbus30*30 >= 300, 'kids')
mdl.minimize(nbbus40*500 + nbbus30*400)
mdl.export("buses.lp")
!cat buses.lp
works fine and gives
Requirement already satisfied: cplex in /usr/local/lib/python3.7/dist-packages (20.1.0.1)
Requirement already satisfied: docplex in /usr/local/lib/python3.7/dist-packages (2.22.213)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from docplex) (1.15.0)
\ This file has been generated by DOcplex
\ ENCODING=ISO-8859-1
\Problem name: buses
Minimize
obj: 500 nbBus40 + 400 nbBus30
Subject To
kids: 40 nbBus40 + 30 nbBus30 >= 300
Bounds
Generals
nbBus40 nbBus30
End
From the error message, SolverFactory seems to be a Pyomo class, and require the CPLEX Interactive executable program to be available locally on the machine where the Pyomo code is executed.
Unless you have a way to install arbitrary executable files on the platform you use, which I very highly doubt if you're not using your own computer, you will have to find another way. Alex's answer proposes two...
When you're working in the colab you need to install cplex from pip. When you install cplex from pip, you need to use the cplex_direct interface in pyomo in order to avoid such errors, since the cplex interface will use the shell approach to solve the problem.
Using Google Colab, this should work
!pip install pyomo -q
!pip install cplex -q
import pyomo.environ as pyo
model = pyo.ConcreteModel()
model.s = pyo.Set(initialize=[1,2,3,4,5])
model.x = pyo.Var(model.s, domain=pyo.NonNegativeReals)
model.c = pyo.Constraint(expr=model.x[model.s.last()]>=5)
model.obj = pyo.Objective(expr=sum(model.x[s] for s in model.s), sense=pyo.minimize)
solver = pyo.SolverFactory('cplex_direct')
solver.solve(model)
model.x.display()
x : Size=5, Index=s
Key : Lower : Value : Upper : Fixed : Stale : Domain
1 : 0 : 0.0 : None : False : False : NonNegativeReals
2 : 0 : 0.0 : None : False : False : NonNegativeReals
3 : 0 : 0.0 : None : False : False : NonNegativeReals
4 : 0 : 0.0 : None : False : False : NonNegativeReals
5 : 0 : 5.0 : None : False : False : NonNegativeReals
I don't use CPLEX a lot, therefore, I'm not fully sure, but I think this free approach should have a limit in the number of var, constraints or others

TensorFlow Serving export signature without arguments

I would like to add extra signature to SavadModel, which will return business description and serve it with TensorFlow Serving.
#tf.function
def info():
return json.dumps({
'name': 'My model',
'description': 'This is model description.',
'project': 'Product ABCD',
'type': 'some_type',
...
})
As is written in TensorFlow Core manual https://www.tensorflow.org/guide/saved_model#identifying_a_signature_to_export, I can easily export signature which accepts arguments providing tf.TensorSpec.
Is it possible to export signature without arguments and call it on server?
Added after #EricMcLachlan comments:
When I try to call a function without defined signature (input_signature=[]) with a code like this:
data = json.dumps({"signature_name": "info", "inputs": None})
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/my_model:predict', data=data, headers=headers)
I get next error in the response:
'_content': b'{ "error": "Failed to get input map for signature: info" }'
Defining the Signature:
I was going to write my own example, but here's such a great example provided by #AntPhitlok in another StackOverflow post:
class MyModule(tf.Module):
def __init__(self, model, other_variable):
self.model = model
self._other_variable = other_variable
#tf.function(input_signature=[tf.TensorSpec(shape=(None, None, 1), dtype=tf.float32)])
def score(self, waveform):
result = self.model(waveform)
return { "scores": results }
#tf.function(input_signature=[])
def metadata(self):
return { "other_variable": self._other_variable }
In this case, they're serving is a Module, but it could have been a Keras model as well.
Using the Serving:
I am not 100% sure how to access the serving (I haven't done it myself yet) but I think you'll be able to access the serving similarly to this:
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = 'serving_default'
request.model_spec.version_label = self.version
tensor_proto = tf.make_tensor_proto(my_input_data, dtype=tf.float32)
request.inputs['my_signatures_input'].CopyFrom(tensor_proto)
try:
response = self.stub.Predict(request, MAX_TIMEOUT)
except Exception as ex:
logging.error(str(ex))
return [None] * len(batch_of_texts)
Here I'm using gRPC to access the TensorFlow Server.
You'd probably need to substitute 'serving_default' with your serving name. Similarly, 'my_signature_input' should match the input to your tf.function (in your case, I think it's empty).
This is a normal standard Keras type prediction and is piggybacking of predict_pb2.PredictRequest. It might be necessary to create a custom Protobuf but that's a bit beyond my abilities at this point.
I hope it's enough to get you going.

Is ES native SQL parsed as ES DSL (json query)? How to get the parsing time?

I want to know whether Elasticsearch native SQL will be parsed as Elasticsearch DSL (json query)? And how to get the parsing time??
I found a link about that.
An Introduction to Elasticsearch SQL with Practical Examples - Part 1
You can find a picture about Elasticsearch SQL implementation consists of 4 execution phases at Implementation Internals.
I suppose the answer is "YES". But I can not find a way to get the "parsed time" which ES native SQL will be parsed as ES DSL (json query). I think "the way" for a long time, then I write following codes.
My local environment:
- macOS Mojave Version 10.14.2
- MacBook Pro(Retina, 13-inch, Early 2015)
- Processor 2.7 GHz Intel Core i5
- Memory 8 GB 1867 MHz DDR3
- Elasticsearch 6.5.4
- python 2
Requirements:
- pip install elasticsearch
- pip install requests
"""
SQL Access (X-Pack)
https://www.elastic.co/guide/en/elasticsearch/reference/6.5/xpack-sql.html
"""
import random
import requests
import datetime
import elasticsearch
URL_PREFIX = 'http://localhost:9200'
def get_url(url):
return URL_PREFIX + url
def translate_sql(sql):
return requests.post(url=get_url('/_xpack/sql/translate/?pretty'),
json={'query': sql}).content
def search_with_query(query):
headers = {'Content-type': 'application/json'}
start_time = datetime.datetime.now()
result = requests.post(url=get_url('/_search?pretty'),
headers=headers,
data=query).content
return datetime.datetime.now() - start_time
def search_with_sql(sql):
# https://www.elastic.co/guide/en/elasticsearch/reference/6.6/sql-rest.html
query = '{"query":"' + sql.replace('\n', '') + '"}'
headers = {'Content-type': 'application/json'}
start_time = datetime.datetime.now()
result = requests.post(url=get_url('/_xpack/sql?format=txt&pretty'),
headers=headers,
data=query).content
return datetime.datetime.now() - start_time
def gen_data(number):
es = elasticsearch.Elasticsearch()
for i in range(number):
es.index(index='question_index', doc_type='question_type', body={
'a': random.randint(0, 10),
'b': random.randint(0, 10),
'c': random.randint(0, 10),
'd': random.randint(0, 10),
'e': random.randint(0, 10),
})
if i % 10000 == 0:
print i
if __name__ == '__main__':
# gen_data(100000)
sql = '''
select max(a) as max_a
from question_index
group by a
having max_a > 1
order by a limit 1
'''
json_query = translate_sql(sql)
sql_cost_time = search_with_sql(sql)
query_cost_time = search_with_query(json_query)
print 'sql :', sql_cost_time
print 'query:', query_cost_time
if query_cost_time < sql_cost_time:
print 'parsing time is:', sql_cost_time - query_cost_time, ' ???'
else:
print 'parsing time is :', query_cost_time - sql_cost_time, ' ???'
Actual results:
I don't know whether the code is right?
Expected:
I expect the code is right.
Your answer may save my hair!
Thanks

Is it possible to have SCIP and and python-zibopt work under windows?

Recently I want to try some open source solvers instead of CPLEX. I found that PICOS + zibopt may be a good choice. However, I can merely find instruction on how to make zibopt work with python under windows properly. I downloaded the windows libraries (.dll file) of scip, and I try to install python-zibopt according to the command "python setup.py install". The error " blockmemshell/memory.h no such file" always popped out. I felt that it is because my compiler, which is VS120COMNTOOL, doecn't find the scip solver. Is there any chance that I can make scip work under windows now?
Did you have a look at the current python interface of SCIP 3.1.0? It uses the library from the SCIP Optimization Suite so you don't have to link another LP solver to SCIP.
On Windows, please try this modified setup.py file:
import sys, os, readline, glob, platform
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
from Cython.Build import cythonize
BASEDIR = os.path.dirname(os.path.abspath(__file__))
BASEDIR = os.path.dirname(BASEDIR)
BASEDIR = os.path.dirname(BASEDIR)
INCLUDEDIR = os.path.join(BASEDIR,'src')
BASEDIR = os.path.dirname(BASEDIR)
#identify compiler version
prefix = "MSC v."
i = sys.version.find(prefix)
if i == -1:
raise Exception('cannot determine compiler version')
i = i + len(prefix)
s, rest = sys.version[i:].split(" ", 1)
majorVersion = int(s[:-2]) - 6
minorVersion = int(s[2:3]) / 10.0
if platform.architecture()[0].find('64')>=0:
LIBDIR = os.path.join(BASEDIR,'vc'+str(majorVersion),'scip_spx','x64','Release')
else:
LIBDIR = os.path.join(BASEDIR,'vc'+str(majorVersion),'scip_spx','Release')
print('BASEDIR='+ BASEDIR)
print('INCLUDEDIR='+ INCLUDEDIR)
print('LIBDIR='+ LIBDIR)
def complete(text, state):
return (glob.glob(text+'*')+[None])[state]
readline.set_completer_delims(' \t\n;')
readline.parse_and_bind("tab: complete")
readline.set_completer(complete)
libscipopt = 'lib/libscipopt.so'
includescip = 'include/scip'
ext_modules = []
ext_modules += [Extension('pyscipopt.scip', [os.path.join('pyscipopt', 'scip.pyx')],
#extra_compile_args=['-g', '-O0', '-UNDEBUG'],
include_dirs=[INCLUDEDIR],
library_dirs=[LIBDIR],
#runtime_library_dirs=[os.path.abspath('lib')],
libraries=['spx', 'scip_spx'])]
#libraries=['scipopt', 'readline', 'z', 'gmp', 'ncurses', 'm'])]
setup(
name = 'pyscipopt',
version = '0.1',
description = 'wrapper for SCIP in Python',
author = 'Zuse Institute Berlin',
author_email = 'scip#zib.de',
license = 'MIT',
cmdclass = {'build_ext' : build_ext},
ext_modules = ext_modules,
packages=['pyscipopt']
)