I'm using this code to scraping external html files
link = URI.parse(url)
request = Net::HTTP::Get.new(link.path)
response = Net::HTTP.start(link.host, link.port) {|http|
http.request(request)
}
Works great but with slowed web pages sometimes responds timeout, so I need set a timeout limit per connection. Any idea?
You need to set the read_timeout attribute.
link = URI.parse(url)
request = Net::HTTP::Get.new(link.path)
begin
response = Net::HTTP.start(link.host, link.port) {|http|
http.read_timeout = 100 #Default is 60 seconds
http.request(request)
}
rescue Net::ReadTimeout => e
puts e.message
end
Related
I have a flask app that is functioning to expectations, and I am now trying to add a message notification section to my page. The difficulty I am having is that the database changes I am trying to rely upon do not seem to be updating in a timely fashion.
The html code is elementary:
<ul id="out" cols="85" rows="14">
</ul><br><br>
<script type="text/javascript">
var ul = document.getElementById("out");
var eventSource = new EventSource("/stream_game_channel");
eventSource.onmessage = function(e) {
ul.innerHTML += e.data + '<br>';
}
</script>
Here is the msg write code that the second user is executing. I know the code block is run because the redis trigger is properly invoked:
msg_join = Messages(game_id=game_id[0],
type="gameStart",
msg_from=current_user.username,
msg_to="Everyone",
message=f'{current_user.username} has requested to join.')
db.session.add(msg_join)
db.session.commit()
channel = str(game_id[0]).zfill(5) + 'startGame'
session['channel'] = channel
date_time = datetime.utcnow().strftime("%Y/%m/%d %H:%M:%S")
redisChannel.set(channel, date_time)
Here is the flask stream code, which is correctly triggered by a new redis time, but when I pull the list of messages, the new message the the second user has added is not yet accessible:
#games.route('/stream_game_channel')
def stream_game_channel():
#stream_with_context
def eventStream():
channel = session.get('channel')
game_id = int(left(channel, 5))
cnt = 0
while cnt < 1000:
print(f'cnt = 0 process running from: {current_user.username}')
time.sleep(1)
ntime = redisChannel.get(channel)
if cnt == 0:
msgs = db.session.query(Messages).filter(Messages.game_id == game_id)
msg_list = [i.message for i in msgs]
cnt += 1
ltime = ntime
lmsg_list = msg_list
for i in msg_list:
yield "data: {}\n\n".format(i)
elif ntime != ltime:
print(f'cnt > 0 process running from: {current_user.username}')
time.sleep(3)
msgs = db.session.query(Messages).filter(Messages.game_id == game_id)
msg_list = [i.message for i in msgs]
new_messages = # need to write this code still
ltime = ntime
cnt += 1
yield "data: {}\n\n".format(msg_list[len(msg_list)-len(lmsg_list)])
return Response(eventStream(), mimetype="text/event-stream")
The syntactic error that I am running into is that the msg_list is exactly the same length (i.e the pushed new message does not get written when i expect it to). Strangely, the second user's session appears to be accessing this information because its stream correctly reflects the addition.
I am using an Amazon RDS MySQL database.
The solution was to utilize a db.session.commit() before my db.session.query(Messages).filter(...) even where no writes were pending. This enabled an immediate read from a different user session, and my code commenced to react to the change in message list length properly.
I am really new to the Airtable API and for some reason connecting the API this way did not work.
at = airtable.Airtable('Base_Key', 'Airtable_Key')
But I got it working this way -
get_url = ‘https://api.airtable.com/v0/BASE_ID/TABLE_NAME’
get_headers = {
‘Authorization’: ‘Bearer API_KEY’ }
Response = requests.get(get_url, headers=get_headers)
Response_Table = Response.json()
However, this fetches only the first 100 records and am reading about offset and pagination but I am unable to figure how to incorporate it into this code.
Thank you for the time!
After a lot of issues, I found this solution. Posting it for anyone else facing the same problem.
global offset
offset = '0'
result = []
while True :
url = "https://api.airtable.com/v0/BASE_ID/TABLE_NAME"
querystring = {
"view":"Published View",
"api_key":"YOUR_KEY",
"offset": offset}
try :
response= requests.get(url, params=querystring)
response_Table = response.json()
records = list(response_Table['records'])
result.append(records)
#print(records[0]['id'] , len(records))
try :
offset = response_Table['offset']
#print(offset)
except Exception as ex:
#print(ex , offset)
break
except error as e:
print(e)
I want to reverse engineering the contents generated by scrolling down in the webpage. The problem is in the url https://www.crowdfunder.com/user/following_page/80159?user_id=80159&limit=0&per_page=20&screwrand=933. screwrand doesn't seem to follow any pattern, so the reversing the urls don't work. I'm considering the automatic rendering using Splash. How to use Splash to scroll like browsers? Thanks a lot!
Here are the codes for two request:
request1 = scrapy_splash.SplashRequest(
'https://www.crowdfunder.com/user/following/{}'.format(user_id),
self.parse_follow_relationship,
args={'wait':2},
meta={'user_id':user_id, 'action':'following'},
endpoint='http://192.168.99.100:8050/render.html')
yield request1
request2 = scrapy_splash.SplashRequest(
'https://www.crowdfunder.com/user/following_user/80159?user_id=80159&limit=0&per_page=20&screwrand=76',
self.parse_tmp,
meta={'user_id':user_id, 'action':'following'},
endpoint='http://192.168.99.100:8050/render.html')
yield request2
ajax request shown in browser console
To scroll a page you can write a custom rendering script (see http://splash.readthedocs.io/en/stable/scripting-tutorial.html), something like this:
function main(splash)
local num_scrolls = 10
local scroll_delay = 1.0
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(splash.args.wait)
for _ = 1, num_scrolls do
scroll_to(0, get_body_height())
splash:wait(scroll_delay)
end
return splash:html()
end
To render this script use 'execute' endpoint instead of render.html endpoint:
script = """<Lua script> """
scrapy_splash.SplashRequest(url, self.parse,
endpoint='execute',
args={'wait':2, 'lua_source': script}, ...)
Thanks Mikhail, I tried your scroll script, and it worked, but I also notice that your script scroll too much one time, some js have no time too render and is skipped, so I do some little change as follow:
function main(splash)
local num_scrolls = 10
local scroll_delay = 1
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(splash.args.wait)
for _ = 1, num_scrolls do
local height = get_body_height()
for i = 1, 10 do
scroll_to(0, height * i/10)
splash:wait(scroll_delay/10)
end
end
return splash:html()
end
I do not think that setting the number of scrolls hard coded is a good idea for infinite scroll pages, so I modified the above-mentioned code like this:
function main(splash, args)
current_scroll = 0
scroll_to = splash:jsfunc("window.scrollTo")
get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(3)
height = get_body_height()
while current_scroll < height do
scroll_to(0, get_body_height())
splash:wait(5)
current_scroll = height
height = get_body_height()
end
splash:set_viewport_full()
return splash:html()
end
I am trying to make http post requests from within Illustrator ExtendScript (via BridgeTalk) and for the most part it is working. However, the documentation on using HttpConnection is non-existent and I am trying to figure out how to set http-headers. The HttpConnection object has both a requestheaders and responseheaders property so I suspect it is possible.
By default, the post requests are being sent with the Content-Type header "text/html", and I would like to override it so that I can use either "application/x-www-form-urlencoded" or "multipart/form-data".
Here is what I have so far:
var http = function (callback) {
var bt = new BridgeTalk();
bt.target = 'bridge' ;
var s = '';
s += "if ( !ExternalObject.webaccesslib ) {\n";
s += " ExternalObject.webaccesslib = new ExternalObject('lib:webaccesslib');\n";
s += "}\n";
s += "var html = '';\n";
s += "var http = new HttpConnection('http://requestb.in/1mo0r1z1');\n";
s += "http.method = 'POST';\n";
s += "http.requestheaders = 'Content-Type, application/x-www-form-urlencoded'\n";
s += "http.request = 'abc=123&def=456';\n";
s += "var c=0,t='';for(var i in http){t+=(i+':'+http[i]+'***');c++;}t='BEFORE('+c+'):'+t;alert(t);\n"; // Debug: to see what properties and values exist on the http object
s += "http.response = html;\n";
s += "http.execute() ;\n";
s += "http.response;\n";
s += "var t='AFTER:';for(var i in http){t+=(i+':'+http[i]+'***');}alert(t);\n"; // Debug: to see what properties and values have been set after executing
bt.body = s;
bt.onResult = function (evt) {
callback(evt);
};
bt.onError = function (evt) {
callback(evt);
};
bt.send();
};
Things to note:
If I try setting the requestheaders properties like in my code above, the request fails. If I comment it out, the request succeeds. The default value for requestheaders is undefined.
Examining the http object after a successful request, shows the reponseheaders properties to be set to: "Connection, keep-alive,Content-Length, 2,Content-Type, text/html; charset=utf-8,Date, Wed, 24 Jun 2015 09:45:40 GMT,Server, gunicorn/18.0,Sponsored-By, https://www.runscope.com,Via, 1.1 vegur". Before the request executes, the responseheaders is set to undefined.
If anyone could help me set the request headers (in particular the Content-Type header), I would be eternally grateful!
Solved it!
The key for setting the content-type header is to set the http.mime property as follows:
s += "http.mime = 'application/x-www-form-urlencoded';\n";
Also for completeness, you can add your own custom headers as follows:
s += "http.requestheaders = ['My-Sample-Header', 'some-value'];\n";
(It turns out the headers is an array which takes the format [key1, value1, key2, value2, .......])
I am having the problem to authenticate a user for google tasks.
At first it authenticates the user and do things perfect. But in the second trip it throws an error.
Signet::AuthorizationError - Authorization failed. Server message:
{
"error" : "invalid_grant"
}:
following is the code:
def api_client code=""
#client ||= (begin
client = Google::APIClient.new
client.authorization.client_id = settings.credentials["client_id"]
client.authorization.client_secret = settings.credentials["client_secret"]
client.authorization.scope = settings.credentials["scope"]
client.authorization.access_token = "" #settings.credentials["access_token"]
client.authorization.redirect_uri = to('/callbackfunction')
client.authorization.code = code
client
end)
end
get '/callbackfunction' do
code = params[:code]
c = api_client code
c.authorization.fetch_access_token!
result = c.execute("tasks.tasklists.list",{"UserId"=>"me"})
unless result.response.status == 401
p "#{JSON.parse(result.body)}"
else
redirect ("/oauth2authorize")
end
end
get '/oauth2authorize' do
redirect api_client.authorization.authorization_uri.to_s, 303
end
What is the problem in performing the second request?
UPDATE:
This is the link and parameters to user consent.
https://accounts.google.com/o/oauth2/auth?
access_type=offline&
approval_prompt=force&
client_id=somevalue&
redirect_uri=http://localhost:4567/oauth2callback&
response_type=code&
scope=https://www.googleapis.com/auth/tasks
The problem is fixed.
Solution:
In the callbackfunction the tokens which are received through the code provided by the user consent are stored in the database.
Then in other functions just retrieve those tokens from the database and use to process whatever you want against the google task API.
get '/callbackfunction' do
code = params[:code]
c = api_client code
c.authorization.fetch_access_token!
# store the tokens in the database.
end
get '/tasklists' do
# Retrieve the codes from the database and create a client
result = client.execute("tasks.tasklists.list",{"UserId"=>"me"})
unless result.response.status == 401
p "#{JSON.parse(result.body)}"
else
redirect "/oauth2authorize"
end
end
I am using rails, and i store the token only inside DB.
then using a script i am setting up new client before calling execute, following is the code.
client = Google::APIClient.new(:application_name => 'my-app', :application_version => '1.0')
client.authorization.scope = 'https://www.googleapis.com/auth/analytics.readonly'
client.authorization.client_id = Settings.ga.app_key
client.authorization.client_secret = Settings.ga.app_secret
client.authorization.access_token = auth.token
client.authorization.refresh_token = true
client.authorization.update_token!({access_token: auth.token})
client.authorization.fetch_access_token!
if client.authorization.refresh_token && client.authorization.expired?
client.authorization.fetch_access_token!
end
puts "Getting accounts list..."
result = client.execute(:api_method => analytics.management.accounts.list)
puts " ===========> #{result.inspect}"
items = JSON.parse(result.response.body)['items']
But,it gives same error you are facing,
/signet-0.4.5/lib/signet/oauth_2/client.rb:875:in `fetch_access_token': Authorization failed. Server message: (Signet::AuthorizationError)
{
"error" : "invalid_grant"
}
from /signet-0.4.5/lib/signet/oauth_2/client.rb:888:in `fetch_access_token!'
Please suggest why it is not able to use the given token? I have used oauth2, so user is already authorized. Now i want to access the api and fetch the data...
===================UPDATE ===================
Ok, two issues were there,
Permission is to be added to devise.rb,
config.omniauth :google_oauth2, Settings.ga.app_key,Settings.ga.app_secret,{
access_type: "offline",
approval_prompt: "" ,
:scope => "userinfo.email, userinfo.profile, plus.me, analytics.readonly"
}
refresh_token must be passed to the API call, otherwise its not able to authorize.
I hope this helps to somebody, facing similar issue.