Crawlera middleware order to enable httpcache

Crawlera middleware order to enable httpcache - scrapy

I want to not use the crawlera proxy service for pages that are already cached using the httpcache middleware (since I have a limit for the number of calls per month).
I'm using the crawlera middleware, and enabling it using:
DOWNLOADER_MIDDLEWARES = {
'scrapy_crawlera.CrawleraMiddleware': 610}
as recommended in the documentation (https://scrapy-crawlera.readthedocs.io/en/latest/).
Though, after a crawl ends, I get:
2017-04-23 00:14:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'crawlera/request': 11,
'crawlera/request/method/GET': 11,
'crawlera/response': 11,
'crawlera/response/status/200': 10,
'crawlera/response/status/301': 1,
'downloader/request_bytes': 3324,
'downloader/request_count': 11,
'downloader/request_method_count/GET': 11,
'downloader/response_bytes': 1352925,
'downloader/response_count': 11,
'downloader/response_status_count/200': 10,
'downloader/response_status_count/301': 1,
'dupefilter/filtered': 6,
'finish_reason': 'closespider_pagecount',
'finish_time': datetime.datetime(2017, 4, 22, 22, 14, 24, 839013),
'httpcache/hit': 11,
'log_count/DEBUG': 12,
'log_count/INFO': 9,
'request_depth_max': 1,
'response_received_count': 10,
'scheduler/dequeued': 10,
'scheduler/dequeued/memory': 10,
'scheduler/enqueued': 23,
'scheduler/enqueued/memory': 23,
'start_time': datetime.datetime(2017, 4, 22, 22, 14, 24, 317893)}
2017-04-23 00:14:24 [scrapy.core.engine] INFO: Spider closed (closespider_pagecount)
with
downloader/request_count': 11
crawlera/request/method/GET': 11
httpcache/hit': 11
So I'm not sure whether this calls went through the crawlera proxy service or not. I get the same results when I change the crawlera middleware order to be 901, 749, 751.
Does anyone know what's going under the hood? Are the pages returned directly from the http cache without calling the crawlera server or not?
Thanks!

Consider the number as just a reference to other middlewares.
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 600,
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620
Just make sure httpcache.HttpCacheMiddleware has lower number than the proxy middlewares.
This works fine for me.

Related

Azure Queue Send Message Method Expiry

New to Azure and testing Azure Queues . I attempted sending a message to the Queue with the Python SDK . Here is the method that I'm calling
from azure.storage.queue import QueueServiceClient, QueueClient, QueueMessage
connectionstring=os.environ.get("connection_string")
queue_client = QueueClient.from_connection_string(connectionstring,queue_name)
msg_content={"MessageID":"AQ2","MessageContext":"This is a test Message"}
#set the visibility timeout to 10 seconds and time-to-live to 1 day (3600 minutes)
#The documentation seems to say its an integer . Is it day , minutes ,hours ,seconds ?
queue_client.send_message(msg_content,visibility_timeout=10,time_to_live=3600)
and the output I get as a response from the queue is
{'id': '90208a43-15d9-461e-a0ba-b12e02624d34',
'inserted_on': datetime.datetime(2020, 6, 9, 12, 17, 57, tzinfo=<FixedOffset 0.0>),
'expires_on': datetime.datetime(2020, 6, 9, 13, 17, 57, tzinfo=<FixedOffset 0.0>),
'dequeue_count': None,
'content': {'MessageID': 'AQ2',
'MessageContext': 'This is a test Message'},
'pop_receipt': '<hidingthistoavoidanydisclosures>',
'next_visible_on': datetime.datetime(2020, 6, 9, 12, 18, 7, tzinfo=<FixedOffset 0.0>)}
Now if you observe the expires_on its clearly an hour from the insert date which is fine . But for some reason the message instantly moved to the poison queue ( which should normally happen after an hour if the message is untouched . I don't get where I'm going wrong . Request help on how to set the expiry time right and why its instantly moving the message to poison queue

The time to live is in seconds.
Here's the doc for queue send message

pysnmp - how to use compiled mibs during agent implementation

the snmp agent implementation examples provided in pysnmp don't really leverage the mib.py file generated by compiling a mib. Is it possible to use this file to simplify agent implementation? Is such an example available, for a table. thanks!

You are right, existing mibdump.py tool is primarily designed for manager-side MIB compilation. However compiled MIB is still useful or even sometimes crucial for agent implementation.
For simple scalars you can mass-replace MibScalar classes with MibScalarInstance ones. And add an extra trailing .0 to their OID. For example this line:
sysDescr = MibScalar((1, 3, 6, 1, 2, 1, 1, 1), DisplayString().subtype(subtypeSpec=ValueSizeConstraint(0, 255))).setMaxAccess("readonly")
would change like this:
sysDescr = MibScalarInstance((1, 3, 6, 1, 2, 1, 1, 1, 0), DisplayString().subtype(subtypeSpec=ValueSizeConstraint(0, 255))).setMaxAccess("readonly")
For SNMP tables it's way more tricker because there can be several cases. If it's a static table which never changes its size, then you can basically replace MibTableColumn with MibScalarInstance plus append the index part of the OID. For example this line:
sysORID = MibTableColumn((1, 3, 6, 1, 2, 1, 1, 9, 1, 2), ObjectIdentifier()).setMaxAccess("readonly")
would look like this (note index 12345):
sysORID = MibScalarInstance((1, 3, 6, 1, 2, 1, 1, 9, 1, 2, 12345), ObjectIdentifier()).setMaxAccess("readonly")
The rest of MibTable* classes can be removed from the MIB.py.
For dynamic tables that change their shape either because SNMP agent or SNMP manager modify them, you might need to preserve all the MibTable* classes and extend/customize the MibTableColumn class to make it actually managing your backend resources in response to SNMP calls.
A hopefully relevant example.

how to find telegram channel messages count using telegram API

I want to get a message's views but I don't know which method should I use.
Here is the telegram API. I have the channel ID and the message_id,(I got them from my telegram bot). I know that telegram bot API doesn't have access to views so I want to use the main telegram API but I don't know which method should I use.

you can follow this steps:
create a post link (https://t.me/channel_username/post_id)
example: https://t.me/tehrandb/93
open the link with PHP or Python or other languages
Extract the field value with the ‍‍tgme_widget_message_views class

by python and telethon you can access to a certain message
and that message object have an attribute 'view':
m = Message(id=4864, to_id=PeerUser(user_id=818906659), date=datetime.datetime(2019, 6, 25, 4, 47, 57, tzinfo=datetime.timezone.utc), message=':reminder_ribbon:تک فیلم آموزش پروژه محور \n:man:\u200d:computer:پیاده سازی Responsive Menu \n:point_left: 0 تا 100\n:round_pushpin:با css & html\n#web\n#stepbysteplearn', out=False, mentioned=False, media_unread=False, silent=False, post=False, from_scheduled=False, legacy=False, from_id=818906659, fwd_from=MessageFwdHeader(date=datetime.datetime(2019, 6, 24, 20, 29, 53, tzinfo=datetime.timezone.utc), from_id=None, from_name=None, channel_id=1023032463, channel_post=11711, post_author=None, saved_from_peer=PeerChannel(channel_id=1023032463), saved_from_msg_id=11711), via_bot_id=None, reply_to_msg_id=None, media=MessageMediaDocument(document=Document(id=5803386688260540004, access_hash=5193338638774407914, file_reference=b'\x01\x00\x00\x13\x00]\x18l:\xb7\xd5\r&\xe8\xb5j\xa65*\xea\x01\xdc\xe2Py', date=datetime.datetime(2019, 6, 24, 20, 29, 52, tzinfo=datetime.timezone.utc), mime_type='video/mp4', size=16955767, dc_id=4, attributes=[DocumentAttributeVideo(duration=668, w=1280, h=720, round_message=False, supports_streaming=True), DocumentAttributeFilename(file_name='Responsive_Menu_With_Media_Queries.mp4')], thumbs=[PhotoStrippedSize(type='i', bytes=b'\x01\x16(\xc5\xa2\x8a(\x00\xa2\x8a(\x00\xa2\x8a(\x00\xa2\x8a(\x00\xa2\x8a(\x00\xa2\x8a(\x03'), PhotoSize(type='m', location=FileLocationToBeDeprecated(volume_id=455132553, local_id=24511), w=320, h=180, size=644)]), ttl_seconds=None), reply_markup=None, entities=[MessageEntityHashtag(offset=90, length=4), MessageEntityMention(offset=95, length=16)], views=6276, edit_date=None, post_author=None, grouped_id=None)
m.views will return a specific message views
full information of a message object in telegram.

SoftLayer API: Missing active preset values for package 200 hourly bare metal servers

We noticed we got the error from SoftLayer API when trying to get categories from
product package 200 ( hourly bare metal server) preset Id=64 starting 10/18.
The following API query
https://<apiuser>:<apikey>#api.softlayer.com/rest/v3/SoftLayer_Product_Package/200/getActivePresets.json?objectMask=mask
[id,packageId,description,name,keyName,isActive,categories.id,categories.name, categories.categoryCode]
now returns presetId as 103, 97, 93,95,99,101,105,151,147,149, 143, 157
It used to return the following additional active presetIds:
64,66,68,70,74,76 , 78 before 10/17/2016.
I don't find these changes from SoftLayer release note
https://softlayer.github.io/release_notes/
Why are the previous active preset Ids 64,66,68,70,74,76 , 78 no longer available? Will they be added back ?
Thanks.

You are right, these presets are not longer available since 10/17/2016, because the DCs are no longer building the configurations and have moved to the Haswell and Broadwell configurations.
For Haswell:
Presets: 93, 95, 97, 99, 101, 103, 105
For Broadwell:
Presets: 147, 149, 151, 153, 157.

Gmail API generating time stamp report

I am curious if I could get a report of messages sent and received that includes time stamps and email addresses.
I looked at the Gmail API documentation and I did not see anything that directly mentioned anything like that.
Thank you.

Here's the relevant function maybe u can see it http://imapclient.readthedocs.org/en/latest/index.html#imapclient.IMAPClient.fetch
>> c.fetch([3293, 3230], ['INTERNALDATE', 'FLAGS'])
{3230: {b'FLAGS': (b'\Seen',),
b'INTERNALDATE': datetime.datetime(2011, 1, 30, 13, 32, 9),
b'SEQ': 84},
3293: {b'FLAGS': (),
b'INTERNALDATE': datetime.datetime(2011, 2, 24, 19, 30, 36),
b'SEQ': 110}}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Crawlera middleware order to enable httpcache - scrapy

Related

Azure Queue Send Message Method Expiry

pysnmp - how to use compiled mibs during agent implementation

how to find telegram channel messages count using telegram API

SoftLayer API: Missing active preset values for package 200 hourly bare metal servers

Gmail API generating time stamp report

Categories

Resources