Where is a TelnetConsole object constructed in Scrapy? - scrapy

I'm running Scrapy from scripts, and found that logging doesn't work as expected from the point it constructs a scrapy.extensions.telnet.TelnetConsole object. Thus, I tried to find where the object is constructed from the source files, but I couldn't.
Where does Scrapy construct a scrapy.extensions.telnet.TelnetConsole object when it is run from scripts?

TelnetConsole is a scrapy extension that allows to connect to scrapy processes via telnet:
Telnet is an application protocol used on the Internet or local area network to provide a bidirectional interactive text-oriented communication facility using a virtual terminal connection. User data is interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over the Transmission Control Protocol (TCP).
It allows you to do many things like inspect python objects and even pause/resume crawling.
see more at the extensive official docs for TelnetConsole extension
It is constructed in the extensions initiation step.
To disable it you can simply set TELNETCONSOLE_ENABLED settings to False in your settings.py or when running your crawler:
scrapy crawl myspider -s TELNETCONSOLE_ENABLED=False

Related

What is the best way to know which protocols are supported by curl in python?

I am building a python application that is used to download remote files. In most cases, we use pycurl to do the actual download but we need to define a class that wraps the pycurl object. The class can handle several protocols (HTTP(S), FTP(S) and SFTP).
We have noticed that on some distributions (for example Ubuntu 18.04), cURL doesn't support the SFTP protocol. So using some SFTP-related options (SSH_KNOWNHOSTS for instance) leads to crashes (the crash occurs when setting the option before the download even if the URL uses another protocol). Therefore we need to know which protocols are available when the class is defined (i.e. when importing the module).
What is the best way to know, in python, which protocols are supported by cURL ? I know that the output of pycurl.version_info() contains the supported protocols (item 8) but is there a better way ?
pycurl does not track or check which protocols are supported by libcurl.

How to figure out port information in mininet

I use python to create a custom mininet topology. To know the topology in detail is not important for the question.
I use ryu as controller. Especially I use the app "ofctl_rest.py". This controller does not install rules in the switch on its own. You have to issue rest - commands to establish rules. In every rest request (rule) you have to specify an outgoing port. To specify this port I need information about the topology of the network.
I need to know which link is connected to a port. I need to know which interface the port runs on. Also helpful would be to know the foreign interface, foreign switch/host, and foraign port of the actual port. How can I retrieve this information???
Please help me. I am really frustrated right now, because I do not know how to figure it out.
Inside the mininet CLI you can use the net command to find out about the topology. The nodes command will show you a list of nodes.
You can also use the dump command to display the interface details.
For information on the 'hosts', such as they are, you can run normal linux commands on each host, e.g.
mn> h1 ifconfig
will run ifconfig on host h1, showing you some of the network configuration for that host.
Given that you seem to be running mininet from a custom script, you could start the CLI at the end of your script (if that's possible) e.g.
net = Mininet(your_topo)
net.start()
CLI(net)
net.stop()
Otherwise, you can use the mininet python APIs to find much of the information.
the dump* functions in mininet.util will print out lots of information.
topo.links() will give you a list of the links in the topology.
topo.linkinfo() might give you some extra info.
For flow information you can either run ovs-dpctl, ovs-ofctl etc. outside of mininet (in a normal shell), or run the equivalents without the ovs- prefix inside the mininet CLI.

Launch webserver with no configuration file

I really like the concept of firing up an HTTP daemon to serve something like the current working directory in the file system without the distraction of configuring modules, directories, permissions etc. (instant gratification for programmers). My example use-cases are:
I may be trying to prototype some RESTful web services with a new UI framework, or
provide a tutorial for users to use some UI framework with a realistic but minimal end-to-end sample code.
experimenting with making an SVN or Git repository available over HTTP (no lectures about security or alternative protocols please)
making my personal files (photos, documents,...) available temporarily over HTTP while I am out of town (particularly abroad where all I would have is a plugin-less browser at an internet cafe)
Here's one I found from another thread:
python -m SimpleHTTPServer 8000
Is there an equivalent, ideally, with Apache httpd? Lighttpd is a candidate too but once you create prerequisites you lose adopters of the technology you are trying to teach (or learn yourself). The best tutorials are one liners you can copy and paste to execute, then figure out how it works after seeing it in action.
I'm guessing the answer is no, not directly BUT you can use a heredoc in place of your httpd.conf file? It would be nicer if the popular binaries had direct command line arguments.
This runs lighttpd in the foreground on port 8080, serving files from /www. Ctrl-C to cause lighttpd to exit:
printf 'server.document-root="/www" \n server.port=8080' | lighttpd -D -f -

tcl shell through apache

I have a tool which supports interactive queries though tcl shell. I want to create a web application through which users can send different queries to the tool. I have done some basic programming using Apache web server and cgi scripts, but i am unable to think of a way to keep the shell alive and send queries to that.
Some more information:
Let me describe it more. Tool builds a graph data structure, after building users can query for information using tcl shell, something like get all child nodes of a particular node. I cannot build the data structure with every query because building takes lot of time. I want to build the data structure and somehow keep the shell alive. Apache server should send all the queries to that shell and return the responses back to the user
You might want to create a daemon process, perhaps using expect, that spawns your interactive program. The daemon program could listen to queries over TCP using Tcl's socket command. Your CGI program would create a client socket to talk to the daemopn.
I'd embed the graph-managing program into an interpreter that's also running a small webserver (e.g., tclhttpd, though that's not the only option) and have the rest of the world interact with the graph through RESTful web accesses. This could then be integrated behind Apache in any way you like — a CGI thunk would work, or you could do request forwarding, or you could write some server-side code to do it (there's many options there!) — or you could even just let clients connect directly. Many options would work.
The question appears to be incomplete as you did not specify what exactly does "interactive" mean with regard to your tool.
How does it support interactive queries? Does it call gets in a kind of endless loop and processed each line as it's read? If so, the solution to your problem is simple: the Tcl shell is not really concerned about whether its standard input is connected to an interactive terminal or not. So just spawn your tool in your CGI request handling code, write the user's query to that process's stdin stream, flush it and then read all the text written by that process to its stdout and stderr streams. Then send them back to the browser. How exactly to spawn the process and communicate with it via its standard streams heavily depends on your CGI code.
If you don't get the idea, try writing your query to a file and then do comething like
$ tclsh /path/to/your/tool/script.tcl </path/to/the/query.file
and you should have the tool to respond in a usual way.
If the interaction is carried using some other way in your tool, then you probably have to split it to a "core" and "front-end" parts so that the core just reads queries and outputs results, and the front-end part carries out interaction. Then hook up that core to your CGI processing code in a way outlined above.

Using a CGI binary in an application

How would an application interface with a CGI binary? For example, say my app is running on Windows and wants to invoke the PHP.exe binary in the context of a web server. How would I do this? By passing some sort of command line arguments? I am using C++, if that helps.
NOTE - I know how to invoke processes and such. I just want to know how CGI fits into this.
The CGI interface works as follows: when a request comes, the Web server runs a process, the HTTP input (AKA POST data) is supplied to the process via stdin, the generated content and headers are emitted via stdout. Server variables are passed via environment.
Now, your question is not clear enough. Do you have a CGI-compliant local binary that you want to invoke from a program? Or do you want to invoke a CGI-compliant binary somewhere on a Web server?
In the former case, use regular means of process creation (CreateProcess, fork/exec), with I/O pipes redirected. In the latter case, use a HTTP client library (curl, WinInet) to issue an HTTP request.