Instal Scrapy on PythonAnywhere? (or Cloud9) - scrapy

Can I run Scrapy on the free level of PythonAnywhere? I've looked, but haven't found instructions for installing it there.
If it can't be run on the free level of PythonAnywhere, is there another online environment where I can run Scrapy without needing to install Python and Scrapy on my computer?
EDIT: My question was just about PythonAnywhere, but in finding the answer to the question, I came across Cloud9 and found it to be a preferable alternative, which is explained in the answer.

Short summary:
Scrapy comes preinstalled on PythonAnywhere. No installation required.
I found an alternative that I like better: Cloud9. I was able to install Scrapy on it, but with a security issue that probably won't be a problem for me.
====================================
There were three parts to my question:
Can I run Scrapy in the free level of PythonAnywhere? This part has been answered: Yes, but with debilitating restrictions.
The other two parts have not been answered, but I've found some answers and will share them here.
What other online environments allow me to run Scrapy without needing to install Python and Scrapy on my computer? I haven't found a direct answer to this, but the free tutorial website, Python for Everybody ("Py4E"), has a page, Setting up your Python Development Environment, which lists four online Python environments. It provides a brief tutorial on PythonAnywhere and then just provides links to the other three: Trinket, Cloud9, and CodeAnywhere.
None of those four environments say anything about running Scrapy on them. With some more research, I did find out how to use Scrapy in PythonAnywhere, which I explain next below. Of the other three, Cloud9 is part of Amazon's AWS suite, which is a sophisticated set of software tools that I've used other parts of before. Based on that, I assumed it also accommodates Scrapy, and I checked it out as well. I've added the results of that below as a new part 4 to my question.
Now, the main part of my question: How to install Scrapy on PythonAnywhere? The answer is:
You don't. It's already installed!
It's amazing that PythonAnywhere's otherwise excellent documentation doesn't say anything about this. I found it out by following instructions that I hoped would lead me to installing Scrapy:
First, since I'm new to Python (but not to programming), I ran through Py4E's tutorial on PythonAnywhere, which is really a quick introduction to Python and got me to write a simple program, told me to use the Bash Unix shell instead of the Python interpreter ("$" instead of ">>>"), and had me save a simple program to a file.
Next, I went to Scrapy's installation instructions. It has this wonderful line: "... if you’re already familiar with installation of Python packages, you can install Scrapy and its dependencies from PyPI with: pip install Scrapy". Of course, it doesn't follow that by saying what to do if I'm not familiar with that. Sigh!
After that, I somehow found my way to Python's official instructions on Installing Packages, which starts by explaining that "package" means "a bundle of software to be installed", so I thought that might include Scrapy. So I ran through the instructions there, and about half-way through, it told me to run:
python3 -m pip install "SomeProject"
(* Footnote below on syntax of that command)
The instructions said that "SomeProject" is supposed to be a project that's included in the Python Package Index, so I went there and searched for Scrapy. It gave me a list of 681 projects with "scrapy" in the name, and some of them looked like they might be various versions of Scrapy itself. None of them were called just "Scrapy", but the Scrapy instruction quoted above said to use just that name. So I held my breath and entered:
python3 -m pip install Scrapy
And guess what I got? PythonAnywhere told me:
Requirement already satisfied: Scrapy in /usr/local/lib/python3.9/site-packages (2.5.0)
That was followed by a couple of dozen more lines that all started with "Requirement already satisfied", which I took to be the dependencies required by Scrapy, all of them already present and ready to roll.
So, hmmm, Scrapy is already there? To find out if that's really true, I went to the tutorial on Scrapy's website. The first thing it said was to create a project by using the command:
scrapy startproject tutorial
I entered that, and PythonAnywhere told me that it had successfully created a new project. Since this was a Scrapy command, I conclude that, yes, indeed, I already have Scrapy installed and running on PythonAnywhere. No installation necessary!
What about Cloud9? As I said above in my answer to part 2, when I found out about Cloud9, I was interested because it's part of Amazon Web Services ("AWS"). I've used other parts of AWS before and found them to be sophisticated, complicated, powerful, and well-documented. They are also very economical.
AWS is a commercial system run by Amazon. It charges fees based on usage, with no minimums, and with low-volume usage being free. The pricing page for Cloud9 shows it to be no exception. Cloud9 itself is free to use, but using it calls on other AWS resources that have charges.
The pricing page gives the following example: "If you use the default settings running an IDE for 4 hours per day for 20 days in a month with a 30-minute auto-hibernation setting your monthly charges for 90 hours of usage would be ... $2.05". That's less than half the lowest monthly cost of PythonAnywhere. (As stated in the answer by Giles Thomas, the free level of PythonAnywhere is not very useful for Scrapy.) I'm not sure how the amount of usage in the Cloud9 example compares with the amount of usage allowed by PythonAnywhere's $5/mo service, but my usage is going to be a lot less than either one, so I expect my cost of using Cloud9 to be very low, and possibly nothing. Furthermore, if I only use Scrapy for a project a couple of times a year, with PythonAnywhere, I'd have to close my account in between projects to stop being charged, but AWS doesn't charge me when I'm not using it, so I can keep the account with no cost between projects.
So based on both the quality of the AWS modules I've used and the low usage cost, I was very interested in Cloud9 as an alternative.
And I was not surprised to find that I could use Scrapy in it.
To figure that out, I quickly abandoned the webpage instructions in favor of downloading a pdf of the comprehensive User Guide from the documentation page. Comprehensive = 595 pages! But it's very well organized and cross-referenced, so I was able to learn what I needed by reading about 20 pages, which included a tutorial on using the GUI environment (pg 29..38) and another on using Python in Cloud9 (pg 423..7).
In that second tutorial, I ran:
python3 --version to find out that Python was already installed, version 3.7.10.
python -m pip --version to find out that pip version 20.2.2 is running.
After that tutorial, I was ready to find out if Scrapy is there. I had learned by then about pip show, so I ran:
python -m pip show Scrapy
The answer was no:
WARNING: Package(s) not found: Scrapy
So I repeated the command that I'd done earlier in PythonAnywhere:
python3 -m pip install Scrapy
This time, there were very few "Requirement already satisfied"s and instead there were a lot of "Collecting ... Downloading"s, followed by "Installing collected packages" and then "Successfully installed" with a long list that included Scrapy-2.6.1.
I repeated python -m pip show Scrapy and got several lines of output that told me Scrapy 2.6.1 is installed. Finally, I ran the same test I'd run before in PythonAnywhere, the first instruction in the official Scrapy tutorial:
scrapy startproject tutorial
and got the same output as before, telling me that the project had been created.
Bingo! I have Scrapy running in Cloud9.
On the negative side, there was a problem here. AWS has two levels of sign-in authority, called root users and IAM. For proper security, I should be running Cloud9 as an IAM user, but there was a problem being able to sign in that way. I posted a question on SO about that, but while waiting for an answer, went ahead and started using Cloud9 as the root user. In the course of that, I got the message:
WARNING: Running pip install with root privileges is generally not a good idea.
That warning came with a suggestion of an alternative command that didn't make sense and didn't work when I tried it. So I'm not sure how much I've messed up the security of my AWS account by what I've been doing here. My work is not secretive, so the security may be a non-issue, but I'd still like to figure out how to proceed as an IAM user and clean up any damage I might have caused by what I've been doing as the root user. If anyone knows about that, please respond to the SO question about it linked in the previous paragraph.
So now I've got Scrapy running in Cloud9, and I'm going to go find out if it can get the data I need. I'll make another edit here if there are any surprises in terms of Cloud9 either (a) not being able to do something or (b) resulting in unexpected charges.
====================================
(*) Footnote on syntax of python3 -m pip install "SomeProject":
Since I was working in something called PythonAnywhere, I was tempted to think that this was a Python command. But then I had to remember that, within PythonAnywhere, I was working in Bash, a Unix shell. So Python3 is a Unix command. I haven't found documentation of that exact command, but did of a command it's presumably based on Python. That documentation says, "-m module-name Searches ... for the named module and runs the corresponding .py file as a script." So this means that pip is a Unix module written in Python for installing Python packages. Then install <project name> is a parameter of the pip module. (Somebody please correct me if I've said any of that wrong.)

You can, but free accounts on PythonAnywhere are limited to accessing sites on a whitelist of official public APIs, so you will probably not be able to access non-API sites.

Related

Getting Started with Xero-Python SDK…nightmare

I’m a newbie to using SDKs, using Jupiter Notebooks to play around.
I have pip installed xero-python per Xero’s github page below:
https://github.com/XeroAPI/xero-python
I saved the repository to my hard drive and opened a Jupiter Notebook within repository master folder and copied the code from the configuration section but get an error “no module called logging.settings found” (referring to the parameter passed into dictConfig)
Could someone help me get this up and running, I’m sure there will be a cascade of other errors after sorting this one.
Also, I’d be really grateful if someone could point me to some resources on how to work with SDKs, I’m used to just pip installing a library and just getting familiar with the objects/methods in that library.
Thank you!!!

Cannot import/use packages in Python

I'm aware that this question has been asked before, but after doing quite a bit of research, I find myself stuck with my problem.
I'm currently comfortable in R, and would like to gradually learn Python basics. I've installed python from the official website, and opted for the Spyder IDE. I ran a few basic Python commands such a printing hello. So far so good. Now I'd like to learn by working with datasets.
My understanding is that you need the Pandas package for that (or it is at least recommended). And here start the many, many issues. I read that you need pip to install packages. To my surprise, I had to install that through the windows command prompt. After many failed attempts, I managed, and later even managed installing the packages. However, when I go back to the Spyder IDE and try to run the generic commands, it always returns invalid syntax to me.
I'm happy to share screenshots, or more details if the problem isn't clear enough. I expected to find a steep learning curve at the very beginning, but not to struggle with packages.
Any help greatly appreciated

Bugzilla install on Windows 7 with Apache & Mysql

Environment:
Windows 7 Pro 64 bit.
Apache 2.2.25
ActivePerl 5.18.2 Build 1802 (64 bit)
Bugzilla 4.4.5 (extracted)
Okay, I have followed step-by-step instructions as indicated here: https://wiki.mozilla.org/Bugzilla:Win32Install
There's 1 major issue that I've researched all day and cannot find a solution. It's the "ppm install DBD-mysql" issue. There is no "DBD-mysql". When I use the ppm gui, there's a DBD-mysqlpp , but that's not what's needed.
When I run in a command window: C:\bugzilla\checksetup.pl , I get the only error:
Checking for DBD-mysql (v4.001) not found
For MySQL, Bugzilla requires that perl's DBD::mysql 4.001 or later be
installed. To install this module, run the following command (as Administrator):
ppm install DBD-mysql
Well, I found DBD-mysql 4.028 here: http://search.cpan.org/~capttofu/DBD-mysql-4.028
However, trying to get the perl ppm (perl package manager) gui to see it, but even pointing the link in the gui preferences it still wouldn't see it to download and install it. This sucks.
So, I manually downloaded it and saw some crap "makefile.pl" file. Look, I use WINDOWS not LINUX, so naturally I rolled my eyes but it quickly became clear that I HAD to do this. Well, after about an hour of screwing around I was finally able to make a mysql_config.pl file. Oh wow, great.... now what!? What do I do next with this??
Not only is this part aggravating, but now I need mod_perl in my Apache. Okay, so I look in my .conf file where I can uncomment:
#LoadModule perl_module modules/mod_perl.so
Guess what... it's not there. (facepalm)
So I researched and downloaded mod_perl 2.0.8 from: http://perl.apache.org/download/index.html
Heh, ... guess what... another damn "makefile" file!! Let me tell you, this is as bad as being a MAC user in the late 90's and trying to find a program to be compatible.
So, my 2 main questions are these:
How do I get DBD-mysql properly installed
How do I get the mod_perl properly installed
I have been working 12 hours non-stop trying to get Bugzilla installed. At this point, I'm so frustrated beyond believe I'm ready to scrap this and install flyspray or some other php/mysql bug tracking system.
For future reference in case anyone stumbles upon this item:
Most Perl modules have installation instructions, either in their .pod documentation or in a README or INSTALL file.
Most Perl modules follow a similar installation mechanism: create a makefile; run make; make test (which runs a set of tests, unsurprisingly), and then install the module. From the commandline, this looks like the following:
$ perl Makefile.PL
$ make
$ make test
$ make install
Both mod_perl and DBD::mysql have platform-specific installation information in the release notes.
DBD-mysql is a package that you can find in ActiveState Package Repository. That means that it should be installed at the time you installed ActivePerl, making your question really strange. I know it because I just did it yesterday, and everything worked just fine.
A possible solution for this issue is:
Open the command line in Windows;
Type ppm, press enter and wait. A GUI for the Perl Package Manager will be displayed;
Search for DBD-mysql. If you find it, schedule it for installation by right-clicking it and clicking on Install DBD-mysql 4.022;
Although, you probably won't find it. If you don't, press Ctrl+P and, in the dialog that appears, select the Repositories tab. Add a new repository:
Name: ActiveState-Package-Repository
Location: http://ppm4.activestate.com/MSWin32-x64/5.16/1604/package.xml
It will update the list of packages. Repeat steps 1 to 3.
If you prefer doing it by the command line, these are the instructions:
ppm repo add ActiveState-Package-Repository http://ppm4.activestate.com/MSWin32-x64/5.16/1604/package.xml
ppm install DBD-mysql

Cygwin & OCaml: OPAM + Batteries

I extensively use Cygwin on a Windows 8 environment (I do not want to go ahead and boot/load Linux directly on the machine). I use the OCamlIDE plug-in for Eclipse and have experienced relatively no problems using this workflow setup.
However, I would like to use Batteries so that I may make use of use of its dynamic arrays among a few other interesting features that will speed up my development process.
I have tried this method: http://ocaml.org/install.html, but I get the following error:
$ sh ./opam_installer.sh /usr/local/bin
No file yet for i686:CYGWIN_NT-6.2-WOW64
What am I missing and how would I configure Cygwin so that it can accept the Opam installer? When I tried yet a different way of building Opam, I got:
'i686-w64-mingw32-gcc' is not recognized as an internal or external command,
as a Makefile error and reason for building failure. It seems something is wrong related to mingw32-gcc, what do I need to install and/or configure for my Cygwin to get it to compile/build things properly. I have wget and curl installed as well.
My overall question: What is the best way to get Batteries installed on my system with the minimum of time spent tracing all of its dependencies by hand? Is there a way I can just build the library module, such as BatDynArray and the includes:
include BatEnum.Enumerable
include BatInterfaces.Mappable
That way I can just call them directly in my code with open...;; and/or include...;;;
OCaml works beautifully on Windows with WODI, which is a Cygwin-based distribution that includes Batteries and tons of other useful packages (which are a pain to install manually on Windows).
I urge you to take a shot at WODI, which I believe to be an indispensable tool for the
rest of us, the forgotten souls, who have to deal with Windows.
First of all, include does not do what you think it does. open Batteries should be exactly what you're looking for. OPAM is not yet solid on windows (maybe Thomas could give an update on where things stand).
Frankly, I would recommend to install a linux on a VM, you should be able to get started with OPAM instantly then. Otherwise, take a look at this package manager for OCaml which focuses on cross platform support: http://yypkg.forge.ocamlcore.org/. I've never tried it myself however. The last package manger you could try is GODI, I'm not sure about its windows support though.
Finally, if none of these options work then it should be possible to install batteries from the source. All you need is OCaml and make. And if there are problems with this approach then you should definitely follow up on them either here or on the bug tracker because batteries does intend to support windows AFAIK.

AIX apache rpm dependencies

I am evaluating the Crowd SSO by Atlassian. Now to get apache to use Crowd for authentication, there is a connector available by the vendor.
Problem
Unfortunately they do not provide anything for my OS (AIX). Instead they provide source code with instructions. Now the example here uses yum -y install autoconf automake gcc httpd-devel libcurl-devel libtool libxml2-devel mod_dav_svn subversion-devel to download the required packages for which there is no alternate in AIX (AFAIK).
So I went to the AIX toolbox and got some packages. For the rest, I took Mr Perzl's help. And while installing the rpms ended up getting dependency errors.
Question
Do I go with
The solution given here dependency hell.
IBM way
Something else which Google and my limited exposure to AIX are not telling me.
I am not *nix expert, rather at basic user level. And any installations are actually done by the admins. I need expert advice so as to get it right and efficiently if possible.
Appreciate if someone would like to retag this question for getting attention from the right people.
It has been a while since I struggled with AIX and Linux, and have success with the Crowd Connector on Linux. So, having taken a look at both links, I would say:
The IBM documentation is only for the packages supplied with their Toolbox and there is a risk that if you use it for other things, you may end up with a dead-end as the utilities may refuse to play ball.
With Mr. Perzl's way, you are building it brick by brick, with known certainty. The main risk is that the right versions may not be available and/or one of the build tools may not work. In that case, you may still have to tweak the source and/or the build/make files to compile properly, but it will eventually work.
Once you have a compiled plugin and it works with a certain version of Apache, you will not need many of the dependencies, so the instructions you give to the admins to deploy will be minimal. Most likely, the runtime dependencies will be mod_dav_svn, curl and libxml
Please post an update when you get it working.