SSH timeout when running importDump.php on a Bitnami Mediawiki instance on Google Cloud server - ssh

The import seems to start out ok, showing the contents of the mediawiki in the terminal window. At some point (often around the same point in the content), the SSH terminal freezes up. Opera Browser returns an 'out of memory' message.
2 questions -
Can I just start the import and ask the server to run it regardless of the status of the terminal window on my machine (or the internet connection)?
If no to #1, what can I modify to prevent the terminal from timing out?

It could be that the cause of the problem is not the Google Cloud server or the network but a problem with the client browser being used.
A good test would be to do the same operation using another browser if possible and see how it goes. If the operation is successful then it means that it is a problem with the Opera browser itself.
Also check the memory configurations on the client machine to see if it can handle the request.
There has been reports of out of memory errors in Opera:
https://forums.opera.com/topic/17877/new-version-out-of-memory-issue

If you have tried other browsers and issue is the same then it should not be caused by the Opera browser ‘out of memory’ error.
Have you provided your Bitnami Mediawiki deployment with the correct instance specs to handle every request?
At the Google Cloud Platform click on Products & Services which is the
icon with the four bars at the top left hand corner.
On the menu go to the Compute section and hover on ‘Compute Engine’ and
then click on ‘VM Instances’ to view all your instances.
Click on your Bitnami instance to see more details.
Go to ‘Machine type’ where you can see CPU and memory allocated for the
instance.
Ensure Bitnami Mediawiki instance has a good profile to handle the request.
You can also check the instance performance while you’re doing the import and see how it behaves.
As per the documentation, running importDump.php can take quite a long time. For a large Wikipedia dump with millions of pages, it may take days, even on a fast server.

Related

Scrapy on Ubuntu web server getting 417 error

I have been developing a crawling script for a number of news websites and using Scrapy to handle the logic.
When I run my script on an Ubuntu web server (Digital Ocean, if that helps), a lot of the websites that return 200 on my local machine turn out to be 417 instead.
I was wondering how I should fix this, if it is a problem at all? I'm actually not quite sure if it is affecting the final output, but it seems like it has been.
Some of my own research has turned up:
http://www.checkupdown.com/status/E417.html . I've tried adding an Expect header to my requests, which hasn't worked
I've heard that it might be a problem with HTTP 1.1 vs 1.0? EDIT: Nope. Scrapy's HTTPDownloaderHandler automatically chooses 1.1 if it is available
417 is the error a web server gives you when your client says it expects content-types a,b,c, but the content that the server could deliver doesn't match any of these types.
This looks like a scrapy bug or, more likely, misconfiguration.
It seems either your public ip address was already banned or was banned while you scraped by the web server of the page you want to scrape. For the first situation you can reboot your instance to get a new public ip (at least this works on Amazon). For the second scenario, here are some tips from the official documentation to avoid this situation:
rotate your user agent from a pool of well-known ones from browsers
(google around to get a list of them)
disable cookies (see COOKIES_ENABLED) as some sites may use cookies to spot bot behaviour
use download delays (2 or higher). See DOWNLOAD_DELAY setting.
if possible, use Google cache to fetch pages, instead of hitting the
sites directly
use a pool of rotating IPs. For example, the free Tor
project or paid services like ProxyMesh
use a highly distributed downloader that circumvents bans internally, so you can just focus on parsing clean pages. One example of such downloaders is Crawlera
Additionally, you can reduce concurrent requests settings in your spider, that worked once for me.

Background js doesn't work on error pages

Is there any way to load background js even on error pages (such as net::ERR_NETWORK_CHANGED)? I need to keep persistence connection with WS server from the extension, but error pages don't load background js. So I lose the connection and possibility to restart it (due this is automated tool without access to browser ui).
The only solution I found is to use proxy server to customize error pages and load background js inside of them.
The assertion "Background js doesn't work on error pages" does not make any sense, because there's only one background page per extension two if you use split incognito mode.
So I assume that you want to detect a network connectivity loss in order to restore the web socket. Chrome offers two reliable global events for this: online and offline.
I have published the source code of Real-time desktop notifications for Stack Exchange inbox, which also accounts for network connectivity loss/regain. The relevant Web Socket part of the Chrome extension is found in stackexchange-notifications/Chrome/using-websocket.js on Github.

IE10 in Win RT can not connect server on local network

When I browse the web with IE10 in win8's Metro part there is no problem but when I try to view page that is located on server in my local network(the same subnet) it displays this message:
This page can't be displayed
•Make sure the web address http://192.168.1.100 is correct.
•Look for the page with your search engine.
•Refresh the page in a few minutes.
If following these suggestions didn't work, resetting your connection might help.
Reset connection [<-a button here]
Get more help with connection problems
Now the funny part is that there is an option in metro version of ie10 to open page on desktop (in regular IE10) and than it works with no problem.
I can't find or think of any security setting that would restrict browsing websites inside your own local network.
(this is Windows 8 32Bit Release Preview build 8400)
Any ideas?
This is related to EPM (Enhanced Protected Mode) in IE10. It's hard to summarize in an answer here, but Eric Lawrence (a PM on the IE team) has an excellent post detailing everything about EPM:
http://blogs.msdn.com/b/ieinternals/archive/2012/03/23/understanding-ie10-enhanced-protected-mode-network-security-addons-cookies-metro-desktop.aspx
In particular, read the "Loopback-blocked" and "Private Network resources" sections.
In your case, you might try one of these approaches:
Try aliasing the dotted hostname (http://192.168.1.100) via a custom DNS entry (e.g. http://myservice)
Change the Trusted Zones settings
See if your network connection was established as sharing or non-sharing, which would trigger private vs. public mode.
Again, see Eric's post for the details of each of these.

Tools for finding Non SSL resources in web page (firebug like tool)

I'm trying to find a non-SSL resource that is being loaded on my site.
This happens occasional where one of us forgets to use the https version of a resource (like some js in a CDN).
My question is there any firebug-like tools to find these "Turds in the punch bowl"? I want my green padlock back :)
Besides Firebug, which you've mentioned, you can use the developer tools in Chrome:
Tools menu -> Developer Tools
Go through the list of loaded resources in the Network tab
Alternatively, the HttpFox extension for Firefox can also be useful. It will keep logging the traffic even when you change pages, which may be useful in some cases.
(This is very similar to Firebug.)
mitm-proxy is great for stuff like this - http://crypto.stanford.edu/ssl-mitm/
You run it on your local machine in a console window, set your browser to use it as a proxy, and you can watch /log everything that your browser requests. It's a little noisy since it shows SSL hand-shaking and file contents, but you can filter that down. When you need to debug SSL communications it's invaluable to see those details though..
mitm-proxy is based on http://grinder.sourceforge.net/g3/tcpproxy.html which has more in the way of scripting capabilities.

How to test a cocoa touch app for the case when the network fails while downloading a file?

My iOS application, among its features, download files from a specific server. This downloading occurs entirely in the background, while the user is working on the app. When a download is complete, the resource associated with the file appears on the app screen.
My users report some misbehavior about missing resources that I could not reproduce. Some side information leads me to suspect that the problem is caused by the download of the resource's file to be aborted mid-way. Then the app has a partially downloaded file that never gets completed.
To confirm the hypothesis, to make sure any fix works, and to test for such random network vanishing under my feet, I would like to simulate the loss of the network on my test environment: the test server is web sharing on my development Mac, the test device is the iOS simulator running on the same Mac.
Is there a more convenient way to do that, than manually turning web sharing off on a breakpoint?
Depending on how you're downloading your file, one possible option would be to set the callback delegate to null halfway through the download. It would still download the data, but your application would simply stop receiving callbacks. Although, I don't know if that's how the application would function if it truly dropped the connection.
Another option would be to temporarily point the download request at some random file on an external web server, then halfway though just disconnect your computer from the internet. I've done that to test network connectivity issues and it usually works. The interesting problem in your case is that you're downloading from your own computer, so disconnecting won't help. This would just be so you can determine the order of callbacks within the application when this happens, (does it make any callbacks at all? In what order?) so that you can simulate that behavior when actually pointed to your test server.
Combine both options together, I guess, to get the best solution.