Splash memory limit (scrapy) - scrapy

I have splash started from docker.
I create big lua script for splash and scrapy, and then it's run i see problem:
Lua error: error in __gc metamethod (/app/splash/lua_modules/sandbox.lua:189: script uses too much memory
How i can encrease memory for splash?

Unfortunately, as of Splash 2.3.2, there is no a built-in way to raise these limits. Limit is hardcoded here: https://github.com/scrapinghub/splash/blob/7b6612847984fc574ebbedf9c3c750180cd93813/splash/lua_modules/sandbox.lua#L176 - you can change the value, and then rebuild Docker image by running docker build -t splash . from Splash source checkout, and then use this image instead of image from DockerHub.

I solve my problem by optimizing the lua script. It turns out splash:select("a#story-title").node.innerHTML is much heavier than splash:evaljs('document.getElementById("story-title").innerHTML;')

Related

Building Superset locally encounters missing static assets

I'm trying to build Superset locally using docker-compose.
After cloning the repository, I modify docker-compose.yml so that it builds images from local source code instead of pulling from Docker Hub. My modifications include:
In service db, change Postgres image version from image: postgres:14 to image: postgres:10 since the service cannot be built properly with Postgres 14.
In services superset, superset-init, superset-worker, superset-worker-beat and superset-tests-worker, change image: *superset-image to build: . so that Docker builds the application from local source code.
However, after running docker-compose build and then docker-compose up, I got the blank screen like this. I checked out the logs and found out that a lot of asset files are missing, for example /static/assets/images/loading.gif is missing which results in that blank screen.
What am I wrong or missing from my configuration steps? Please help me.
I finally figured it out, it's because the webpackage of the superset frontend is installed inside the container superset_node instead of while building it. That's why although the superset_node is built, we have to wait (in my case for about 15-20 more minutes) for the webpackage to be completely installed. Another point to note is that this installation takes up a lot of memory, so make sure you allocate enough RAM to it (in my case I allocate 16GB to Docker).

Creating network files in SUMO using NETCONVERT

Problem when calling netconvert in sumo:
I am trying to create my own scenario for simulation purposes.
I am using OpenStreetMaps for this.
python osmWebWizard.py
opens the browser and I select the area which I download.
netconvert --osm-files osm_bbox.osm.xml -o osm.net.xml
The error message I get is
Error: Cannot import network data without PROJ-Library. Please install packages proj before building sumo
Warning: Environment variable SUMO_HOME is not set, using built in type maps.
Quitting (on error).
My attempt to fix the problem is:
sudo apt-get install libproj*
But it seems like a dead end there and I am out of options.
Thank you.
EDIT
I have a gut feeling it has to do with libproj0 not being available anymore.

Is it possible to do offscreen rendering without Surface in Vulkan?

Similar to surfaceless context in OpenGL can we do it in Vulkan.
Sure it's even designed to do so from the start.
Instead of getting your images from a swapchain, create them and allocate and bind memory for it yourself.
Getting the result back will then require a copy into a host-visible readback buffer after the render.
SaschaWillems/Vulkan renderheadless.cpp example
It seems that Vulkan is designed to support offscreen rendering better than OpenGL.
This is mentioned on this NVIDIA overview: https://developer.nvidia.com/transitioning-opengl-vulkan
This is a runnable example that I just managed to run locally: https://github.com/SaschaWillems/Vulkan/tree/b9f0ac91d2adccc3055a904d3a8f6553b10ff6cd/examples/renderheadless/renderheadless.cpp
After installing the drivers and ensuring that the GPU is working I can do:
git clone https://github.com/SaschaWillems/Vulkan
cd Vulkan
git checkout b9f0ac91d2adccc3055a904d3a8f6553b10ff6cd
python download_assets.py
mkdir build
cd build
cmake ..
make -j`nproc`
cd bin
./renderheadless
and this immediately generates an image headless.ppm without opening any windows:
I have also managed to run this program an Ubuntu Ctrl + Alt + F3 non-graphical TTY, which further indicates it really does not need a screen.
Other examples that might be of interest:
https://github.com/SaschaWillems/Vulkan/blob/b9f0ac91d2adccc3055a904d3a8f6553b10ff6cd/examples/screenshot/screenshot.cpp starts a GUI, and you can click a button to take a screenshot, and it gets saved to `screenshot
https://github.com/SaschaWillems/Vulkan/blob/b9f0ac91d2adccc3055a904d3a8f6553b10ff6cd/examples/offscreen/offscreen.cpp render an image twice to create a reflection effect
Related: How to use GLUT/OpenGL to render to a file?
Tested on Ubuntu 20.04, NVIDIA driver 435.21, NVIDIA Quadro M1200 GPU.

Running a JSON graph locally

I've been using Flowhub.io to do my development on the nodejs device. Now that the GUI-based design is done, I'm ready to take it offline and run the code via the command line. How would do I do this? I have the JSON file corresponding to the graph I created online, but not sure how to use the noflo nodejs module.
Could someone help me by showing me an example of how to load a graph using the noflo module, please? Thanks!
f you want to run an existing graph, you can use the --graph option.
noflo-nodejs --graph graphs/MyMainGraph.json
If you also want the process to exit when the network stops, you can pass --batch.
PS: I added this to the noflo-nodejs README.

How to get around memory error with karma & phantomjs

We're running tests using karma and phantomjs Last week, our tests mysteriously started crashing phantomJS with an error of -1073741819.
Based on this thread for Chutzpah it appears that code indicates a native memory failure with PhantomJS.
Upon further investigation, we are consistently seeing phantom crash around 750MB of memory.
Is there a way to configure Karma so that it does not run up against this limit? Or a way to tell it to flush phantom?
We only have around 1200 tests so far. We're about 1/4 of the way through our project, so 5000 UI tests doesn't seem out of the question.
Thanks to the StackOverflow phenomenon of posting a question and quickly discovering an answer, we solved this by adding gulp tasks. Before we were just running karma start at the command line. This spun up a single instance of phantomjs that crashed when 750MB was reached.
Now we have a gulp command for each one of our sections of tests, e.g. gulp common-tests and gulp admin-tests and gulp customer-tests
Then a single gulp karma that runs each of those groupings. This allows each gulp command to have its own instance of phantom, and therefore stay underneath that threshold.
We ran into similar issue. Your approach is interesting and certainly side steps the issue. However, be prepared to face it again later.
I've done some investigation and found the cause of memory growth (at least in our case). Turns out when you use:
beforeEach(inject(SomeActualService)){ .... }
the memory taken up by SomeActualService does not get released at the end of the describe block and if you have multiple test files where you inject the same service (or other injectable objects) more memory will be allocated for it again.
I have a couple of ideas on how to avoid this:
1. create mock objects and never use inject to get real objects unless you are in the test that tests that module. This will require writing tons of extra code.
2. Create your own tracker (for tests only) for injectable objects. That way they can be loaded only once and reused between test files.
Forgot to mention: We are using angular 1.3.2, Jasmine 2.0 and hit this problem around 1000 tests.
I was also running into this issue after about 1037 tests on Windows 10 with PhantomJS 1.9.18.
It would appear as ERROR [launcher]: PhantomJS crashed. after the RAM for the process would exceed about 800-850 MB.
There appears to be a temporary fix here:
https://github.com/gskachkov/karma-phantomjs2-launcher
https://www.npmjs.com/package/karma-phantomjs2-launcher
You install it via npm install karma-phantomjs2-launcher --save-dev
But then need to use it in karma.conf.js via
config.set({
browsers: ['PhantomJS2'],
...
});
This seems to run the same set of tests while only using between 250-550 MB RAM and without crashing.
Note that this fix works out of the box on Windows and OS X, but not Linux (PhantomJS2 binaries won't start). This affects pushes to Travis CI.
To work around this issue on Debian/Ubuntu:
sudo apt-get install libicu52 libjpeg8 libfontconfig libwebp5
This is a problem with PhantomJS. According to another source, PhantomJS only runs the garbage collector when the page is closed, and this only happens after your tests run. Other browsers work fine because their garbage collectors work as expected.
After spending a few days on the issue, we concluded that the best solution was to split tests into groups. We had grunt create a profile for each directory dynamically and created a command that runs all those profiles. For all intents and purposes, it works just the same.
We had a similar issue on linux (ubuntu), that turned out to be the amount of memory segments that the process can manage:
$ cat /proc/sys/vm/max_map_count
65530
Then run this:
$ sudo bash -c 'echo 6553000 > /proc/sys/vm/max_map_count'
Note the number was multiplied by 100.
This will change the session settings. If it solves the problem, you can set it up for all future sessions:
$ sudo bash -c 'echo vm.max_map_count = 6553000 > /etc/sysctl.d/60-max_map_count.conf'
Responding to an old question, but hopefully this helps ...
I have a build process which a CI job runs in a command line only linux box. So, it seems that PhantomJS is my only option there. I have experienced this memory issue locally on my mac, but somehow it doesn't happen on the linux box. My solution was to add another test command to my package.json to run karma using Chrome, and run that locally to run my tests. When pushed up, Jenkins would kick off the regular test command, running PhantomJS.
Install this plugin: https://github.com/karma-runner/karma-chrome-launcher
Add this to package.json
"test": "karma start",
"test:chrome": "karma start --browsers Chrome"