Can you view a lua script in redis? - scripting

Is there a way do get the content of a lua script in redis after you upload the script ?

There's no way to see the content of a lua script after it's loaded.

Related

How can I use scrapy middleware in the scrapy Shell?

In a scrapy project one uses middleware quite often. Is there a generic way of enableing usage of middleware in the scrapy shell during interactive sessions as well?
Although, Middlewares set up in setting.py are enabled by default in scrapy shell. You can see it on the logs when running scrapy shell.
So to answer your question, yes you can do so using this command.
scrapy shell -s DOWNLOADER_MIDDLEWARES='<<your custom middleware>>'
You can override settings using the -s parameters.
Remember, just running scrapy shell inside a folder that contains a scrapy project.
It will load the default settings from settings.py.
Happy Scraping :)

Any way to validate pig script before running in hdfs cluster?

I'm new to pig script. The problem I'm facing is inability to validate how syntactically correct my script is. I have to upload it to hdfs cluster and run there just to realize I missed ';' in the end of the line. Big waste of time. I use IntelliJ IDEA with pig script plugin, but while it helps to highlight pig statements it does not validate it. Apache Pig does not seem to have any compiler, you can only run it, but I can't run it locally, data is not available from my laptop. So I wonder it there is any sophisticated pig script syntax validator, so I can run it and check if my script syntactically correct before uploading to the server.

Splash memory limit (scrapy)

I have splash started from docker.
I create big lua script for splash and scrapy, and then it's run i see problem:
Lua error: error in __gc metamethod (/app/splash/lua_modules/sandbox.lua:189: script uses too much memory
How i can encrease memory for splash?
Unfortunately, as of Splash 2.3.2, there is no a built-in way to raise these limits. Limit is hardcoded here: https://github.com/scrapinghub/splash/blob/7b6612847984fc574ebbedf9c3c750180cd93813/splash/lua_modules/sandbox.lua#L176 - you can change the value, and then rebuild Docker image by running docker build -t splash . from Splash source checkout, and then use this image instead of image from DockerHub.
I solve my problem by optimizing the lua script. It turns out splash:select("a#story-title").node.innerHTML is much heavier than splash:evaljs('document.getElementById("story-title").innerHTML;')

How can I use boto or boto-rsync a full backup of 1000+ files to an S3-compatible cloud?

I'm trying to back up my entire collection of over 1000 work files, mainly text but also pictures, and a few large (0.5-1G) audiorecordings, to an S3 cloud (Dreamhost DreamObjects). I have tried to use boto-rsync to perform the first full 'put' with this:
$ boto-rsync --endpoint objects.dreamhost.com /media/Storage/Work/ \
> s3:/work.personalsite.net/ > output.txt
where '/media/Storage/Work/' is on a local hard disk, 's3:/work.personalsite.net/' is a bucket named after my personal web site for uniqueness, and output.txt is where I wanted a list of the files uploaded and error messages to go.
Boto-rsync grinds its way through the whole dirtree, but refreshing output about each file's progress doesn't look so good when it's printed in a file. Still as the upload is going, I 'tail output.txt' and I see that most files are uploaded, but some are only uploaded to less than 100%, and some are skipped altogether. My questions are:
Is there any way to confirm that a transfer is 100% complete and correct?
Is there a good way to log the results and errors of a transfer?
Is there a good way transfer a large number of files in a big directory hierarchy to one or more buckets for the first time, as opposed to an incremental backup?
I am on a Ubuntu 12.04 running Python 2.7.3. Thank you for your help.
you can encapsulate the command in an script and starts over nohup:
nohup script.sh
nohup generates automaticaly nohup.out file where all the output aof the script/command are captured.
to appoint the log you can do:
nohup script.sh > /path/to/log
br
Eddi

Hadoop put command doing nothing!

I am running Cloudera's distribution of Hadoop and everything is working perfectly.The hdfs contains a large number of .seq files.I need to merge the contents of all the .seq files into one large .seq file.However, the getmerge command did nothing for me.I then used cat and piped the data of some .seq files onto a local file.When i want to "put" this file into hdfs it does nothing.No error message shows up,and no file is created.
I am able to "touchz" files in the hdfs and user permissions are not a problem here.The put command simply does not work.What am I doing wrong?
Write a job that merges the all sequence files into a single one. It's just the standard mapper and reducer with only one reduce task.
if the "hadoop" commands fails silently you should have a look at it.
Just type: 'which hadoop', this will give you the location of the "hadoop" executable. It is a shell script, just edit it and add logging to see what's going on.
If the hadoop bash script fails at the beginning it is no surprise that the hadoop dfs -put command does not work.