Rebol Request-Download doesn't support Big File: how to correct? - rebol

download-dir: request-dir
Print ["downloading " "VStudio2008Express.iso" "..." ]
url: http://go.microsoft.com/fwlink/?LinkId=104679
file-save: to-rebol-file rejoin [download-dir "VStudio2008Express.iso"]
request-download/to url file-save
In the end whereas the progress bar has shown the download has finished:
** Script Error: Not enough memory
** Where: append
** Near: insert tail series :value
>>
So how to correct request-download as it is a mezzanine function:
func [
{Request a file download from the net. Show progress. Return none on error.}
url [url!]
/to "Specify local file target." local-file [file! none!]
/local prog lo stop data stat event-port event
][
view/new center-face lo: layout [
backeffect [gradient 1x1 water gray]
space 10x8
vh2 300 gold "Downloading File:"
vtext bold center 300 to-string url
prog: progress 300
across
btn 90 "Cancel" [stop: true]
stat: text 160x24 middle
]
stop: false
data: read-thru/to/progress/update url local-file func [total bytes] [
prog/data: bytes / (max 1 total)
stat/text: reform [bytes "bytes"]
show [prog stat]
not stop
]
unview/only lo
if not stop [data]
]

Rebol's read functions read all the input data into memory at once and cannot be used on large datasets. You have to open a port and copy the data from it in chunks to handle large datasets.
I would think that the request-download function could be modified to use ports for both the input and output data. This thread from the Rebol Mailing List may help you:
http://www.rebol.org/ml-display-thread.r?m=rmlFQXC
You can find a fuller example on Carl's blog at http://www.rebol.com/cgi-bin/blog.r?view=0281#comments
Even using this technique there is a limit of approximately 2Gb to the size of files that can be handled in Rebol 2.

Try this
http://anton.wildit.net.au/rebol/util/batch-download.r

Related

channel checks as empty even if it has content

I am trying to have a process that is launched only if a combination of conditions is met, but when checking if a channel has a path to a file, it always returns it as empty. Probably I am doing something wrong, in that case please correct my code. I tried to follow some of the suggestions in this issue but no success.
Consider the following minimal example:
process one {
output:
file("test.txt") into _chProcessTwo
script:
"""
echo "Hello world" > "test.txt"
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.toList().view()
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(!_chProcessTwoCheck.toList().isEmpty())
script:
def test = _chProcessTwoUse.toList().isEmpty() ? "I'm empty" : "I'm NOT empty"
println "The outcome is: " + test
}
I want to have process two run if and only if there is a file in the _chProcessTwo channel.
If I run the above code I obtain:
marius#dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [infallible_gutenberg] - revision: 9f57464dc1
[c8/bf38f5] process > one [100%] 1 of 1 ✔
[- ] process > two -
[/home/marius/pipeline/work/c8/bf38f595d759686a497bb4a49e9778/test.txt]
where the last line are actually the contents of _chProcessTwoView
If I remove the when directive from the second process I get:
marius#mg-dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [modest_descartes] - revision: 5b2bbfea6a
[57/1b7b97] process > one [100%] 1 of 1 ✔
[a9/e4b82d] process > two [100%] 1 of 1 ✔
[/home/marius/pipeline/work/57/1b7b979933ca9e936a3c0bb640c37e/test.txt]
with the contents of the second worker .command.log file being: The outcome is: I'm empty
I tried also without toList()
What am I doing wrong? Thank you in advance
Update: a workaround would be to check _chProcessTwoUse.view() != "" but that is pretty dirty
Update 2 as required by #Steve, I've updated the code to reflect a bit more the actual conditions i have in my own pipeline:
def runProcessOne = true
process one {
when:
runProcessOne
output:
file("inputProcessTwo.txt") into _chProcessTwo optional true
file("inputProcessThree.txt") into _chProcessThree optional true
script:
// this would replace the probability that output is not created
def outputSomething = false
"""
if ${outputSomething}; then
echo "Hello world" > "inputProcessTwo.txt"
echo "Goodbye world" > "inputProcessThree.txt"
else
echo "Sorry. Process one did not write to file."
fi
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(runProcessOne)
script:
"""
echo "The outcome is: ${myInput}"
"""
}
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(inputFromProcessTwo) from _chProcessThree
script:
def extra_parameters = _chProcessThree.isEmpty() ? "" : "--extra-input " + inputFromProcessTwo
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
As #Steve mentioned, I should not even check if a channel is empty, NextFlow should know better to not initiate the process. But I think in this construct I will have to.
Marius
I think part of the problem here is that process 'one' creates only optional outputs. This makes dealing with the optional inputs in process 'three' a bit tricky. I would try to reconcile this if possible. If this can't be reconciled, then you'll need to deal with the optional inputs in process 'three'. To do this, you'll basically need to create a dummy file, pass it into the channel using the ifEmpty operator, then use the name of the dummy file to check whether or not to prepend the argument's prefix. It's a bit of a hack, but it works pretty well.
The first step is to actually create the dummy file. I like shareable pipelines, so I would just create this in your baseDir, perhaps under a folder called 'assets':
mkdir assets
touch assets/NO_FILE
Then pass in your dummy file if your '_chProcessThree' channel is empty:
params.dummy_file = "${baseDir}/assets/NO_FILE"
dummy_file = file(params.dummy_file)
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(optfile) from _chProcessThree.ifEmpty(dummy_file)
script:
def extra_parameters = optfile.name != 'NO_FILE' ? "--extra-input ${optfile}" : ''
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
Also, these lines are problematic:
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
Calling view() will emit all values from the channel to stdout. You can ignore whatever value it returns. Unless you enable DSL2, the channel will then be empty. I think what you're looking for here is a closure:
_chProcessTwoView.view { "Found: $it" }
Be sure to append -ansi-log false to your nextflow run command so the output doesn't get clobbered. HTH.

Piping data to a pager

I have a code sample in Ruby that pipes data to a pager in order to print it in portions to STDOUT:
input = File.read "some_long_file"
pager = "less"
IO.popen(pager, mode="w") do |io|
io.write input
io.close
end
I have no problem in adopting this to Crystal like this:
input = File.read "some_long_file"
pager = "less"
Process.run(pager, output: STDOUT) do |process|
process.input.puts input
process.input.close
end
But if I change pager = "more" than the Ruby example still works fine, but the Crystal snippet dumps all the data, instead of serving it in portions. How can I fix it?
Crystal 0.25.0 [7fb783f7a] (2018-06-11)
LLVM: 4.0.0
Default target: x86_64-unknown-linux-gnu
The more command tries to write it's user interface to stderr, so you need to forward that as well:
Process.run(pager, output: STDOUT, error: STDERR) do |process|
process.input.puts input
process.input.close
end
Since you are reading a long file, you might consider not reading it into memory, but instead to pass the file descriptor to the pipe:
input = File.open("log/development.log")
pager = "more"
Process.run(pager, input: input, output: STDOUT, error: STDERR)

How to get around SlowDown Errors with AWS S3 put object

I am trying to access a file and update it in S3 with boto but continue to get slowdown errors even after pausing in between requests as per code below. How do I get around this ?
body = b'Here we have some more data'
s3.put_object(Body=body,Bucket=bucket, Key=key)
time.sleep(10)
response = s3.get_object(Bucket=bucket, Key=key)
time.sleep(10)
print(response["Body"].read().decode('utf-8'))
currFile = response["Body"].read().decode('utf-8')
newFile = currFile + "\n" + "New Stuff!!!"
newFileB = newFile.encode('utf-8')
time.sleep(60)
s3.put_object(Body=newFileB,Bucket=bucket, Key=key)
time.sleep(10)
response = s3.get_object(Bucket=bucket, Key=key)
print(response["Body"].read().decode('utf-8'))
Here is the error :
Details
The area below shows the result returned by your function execution.
{
"errorMessage": "An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 4): Please reduce your request rate.",
"errorType": "ClientError",
"stackTrace": [
[
"/var/task/lambda_function.py",
43,
"lambda_handler",
"raise e"
],
[
"/var/task/lambda_function.py",
20,
"lambda_handler",
"s3.put_object(Body=body,Bucket=bucket, Key=key)"
],
[
"/var/runtime/botocore/client.py",
314,
"_api_call",
"return self._make_api_call(operation_name, kwargs)"
],
[
"/var/runtime/botocore/client.py",
612,
"_make_api_call",
"raise error_class(parsed_response, operation_name)"
]
]
}
I had this exact problem and I am not sure why it happens, but it is the sort of thing that production code has to deal with all the time. The solution is to just keep trying and not give up for up to quite a while. Upon a failure, the loop starts with a delay of 1 second and then it increases that in each loop by a second (delay_incr), finally timing out a max_delay of 30 sec per loop, which is actually a max total 7.5 minutes, when it finally gives up. Of course you can twiddle the timings. This has been successful for me so far.
I have to do a similar thing even for a NAS file server where I have to wait for a while to read files at times.
def put_s3_core(bucket, key, strobj, content_type=None):
""" write strobj to s3 bucket, key
content_type can be:
binary/octet-stream (default)
text/plain
text/html
text/csv
image/png
image/tiff
application/pdf
application/zip
"""
delay = 1 # initial delay
delay_incr = 1 # additional delay in each loop
max_delay = 30 # max delay of one loop. Total delay is (max_delay**2)/2
while delay < max_delay:
try:
s3 = boto3.resource('s3')
request_obj = s3.Object(bucket, key)
if content_type:
request_obj.put(Body=strobj, ContentType=content_type)
else:
request_obj.put(Body=strobj)
break
except ClientError:
time.sleep(delay)
delay += delay_incr
else:
raise
```

REBOL layout: How to create layout words automatically - word has no context?

Using the REBOL/View 2.7.8 Core, I would like to prepare a view layout beforehand by automatically assigning words to various layout items, as in the following example.
Instead of
prepared-view: [across
cb1: check
label "Checkbox 1"
cb2: check
label "Checkbox 2"
cb3: check
label "Checkbox 3"
cb4: check
label "Checkbox 4"
]
view layout prepared-view
I would thus like the words cb1 thru cb5 to be created automatically, e.g.:
prepared-view2: [ across ]
for i 1 4 1 [
cbi: join "cb" i
cbi: join cbi ":"
cbi: join cbi " check"
append prepared-view2 to-block cbi
append prepared-view2 [
label ]
append prepared-view2 to-string join "Checkbox " i
]
view layout prepared-view2
However, while difference prepared-view prepared-view2 shows no differences in the block being parsed (== []), the second script leads to an error:
** Script Error: cb1 word has no context
** Where: forever
** Near: new/var: bind to-word :var :var
I've spent hours trying to understand why, and I think somehow the new words need to be bound to the specific context, but I have not yet found any solution to the problem.
What do I need to do?
bind prepared-view2 'view
view layout prepared-view2
creates the correct bindings.
And here's another way to dynamically create layouts
>> l: [ across ]
== [across]
>> append l to-set-word 'check
== [across check:]
>> append l 'check
== [across check: check]
>> append l "test"
== [across check: check "test"]
>> view layout l
And then you can use loops to create different variables to add to your layout.
When you use TO-BLOCK to convert a string to a block, that's a low-level operation that doesn't go through the "ordinary" binding to "default" contexts. All words will be unbound:
>> x: 10
== 10
>> code: to-block "print [x]"
== [print [x]]
>> do code
** Script Error: print word has no context
** Where: halt-view
** Near: print [x]
So when you want to build code from raw strings at runtime whose lookups will work, one option is to use LOAD and it will do something default-ish, and that might work for some code (the loader is how the bindings were made for the code you're running that came from source):
>> x: 10
== 10
>> code: load "print [x]"
== [print [x]]
>> do code
10
Or you can name the contexts/objects explicitly (or by way of an exemplar word bound into that context) and use BIND.

How do I pick elements from a block using a string in Rebol?

Given this block
fs: [
usr [
local [
bin []
]
share []
]
bin []
]
I could retrieve an item using a path notation like so:
fs/usr/local
How do I do the same when the path is a string?
path: "/usr/local"
find fs path ;does not work!
find fs to-path path ;does not work!
You need to complete the input string path with the right root, then LOAD it and evaluate it.
>> path: "/usr/local"
>> insert path "fs"
>> do load path
== [
bin []
]
Did you know Rebol has a native path type?
although this doesn't exactly answer your question, I tought I'd add a reference on how to use paths directly in Rebol. Rebol has a lot of datatypes, and when you can, you should leverage that rich language feature. Especially when you start to use and build dialects, knowing what types exist and how to use them becomes even more potent.
Here is an example on how you can build and run a path directly, without using strings. in order to represent a path within source code, you use the lit-path! datatype.
example:
>> p: 'fs/usr/local
== fs/usr/local
>> do p
== [
bin []
]
you can even append to a path to manipulate it:
>> append p 'bin
== fs/usr/local/bin
>> do p
== []
if it where stored within a block, you use a path! type directly (not a lit-path!):
>> p: [fs/usr/local/bin]
== [fs/usr/local]
>> do first p
== [
bin []
]
also note that using paths directly has advantages over using strings because the path is composed of a series of words, which you can do some manipulation more easily than with strings example:
>> head change next p 'bin
== fs/bin/local
>> p: 'fs/path/issue/is
== fs/path/issue/is
>> head replace p 'is 'was
== fs/path/issue/w
as opposed to using a string:
>> p: "fs/path/issue/is"
== "fs/path/issue/is"
>> head replace p "is" "was"
== "fs/path/wassue/is"
If you want to browse the disk, instead of Rebol datasets, you must simply give 'FS a value of a file! and the rest of the path with browse from there (this is how paths work on file! types):
fs: %/c/
read dirize fs/windows