How to flush current output in RApache? - rapache

I'm testing using RApache as an SSE (Server Sent Events) and similar (long poll, comet, etc.) back-end. I seem to be stuck on how to flush my output. Is it possible?
Here is my test R script:
setContentType("text/plain")
repeat{
cat(format(Sys.time()),"\n")
#sendBin(paste(format(Sys.time()),"\n"))
flush(stdout())
Sys.sleep(1)
}
My Rapache.conf entry is:
<Location /rtest/sse>
Options -MultiViews
SetHandler r-handler
RFileHandler /var/www/local/rtest/sse.r
</Location>
And I test it using either wget or curl:
wget -O - http://localhost/rtest/sse
curl http://localhost/rtest/sse
Both just sit there, meaning nothing is being sent.
Using sendBin() made no change, and neither did using flush().
If I change repeat to for(i in 1:5) then it sits there for 5 seconds and then shows 5 timestamps (spaced one second apart). So, I believe everything else is working fine and this is purely a buffering issue.
UPDATE: Looking at this with fresh eyes after 5 months, I think I could have described the problem more clearly: the problem is that RApache appears to be buffering all the output, and not sending anything until the R script exits. To be useful for streaming it has to send data out of Apache and on to the client each time flush() is called, i.e. while the R script is still running.
So, my question is: is there a way to get RApache to behave like that?
UPDATE 2 I tried adding flush.console() before or after the flush(stdout()) but no difference. I also tried setStatus(status=200L) at the top. And I tried SERVER$no_cache=T;SERVER$no_local_copy=T; at the top of the script. Again it made no difference. (Yes, none of those should have helped, but it never hurts to try!)
Here is a link to how PHP implements flush when it is running as an Apache module:
http://git.php.net/?p=php-src.git;a=blob;f=sapi/apache2handler/sapi_apache2.c#l290
I think the key point is that there is a call to ap_rflush(r). I'm guessing that RApache is not making the ap_rflush() call.

You are passing the wrong MIME type. Try changing with
setContentType("text/event-stream")
EDIT1:
this is the attempt, (still unsuccessful) I mentioned in the comment below, to implement SSE in Rook.
<%
res$header('Content-Type', 'text/event-stream')
res$header('Cache-Control', 'no-cache')
res$header('Connection', 'keep-alive')
A <- 1
sendMessage <- function(){
while(A<=4){
cat("id: ", Sys.time(), "\n", "data: hello\n\n", sep="")
A <- A+1
flush(stdout())
Sys.sleep(1)
}
}
-%>
<% sendMessage() %>
the while loop condition was supposed to be always TRUE but I'm having your same problem so I had to do a finite loop...
The good new is I DO have data reaching the browser. I can tell by looking, in developer tools, at the Content-Length in the Response Header section. it says 114 for the above code and you change, say, "Hello" in "Hello!" it'll say 118.
The js code is: (you'll need JQuery as well)
$(document).ready(function(){
$("button").click(function(){
var source = new EventSource("../R/sse.Rhtml");
source.onopen = function(event){
console.log("readyState: " + source.readyState);
}
source.onmessage = function(event){
$("#div").append(event.data);
};
source.onerror = function(event){
console.log(event);
};
});
});
So, in essence
1) The connection is open (readyState 1)
2) Buffering is still there
3) Data (after buffering) reaches the browser but an error happens in receiving them properly.
EIDT2:
it's interesting to note that brew()ing the above .Rhtml file the output is not buffered. There must be a configuration the in the web server (both the R internal and Apache) that buffer the data flows.
As a side note, flush is not even needed, cat's output defaults to stout(). So the options are:
Web server configuration
The R equivalent of the PHP ob_flush(); which is always used in any PHP implementation I've seen. this is example

Related

Fetching big SQL table in the web app session

I'm quite new in web app so apologize if my question is abit basic. I'm developing a Web app with R shiny where the inputs are very large tables from Azure SQL server. They are 20 tables each in the order of hundred-thousand rows and hundreds of columns containing numbers, Characters and etc. I have no problem calling them, my main issue is that it takes so much time to fetch everything from Azure SQL server. It takes approximately 20 minutes. So the user of the web app needs to wait quite a long.
I'm using DBI package as follows:
db_connect <- function(database_config_name){
dbConfig <- config::get(database_config_name)
connection <- DBI::dbConnect(odbc::odbc(),
Driver = dbConfig$driver,
Server = dbConfig$server,
UID = dbConfig$uid,
PWD = dbConfig$pwd,
Database = dbConfig$database,
encoding = "latin1"
)
return(connection)
}
and then fetching tables by :
connection <- db_connect(db_config_name)
table <- dplyr::tbl(con, dbplyr::in_schema(fetch_schema_name(db_config_name,table_name,data_source_type), fetch_table_name(db_config_name,table_name,data_source_type)))
I searched a lot but didn't come across a good solution, I appreciate any solutions can tackle this problem.
I work with R accessing SQL Server (not Azure) daily. For larger data (as in your example), I always revert to using the command-line tool sqlcmd, it is significantly faster. The only pain point for me was learning the arguments and working around the fact that it does not return proper CSV, there is post-query munging required. You may have an additional pain-point of having to adjust my example to connect to your Azure instance (I do not have an account).
In order to use this in a shiny environment and preserve its interactivity, I use the processx package to start the process in the background and then poll its exit status periodically to determine when it has completed.
Up front: this is mostly a "loose guide", I do not pretend that this is a fully-functional solution for you. There might be some rough-edges that you need to work through yourself. For instance, while I say you can do it asynchronously, it is up to you to work the polling process and delayed-data-availability into your shiny application. My answer here provides starting the process and reading the file once complete. And finally, if encoding= is an issue for you, I don't know if sqlcmd does non-latin correctly, and I don't know if or how to fix this with its very limited, even antiquated arguments.
Steps:
Save the query into a text file. Short queries can be provided on the command-line, but past some point (128 chars? I don't know that it's clearly defined, and have not looked enough recently) it just fails. Using a query-file is simple enough and always works, so I always use it.
I always use temporary files for each query instead of hard-coding the filename; this just makes sense. For convenience (for me), I use the same tempfile base name and append .sql for the query and .csv for the returned data, that way it's much easier to match query-to-data in the temp files. It's a convention I use, nothing more.
tf <- tempfile()
# using the same tempfile base name for both the query and csv-output temp files
querytf <- paste0(tf, ".sql")
writeLines(query, querytf)
csvtf <- paste0(tf, ".csv")
# these may be useful in troubleshoot, but not always [^2]
stdouttf <- paste0(tf, ".stdout")
stderrtf <- paste0(tf, ".stderr")
Make the call. I suggest you see how fast this is in a synchronous way first to see if you need to add an async query and polling in your shiny interface.
exe <- "/path/to/sqlcmd" # or "sqlcmd.exe"
args <- c("-W", "b", "-s", "\037", "-i", querytf, "-o", csvtf,
"-S", dbConfig$server, "-d", dbConfig$database,
"-U", dbConfig$uid, "-P", dbConfig$pwd)
## as to why I use "\037", see [^1]
## note that the user id and password will be visible on the shiny server
## via a `ps -fax` command-line call
proc <- processx::process$new(command = exe, args = args,
stdout = stdouttf, stderr = stderrtf) # other args exist
# this should return immediately, and should be TRUE until
# data retrieval is done (or error)
proc$is_alive()
# this will hang (pause R) until retrieval is complete; if/when you
# shift to asynchronous queries, do not do this
proc$wait()
One can use processx::run instead of process$new and proc$wait(), but I thought I'd start you down this path in case you want/need to go asynchronous.
If you go with an asynchronous operation, then periodically check (perhaps every 3 or 10 seconds) proc$is_alive(). Once that returns FALSE, you can start processing the file. During this time, shiny will continue to operate normally. (If you do not go async and therefore choose to proc$wait(), then shiny will hang until the query is complete.)
If you make a mistake and do not proc$wait() and try to continue with reading the file, that's a mistake. The file may not exist, in which case it will err with No such file or directory. The file may exist, perhaps empty. It may exist and have incomplete data. So really, make a firm decision to stay synchronous and therefore call proc$wait(), or go asynchronous and poll periodically until proc$is_alive() returns FALSE.
Reading in the file. There are three "joys" of using sqlcmd that require special handling of the file.
It does not do embedded quotes consistently, which is why I chose to use "\037" as a separator. (See [^1].)
It adds a line of dashes under the column names, which will corrupt the auto-classing of data when R reads in the data. For this, we do a two-step read of the file.
Nulls in the database are the literal NULL string in the data. For this, we update the na.strings= argument when reading the file.
exitstat <- proc$get_exit_status()
if (exitstat == 0) {
## read #1: get the column headers
tmp1 <- read.csv(csvtf, nrows = 2, sep = "\037", header = FALSE)
colnms <- unlist(tmp1[1,], use.names = FALSE)
## read #2: read the rest of the data
out <- read.csv(csvtf, skip = 2, header = FALSE, sep = "\037",
na.strings = c("NA", "NULL"), quote = "")
colnames(out) <- colnms
} else {
# you should check both stdout and stderr files, see [^2]
stop("'sqlcmd' exit status: ", exitstat)
}
Note:
After a lot of pain with several issues (some in sqlcmd.exe, some in data.table::fread and other readers, all dealing with CSV-format non-compliance), at one point I chose to stop working with comma-delimited returns, instead opting for the "\037" field Delimiter. It works fine with all CSV-reading tools and has fixed so many problems (some not mentioned here). If you're not concerned, feel free to change the args to "-s", "," (adjusting the read as well).
sqlcmd seems to use stdout or stderr in different ways when there are problems. I'm sure there's rationale somewhere, but the point is that if there is a problem, check both files.
I added the use of both stdout= and stderr= because of a lot of troubleshooting I did, and continue to do if I munge a query. Using them is not strictly required, but you might be throwing caution to the wind if you omit those options.
By the way, if you choose to only use sqlcmd for all of your queries, there is no need to create a connection object in R. That is, db_connect may not be necessary. In my use, I tend to use "real" R DBI connections for known-small queries and the bulk sqlcmd for anything above around 10K rows. There is a tradeoff; I have not measured it sufficiently in my environment to know where the tipping point is, and it is likely different in your case.

"Could not get Timeline data" when using Timeline Visualization with Comma IDE

After implementing the answer to this question on how to set up a script for time visualization in this project (which uses a small extension to the published Log::Timeline that allows me to set the logging file from the program itself), I still get the same error
12:18 Timeline connection error: Could not get timeline data: java.net.ConnectException: Conexión rehusada
(which means refused connection). I've also checked the created files, and they are empty, they don't receive anything. I'm using this to log:
class Events does Log::Timeline::Event['ConcurrentEA', 'App', 'Log'] { }
(as per the README.md file). It's probably the case that there's no such thing as a default implementation, as shown in the tests, but, in that case, what would be the correct way of making it print to the file and also connect to the timeline visualizer?
If you want to use the timeline visualization, leave the defaults for logging, commenting out any modification of the standard logging output. In my case:
#BEGIN {
# PROCESS::<$LOG-TIMELINE-OUTPUT>
# = Log::Timeline::Output::JSONLines.new(
# path => log-file
# )
#}
Not really sure if this would have happened if an output file is defined using an environment variable, but in any case, better to be on the safe side. You can use the output file when you eventually drop the script into production.

Cro::WebSocket::Client doesn't work

Created a websocket server with "cro sub".
Wrote this client:
use v6;
use Cro::WebSocket::Client;
constant WS-PORT = '20000';
constant WS-ADDRESS = 'localhost';
constant WS-PATH = 'chat';
constant WS-URL = 'ws://' ~ WS-ADDRESS ~ ':' ~ WS-PORT ~ '/' ~ WS-PATH;
constant TIMEOUT-TO-CONNECT = 5; # seconds
my $timeout;
my $connection-attempt;
await Promise.anyof(
$connection-attempt = Cro::WebSocket::Client.connect(WS-URL),
$timeout = Promise.in(TIMEOUT-TO-CONNECT));
if $timeout.status == Kept
{
say "* could not connect to server in ', TIMEOUT-TO-CONNECT, ' seconds";
exit 1;
}
if $connection-attempt.status != Kept
{
say "* error ", $connection-attempt.cause,
" when trying to connect to server";
exit 1;
}
my $connection = $connection-attempt.result;
my $peer = WS-ADDRESS ~ ':' ~ WS-PORT;
say '* connected with ', $peer;
my $counter = 0;
my $message-supplier = Supplier::Preserving.new;
my $has-message-to-send = $message-supplier.Supply;
$message-supplier.emit(1);
react
{
whenever $has-message-to-send
{
$counter++;
$connection.send($counter);
say "* ok, sent message ", $counter, " to server";
}
whenever $connection.messages -> $reply
{
say '* received reply=[' ~ $reply ~ '] from server';
$message-supplier.emit(1);
}
} # react
I see with tcpdump the response code 101 (switching protocols) from the server, but I don't see the message sent from the client to the server.
So, what am I doing wrong ?
Another question, shoudn't "$connection.send" return a Promise or something ? What if there's an error when sending ?
And another question: it seems the server only understands IPV6 addresses...how to make it understand IPV4 addresses ?
That's it, for now.
UPDATE
As per Takao's advice, changing
$connection.send($counter)
to
$connection.send($counter.Str)
solves the problem (though I tried it on another program, not this one).
Let's resolve this piece by piece.
Firstly, your code looks correct to me, except for a couple of tiny bits.
When I reproduced your code, it indeed did not work, so I tried it with cro trace . instead of cro run .. You can find info about that mode in official docs.
The alternative way is to just set CRO_TRACE=1 environment variable.
So during debug I saw this error:
[TRACE(anon 1)] Cro::HTTP::ResponseParser QUIT No applicable body serializer could be found for this message
As it says, the body you sent could not be serialized. So I looked into what are you sending: $counter. $counter in your code is Int, so we need to make it Str before, doing simple $counter.Str makes your example work.
Also, note that you are sending a message on every reply, and echo server (default one you created using cro stub) also sends a reply for every incoming message, so your example sends messages endlessly. To prevent that you may consider adding a condition under which you will no longer send things, but well, it is a test example anyway, so up to you.
As for your other questions:
Another question, shoudn't "$connection.send" return a Promise or something?
It should not, I'll write out some cro's architecture details to explain it next. As you may know from reading docs, cro pipeline is basically just a bunch of Cro::Transform-wrapped supplies. Inside of Cro::Websocket::Client::Connection, send method just sends a thing directly into Cro::Source of the whole pipeline, you cannot go wrong with a simple $supplier.emit($message)(the real implementation of this method looks very close to this line). The thing you bumped into occurred further in the pipeline. I am sure it is not a nice user experience to hide exceptions of such cases, so I'll consider making a patch to propagate the exception, so it'd be easier to catch(although you always can use debug mode).
it seems the server only understands IPV6 addresses...how to make it understand IPV4 addresses ?
I am not sure about that, please open a new question.

Trying to get node-webkit console output formatted on terminal

Fairly new to node-webkit, so I'm still figuring out how everything works...
I have some logging in my app:
console.log("Name: %s", this.name);
It outputs to the browser console as expected:
Name: Foo
But in the invoking terminal, instead I get some fairly ugly output:
[7781:1115/085317:INFO:CONSOLE(43)] ""Name: %s" "Foo"", source: /file/path/to/test.js (43)
The numerical output within the brackets might be useful, but I don't know how to interpret it. The source info is fine. But I'd really like the printed string to be printf-style formatted, rather than shown as individual arguments.
So, is there a way to get stdout to be formatted either differently, or to call a custom output function of some sort so I can output the information I want?
I eventually gave up, and wrapped console.log() with:
log = function() {
console.log(util.format.apply(this, arguments));
}
The actual terminal console output is done via RenderFrameHostImpl::OnAddMessageToConsole in chromium, with the prefix info being generated via LogMessage::Init() in the format:
[pid:MMDD/HHMMSS:severity:filename(line)]
The javascript console.log is implemented in console.cc, via the Log() function. The printf style formatting is being done at a higher level, so that by the time the Log() function (or similar) are called, they are only passed a single string.
It's not a satisfying answer, but a tolerable workaround.
I was looking to create a command line interface to go along side my UI and had a similar problem. Along with not logging the values as I wanted I also wanted to get rid of the [pid:MMDD/HHMMSS:severity:filename(line)] output prefix so I added the following:
console.log = function (d) {
process.stdout.write(d + '\n');
};
so that console logging was set back to stdout without extra details. Unfortunately also a work around.

Balanced Payments doesn't seem to work with Phonegap

We are not able to call balanced.card.create from a Phonegap application. This is reproduced in a stock Phonegap application here: https://github.com/kevg/phonegap-balanced. Full details are in the README.md on github, but the basic summary is:
For those not familiar with phonegap, the main page that loads is
index.html. This initializes phonegap in index.js. When the device is
ready, we will show a hidden DIV with a button named "Execute
Balanced." When you click this button, app.executeBalanced in index.js
will be called which prompts for the balanced marketplace URI, loads
balanced.js with $.getScript, and then calls balanced.card.create with
a test credit card.
The expected result is that callbackHandler is called or an exception
is caught. Instead, it seems the execution of the Javascript thread
disappears into balanced.card.create, never to return and without any
error.
Alrighty, I found the bug in balanced.js. So, in Phonegap, window.location.href returns something like file:///.../index.html. Balanced.js creates an iframe to something like https://js.balancedpayments.com/proxy#file
var src = proxy + "#" + encodeURIComponent(window.location.href);
https://github.com/balanced/balanced-js/blob/master/src/utils.js#L48
In the script returned in proxy.html (which I can't find on github), it does:
c.parentURL=decodeURIComponent(
window.location.hash.replace(/^#/,"")
).replace(/#.*$/,"")
c.parentDomain=c.parentURL.replace(/([^:]+:\/\/[^\/]+).*/,"$1")
The regex doesn't match because file: has three slashes. Now, at first, I thought I could just convert the regex to:
/([^:]+:\/+[^\/]+).*/
However, then there's another problem, because balanced does a security origin check on the match:
if (d.origin.toLowerCase() !== c.toLowerCase()) return !1;
However, the regex returns file:///firstcomponent, whereas event.origin does not include a host name for the file scheme, so these won't match even with a fixed regex.
I can't change anything in the script returned in the proxy response because if I load that from a domain other than balancedpayments.com, then the AJAX POST fails (return code 0 with a blank body). Therefore, the only thing I can control is the hash passed to the iframe.
However, since this regex is a replace, we can simply pass exactly what we know we need (we don't care that the regex is a no-op).
Therefore, the solution is to change L48 above to:
var src = proxy + "#" + encodeURIComponent("file://");
This works.