sqlQuery in R fails when called via source() [duplicate] - sql

The following, when copied and pasted directly into R works fine:
> character_test <- function() print("R同时也被称为GNU S是一个强烈的功能性语言和环境,探索统计数据集,使许多从自定义数据图形显示...")
> character_test()
[1] "R同时也被称为GNU S是一个强烈的功能性语言和环境,探索统计数据集,使许多从自定义数据图形显示..."
However, if I make a file called character_test.R containing the EXACT SAME code, save it in UTF-8 encoding (so as to retain the special Chinese characters), then when I source() it in R, I get the following error:
> source(file="C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "UTF-8")
Error in source(file = "C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "utf-8") :
C:\Users\Tony\Desktop\character_test.R:3:0: unexpected end of input
1: character.test <- function() print("R
2:
^
In addition: Warning message:
In source(file = "C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "UTF-8") :
invalid input found on input connection 'C:\Users\Tony\Desktop\character_test.R'
Any help you can offer in solving and helping me to understand what is going on here would be much appreciated.
> sessionInfo() # Windows 7 Pro x64
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_2.12.1
and
> l10n_info()
$MBCS
[1] FALSE
$`UTF-8`
[1] FALSE
$`Latin-1`
[1] TRUE
$codepage
[1] 1252

On R/Windows, source runs into problems with any UTF-8 characters that can't be represented in the current locale (or ANSI Code Page in Windows-speak). And unfortunately Windows doesn't have UTF-8 available as an ANSI code page--Windows has a technical limitation that ANSI code pages can only be one- or two-byte-per-character encodings, not variable-byte encodings like UTF-8.
This doesn't seem to be a fundamental, unsolvable problem--there's just something wrong with the source function. You can get 90% of the way there by doing this instead:
eval(parse(filename, encoding="UTF-8"))
This'll work almost exactly like source() with default arguments, but won't let you do echo=T, eval.print=T, etc.

We talked about this a lot in the comments to my previous post but I don't want this to get lost on page 3 of comments: You have to set the locale, it works with both input from the R-console (see screenshot in comments) as well as with input from file see this screenshot:
The file "myfile.r" contains:
russian <- function() print ("Американские с...");
The console contains:
source("myfile.r", encoding="utf-8")
> Error in source(".....
Sys.setlocale("LC_CTYPE","ru")
> [1] "Russian_Russia.1251"
russian()
[1] "Американские с..."
Note that the file-in fails and it points to the same character as the original poster's error (the one after "R). I can not do this with Chinese because i would have to install "Microsoft Pinyin IME 3.0", but the process is the same, you just replace the locale with "chinese" (the naming is a bit inconsistent, consult the documentation).

I think the problem lies with R. I can happily source UTF-8 files, or UCS-2LE files with many non-ASCII characters in. But some characters cause it to fail. For example the following
danish <- function() print("Skønt H. C. Andersens barndomsomgivelser var meget fattige, blev de i hans rige fantasi solbeskinnede.")
croatian <- function() print("Dodigović. Kako se Vi zovete?")
new_testament <- function() print("Ne provizu al vi trezorojn sur la tero, kie tineo kaj rusto konsumas, kaj jie ŝtelistoj trafosas kaj ŝtelas; sed provizu al vi trezoron en la ĉielo")
russian <- function() print ("Американские суда находятся в международных водах. Япония выразила серьезное беспокойство советскими действиями.")
is fine in both UTF-8 and UCS-2LE without the Russian line. But if that is included then it fails. I'm pointing the finger at R. Your Chinese text also appears to be too hard for R on Windows.
Locale seems irrelevant here. It's just a file, you tell it what encoding the file is, why should your locale matter?

For me (on windows) I do:
source.utf8 <- function(f) {
l <- readLines(f, encoding="UTF-8")
eval(parse(text=l),envir=.GlobalEnv)
}
It works fine.

Building on crow's answer, this solution makes RStudio's Source button work.
When hitting that Source button, RStudio executes source('myfile.r', encoding = 'UTF-8')), so overriding source makes the errors disappear and runs the code as expected:
source <- function(f, encoding = 'UTF-8') {
l <- readLines(f, encoding=encoding)
eval(parse(text=l),envir=.GlobalEnv)
}
You can then add that script to an .Rprofile file, so it will execute on startup.

I encounter this problem when a try to source a .R file containing some Chinese characters. In my case, I found that merely set "LC_CTYPE" to "chinese" is not enough. But setting "LC_ALL" to "chinese" works well.
Note that it's not enough to get encoding right when you read or write plain text file in Rstudio (or R?) with non-ASCII. The locale setting counts too.
PS. the command is Sys.setlocale(category = "LC_CTYPE",locale = "chinese"). Please replace locale value correspondingly.

On windows, when you copy-paste a unicode or utf-8 encoded string into a text-control that is set to single-byte-input (ascii... depending on locale), the unknown bytes will be replaced by questionmarks. If i take the first 4 characters of your string and copy-paste it into e.g. Notepad and then save it, the file becomes in hex:
52 3F 3F 3F 3F
what you have to do is find an editor which you can set to utf-8 before copy-pasting the text into it, then the saved file (of your first 4 characters) becomes:
52 E5 90 8C E6 97 B6 E4 B9 9F E8 A2 AB
This will then be recognized as valid utf-8 by [R].
I used "Notepad2" for trying this, but i am sure there are many more.

Related

Writing lines to a binary file

I'm further playing with Raku's CommaIDE and I wanna print a binary file line by line.
I've tried this, but it doesn't work:
for "G.txt".IO.lines -> $line {
say $_;
}
How shall I fix it ? It's obviously incorrect.
EDIT
this doesn't work either, see the snippet bellow
for "G.txt".IO.lines -> $line {
say $line;
}
You're showing us h.raku but Comma is giving you an error regarding c.raku, which is some other file in your Comma project.
It looks like you're working with a text file, not binary. Raku makes a clear distinction here: a text file is treated as text, regardless of encoding. If it's UTF-8, using .lines as you are now should work just fine because that's the default. If it's some other encoding, you can call .lines(:enc<some-other-encoding>). If it's truly binary, then the concept of "lines" really has no meaning, and you want something more like .slurp(:bin), which will give you a Buf[uint8] for working on the byte level.
The question specifically refers to reading a binary file, for which reading line-wise may (or may not) make sense--depending on the file.
Here's code to read a binary file straight from the docs (using class IO::CatHandle):
~$ raku -e '(my $f1 = "foo".IO).spurt: "A\nB\nC\n"; (my $f2 = "foo"); with IO::CatHandle.new: $f2 {.encoding: Nil; .slurp.say;};'
Buf[uint8]:0x<41 0A 42 0A 43 0A>
Compare to reading the file with default encoding (utf8):
~$ raku -e '(my $f1 = "foo".IO).spurt: "A\nB\nC\n"; (my $f2 = "foo"); with IO::CatHandle.new: $f2 {.slurp.say;};'
A
B
C
See:
https://docs.raku.org/routine/encoding
Note: the read method uses class IO::Handle which reads binary by default. So the code is simply:
~$ raku -e '(my $file1 = "foo".IO).spurt: "A\nB\nC\n"; my $file2 = "foo".IO; given $file2.open { .read.say; .close;};'
Buf[uint8]:0x<41 0A 42 0A 43 0A>
See:
https://docs.raku.org/type/IO::Handle#method_read
For further reading, see discussion of Perl5's <> diamond-operator-equivalent in Raku:
https://docs.raku.org/language/5to6-nutshell#while_until
...and some (older) mailing-list discussion of the same:
https://www.nntp.perl.org/group/perl.perl6.users/2018/11/msg6295.html
Finally, the docs refer to writing a mixed utf8/binary file here (useful for further testing):
https://docs.raku.org/routine/encoding#Examples

STM32 Atollic TrueSTUDIO - Graphical view of the Memory

I'm using Atollic TrueSTUDIO for STM32 as an Eclipse Based IDE to perform Digital Signal Processing on audio signal. I'm looking for a way to plot an array (16 bits audio samples) from RAM memory. For the moment I'm using :
The memory View
The SWV real time data time line
None of this tools are powerful to analyse signal on an array, and it doesn't have to be on real time : Just ploting an array after reaching a breakpoint.
Is there an Eclipse Plugin or some other ways to do this ?
I'm considering to export the RAM memory and in a file and plot it in Matlab, but it seems really inapropriate for such a simple thing.
Thanks for any advices
In Atollic, you can easily attach gdb commands to breakpoints. Doing this, allows you to automaticly dump any variables. Additionally, you can execute an external programm once afterwards, to plot the content of the dumped variable.
To do so, goto breakpoint properties and create a new action. Select "Debugger Command Action" and use dump binary value x.bin x to dump variable x to file x.bin
You can also start a python script from the breakpoint to plot the data.
Use an additonal "External Tool Action" and select your python location. Make sure to select your current Working Dictonary. Use the arguments to pass the full path of the python file. The following file will import a float array and plot it.
import struct
import numpy as np
import matplotlib.pyplot as plt
import os
def readBinaryDump(filename):
result = []
N=8
with open(filename,'rb') as f:
while(True):
b = f.read(4*N);
if len(b) ==0:
break
fn = "f"*N
result.append(struct.unpack(fn,b))
result = np.array(result)
return result.ravel()
plt.plot(readBinaryDump("x.bin"))
Don't forget to add the actions to the current breakpoint. Now once the breakpoint is reached, the variable should be dumped and plotted automaticly.
While it's surprising nothing could be embeded in Atollic/Eclipse, I followed the idea of writing an specific application. Here are the steps I used :
Dump Memory :
Debug your software
Stop on a BreakPoint
View > Memory > Export Button > Format : "Plain Text"
The file representing a sine wawe looks like this :
00 00 3E 00 7D 00 BC 00 FB 00 39 01 78 01
B7 01 F6 01 34 02 73 02 B2 02 F0 02 2F 03
You should read these int16 samples like this :
1. 0x0000
2. 0x003E
3. 0x007D
4. etc...
Write this Matlab script :
fileID = fopen('your_file','r');
samples = textscan(fileID,'%s')
fclose(fileID);
samples = samples{1};
words = strcat(samples(2:2:end,1), samples(1:2:end,1));
values = typecast(uint16(hex2dec(words)),'int16');
plot(values) ;
The sinus wave plotted in Matlab
While there aren't any Eclipse plugins that would do what you're asking for that I'm personally aware of, there's STM Studio whose main purpose is displaying variables in real-time. It parses your ELF file to get the available variables, so the effort to at least give it a try should be minimal.
It's available here: https://www.st.com/en/development-tools/stm-studio-stm32.html
ST-Link is required to run it.
Write the simple app in C#. Use semi hosting to dump the memory to the text file. Open it and display.
Recently I had problem with MEMS-es and this was written in less than one hour. IMO (In My Opinion) it is easier to write program which will visualize the data instead of wasting hours or days searching for the ready made one:

knitr 1.5 / patchDVI 1.9 doesn't seem to generate a concordance acceptable to evince + emacs

Setup : here is sessionInfo() :
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] patchDVI_1.9 knitr_1.5
loaded via a namespace (and not attached):
[1] compiler_3.0.2 evaluate_0.5.1 formatR_0.9 highr_0.2.1 stringr_0.6.2
[6] tcltk_3.0.2 tools_3.0.2
I am trying to get emacs and AucTeX to synchronize my .Rnw source file with evince to go to comiled text from source and back.
I have already checked that the synchronization works fine between a .tex source and a PDF.
My .Rnw file starts with :
\documentclass[a4paper,twoside,12pt]{article}
\synctex=1 %% Should force concordance generation
\pdfcompresslevel=0 %% Should force avoidance of PDF compression, which patchDVI does
\pdfobjcompresslevel=0 %% not handle
<<include=FALSE>>= %% Modificaton of what Sweave2kinitr does
## opts_chunk$set(concordance=TRUE, self.contained=TRUE) ## No possible effect
opts_knit$set(concordance=TRUE, self.contained=TRUE) ## Seems reasonable
#
%% \SweaveOpts{concordance=TRUE} %% That's where inspiration came from
Consider the following log (unrelevant parts edited) :
> options("knitr.concordance")
$knitr.concordance
[1] TRUE
> opts_knit$get("concordance")
[1] TRUE
> knit("IntroStat.Rnw")
processing file: IntroStat.Rnw
|...................... | 33%
ordinary text without R code
|........................................... | 67%
label: unnamed-chunk-1 (with options)
List of 1
$ include: logi FALSE
|.................................................................| 100%
ordinary text without R code
output file: IntroStat.tex
[1] "IntroStat.tex"
> system("pdflatex -synctex=1 IntroStat.tex")
[ Edited irrelevancies ]
SyncTeX written on IntroStat.synctex.gz.
Note : a concordance has *been* generated !!! **
Transcript written on IntroStat.log.
Let's do that again to fix references :
> system("pdflatex -synctex=1 IntroStat.tex")
[ Edited irrelevancies ]
Output written on IntroStat.pdf (1 page, 136907 bytes).
SyncTeX written on IntroStat.synctex.gz.
Note : a concordance has *been* generated *again* !!! **
Transcript written on IntroStat.log.
> patchDVI("IntroStat.pdf")
[1] "0 patches made. Did you set \\SweaveOpts{concordance=TRUE}?"
* This I do not understand *
> patchSynctex("IntroStat.synctex.gz")
[1] "0 patches made. Did you set \\SweaveOpts{concordance=TRUE}?"
* Ditto *
It appears that something in the set of tools does not work as advertized : either dviPatch does not recognize legal concordance \specials or pdflatex dfoes not generate them. It does generate something, however...
I checked that the resulting PDF enables evince to synchronize with the .tex file, but not in the .Rnw file. Furthermore, when the .Rnw file is open in emacs, starting the viewer with 'C-c C-v View" in AucTeX indeed starts the viewer (after requesting to open a server, which I authorize), but the viewers is empty, and i get this :
"TeX-evince-sync-view: Couldn't find the Evince instance for file:///home/charpent/Boulot/Cours/ODF/Chapitres/Ch3-StatMath/IntroStat.Rnw.pdf"
in the "Messages" buffer.
So we have a second problem here.
A third one would be to integrate all of this transparently in the AucTeX production chain, but this is another story...
I'd really like to keep emacs as my main tool for R/\LaTeX/Sage work, rather tha switch to RStudio, which probably won't like much SageTeX and othe various tools I need on a daily/weekly basis...
Any thoughts ?
Maybe this https://github.com/jan-glx/patchKnitrSynctex will help. I tried it on a simple file, and it does work.
As for the second and third problems, I have this script (note that I source the above code from jan-glx; modify path accordingly):
#!/bin/bash
FILE=$1
BASENAME=$(basename $FILE .Rnw)
Rscript -e 'library(knitr); opts_knit$set("concordance" = TRUE); knit("'$1'")'
pdflatex --synctex=1 --file-line-error --shell-escape "${1%.*}"
Rscript -e "source('~/Sources/patchKnitrSynctex.R'); patchKnitrSynctex('${1%.*}.tex')"
ln -s $BASENAME.synctex.gz $BASENAME.Rnw.synctex.gz
ln -s $BASENAME.pdf $BASENAME.Rnw.pdf
The links are my kludgy way of getting around the "Couldn't find the instance (...) ".
If you have your .Rnw in an Emacs buffer, go to a shell buffer, and call that script. When finished, C-c C-v from Emacs will open your configured PDF viewer (okular in my case). In the PDF, shift + left mouse click (okular at least) will bring you to the right place in the Emacs .Rnw buffer.
This is not ideal: if you jump to an error, it goest to the .tex, not the .Rnw. And I'd like to be able to invoke it via C-c C-c or similar (but I don't know how ---elisp ignorance).

Moose throwing an error when trying to build a distribution with Dist::Zilla

I'm having a bit of trouble with building a Dist::Zilla distribution. Every time I try and build it, or really do anything (i.e., test, smoke, listdeps, whatever it is), I get this error message:
Attribute name must be provided before calling reader at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/MooseX/LazyRequire/Meta/Attribute/Trait/LazyRequire.pm line 40
MooseX::LazyRequire::Meta::Attribute::Trait::LazyRequire::__ANON__('Dist::Zilla::Dist::Builder=HASH(0x53d06e0)') called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/x86_64-linux/Class/MOP/Mixin/AttributeCore.pm line 45
Class::MOP::Mixin::AttributeCore::default('Moose::Meta::Class::__ANON__::SERIAL::5=HASH(0x50c9c30)', 'Dist::Zilla::Dist::Builder=HASH(0x53d06e0)') called at reader Dist::Zilla::name (defined at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/Dist/Zilla.pm line 41) line 6
Dist::Zilla::name('Dist::Zilla::Dist::Builder=HASH(0x53d06e0)') called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/Dist/Zilla/Dist/Builder.pm line 264
Dist::Zilla::Dist::Builder::build_in('Dist::Zilla::Dist::Builder=HASH(0x53d06e0)', undef) called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/Dist/Zilla/Dist/Builder.pm line 315
Dist::Zilla::Dist::Builder::ensure_built_in('Dist::Zilla::Dist::Builder=HASH(0x53d06e0)') called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/Dist/Zilla/Dist/Builder.pm line 304
Dist::Zilla::Dist::Builder::ensure_built('Dist::Zilla::Dist::Builder=HASH(0x53d06e0)') called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/Dist/Zilla/Dist/Builder.pm line 322
Dist::Zilla::Dist::Builder::build_archive('Dist::Zilla::Dist::Builder=HASH(0x53d06e0)') called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/Dist/Zilla/App/Command/build.pm line 30
Dist::Zilla::App::Command::build::execute('Dist::Zilla::App::Command::build=HASH(0x4aeb7b0)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x4bdb150)', 'ARRAY(0x3873fc8)') called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/App/Cmd.pm line 231
App::Cmd::execute_command('Dist::Zilla::App=HASH(0x3c6f418)', 'Dist::Zilla::App::Command::build=HASH(0x4aeb7b0)', 'Getopt::Long::Descriptive::Opts::__OPT__::2=HASH(0x4bdb150)') called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/lib/site_perl/5.14.0/App/Cmd.pm line 170
App::Cmd::run('Dist::Zilla::App') called at /home/mxe/perl5/perlbrew/perls/perl-5.14.0/bin/dzil line 15
Or something along those lines. The contents of my dist.ini are as follows:
author = Gary Warman <email#host.com>
license = None ; this is an all rights reserved license
copyright_holder = Gary Warman
copyright_year = 2011
[ReadmeFromPod]
[#Filter]
-bundle = #Basic
-remove = Readme
[AutoPrereqs]
[OurPkgVersion] ; use this instead of [PkgVersion]
[PodWeaver]
[MetaNoIndex]
file = perlcritic.rc
[MetaJSON]
[NextRelease]
format = %-9v %{yyyy-MM-dd}d ; make Changes Spec happy
[#TestingMania]
disable = NoTabsTests ; TestSynopsis optional if synopsis is not perl or if it's a largely generated codebase
critic_config = perlcritic.rc
[ExtraTests]
[PodSpellingTests]
wordlist = Pod::Wordlist::hanekomu ;optional
spell_cmd = aspell list
[PruneFiles]
filenames = dist.ini
filenames = weaver.ini
[#Git]
[Git::NextVersion]
first_version = 0.1.0 ; use semantic versioning if you don't know what this means read: http://semver.org/ may switch to the semantic versioning plugin at some point.
[CheckChangesHasContent]
[Clean] ; optional, this cleans up directories upon running dzil release.
So, anyone here have any idea what's going on so I can resolve this?
Your dist.ini file must contain an assignment for name, which is what that horrid error is reporting. Insert, at the very very top of your file:
name = My-Awesome-Dist
...and it will work.

How to read text files transfered as binary

My code copies files from ftp (using text transfer mode) to local disk and then trys to process them.
All files contain only text and values are seperated using new line. Sometimes files are moved to this ftp using binary transfer mode and looks like this will mess up line-ends.
Using hex editor, I compared line ends depending the transfer mode used to send files to ftp:
using text mode: file endings are 0D 0A
using binary mode: file endings are 0D 0D 0A
Is it possible to modify my code so it could read files in both cases?
Code from job that illustrates my problem and shows how i'm reading file:
(here i use same file, that contains 14 rows of data)
int i;
container con;
container files = ["c:\\temp\\axa_keio\\ascii.txt", "c:\\temp\\axa_keio\\binary.txt"];
boolean purchLineFirstRow;
IO inFile;
;
for(i=1; i<=conlen(files); i++)
{
inFile = new AsciiIO(conpeek(files,i), "R");
inFile.inFieldDelimiter('\n');
con = inFile.read();
info(int2str(conlen(con)));
}
Files come from Unix system to Windows sytem.
Not sure but maybe the question could be: "Which inFieldDelimiter values should i use to read both Unix and Windows line ends?"
Use inRecordDelimiter:
inFile.inRecordDelimiter('\n');
instead of:
inFile.inFieldDelimiter('\n');
There may still be a dangling CR on the last field, you may wish remove this:
strRem(conpeek(con, conlen(con)), '\r')
See also: http://en.wikipedia.org/wiki/Line_endings