Code folding in RStudio: Creating hierarchy in the code - ide

I'm writing R scripts in RStudio and I use the code folding a lot. I found that you can see the hierarchy of the folding by pressing cmd + shift + O. This is super helpful.
# to my dear love ---------------------------------------------------------
2+2
# yo man ====
x.2 = function (x) {x+2}
### I do love potatoes ####
See the result by pressing cmd + shift + O.
I don't understand how this is working because when I write the code below, I can create a subsection without text but not when there is text in it (using # ==== but not # yo man ====).
# to my dear love ---------------------------------------------------------
2+2
# ====
# yo man ====
### I do love potatoes ####
x.2 = function (x) {x+2}
data = "here is some data"
See the result by pressing cmd + shift + O.
You can see that under # to my dear love --------------------------------------------------------- everything under is shifted to the right! This is cool!
The question is thus, how could it be possible to create a hierarchy of sections that include text in it?
Is it a peculiar package or Emac that is doing this? How can I create subsections, with text, and see the hierarchy in the cmd + shift + O box?
How can I down shift a section (going to a higher section (say section 2) to a lower section (section 1), by decreasing the visual hierarchy in the right box?

As per Chris's answer subheaders within functions
RStudio Code Folding hierarchy only works within function definitions and if-else structures. For example:
# Section 1 ----
a <- 1
testfunct1 <- function () {
# sect in function=====
b <- 2
c <- 3
}
# Section 2 #####
d <- 4
# Section 3 =======
e <- 5
testfunct2 <- function () {
# sect in function 2 =====
f <- 6
testsubfunct2_1 <- function () {
# sect in subfunction 2_1 -----
if (a == 1) {
# section in if ----
g < 7
} else {
# section in else ----
h = 8
}
}
}
# Section 4 ####
j <- 9
Produces this outline:
I don't know why the if-else section labels do not line up.

Just discovered I could use various special characters across the top of my keyboard in combination with hyphens to produce a hierarchical look to the code section ToC. I chose the asterisk for this example, but you can use anything from the special character keys across the top to produce this look.

Related

Clean up code and keep null values from crashing read.csv.sql

I am using read.csv.sql to conditionally read in data (my data set is extremely large so this was the solution I chose to filter it and reduce it in size prior to reading the data in). I was running into memory issues by reading in the full data and then filtering it so that is why it is important that I use the conditional read so that the subset is read in, versus the full data set.
Here is a small data set so my problem can be reproduced:
write.csv(iris, "iris.csv", row.names = F)
library(sqldf)
csvFile <- "iris.csv"
I am finding that the notation you have to use is extremely awkward using read.csv.sql the following is the how I am reading in the file:
# Step 1 (Assume these values are coming from UI)
spec <- 'setosa'
petwd <- 0.2
# Add quotes and make comma-separated:
spec <- toString(sprintf("'%s'", spec))
petwd <- toString(sprintf("'%s'", petwd))
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Species" in ($spec)'
and "Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
My main problem is that if any of the values above (from UI) are null then it won't read in the data properly, because this chunk of code is all hard coded.
I would like to change this into: Step 1 - check which values are null and do not filter off of them, then filter using read.csv.sql for all non-null values on corresponding columns.
Note: I am reusing the code from this similar question within this question.
UPDATE
I want to clear up what I am asking. This is what I am trying to do:
If a field, say spec comes through as NA (meaning the user did not pick input) then I want it to filter as such (default to spec == EVERY SPEC):
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
Since spec is NA, if you try to filter/read in a file matching spec == NA it will read in an empty data set since there are no NA values in my data, hence breaking the code and program. Hope this clears it up more.
There are several problems:
some of the simplifications provided in the link in the question were not followed.
spec is a scalar so one can just use '$spec'
petwd is a numeric scalar and SQL does not require quotes around numbers so just use $petwd
the question states you want to handle empty fields but not how so we have used csvfix to map them to -1 and also strip off quotes. (Alternately let them enter and do it in R. Empty numerics will come through as 0 and empty character fields will come through as zero length character fields.)
you can use [...] in place of "..." in SQL
The code below worked for me in both Windows and Ubuntu Linux with the bash shell.
library(sqldf)
spec <- 'setosa'
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file where [Species] = '$spec' and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix map -smq -fv "" -tv -1'
)
Update
Regarding the update at the end of the question it was clarified that the NA could be in spec as opposed to being in the data being read in and that if spec is NA then the condition involving spec should be regarded as TRUE. In that case just expand the SQL where condition to handle that as follows.
spec <- NA
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file
where ('$spec' == 'NA' or [Species] = '$spec') and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix echo -smq'
)
The above will return all rows for which Petal.Width is 0.2 .

Create 20 unique bingo cards

I'm trying to create 20 unique cards with numbers, but I struggle a bit.. So basically I need to create 20 unique matrices 3x3 having numbers 1-10 in first column, numbers 11-20 in the second column and 21-30 in the third column.. Any ideas? I'd prefer to have it done in r, especially as I don't know Visual Basic. In excel I know how to generate the cards, but not sure how to ensure they are unique..
It seems to be quite precise and straightforward to me. Anyway, i needed to create 20 matrices that would look like :
[,1] [,2] [,3]
[1,] 5 17 23
[2,] 8 18 22
[3,] 3 16 24
Each of the matrices should be unique and each of the columns should consist of three unique numbers ( the 1st column - numbers 1-10, the 2nd column 11-20, the 3rd column - 21-30).
Generating random numbers is easy, though how to make sure that generated cards are unique?Please have a look at the post that i voted for as an answer - as it gives you thorough explanation how to achieve it.
(N.B. : I misread "rows" instead of "columns", so the following code and explanation will deal with matrices with random numbers 1-10 on 1st row, 11-20 on 2nd row etc., instead of columns, but it's exactly the same just transposed)
This code should guarantee uniqueness and good randomness :
library(gtools)
# helper function
getKthPermWithRep <- function(k,n,r){
k <- k - 1
if(n^r< k){
stop('k is greater than possibile permutations')
}
v <- rep.int(0,r)
index <- length(v)
while ( k != 0 )
{
remainder<- k %% n
k <- k %/% n
v[index] <- remainder
index <- index - 1
}
return(v+1)
}
# get all possible permutations of 10 elements taken 3 at a time
# (singlerowperms = 720)
allperms <- permutations(10,3)
singlerowperms <- nrow(allperms)
# get 20 random and unique bingo cards
cards <- lapply(sample.int(singlerowperms^3,20),FUN=function(k){
perm2use <- getKthPermWithRep(k,singlerowperms,3)
m <- allperms[perm2use,]
m[2,] <- m[2,] + 10
m[3,] <- m[3,] + 20
return(m)
# if you want transpose the result just do:
# return(t(m))
})
Explanation
(disclaimer tl;dr)
To guarantee both randomness and uniqueness, one safe approach is generating all the possibile bingo cards and then choose randomly among them without replacements.
To generate all the possible cards, we should :
generate all the possibilities for each row of 3 elements
get the cartesian product of them
Step (1) can be easily obtained using function permutations of package gtools (see the object allPerms in the code). Note that we just need the permutations for the first row (i.e. 3 elements taken from 1-10) since the permutations of the other rows can be easily obtained from the first by adding 10 and 20 respectively.
Step (2) is also easy to get in R, but let's first consider how many possibilities will be generated. Step (1) returned 720 cases for each row, so, in the end we will have 720*720*720 = 720^3 = 373248000 possible bingo cards!
Generate all of them is not practical since the occupied memory would be huge, thus we need to find a way to get 20 random elements in this big range of possibilities without actually keeping them in memory.
The solution comes from the function getKthPermWithRep, which, given an index k, it returns the k-th permutation with repetition of r elements taken from 1:n (note that in this case permutation with repetition corresponds to the cartesian product).
e.g.
# all permutations with repetition of 2 elements in 1:3 are
permutations(n = 3, r = 2,repeats.allowed = TRUE)
# [,1] [,2]
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 2 1
# [5,] 2 2
# [6,] 2 3
# [7,] 3 1
# [8,] 3 2
# [9,] 3 3
# using the getKthPermWithRep you can get directly the k-th permutation you want :
getKthPermWithRep(k=4,n=3,r=2)
# [1] 2 1
getKthPermWithRep(k=8,n=3,r=2)
# [1] 3 2
Hence now we just choose 20 random indexes in the range 1:720^3 (using sample.int function), then for each of them we get the corresponding permutation of 3 numbers taken from 1:720 using function getKthPermWithRep.
Finally these triplets of numbers, can be converted to actual card rows by using them as indexes to subset allPerms and get our final matrix (after, of course, adding +10 and +20 to the 2nd and 3rd row).
Bonus
Explanation of getKthPermWithRep
If you look at the example above (permutations with repetition of 2 elements in 1:3), and subtract 1 to all number of the results you get this :
> permutations(n = 3, r = 2,repeats.allowed = T) - 1
[,1] [,2]
[1,] 0 0
[2,] 0 1
[3,] 0 2
[4,] 1 0
[5,] 1 1
[6,] 1 2
[7,] 2 0
[8,] 2 1
[9,] 2 2
If you consider each number of each row as a number digit, you can notice that those rows (00, 01, 02...) are all the numbers from 0 to 8, represented in base 3 (yes, 3 as n). So, when you ask the k-th permutation with repetition of r elements in 1:n, you are also asking to translate k-1 into base n and return the digits increased by 1.
Therefore, given the algorithm to change any number from base 10 to base n :
changeBase <- function(num,base){
v <- NULL
while ( num != 0 )
{
remainder = num %% base # assume K > 1
num = num %/% base # integer division
v <- c(remainder,v)
}
if(is.null(v)){
return(0)
}
return(v)
}
you can easily obtain getKthPermWithRep function.
One 3x3 matrix with the desired value range can be generated with the following code:
mat <- matrix(c(sample(1:10,3), sample(11:20,3), sample(21:30, 3)), nrow=3)
Furthermore, you can use a for loop to generate a list of 20 unique matrices as follows:
for (i in 1:20) {
mat[[i]] <- list(matrix(c(sample(1:10,3), sample(11:20,3), sample(21:30,3)), nrow=3))
print(mat[[i]])
}
Well OK I may fall on my face here but I propose a checksum (using Excel).
This is a unique signature for each bingo card which will remain invariate if the order of numbers within any column is changed without changing the actual numbers. The formula is
=SUM(10^MOD(A2:A4,10)+2*10^MOD(B2:B4,10)+4*10^MOD(C2:C4,10))
where the bingo numbers for the first card are in A2:C4.
The idea is to generate a 10-digit number for each column, then multiply each by a constant and add them to get the signature.
So here I have generated two random bingo cards using a standard formula from here plus two which are deliberately made to be just permutations of each other.
Then I check if any of the signatures are duplicates using the formula
=MAX(COUNTIF(D5:D20,D5:D20))
which shouldn't given an answer more than 1.
In the unlikely event that there were duplicates, then you would just press F9 and generate some new cards.
All formulae are array formulae and must be entered with CtrlShiftEnter
Here is an inelegant way to do this. Generate all possible combinations and then sample without replacement. These are permutations, combinations: order does matter in bingo
library(dplyr)
library(tidyr)
library(magrittr)
generate_samples = function(n) {
first = data_frame(first = (n-9):n)
first %>%
merge(first %>% rename(second = first)) %>%
merge(first %>% rename(third = first)) %>%
sample_n(20)
}
suffix = function(df, suffix)
df %>%
setNames(names(.) %>%
paste0(suffix))
generate_samples(10) %>% suffix(10) %>%
bind_cols(generate_samples(20) %>% suffix(20)) %>%
bind_cols(generate_samples(30) %>% suffix(30)) %>%
rowwise %>%
do(matrix = t(.) %>% matrix(3)) %>%
use_series(matrix)

Customizing new trading strategy in R using quantmod

I want to create a new custom TA-indicator to the stock symbol in R. But I have no idea about how to convert my SQL conditional strategy into R self-defined function and add it up to the ChartSeries in R.
The question are listed in the following code as the explanation.
library("quantmod")
library("FinancialInstrument")
library("PerformanceAnalytics")
library("TTR")
stock <- getSymbols("002457.SZ",auto.assign=FALSE,from="2012-11-26",to="2014-01-30")
head(stock)
chartSeries(stock, theme = "white", subset = "2013-07-01/2014-01-30",TA = "addSMA(n=5,col=\"gray\");addSMA(n=10,col=\"yellow\");
addSMA(n=20,col=\"pink\");addSMA(n=30,col=\"green\");addSMA(n=60,col=\"blue\");addVo()")
Question: How can I rewrite the code below to make it available as a function in R?
#Signal Design
#Today's volume is the lowset during the last 20 trading days
lowvolume <- VOL<=LLV(VOL,20);
#seveal moving average lines stick together
X1:=ABS(MA(C,10)/MA(C,20)-1)<0.01;
X2:=ABS(MA(C,5)/MA(C,10)-1)<0.01;
X3:=ABS(MA(C,5)/MA(C,20)-1)<0.01;
#If the follwing condition is satisfied, then the signal appears
MA(C,5)>REF(MA(C,5),1) AND X1 AND X2 AND X3 AND lowvolume;
#Convert the above SQL code into the following R custom function
VOLINE <- function(x) {
}
#Create a new TA function for the chartseries and then add it up.
addVoline <- newTA(FUN=VOLINE,
+ preFUN=Cl,
+ col=c(rep(3,6),
+ rep(”#333333”,6)),
+ legend=”VOLINE”)
I dont think you need sql in this case
Try this
require(quantmod)
# fetch the data
s <- get(getSymbols('yhoo'))
# add the indicators
s$ma5 <- SMA(Cl(s) ,5)
s$ma10 <- SMA(Cl(s) ,10)
s$ma20 <- SMA(Cl(s) ,20)
s$llv <- rollapply(Vo(s), 20, min)
# generate the signal
s$signal <- (s$ma10 / s$ma20 - 1 < 0.01 & s$ma5 / s$ma10 - 1 < 0.01 & s$ma5 / s$ma20 - 1 < 0.01 & Vo(s) == s$llv)
# draw
chart_Series(s)
add_TA(s$signal == 1, on = 1, col='red')
I'm not sure what REF means but i'm sure you can do that by your self.
This is the output (i cant seem to upload the photo but you see a chart with horizontal lines where signal eq 1)
Use the function as a wrapper for sqldf() in the sqldf package. The argument to sqldf() will be a select statement on the data frame that has the data.
A good tutorial for this can be found at Burns Statistics.

Read only n-th column of a text file which has no header with R and sqldf

I have a similiar problem like this question:
selecting every Nth column in using SQLDF or read.csv.sql
I want to read some columns of large files (table of 150rows, >500,000 columns, space separated, filled with numeric data and only a 32 bit system available). This file has no header, therefore the code in the thread above didn't work and I decided to write a new post.
Do you have an idea to solve this problem?
I thought about something like that, but any results with fread or read.table are also ok:
MyConnection <- file("path/file.txt")
df<-sqldf("select column 1 100 1000 235612 from MyConnection",file.format = list(header=F,sep=" "))
You can use substr to specify the start and end position of the columns you want to read in if they are fixed width:
x <- tempfile()
cat("12345", "67890", "09876", "54321", sep = "\n", file = x)
myfile <- file(x)
sqldf("select substr(V1, 1, 1) var1, substr(V1, 3, 5) var2 from myfile")
# var1 var2
# 1 1 345
# 2 6 890
# 3 9 76
# 4 5 321
See this blog post for some more examples. The "select" statement can easily be constructed with paste if you know the details about the column starting positions and widths.

Printing out a binary search tree with slashes

http://pastebin.com/dN9a9xfs
That's my code to print out the elements of a binary search tree. The goal is to display it in level order, with slashes connecting the parent to each child. So for instance, the sequence 15 3 16 2 1 4 19 17 28 31 12 14 11 0 would display after execution as:
15
/ \
3 16
/ \ \
2 4 19
/ \ / \
1 12 17 28
/ / \ \
0 11 14 31
I've been working on it for a long time now, but I just can't seem to get the spacing/indentation right. I know I wrote the proper algorithm for displaying the nodes in the proper order, but the slashes are just off. This is the result of my code as is: http://imgur.com/sz8l1
I know I'm so close to the answer, since my display is not that far off from what I need, and I have a feeling it's a really simple solution, but for some reason I just seem to get it right.
I'm out of time for now, but here's a quick version. I did not read your code (don't know C++), so I don't know how close our solutions are.
I changed the output format slightly. Instead of / for the left node, I used | so I didn't have to worry about left spacing at all.
15
| \
3 16
|\ \
2 4 19
| \ | \
1 | 17 28
| | \
0 12 31
| \
11 14
Here's the code. I hope you're able to take what you need from it. There are definitely some Pythonisms which I hope map to what you're using. The main idea is to treat each row of numbers as a map of position to node object, and at each level, sort the map by key and print them to the console iteratively based on their assigned position. Then generate a new map with positions relative to their parents in the previous level. If there's a collision, generate a fake node to bump the real node down a line.
from collections import namedtuple
# simple node representation. sorry for the mess, but it does represent the
# tree example you gave.
Node = namedtuple('Node', ('label', 'left', 'right'))
def makenode(n, left=None, right=None):
return Node(str(n), left, right)
root = makenode(
15,
makenode(
3,
makenode(2, makenode(1, makenode(0))),
makenode(4, None, makenode(12, makenode(11), makenode(14)))),
makenode(16, None, makenode(19, makenode(17),
makenode(28, None, makenode(31)))))
# takes a dict of {line position: node} and returns a list of lines to print
def print_levels(print_items, lines=None):
if lines is None:
lines = []
if not print_items:
return lines
# working position - where we are in the line
pos = 0
# line of text containing node labels
new_nodes_line = []
# line of text containing slashes
new_slashes_line = []
# args for recursive call
next_items = {}
# sort dictionary by key and put them in a list of pairs of (position,
# node)
sorted_pos_and_node = [
(k, print_items[k]) for k in sorted(print_items.keys())]
for position, node in sorted_pos_and_node:
# add leading whitespace
while len(new_nodes_line) < position:
new_nodes_line.append(' ')
while len(new_slashes_line) < position:
new_slashes_line.append(' ')
# update working position
pos = position
# add node label to string, as separate characters so list length
# matches string length
new_nodes_line.extend(list(node.label))
# add left child if any
if node.left is not None:
# if we're close to overlapping another node, push that node down
# by adding a parent with label '|' which will make it look like a
# line dropping down
for collision in [pos - i for i in range(3)]:
if collision in next_items:
next_items[collision] = makenode(
'|', next_items[collision])
# add the slash and the node to the appropriate places
new_slashes_line.append('|')
next_items[position] = node.left
else:
new_slashes_line.append(' ')
# update working position
len_num = len(node.label)
pos += len_num
# add some more whitespace
while len(new_slashes_line) < position + len_num:
new_slashes_line.append(' ')
# and take care of the right child
if node.right is not None:
new_slashes_line.append('\\')
next_items[position + len_num + 1] = node.right
else:
new_slashes_line.append(' ')
# concatenate each line's components and append them to the list
lines.append(''.join(new_nodes_line))
lines.append(''.join(new_slashes_line))
# do it again!
return print_levels(next_items, lines)
lines = print_levels({0: root})
print '\n'.join(lines)