How to connect points with different indices (one data file) in gnuplot - indexing

I have a file "a_test.dat" with two data blocks that I can select via the corresponding index.
# first
x1 y1
3 1
6 2
9 8
# second
x2 y2
4 5
8 2
2 7
Now I want to connect the data points of both indices with an arrow.
set arrow from (x1,y1) to (x2,y2).
I can plot both blocks with one plot statement. But I cannot get the points to set the arrows.
plot "a_test.dat" index "first" u 1:2, "" index "second" u 1:2

From version 5.2 you can use gnuplot arrays:
stats "a_test.dat" nooutput
array xx[STATS_records]
array yy[STATS_records]
# save all data into two arrays
i = 1
fnset(x,y) = (xx[i]=x, yy[i]=y, i=i+1)
# parse data ignoring output
set table $dummy
plot "" using (fnset($1,$2)) with table
unset table
# x2,y2 data starts at midpoint in array
numi = int((i-1)/2)
plot for [i=1:numi] $dummy using (xx[i]):(yy[i]):(xx[numi+i]-xx[i]):(yy[numi+i]-yy[i]) with vectors
Use stats to count the number of lines in the file, so that the array can
be large enough. Create an array xx and another yy to hold the data.
Use plot ... with table to read the file again, calling your function
fnset() for each data line with the x and y column values. The function
saves them at the current index i, which it increments. It was
initialised to 1.
For 3+3 data lines, i ends up at 7, so we set numi to (i-1)/2 i.e. 3.
Use plot for ... vectors to draw the arrows. Each arrow needs 4 data
items from the array. Note that the second x,y must be a relative delta,
not an absolute position.

Related

Index of data file in title

I have a Gnuplot data file broken into sections (2 blank lines). I wish to capture the indices I've selected to plot into the key of the plot, "plot 'datafile' index 5:10:1 us 7:8 ti XXX", where XXX is the index. I understand the the pseudocolumn, column(-2), contains the index value.
How do I capture the index and 'sprint' it into the title? The index is, of course, and integer.
What do you want to see as XXX? The index? Do you actually mean the indices from 5 to 10 as several key entries? As far as I know the title is only evaluated once per plot command. Hence, you have to put the plot command into a for loop.
Script:
### plotting several blocks with index as title
reset session
# create some random test data
set table $Data
set samples 10
do for [i=1:12] {
plot '+' u 1:(i+rand(0)) w table, \
'+' u ("") every ::::1 w table # two empty lines
}
unset table
set key out
plot for [i=5:10] $Data u 1:2 index i w l ti sprintf("index %d",i)
### end of script
Result:
Here is a relevant plot from the on-line demo collection:
plot #2 of 'iterate.dem'
The plot command that generated it is
splot for [i=1:*] "whale.dat" index i title sprintf("scan %d",i) with lines
If you do not want to extract all the available data segments you can adjust the iteration limits to match which set of index values you want to select.

One ggplot from two data frames (1 bar each)

I was looking for an answer everywhere, but I just couldn't find one to this problem (maybe I was just too stupid to use other answers, because I'm new to R).
I have two data frames with different numbers of rows. I want to create a plot containing a single bar per data frame. Both should have the same length and the count of different variables should be stacked over each other. For example: I want to compare the proportions of gender in those to data sets.
t1<-data.frame(cbind(c(1:6), factor(c(1,2,2,1,2,2))))
t2<-data.frame(cbind(c(1:4), factor(c(1,2,2,1))))
1 represents male, 2 represents female
I want to create two barplots next to each other that represent, that the proportions of gender in the first data frame is 2:4 and in the second one 2:2.
My attempt looked like this:
ggplot() + geom_bar(aes(1, t1$X2, position = "fill")) + geom_bar(aes(1, t2$X2, position = "fill"))
That leads to the error: "Error: stat_count() must not be used with a y aesthetic."
First I should merge the two dataframes. You need to add a variable that will identify the origin of the data, add in both dataframes a column with an ID (like t1 and t2). Keep in mind that your columnames are the same in both frames so you will be able to use the function rbind.
t1$data <- "t1"
t2$data <- "t2"
t <- (rbind(t1,t2))
Now you can make the plot:
ggplot(t[order(t$X2),], aes(data, X2, fill=factor(X2))) +
geom_bar(stat="identity", position="stack")

Create 20 unique bingo cards

I'm trying to create 20 unique cards with numbers, but I struggle a bit.. So basically I need to create 20 unique matrices 3x3 having numbers 1-10 in first column, numbers 11-20 in the second column and 21-30 in the third column.. Any ideas? I'd prefer to have it done in r, especially as I don't know Visual Basic. In excel I know how to generate the cards, but not sure how to ensure they are unique..
It seems to be quite precise and straightforward to me. Anyway, i needed to create 20 matrices that would look like :
[,1] [,2] [,3]
[1,] 5 17 23
[2,] 8 18 22
[3,] 3 16 24
Each of the matrices should be unique and each of the columns should consist of three unique numbers ( the 1st column - numbers 1-10, the 2nd column 11-20, the 3rd column - 21-30).
Generating random numbers is easy, though how to make sure that generated cards are unique?Please have a look at the post that i voted for as an answer - as it gives you thorough explanation how to achieve it.
(N.B. : I misread "rows" instead of "columns", so the following code and explanation will deal with matrices with random numbers 1-10 on 1st row, 11-20 on 2nd row etc., instead of columns, but it's exactly the same just transposed)
This code should guarantee uniqueness and good randomness :
library(gtools)
# helper function
getKthPermWithRep <- function(k,n,r){
k <- k - 1
if(n^r< k){
stop('k is greater than possibile permutations')
}
v <- rep.int(0,r)
index <- length(v)
while ( k != 0 )
{
remainder<- k %% n
k <- k %/% n
v[index] <- remainder
index <- index - 1
}
return(v+1)
}
# get all possible permutations of 10 elements taken 3 at a time
# (singlerowperms = 720)
allperms <- permutations(10,3)
singlerowperms <- nrow(allperms)
# get 20 random and unique bingo cards
cards <- lapply(sample.int(singlerowperms^3,20),FUN=function(k){
perm2use <- getKthPermWithRep(k,singlerowperms,3)
m <- allperms[perm2use,]
m[2,] <- m[2,] + 10
m[3,] <- m[3,] + 20
return(m)
# if you want transpose the result just do:
# return(t(m))
})
Explanation
(disclaimer tl;dr)
To guarantee both randomness and uniqueness, one safe approach is generating all the possibile bingo cards and then choose randomly among them without replacements.
To generate all the possible cards, we should :
generate all the possibilities for each row of 3 elements
get the cartesian product of them
Step (1) can be easily obtained using function permutations of package gtools (see the object allPerms in the code). Note that we just need the permutations for the first row (i.e. 3 elements taken from 1-10) since the permutations of the other rows can be easily obtained from the first by adding 10 and 20 respectively.
Step (2) is also easy to get in R, but let's first consider how many possibilities will be generated. Step (1) returned 720 cases for each row, so, in the end we will have 720*720*720 = 720^3 = 373248000 possible bingo cards!
Generate all of them is not practical since the occupied memory would be huge, thus we need to find a way to get 20 random elements in this big range of possibilities without actually keeping them in memory.
The solution comes from the function getKthPermWithRep, which, given an index k, it returns the k-th permutation with repetition of r elements taken from 1:n (note that in this case permutation with repetition corresponds to the cartesian product).
e.g.
# all permutations with repetition of 2 elements in 1:3 are
permutations(n = 3, r = 2,repeats.allowed = TRUE)
# [,1] [,2]
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 2 1
# [5,] 2 2
# [6,] 2 3
# [7,] 3 1
# [8,] 3 2
# [9,] 3 3
# using the getKthPermWithRep you can get directly the k-th permutation you want :
getKthPermWithRep(k=4,n=3,r=2)
# [1] 2 1
getKthPermWithRep(k=8,n=3,r=2)
# [1] 3 2
Hence now we just choose 20 random indexes in the range 1:720^3 (using sample.int function), then for each of them we get the corresponding permutation of 3 numbers taken from 1:720 using function getKthPermWithRep.
Finally these triplets of numbers, can be converted to actual card rows by using them as indexes to subset allPerms and get our final matrix (after, of course, adding +10 and +20 to the 2nd and 3rd row).
Bonus
Explanation of getKthPermWithRep
If you look at the example above (permutations with repetition of 2 elements in 1:3), and subtract 1 to all number of the results you get this :
> permutations(n = 3, r = 2,repeats.allowed = T) - 1
[,1] [,2]
[1,] 0 0
[2,] 0 1
[3,] 0 2
[4,] 1 0
[5,] 1 1
[6,] 1 2
[7,] 2 0
[8,] 2 1
[9,] 2 2
If you consider each number of each row as a number digit, you can notice that those rows (00, 01, 02...) are all the numbers from 0 to 8, represented in base 3 (yes, 3 as n). So, when you ask the k-th permutation with repetition of r elements in 1:n, you are also asking to translate k-1 into base n and return the digits increased by 1.
Therefore, given the algorithm to change any number from base 10 to base n :
changeBase <- function(num,base){
v <- NULL
while ( num != 0 )
{
remainder = num %% base # assume K > 1
num = num %/% base # integer division
v <- c(remainder,v)
}
if(is.null(v)){
return(0)
}
return(v)
}
you can easily obtain getKthPermWithRep function.
One 3x3 matrix with the desired value range can be generated with the following code:
mat <- matrix(c(sample(1:10,3), sample(11:20,3), sample(21:30, 3)), nrow=3)
Furthermore, you can use a for loop to generate a list of 20 unique matrices as follows:
for (i in 1:20) {
mat[[i]] <- list(matrix(c(sample(1:10,3), sample(11:20,3), sample(21:30,3)), nrow=3))
print(mat[[i]])
}
Well OK I may fall on my face here but I propose a checksum (using Excel).
This is a unique signature for each bingo card which will remain invariate if the order of numbers within any column is changed without changing the actual numbers. The formula is
=SUM(10^MOD(A2:A4,10)+2*10^MOD(B2:B4,10)+4*10^MOD(C2:C4,10))
where the bingo numbers for the first card are in A2:C4.
The idea is to generate a 10-digit number for each column, then multiply each by a constant and add them to get the signature.
So here I have generated two random bingo cards using a standard formula from here plus two which are deliberately made to be just permutations of each other.
Then I check if any of the signatures are duplicates using the formula
=MAX(COUNTIF(D5:D20,D5:D20))
which shouldn't given an answer more than 1.
In the unlikely event that there were duplicates, then you would just press F9 and generate some new cards.
All formulae are array formulae and must be entered with CtrlShiftEnter
Here is an inelegant way to do this. Generate all possible combinations and then sample without replacement. These are permutations, combinations: order does matter in bingo
library(dplyr)
library(tidyr)
library(magrittr)
generate_samples = function(n) {
first = data_frame(first = (n-9):n)
first %>%
merge(first %>% rename(second = first)) %>%
merge(first %>% rename(third = first)) %>%
sample_n(20)
}
suffix = function(df, suffix)
df %>%
setNames(names(.) %>%
paste0(suffix))
generate_samples(10) %>% suffix(10) %>%
bind_cols(generate_samples(20) %>% suffix(20)) %>%
bind_cols(generate_samples(30) %>% suffix(30)) %>%
rowwise %>%
do(matrix = t(.) %>% matrix(3)) %>%
use_series(matrix)

Gnuplot: How to load and display single numeric value from data file

My data file has this content
# data file for use with gnuplot
# Report 001
# Data as of Tuesday 03-Sep-2013
total 1976
case1 522 278 146 65 26 7
case2 120 105 15 0 0 0
case3 660 288 202 106 63 1
I am making a histogram from the case... lines using the script below - and that works. My question is: how can I load the grand total value 1976 (next to the word 'total') from the data file and either (a) store it into a variable or (b) use it directly in the title of the plot?
This is my gnuplot script:
reset
set term png truecolor
set terminal pngcairo size 1024,768 enhanced font 'Segoe UI,10'
set output "output.png"
set style fill solid 1.00
set style histogram rowstacked
set style data histograms
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
plot for [i=3:7] 'mydata.dat' every ::1 using i:xticlabels(1) with histogram \
notitle, '' every ::1 using 0:2:2 \
with labels \
title "My Title"
For the benefit of others trying to label histograms, in my data file, the column after the case label represents the total of the rest of the values on that row. Those total numbers are displayed at the top of each histogram bar. For example for case1, 522 is the total of (278 + 146 + 65 + 26 + 7).
I want to display the grand total somewhere on my chart, say as the second line of the title or in a label. I can get a variable into sprintf into the title, but I have not figured out syntax to load a "cell" value ("cell" meaning row column intersection) into a variable.
Alternatively, if someone can tell me how to use the sum function to total up 522+120+660 (read from the data file, not as constants!) and store that total in a variable, that would obviate the need to have the grand total in the data file, and that would also make me very happy.
Many thanks.
Lets start with extracting a single cell at (row,col). If it is a single values, you can use the stats command to extract the values. The row and col are specified with every and using, like in a plot command. In your case, to extract the total value, use:
# extract the 'total' cell
stats 'mydata.dat' every ::::0 using 2 nooutput
total = int(STATS_min)
To sum up all values in the second column, use:
stats 'mydata.dat' every ::1 using 2 nooutput
total2 = int(STATS_sum)
And finally, to sum up all values in columns 3:7 in all rows (i.e. the same like the previous command, but without using the saved totals) use:
# sum all values from columns 3:7 from all rows
stats 'mydata.dat' every ::1 using (sum[i=3:7] column(i)) nooutput
total3 = int(STATS_sum)
These commands require gnuplot 4.6 to work.
So, your plotting script could look like the following:
reset
set terminal pngcairo size 1024,768 enhanced
set output "output.png"
set style fill solid 1.00
set style histogram rowstacked
set style data histograms
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
# extract the 'total' cell
stats 'mydata.dat' every ::::0 using 2 nooutput
total = int(STATS_min)
plot for [i=3:7] 'mydata.dat' every ::1 using i:xtic(1) notitle, \
'' every ::1 using 0:(s = sum [i=3:7] column(i), s):(sprintf('%d', s)) \
with labels offset 0,1 title sprintf('total %d', total)
which gives the following output:
For linux and similar.
If you don't know the row number where your data is located, but you know it is in the n-th column of a row where the value of the m-th column is x, you can define a function
get_data(m,x,n,filename)=system('awk "\$'.m.'==\"'.x.'\"{print \$'.n.'}" '.filename)
and then use it, for example, as
y = get_data(1,"case2",4,"datafile.txt")
using data provided by user424855
print y
should return 15
It's not clear to me where your "grand total" of 1976 comes from. If I calculate 522+120+660 I get 1302 not 1976.
Anyway, here is a solution which works even without stats and sum which were not available in gnuplot 4.4.0.
In the data you don't necessarily need the "grand total" or the sum of each row, because gnuplot can calculate this for you. This is done by (not) plotting the file as a matrix, and at the same time summing up the rows in the string variable S0 and the total sum in variable Total. There will be a warning warning: matrix contains missing or undefined values which you can ignore. The labels are added by plotting '+' ... with labels extracting the desired values from the S0 string.
Data: SO18583180.dat
So, the reduced input data looks like this:
# data file for use with gnuplot
# Report 001
# Data as of Tuesday 03-Sep-2013
case1 278 146 65 26 7
case2 105 15 0 0 0
case3 288 202 106 63 1
Script: (works for gnuplot>=4.4.0, March 2010 and gnuplot 5.x)
### histogram with sums and total sum
reset
FILE = "SO18583180.dat"
set style histogram rowstacked
set style data histograms
set style fill solid 0.8
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
set key top left noautotitle
set grid y
set xrange [0:2]
set offsets 0.5,0.5,0,0
Total = 0
S0 = ''
addSums(v) = S0.sprintf(" %g",(M=$2,(N=$1+1)==1?S1=0:0,S1=S1+v))
plot for [i=2:6] FILE u i:xtic(1) notitle, \
'' matrix u (S0=addSums($3),Total=Total+$3,NaN) w p, \
'+' u 0:(real(S2=word(S0,int($0*N+N)))):(S2) every ::::M w labels offset 0,0.7 title sprintf("Total: %g",Total)
### end of script
Result: (created with gnuplot 4.4.0, Windows terminal)

GNUPLOT: dot plot with data depending dot size

I am trying plot data sets consisting of 3 coordinates:
X-coordinate, x-coordinate and the number of occurrences.
example:
1 2 10
3 1 2
3 2 1
I would like to draw for every line a dot at x,y with a diameter which is depending on the third value.
Is that possible with Gnuplot?
Create a 2D plot with variable point size. See the demo.
Example:
plot 'dataFile.dat' u 1:2:3 w points lt 1 pt 10 ps variable
This is basically equivalent to the existing answer, just shorter:
plot 'dataFile.dat' with circles
Credit: Gnuplot: plot with circles of a defined radius