How do I extract the coredata of a zoo object into a data frame? - dataframe

32I have several zoo ordered observations that contain data indexed by date. For example (HTML used for spacing):
A B
2014-10-5 60.272 84.019
2014-10-6 61.183 84.024
2014-10-7 61.611 84.010
A B
2014-10-5 61.376 84.028
2014-10-6 61.761 84.032
2014-10-7 62.210 84.025
A B
2014-10-5 61.159 84.006
2014-10-6 61.550 84.029
2014-10-7 61.996 84.024
I have 3 of these objects, say, x, y, and z, with different data in each.
I want to extract data (from last row) from these and put them into a non-zoo data frame because I no longer want the date indexing. Here is what I tried to do:
main.result <- data.frame(x[nrow(x),])
main.result <- data.frame(rbind(main.result, y[nrow(y),]))
main.result <- data.frame(rbind(main.result, z[nrow(z),]))
What is get is not what I expected in main.result:
A B
main.result 61.61163, 62.21080 84.02096, 84.02505
61.99627 84.02423
I should have gotten:
A B
61.611 84.010
62.210 84.025
61.996 84.024
I want to continue to add rows to the main.result matrix. Where am I jumping the track here?

SOLVED: Sorry, the formatting of the output for ‘main.result’ threw me off. The answer I am getting is correct. Please disregard this post!!

Related

How to apply a value from one row to all rows with the same ID in R

I currently have a large data set that is in long format. I am hoping to apply a value only given at baseline (repeat_instance = 0) to all follow up instances (repeat_instance = 1, 2, 3+) based on the record_id.
While I cannot share the actual data I have created a simplified example below to illustrate the quesiton.
record_id <- c(1,1,1,2,3,4,4,5,6,7,8,8,9,10,10,10)
repeat_instance <- c(0,1,2,0,0,0,1,0,0,0,0,1,0,0,1,2)
reason_for_visit <- c(1,NA,NA,1,2,1,NA,1,2,3,1,NA,1,1,NA,NA)
Current Format:
Desired Outcome:
I have seen solutions in Excel, however am not sure which formula may be useful in R.
We can use fill from tidyr
library(tidyr)
fill(df1, reason_for_visit)
data
df1 <- data.frame(record_id, repeat_instance, reason_for_visit)

Dendrograms with SciPy

I have a dataset that I shaped according to my needs, the dataframe is as follows:
Index A B C D ..... Z
Date/Time 1 0 0 0,35 ... 1
Date/Time 0,75 1 1 1 1
The total number of rows is 8878
What I try to do is create a time-series dendrogram (Example: Whole A column will be compared to whole B column in whole time).
I am expecting an output like this:
(source: rsc.org)
I tried to construct the linkage matrix with Z = hierarchy.linkage(X, 'ward')
However, when I print the dendrogram, it just shows an empty picture.
There is no problem if a compare every time point with each other and plot, but in that way, the dendrogram becomes way too complicated to observe even in truncated form.
Is there a way to handle the data as a whole time series and compare within columns in SciPy?

One ggplot from two data frames (1 bar each)

I was looking for an answer everywhere, but I just couldn't find one to this problem (maybe I was just too stupid to use other answers, because I'm new to R).
I have two data frames with different numbers of rows. I want to create a plot containing a single bar per data frame. Both should have the same length and the count of different variables should be stacked over each other. For example: I want to compare the proportions of gender in those to data sets.
t1<-data.frame(cbind(c(1:6), factor(c(1,2,2,1,2,2))))
t2<-data.frame(cbind(c(1:4), factor(c(1,2,2,1))))
1 represents male, 2 represents female
I want to create two barplots next to each other that represent, that the proportions of gender in the first data frame is 2:4 and in the second one 2:2.
My attempt looked like this:
ggplot() + geom_bar(aes(1, t1$X2, position = "fill")) + geom_bar(aes(1, t2$X2, position = "fill"))
That leads to the error: "Error: stat_count() must not be used with a y aesthetic."
First I should merge the two dataframes. You need to add a variable that will identify the origin of the data, add in both dataframes a column with an ID (like t1 and t2). Keep in mind that your columnames are the same in both frames so you will be able to use the function rbind.
t1$data <- "t1"
t2$data <- "t2"
t <- (rbind(t1,t2))
Now you can make the plot:
ggplot(t[order(t$X2),], aes(data, X2, fill=factor(X2))) +
geom_bar(stat="identity", position="stack")

VBA to find maximum value in a chart

I have a range of data columns A, B, and C. I have displayed as a line graph with B as the primary axis and C as the secondary axis. Column A is the category axis. I want to find the maximum value of column C and put a data callout on the point that is the maximum of column C and where column B occurs.
I know this sounds confusing. In this example, the maximum of Column C occurs at Point 27 (or 1.50% on the category axis). I would like a dot at point 27 for both Column B and C.
Column A is percentage from -5.00 to 10.00 incremented at .25%. Columns B and C are plotted against the change.
In the past I have done something similar, use a formula in column D to identify the largest number in Column C and B and make it a value high on your chart if the result is true.
Add Column D as a series to the chart.
Change the chart type on that series only to a scatter chart or something that puts points up there.
You can put a label on or simply put the amount showing above the plotted point.
You don't need VBA for this.
You might be interested to know I found a solution that works for me. First, I added columns D and E using the formula =IF(C2=MAX(C$2:C$62),C2,NA()) and =IF(C2=MAX(C$2:C$62),B2,NA()), this gave me the point on the graph for both lines B and C where B was maximum. I then formatted the graph so that these points had data callouts (a request from the client). Finally, I set columns D and E to have white font, to match the background so the appear invisible. I don't love this step, but I don't want the client to see the extra rows of #NA, etc.
The basic VBA for data callout is ActiveChart.FullSeriesCollection(5).Select
ActiveChart.SetElement (msoElementDataLabelCallout)
Where the series is 5 (column E) and I'm putting a data callout on the graphed point, which happens to be the maximum of column 3.

Identifying graphs in heap of connected nodes -- how is this called?

I have a SQL table with three columns X, Y, Z. I need to split it in groups in such a way that all records with same value of X or Y or Z are assigned to the same group. I need to make sure that the records with same value X or Y or Z are never split across multiple groups.
If you think of records as nodes and values of X, Y, Z as edges, this problem is the same as finding all graphs where the nodes in each graph will be connected directly or indirectly via X, Y, or Z-edge, but each graph will have no edges in common with other graphs (otherwise it would be part of the same graph).
A few years ago I knew what this was called and even remembered the algorithm but now it escapes me. Please tell me how this problem is called so I can Google for solution. If you now a good algorithm -- please point me to it. If you have a SQL implementation -- I will marry you :)
Example:
X Y Z BUCKET
--------- ---------------- --------- -----------
1 34 56 1
54 43 45 2
1 12 22 1
2 34 11 1
The last row is in bucket 1 because of the value of Y=34 which is the same as of the first row, which is in bucket 1.
It looks not like a graph, more like a simplicial complex.
But if we treat this complex as its skeletal graph (the numbers are treated as vertices and a row in a table means that all that three vertices are connected by an edge), then we may just use any algorithm to find connected components of this graph. I'm not sure whether there is a feasible way to do this in SQL though, perhaps it would be more prudent to use a graph database somehow.
However, for this specific problem there may be some easy solution attainable by means of SQL which I didn't look for.
to find how many nodes in each group x:
select x, count(x)
from mytable
group by x
or to find the list of sets x:
select distinct x from mytable;
Why don't you initially GROUP BY one of the colums (say X), make buckets, then do so for Y and Z, each time merging all the buckets from the previous step if you find new groups.
Repeat the process for X, Y, and Z until the buckets stop changing.
Are you working for linked-in or facebook? :)