Get summary data per tf.name_scope from TensorBoard - tensorflow

I've used with tf.name_scope(...) in my program to separate out a few different parts of the graph.
This works nicely in the TensorBoard graph view, I can view these functions as subgraphs. However, what I'd like to do is profile how much time (in total) was spent in each of these functions.
I've looked at the TraceViewer tool (https://www.tensorflow.org/guide/profiler#trace_viewer). I can zoom right in and see individual operations, each one have a long name which includes the name_scopes I have defined. I can see the "Wall duration" for each in the details pane.
But I can't see a way to get a summary of total CPU time spent per name_scope or any kind of summary of aggregate times (this is the sort of thing I'd expect to get from a more typical profiling tool in Python, time spent per function). Is there any way to get such a summary?
(I should say that my program is a TF-compiled state space model, not a Keras model. I've exported my trace "manually" with tf.summary.trace_export. So I can't see some of the profiling outputs that are shown on the TensorBoard docs pages. I can only see the Profile (Trace Viewer) and Graph pages in TensorBoard.)
Many thanks in advance for any advice.

Related

How to disable summary for Tensorflow Estimator?

I'm using Tensorflow-GPU 1.8 API on Windows 10. For many projects I use the tf.Estimator's, which really work great. It takes care of a bunch of steps including writting summaries for Tensorboard. But right now the 'events.out.tfevents' file getting way to big and I am running into "out of space" errors. For that reason I want to disable the summary writting or at least reduce the amount of summaries written.
Going along with that mission I found out about the RunConfig you can pass over at construction of tf.Estimator. Apparently the parameter 'save_summary_steps' (which by default is 200) controls the way summaries are wrtitten out. Unfortunately changing this parameter seems to have no effect at all. It won't disable (using None value) the summary or reducing (choosing higher values, e.g. 3000) the file size of 'events.out.tfevents'.
I hope you guys can help me out here. Any help is appreciated.
Cheers,
Tobs.
I've observed the following behavior. It doesn't make sense to me so I hope we get a better answer:
When the input_fn gets data from tf.data.TFRecordDataset then the number of steps between saving events is the minimum of save_summary_steps and (number of training examples divided by batch size). That means it does it a minimum of once per epoch.
When the input_fn gets data from tf.TextLineReader, it follows save_summary_steps as you'd expect and I can give it a large value for infrequent updates.

Remove image outputs from TensorBoard

My TensorBoard log files grow huge because – it seems – every image summary ever generated is stored. This even though in TensorBoard, it seems like I can only look at the most recent image. And I only need the most recent image anyway.
Is there a way to let TensorBoard know that I only need the latest iamge? I looked at the SummaryWriter API docs but there is no obvious flag.
Hi I work on TensorBoard. To the best of my knowledge, the logs are append only. However when TensorBoard loads them into memory, it uses reservoir sampling so they don't consume all your memory. In the future, we may be implementing a system that will reservoir sampling during the writing phase, or possibly, a tool for compressing logs so they only contain what TensorBoard needs.
While TensorBoard image dashboard only shows the most recent image at the moment, we'd be hesitant to write tools to remove the previous ones, since we may be extending the dashboard to show more than the most recent sample.

analysis Fitbit walking and sleeping data

I'm participating in small data analysis competition in our school.
We use Fitbit wearable devices, which is loaned to each participants by host of contest.
For 2 months during the contest, they walk and sleep with this small device 24/7,
allow it to gather data about participant's walk count with heart rate(bpm), etc.
and we need to solve some problems based on these participants' data
like, example,
show the relations between rainy days and participants' working out rate using the chart,
i think purpose of problem is,
because of rain, lot of participants are expected to be at home.
can you show some cause and effect numerically?
i'm now studying python library numpy, pandas with ipython notebook.
but still i have no idea about solving these problems..
could you recommend some projects or sites use for references? i really eager to win this competition.:(
and lastly, sorry for my poor English.
Thank you.
that's a fun project. I'm working on something kind of similar.
Here's what you need to do:
Learn the fitbit API and stream the data from the fitbit accelerometer and gyroscope. If you can combine this with heart rate data, great. The more types of data you have, the more effective your algorithm will be. You can store this data in a simple csv file (streaming the accel/gyro data at 50Hz is recommended). Or setup a web server and store it in a database for easy access
Learn how to use pandas and scikit learn
[optional but recommended]: Learn matplotlib so you can graph you data and get a feel for how it looks
Load the data into pandas and create features on the data - notably using 1-2 second sliding window analysis with 50% overlap. Good features include (for all three Accel X, Y, Z): max, min, standard deviation, root mean square, root sum square and tilt. Polynomials will help.
Since this is a supervised classification problem, you will need to create some labelled data - so do this manually (state 1 = rainy day, state 2 = non-rainy day) and then train a classification algorithm. I would recommend a random forest
Test using unlabeled data - don't forget to use cross validation
Voila, you now have a highly accurate model and will win the competition. Plus you've learned about a bunch of really cool Python and machine learning stuff.
For more tutorials on how all this stuff works, I'd highly recommend the Kaggle tutorial projects
BONUS: If you want to take it to a new level, you can start adding smoothers on top of your classifier, for example by using a Hidden Markov Model as explained in this talk
BONUS 2: Go get a PhD in Human Activity Recognition.

RRDtool what use are multiple RRAs?

I'm trying to implement rrdtool. I've read the various tutorials and got my first database up and running. However, there is something that I don't understand.
What eludes me is why so many of the examples I come across instruct me to create multiple RRAs?
Allow me to explain: Let's say I have a sensor that I wish to monitor. I will want to ultimately see graphs of the sensor data on an hourly, daily, weekly and monthly basis and one that spans (I'm still on the fence on this one) about 1.5 yrs (for visualising seasonal influences).
Now, why would I want to create an RRA for each of these views? Why not just create a database like this (stepsize=300 seconds):
DS:sensor:GAUGE:600:U:U \
RRA:AVERAGE:0.5:1:160000
If I understand correctly, I can then create any graph I desire, for any given period with whatever resolution I need.
What would be the use of all the other RRAs people tell me I need to define?
BTW: I can imagine that in the past this would have been helpful when computing power was more rare. Nowadays, with fast disks, high-speed interfaces and powerful CPUs I guess you don't need the kind of pre-processing that RRAs seem to be designed for.
EDIT:
I'm aware of this page. Although it explains about consolidation very clearly, it is my understanding that rrdtool graph can do this consolidation aswell at the moment the data is graphed. There still appears (to me) no added value in "harvest-time consolidation".
Each RRA is a pre-consolidated set of data points at a specific resolution. This performs two important functions.
Firstly, it saves on disk space. So, if you are interested in high-detail graphs for the last 24h, but only low-detail graphs for the last year, then you do not need to keep the high-detail data for a whole year -- consolidated data will be sufficient. In this way, you can minimise the amount of storage required to hold the data for graph generation (although of course you lose the detail so cant access it if you should want to). Yes, disk is cheap, but if you have a lot of metrics and are keeping low-resolution data for a long time, this can be a surprisingly large amount of space (in our case, it would be in the hundreds of GB)
Secondly, it means that the consolidation work is moved from graphing time to update time. RRDTool generates graphs very quickly, because most of the calculation work is already done in the RRAs at update time, if there is an RRA of the required configuration. If there is no RRA available at the correct resolution, then RRDtool will perform the consolidation on the fly from a high-granularity RRA, but this takes time and CPU. RRDTool graphs are usually generated on the fly by CGI scripts, so this is important, particularly if you expect to have a large number of queries coming in. In your example, using a single 5min RRA to make a 1.5yr graph (where 1pixel would be about 1 day) you would need to read and process 288 times more data in order to generate the graph than if you had a 1-day granularity RRA available!
In short, yes, you could have a single RRA and let the graphing work harder. If your particular implementation needs faster updates and doesnt care about slower graph generation, and you need to keep the detailed data for the entire time, then maybe this is a solution for you, and RRDTool can be used in this way. However, usually, people will optimise for graph generation and disk space, meaning using tiered sets of RRAs with decreasing granularity.

How should I objectively test my program results?

I have developed two differing methods in MATLAB which aim to analyse a pop song and then automatically create a 30 second audio thumbnail (a preview clip) containing part of the chorus section.
Both methods have varying results:
The first method can create a thumbnail for each track, managing to find a chorus section in 40 out of 50 tested songs
The second method only managed to work on 30 out of the 50 songs, and it found the chorus section 21 times out the 30.
Obviously I know which method is superior, but I need to describe and explain the results in a report which requires the demonstration of proper statistical testing.
Other academic papers have previously used an f-test to do this, but because their methods are vastly superior, their aims are usually involve the detection of chorus onset times with 100% accuracy.
My aim is more relaxed as I am just looking for the generated thumbnails to contain any part of the chorus, regardless of onset.
Can anyone suggest some objective tests that I could possibly explore with regards to my project? This is my first time conducting an investigation like this so my experience/knowledge is incredibly low.
Thank you!
Possibly, the way for you is formating your song track with time cuts for relevant information about type of sound(chorus, etc). In sound editor like CoolEdit, you can set time cuts and assign names for theirs like 'chorus', 'pause','music'... Then, you must extract cut information to import in Matlab. For Windows 32 can be used utility Wav2labs from http://www.pallier.org/ressources/wspot/sig2wav/toolswav.html; http://www.pallier.org/ressources/wspot/sig2wav/Wav2labs.exe This program extract cuts to text file and you can read with Matlab textscan function.
After all, only segmentation accuracy must be proceed, like percent time when signal type(chorus/not chorus) was recognized correctly
Or specify your question more exactly