Where is the output of the regStats() function in gem5 stored? - gem5

Does the regStats() function record every tick? If so, where is the recorded data output? I only saw the configuration information of all aspects of gem5 in stats.txt.

Related

Complete Cryptocurrencies Data

I am working on cryptocurrencies blockchain data and I want data from the beginning of the time of that particular cryptocurrency. Is there any way to download complete block data in the Postgresql file?
https://blockchair.com/dumps is although offering this but they limit the download speed and number of downloading files. Moreover, I am also waiting for their reply. Meanwhile, I am finding some other ways or websites to download complete data of multiple cryptocurrencies in SQL format. I cannot download the .csv or .tsv file because it takes a lot of space on my laptop. Therefore, I want to use any other format (preferably .sql format)
This is depends on cryptocurrency, you have. I can suggest you, how to fetch data from Bitcoin-compatible crypto, i.e. Bitcoin, Emercoin, etc.
The cryptocurrency node (wallet) has JSON RPC API interface.Usinf this API, you retrieve all your data with following commands from this command list:
Get total block counter with command getblockcount.
Iterate block number from 0 to result of getblockcount. For each number, call getblockhash.
For result of getblockhash call getblock. This function provide transactions list, enclosed in this block.
For ech transaction (nested loop), call getrawtransaction. Hint: if you call getrawtransaction with 3rd argument "1", node automatically decodes transaction, and return you decoded transaction in JSON.
You can extract from a transaction vectors vin and vout, and upload all data into your SQL database.

How can I monitor multiple Statistics from different classes in Gem5 at the same time dynamiclly?

Which class in Gem5 has access to all the Stats from different objects?
Do the statistics of each object being returned to specific class continuously or these stats are collected just at the end of the simulation?
For example servicedByWrQ is a Scalar stat defined in the dram_ctrl.hh. On the other hand condPredicted is another Scalar stat which is defined in the bpred_unit.hh. How can I monitor these two statistics at the same time during the simulation not through the output file in Gem5?
My ultimate goal is to change the behavior other hardware units during the simulations such as branch prediction or cache replacement policy, etc. based on those statistics.

Can I estimate the time taken by BigQuery to run an export job?

I am creating a service that allows users to apply filters on bigquery data and export it as csv or json. Is there a way I can estimate the time, bigquery will take to export a set of rows.
Currently, I am recording the number of rows and the time it took to finish the export job. Then I take the average time of exporting a single row to estimate the time. But it is certainly not a linear problem.
Any suggestion on the prediction algorithm would be great too.
Unfortunately, there isn't a great way to predict how long the export would take.
There are a number of factors:
How many "shards" of data your table is broken up into. This is related to how compressible your data is and to some extent, how you've loaded your tables into bigquery. BigQuery will attempt to do extracts in parallel as long as you pass a 'glob' path as your extract destination (e.g. gs://foo/bar/baz*.csv).
The size of the table.
The number of concurrently running extract jobs. The higher the overall system load, the fewer resource that will be available to your extract job.
Since most of these factors aren't really under your control, the best practices are:
Always pass a glob path as your destination path so that bigquery can extract in parallel.
If your table is small, you can use tabledata.list to extract the data instead of export.
Note that there are a couple of open bugs with respect to extract performance that we're working on addressing.

Classify data using mahout

I'm new to Apache Mahout and working on a classsification problem.
The Problem states:
There exists a set of data in a text file and I need to fetch some or all of the data from the file depending upon the given span of time.
Span of time : Each record would have a Date of transaction.
So, time span would be calculated using the logic (Sys_Date - Transaction_Date).
Thus, output would vary depending upon whether data is required for last month / week / specific number of days.
How can this filtering be achieved using Apache Mahout.
This by itself does not sound like a machine learning problem at all. You want to put your data in a database of some kind and query for records in a date range. Then, you want to do something with that data. This is not something ML tools do.
I haven't been working properly with hadoop yet. But it seems to me that this video should help:
http://www.youtube.com/watch?v=KwW7bQRykHI&feature=player_embedded
After the filtering, you can use result in mahout (for solving the classification problem)

Caching of Map applications in Hadoop MapReduce?

Looking at the combination of MapReduce and HBase from a data-flow perspective, my problem seems to fit. I have a large set of documents which I want to Map, Combine and Reduce. My previous SQL implementation was to split the task into batch operations, cumulatively storing what would be the result of the Map into table and then performing the equivalent of a reduce. This had the benefit that at any point during execution (or between executions), I had the results of the Map at that point in time.
As I understand it, running this job as a MapReduce would require all of the Map functions to run each time.
My Map functions (and indeed any function) always gives the same output for a given input. There is simply no point in re-calculating output if I don't have to. My input (a set of documents) will be continually growing and I will run my MapReduce operation periodically over the data. Between executions I should only really have to calculate the Map functions for newly added documents.
My data will probably be HBase -> MapReduce -> HBase. Given that Hadoop is a whole ecosystem, it may be able to know that a given function has been applied to a row with a given identity. I'm assuming immutable entries in the HBase table. Does / can Hadoop take account of this?
I'm made aware from the documentation (especially the Cloudera videos) that re-calculation (of potentially redundant data) can be quicker than persisting and retrieving for the class of problem that Hadoop is being used for.
Any comments / answers?
If you're looking to avoid running the Map step each time, break it out as its own step (either by using the IdentityReducer or setting the number of reducers for the job to 0) and run later steps using the output of your map step.
Whether this is actually faster than recomputing from the raw data each time depends on the volume and shape of the input data vs. the output data, how complicated your map step is, etc.
Note that running your mapper on new data sets won't append to previous runs - but you can get around this by using a dated output folder. This is to say that you could store the output of mapping your first batch of files in my_mapper_output/20091101, and the next week's batch in my_mapper_output/20091108, etc. If you want to reduce over the whole set, you should be able to pass in my_mapper_output as the input folder, and catch all of the output sets.
Why not apply your SQL workflow in a different environment? Meaning, add a "processed" column to your input table. When time comes to run a summary, run a pipeline that goes something like:
map (map_function) on (input table filtered by !processed); store into map_outputs either in hbase or simply hdfs.
map (reduce function) on (map_outputs); store into hbase.
You can make life a little easier, assuming you are storing your data in Hbase sorted by insertion date, if you record somewhere timestamps of successful summary runs, and open the filter on inputs that are dated later than last successful summary -- you'll save some significant scanning time.
Here's an interesting presentation that shows how one company architected their workflow (although they do not use Hbase):
http://www.scribd.com/doc/20971412/Hadoop-World-Production-Deep-Dive-with-High-Availability