openAcc how to profile - gpu

Hi I was using CAPS OpenACC compilers, but something strage happens when I try to get some preliminary profile results.
At first, I ran the code with declaring HMPPRT_LOG_LEVEL="info", which generates some profile results with time stamp.
[ 2.612337] ( 0) INFO : Upload edgelengths[0:129600] (element_size=8, queue=none, location=gravity_openacc.c:50)
[ 2.613485] ( 0) INFO : Call __hmpp_acc_region__2ha750yb (queue=none, location=gravity_openacc.c:50)
[ 2.614367] ( 0) INFO : Free edgelengths[0:129600] (element_size=8, queue=none, location=gravity_openacc.c:50)
So I guess the kernel execution time is calculated as 2.614367-2.613485=0.000882 s.
But when I declaring the CUDA_PROFILE=1, the below profile is shown
method=[ __hmpp_acc_region__2ha750yb_parallel_region_1 ] gputime=[ 492.480 ] cputime=[ 13.000 ] occupancy=[ 0.250 ]
So I'm quite confused about these two results, which is true???
Anyone get some solutions?
Thanks!

The CUDA profiler shows you just the time it takes to execute the CUDA kernel, while the log you obtain with HMPPRT_LOG_LEVEL="info" gives you the overall time it takes to execute the region, which is not exactly the same thing, because you may have some code that is executed on the host for example.

Related

Do we have an API to get Test Cycle Summary in Qtest?

Do we have an API in Qtest that can provide summary of test cycle execution ?
E.g. Passed: 23 Failed: 7 Unexecuted: 10 Running: 2
Need this data for generating report in our consolidated reporting tool along with data from some other sources.
Nothing that gives exactly what you ask for, but you could use the API calls below to create it yourself.
You can get the status of all test runs in a project using
GET /api/v3/projects/{projectId}/test-runs/execution-statuses
Or, to get results from a specific test cycle, first find all the test runs in that cycle using
/api/v3/projects/{projectId}/test-runs?parentId={testCycleID}&parentType=test-cycle
(append &expand=descendants to find test runs in containers under the test cycle)
and then get the results of each run individually using
/api/v3/projects/{projectId}/test-runs/{testRunId}/test-logs/last-run
See https://qtest.dev.tricentis.com/

Spark structured streaming groupBy not working in append mode (works in update)

I'm trying to get a streaming aggregation/groupBy working in append output mode, to be able to use the resulting stream in a stream-to-stream join. I'm working on (Py)Spark 2.3.2, and I'm consuming from Kafka topics.
My pseudo-code is something like below, running in a Zeppelin notebook
orderStream = spark.readStream().format("kafka").option("startingOffsets", "earliest").....
orderGroupDF = (orderStream
.withWatermark("LAST_MOD", "20 seconds")
.groupBy("ID", window("LAST_MOD", "10 seconds", "5 seconds"))
.agg(
collect_list(struct("attra", "attrb2",...)).alias("orders"),
count("ID").alias("number_of_orders"),
sum("PLACED").alias("number_of_placed_orders"),
min("LAST_MOD").alias("first_order_tsd")
)
)
debug = (orderGroupDF.writeStream
.outputMode("append")
.format("memory").queryName("debug").start()
)
After that, I would expected that data appears on the debug query and I can select from it (after the late arrival window of 20 seconds has expired. But no data every appears on the debug query (I waited several minutes)
When I changed output mode to update the query works immediately.
Any hint what I'm doing wrong?
EDIT: after some more experimentation, I can add the following (but I still don't understand it).
When starting the Spark application, there is quite a lot of old data (with event timestamps << current time) on the topic from which I consume. After starting, it seems to read all these messages (MicroBatchExecution in the log reports "numRowsTotal = 6224" for example), but nothing is produced on the output, and the eventTime watermark in the log from MicroBatchExecution stays at epoch (1970-01-01).
After producing a fresh message onto the input topic with eventTimestamp very close to current time, the query immediately outputs all the "queued" records at once, and bumps the eventTime watermark in the query.
What I can also see that there seems to be an issue with the timezone. My Spark programs runs in CET (UTC+2 currently). The timestamps in the incoming Kafka messages are in UTC, e.g "LAST__MOD": "2019-05-14 12:39:39.955595000". I have set spark_sess.conf.set("spark.sql.session.timeZone", "UTC"). Still, the microbatch report after that "new" message has been produced onto the input topic says
"eventTime" : {
"avg" : "2019-05-14T10:39:39.955Z",
"max" : "2019-05-14T10:39:39.955Z",
"min" : "2019-05-14T10:39:39.955Z",
"watermark" : "2019-05-14T10:35:25.255Z"
},
So the eventTime somehow links of with the time in the input message, but it is 2 hours off. The UTC difference has been subtraced twice. Additionally, I fail to see how the watermark calculation works. Given that I set it to 20 seconds, I would have expected it to be 20 seconds older than the max eventtime. But apparently it is 4 mins 14 secs older. I fail to see the logic behind this.
I'm very confused...
It seems that this was related to the Spark version 2.3.2 that I used, and maybe more concretely to SPARK-24156. I have upgraded to Spark 2.4.3 and here I get the results of the groupBy immediately (well, of course after the watermark lateThreshold has expired, but "in the expected timeframe".

cudaError_t 1 : "__global__ function call is not configured" returned from 'cublasCreate(&handle_)'

I run ASR experiment using Kaldi on SGE cluster consisting of two workstation with TITAN XP.
And randomly I meet the following problem:
ERROR (nnet3-train[5.2.62~4-a2342]:FinalizeActiveGpu():cu-device.cc:217) cudaError_t 1 : "__global__ function call is not configured" returned from 'cublasCreate(&handle_)'
I guess something is wrong with GPU driver or hardware.
Could you please offer some help?
And here is the complete log
I had similar issue in running darknet in one of the TX2
with reference to
https://blog.csdn.net/JIEJINQUANIL/article/details/103091537
enter the root by
sudo su
Then source the catkin_ws
Then launch the darkent.
Then can run.
Here is my result
Hope you can solve it by similar method

How do you use the benchmark flags for the go (golang) gocheck testing framework?

How does one use the flag options for benchmarks with the gocheck testing framework? In the link that I provided it seems to be that the only example they provide is by running go test -check.b, however, they do not provide additional comments on how it works so its hard to use it. I could not even find the -check in the go documentation when I did go help test nor when I did go help testflag. In particular I want to know how to use the benchmark testing framework better and control how long it runs for or for how many iterations it runs for etc etc. For example in the example they provide:
func (s *MySuite) BenchmarkLogic(c *C) {
for i := 0; i < c.N; i++ {
// Logic to benchmark
}
}
There is the variable c.N. How does one specify that variable? Is it through the actual program itself or is it through go test and its flags or the command line?
On the side note, the documentation from go help testflag did talk about -bench regex, benchmem and benchtime t options, however, it does not talk about the -check.b option. However I did try to run these options as described there but it didn't really do anything I could notice. Does gocheck work with the original options for go test?
The main problem I see is that there is no clear documentation for how to use the gocheck tool or its commands. I accidentally gave it a wrong flag and it threw me a error message suggesting useful commands that I need (which limited description):
-check.b=false: Run benchmarks
-check.btime=1s: approximate run time for each benchmark
-check.f="": Regular expression selecting which tests and/or suites to run
-check.list=false: List the names of all tests that will be run
-check.v=false: Verbose mode
-check.vv=false: Super verbose mode (disables output caching)
-check.work=false: Display and do not remove the test working directory
-gocheck.b=false: Run benchmarks
-gocheck.btime=1s: approximate run time for each benchmark
-gocheck.f="": Regular expression selecting which tests and/or suites to run
-gocheck.list=false: List the names of all tests that will be run
-gocheck.v=false: Verbose mode
-gocheck.vv=false: Super verbose mode (disables output caching)
-gocheck.work=false: Display and do not remove the test working directory
-test.bench="": regular expression to select benchmarks to run
-test.benchmem=false: print memory allocations for benchmarks
-test.benchtime=1s: approximate run time for each benchmark
-test.blockprofile="": write a goroutine blocking profile to the named file after execution
-test.blockprofilerate=1: if >= 0, calls runtime.SetBlockProfileRate()
-test.coverprofile="": write a coverage profile to the named file after execution
-test.cpu="": comma-separated list of number of CPUs to use for each test
-test.cpuprofile="": write a cpu profile to the named file during execution
-test.memprofile="": write a memory profile to the named file after execution
-test.memprofilerate=0: if >=0, sets runtime.MemProfileRate
-test.outputdir="": directory in which to write profiles
-test.parallel=1: maximum test parallelism
-test.run="": regular expression to select tests and examples to run
-test.short=false: run smaller test suite to save time
-test.timeout=0: if positive, sets an aggregate time limit for all tests
-test.v=false: verbose: print additional output
is writing wrong commands the only way to get some help with this tool? it doesn't have a help flag or something?
I'm 5 years late, but to specify how many N times to run. Use the option -benchtime Nx.
Example:
go test -bench=. -benchtime 100x
BenchmarkTest 100 ... ns/op
Please read more about all go testing flags here.
see the Description_of_testing_flags:
-bench regexp
Run benchmarks matching the regular expression.
By default, no benchmarks run. To run all benchmarks,
use '-bench .' or '-bench=.'.
-check.b works the same way as -test.bench.
E.g. to run all benchmarks:
go test -check.b=.
to run a specific benchmark:
go test -check.b=BenchmarkLogic
more information about testing in Go can be found here

Determine actual errors from a load job

Using the Java SDK I am creating a load job for just a single record with a fairly complicated schema. When monitoring the status of the load job, it takes a surprisingly long time (but perhaps this is due to working out the schema), but then says:
11:21:06.975 [main] INFO xxx.GoogleBigQuery - Job status (21694ms) create_scans_1384744805079_172221126: DONE
11:24:50.618 [main] ERROR xxx.GoogleBigQuery - Job create_scans_1384744805079_172221126 caused error (invalid) with message
Too many errors encountered. Limit is: 0.
11:24:50.810 [main] ERROR xxx.GoogleBigQuery - {
"message" : "Too many errors encountered. Limit is: 0.",
"reason" : "invalid"
?}
BTW - how do I tell the job that it can have more than zero errors using Java?
This load job does not appear in the list of recent jobs in the console, and as far as I can see, none of the Java objects contains any more details about the actual errors encountered. So how can I pro-grammatically find out what is going wrong? All I can find is:
if (err != null) {
log.error("Job {} caused error ({}) with message\n{}", jobID, err.getReason(), err.getMessage());
try {
log.error(err.toPrettyString());
}
...
In general I am having a difficult time finding good documentation for some of these things and am working it out by trial and error and short snippets of code found on here and older groups. If there is a better source of information than the getting started guides, then I would appreciate any pointers to that information. The Javadoc does not really help and I cannot find any complete examples of loading, querying, testing for errors, cataloging errors and so on.
This job is submitted via a NEWLINE_DELIMITIED_JSON record, supplied to the job via:
InputStream dummy = getClass().getResourceAsStream("/googlebigquery/xxx.record");
final InputStreamContent jsonIn = new InputStreamContent("application/octet-stream", dummy);
createTableJob = bigQuery.jobs().insert(projectId, loadJob, jsonIn).execute();
My authentication and so on seems to work correctly as separate Java code to list the projects, and the datasets in the project all works correctly. So I just need help in working what the actual error is - does it not like the schema (I have records nested within records for instance), or does it think that there is an error in the data I am submitting.
Thanks in advance for any help. The job number cited above is an actual failed load job if that helps any Google staffers who might read this.
It sounds like you have a couple of questions, so I'll try to address them all.
First, the way to get the status of the job that failed is to call jobs().get(jobId), which returns a job object that has an errorResult object that has the error that caused the job to fail (e.g. "too many errors"). The errorStream list is a lost of all of the errors on the job, which should tell you which lines hit errors.
Note if you have the job id, it may be easier to use bq to lookup the job -- you can run bq show <job_id> to get the job error information. If you add the --format=prettyjson it will print out all of the information in the job.
A hint you also might want to consider is to supply your own job id when you create the job -- then even if there is an error starting the job (i.e. the insert() call fails, perhaps due to a network error) you can look up the job to see what actually happened.
To tell BigQuery that some errors are allowed during import, you can use the maxBadResults setting in the load job. See https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#getMaxBadRecords().