Time Based Profiling with perf record - performancecounter

Would you inform me how can I get the value of event counters at each sample when using perf record?
For example:
perf record -F 100 -e instructions ./program
With perf report, I only see overhead. But I want to know if there are ways to view the value of the counter at each sample...

I will answer my own question in case someone else is facing the same issue.
In order to view the time at which sample is taken and the value of the counter at that time, you can use perf script after executing perf record.

Related

Crux dataset Bigquery - Query for Min/Avg/Max LCP, FID and CLS

I have been exploring the Crux dataset in big query for last 10 days to extract data for data studio report. Though I consider myself good at SQL, as I have mostly worked with oracle and SQL server, I am finding it very hard to write queries against this dataset. I started from this article by Rick Viscomi, explored the queries on his github repo but still unable to figure it out.
I am trying to use the materialized table chrome-ux-report.materialized.metrics_summary to get some of the metrics but I am not sure if the Min/Avg/Max lcp (in milliseconds) for a time period (month for example) could be extracted from this table. What other queries could I possibly try which requires less data processing. (Some of the queries that I tried expired my free TB of data processing on big query).
Any suggestion, advise solution, queries are more than welcome since the documentation about the structure of the dataset and queries against it is not very clear.
For details about the fields used on the report you can check on the main documentation for the chrome ux report specially on the last part with data format which shows the dimensions and how its interpreted as show below:
Dimension
origin "https://example.com"
effective_connection_type.name 4G
form_factor.name "phone"
first_paint.histogram.start 1000
first_paint.histogram.end 1200
first_paint.histogram.density 0.123
For example, the above shows a sample record from the Chrome User Experience Report, which indicates that 12.3% of page loads had a “first paint time” measurement in the range of 1000-1200 milliseconds when loading “http://example.com” on a “phone” device over a ”4G”-like connection. To obtain a cumulative value of users experiencing a first paint time below 1200 milliseconds, you can add up all records whose histogram’s “end” value is less than or equal to 1200.
For the metrics, in the initial link there is a section called methodology where you can get information about the metrics and dimensions of the report. I recommend going to the actual origin source table per country and per site and not the summary as the data you are looking for can be obtained there. In the Bigquery part of the documentation you will find samples of how to query those tables. I find this relatable:
SELECT
SUM(bin.density) AS density
FROM
`chrome-ux-report.chrome_ux_report.201710`,
UNNEST(first_contentful_paint.histogram.bin) AS bin
WHERE
bin.start < 1000 AND
origin = 'http://example.com'
In the example above we’re adding all of the density values in the FCP histogram for “http://example.com” where the FCP bin’s start value is less than 1000 ms. The result is 0.7537, which indicates that ~75.4% of page loads experience the FCP in under a second.
About query estimation cost, you can see estimating query cost guide on google official bigquery documentation. But using this tables due to its nature consumes a lot of processing so filter it as much as possible.

BigQuery Count Appears to be Processing Data

I noticed that running a SELECT count(*) FROM myTable on my larger BQ tables yields long running times, upwards of 30/40 seconds despite the validator claiming the query processes 0 bytes. This doesn't seem quite right when 500 GB queries run faster. Additionally, total row counts are listed under details -> Table Info. Am I doing something wrong? Is there a way to get total row counts instantly?
When you run a count BigQuery still needs to allocate resources (such as: slot units, shards etc). You might be reaching some limits which cause a delay. For example, the slots default per project is 2,000 units.
BigQuery execution plan provides very detail information about the process which can help you better understand the source of the delay.
One way to overcome this is to use an approximate method described in this link
This Slide by Google might also help you
For more details see this video about how to understand the execution plan

Confirmation of how to calculate bigquery query costs

I want to double check what I need to look at when assessing query costs for BigQuery. I've found the quoted price per TB here which says $5 per TB, but for precisely 1 TB of what? I have been assuming up until now (before it seemed to matter) that the relevant number would be that which the BigQuery UI outputs above the results, so for this example query:
...in this case 2.34GB. So as a fraction of a terabyte and multiplied by $5 this would cost around 1.2cents assuming I'd used up my allowance for the month.
Can anyone confirm that I'm correct? Checking this before I process something I think could rack up some non-negligible costs for once. I should say I've never been stung with a sizeable BigQuery bill before it seems difficult to do.
Can anyone confirm that I'm correct?
Confirmed
Please note - BigQuery UI in fact uses DryRun which only estimates Total Bytes Processed . The final cost is based on Bytes Billed which reflects some nuances - minimum 10MB per each table involved in query as an example. You can see more details here - https://cloud.google.com/bigquery/pricing#on_demand_pricing
I know I am late but this might help you.
If you are pushing your audit logs to another dataset, you can do below on that dataset.
WITH data as
(
SELECT
protopayload_auditlog.authenticationInfo.principalEmail as principalEmail,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent
FROM
`administrative-audit-trail.gcp_audit_logs.cloudaudit_googleapis_com_data_access_20190227`
)
SELECT
principalEmail,
FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM
data
WHERE
jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY principalEmail
ORDER BY Estimated_USD_Cost DESC
Reference: https://cloud.google.com/bigquery/docs/reference/auditlogs/

Time taken to display output in postgres 9.1

Does postgres include the time taken for rendering the output on screen within \timing or explain analyze. From what I understand it does not. Am I correct?
Actually I am outputting a lot of rows on screen and I find that postgres does not take much time to display them on screen, whereas if I write a simple C program to output the results, then the C programs takes about 3000ms. Whereas postgres takes about 500ms to display the same data on screen.
"postgres" doesn't display anything at all. I think you mean the psql client.
If so: \timing displays time including the time to receive the data from the server. EXPLAIN ANALYZE doesn't, but adds the overhead of doing the detailed server-side timing. log_min_duration_statement just records statement timings server-side.

Is my execution plan trying to trick me?

I am trying to speed up a long running query that I have (takes about 10 minutes to run...). In order to track down what part of the query is costing me the most time I included the Actual Execution Plan when I ran it and found a particular section that was taking up 55% (screen shot below)
alt text http://img109.imageshack.us/img109/9571/53218794.png
This didn't quite seem right to me so I added Print '1' and Print '2' before and after this trouble section. When I run the query for a mere 17 seconds and then cancel it the 1 and 2 print out which I'm assuming means it's getting through that section in the first 17 seconds.
alt text http://img297.imageshack.us/img297/4739/66797633.png
Am I doing something wrong here or is my Execution plan misleading me?
Metrics from perfmon will also help figure out what's going wrong... you could be running into some serious IO issues with the drive your tempDB is residing on. Additionally, run a trace and look at the CPU & IO of the actual run.
Good perfmon metrics to look at are disk queue length (avg & writes).
If you don't have access to perfmon or don't want to trace things, use "SET STATISTICS IO ON" at the beginning of your query and allow it to complete...don't stop it. Just because an execution plan says it's taking over have the cost doesn't mean it will run for half of the query time...it could be much more (or less).
It says Query 10: Query cost (relative to the batch): 55%. Are 100% positive that it is the 10th statement in the batch that you surounded with Print statements? Could the INSERT ... INTO #mpProgramSet2 execute multiple times, some times in under 17 seconds other time for 5 minutes, depending on how much data was selected/inserted?
As a side note you should run with SET STATISTICS TIME ON rather that prints, this will give you exact compile/time and execution time of each statement in the batch.
I wouldn't trust that printing the '1' and '2' will prove anything about what has executed and what has not. I do the same thing, but I just wouldn't rely on it as proof. You could print the ##rowcount from that first insert query - that would indicate for sure that the insert has occurred.
Although the plan says that query may take 55% of the cost, it may not be 55% of the execution time, especially if the query results are cached.
Another advantage of printing the ##rowcount is to compare the actual number of rows to the estimated rows (51K). If they differ by a lot then you might investigate the statistics for your indexes.
We would need the full query to understand what's going on; but I would probably start with setting MAXDOP to 1 in order to limit the number of processors it's running on.
Note that sometimes queries need to be limited to only 1 processor due to locks etc.
Further you might try adding NOLOCKs to any of your selects which can get away with dirty reads.