SQL, Get average job runtime from a log table - sql

I have a table LOGS with this attributes:
ID(int)
Date(datetime)
Core(int)
Source(string)
Message(string)
That table contains log entries from multiple jobs.
A job has multiple log entries, but the start/end entries always have the same message.
Example:
->1, 07.12.2016 10:49:00, 2, Some DLL, Calling Execute on CleanUp // Start
2, 07.12.2016 10:49:01, 3, Other DLL, BLABLABLA
3, 07.12.2016 10:49:10, 1, Other DLL, BLABLABLA
->4, 07.12.2016 10:50:15, 2, Other DLL, BLABLABLA // Job does sth.
->5, 07.12.2016 10:50:50, 2, Other DLL, Execution completed // End
The rows marked with an arrow belonging to the same job.
As you can see, a job starts with 'Calling Execute...' and ends with 'Execution completed'.
What I want to achieve:
My task is to get the average job running times. The initial aproach was to filter with
WHERE Message LIKE '%JOBNAME%' OR Message LIKE 'Execution completed'
and comparing the dateTimes. This worked for some jobs, but some jobs run rarely so I only get "Execution completed" and the precision is not that great doing this manually.
At the end I want a list with following attributes:
ID(start log),
Start-Date,
End-Date,
Core,
Source-Start,
Source-End,
Message-Start,
Message-End
So later it's easy to calculate the difference and do the avg on it.
My idea
-> Get jobs by searching for a message.
-> Get a list with the message "Executing completed" having:
a higher ID (end log is always after start log)
a later datetime
the same core
For example:
Having a job with the attributes
1, 07.12.2016 11:33:00, 2, Source 1, Calling Execute on job Cleanup
Then searching for all logs with
ID>1,
dateTime>07.12.2016 11:33:00,
Core=2,
Message="Execution completed"
Picking the first item of that list should be the end log of the job
How can I do this with a sql query?
PS: I cannot change anything in the database, I can only read data.

You can identify the jobs using a correlated subquery to get the next end record. The following shows how to get these fields:
select l.*, lend.*
from (select l.*,
(select min(l2.date)
from logs l2
where l2.core = l.core and
l2.message like '% End'
l2.date > l.date
) as end_date
from logs l
where l.message like '% Start'
) l join
logs lend
on lend.core = l.core and lend.date = l.end_date;
This assumes that the date/time values are unique for a given "core".

Related

Build query that brings only sessions that have only errors?

I have a table with sessions events names. Each session can have 3 different types of events.
There are sessions that have only error type event and I need to identify them by getting a list those session.
I tried the following code:
SELECT
test.SessionId, SS.RequestId
FROM
(SELECT DISTINCT
SSE.SessionId,
SSE.type,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId, SSE.type) AS total_XSESIONID_TYPE,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId) AS total_XSESIONID
FROM
[CMstg].SessionEvents SSE
-- WHERE SSE.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
) AS test
WHERE
test.total_XSESIONID_TYPE = test.total_XSESIONID
AND test.type = 'Errors'
-- AND test.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
Each session can have more than one type, and I need to count only the sessions that have only type 'errors'. I don't want to include sessions that have additional types of events in the count
While I'm running the first query I'm getting a count of 3 error event per session, but while running the all procedure the number is multiplied to 90?
Sample table :
sessionID
type
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
NonError
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
In this case I should get
sessionid = fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Please advice - hope this is clearer now, thanks!
It's been a long time but I think something like this should get you the desired results:
SELECT securemeSessionId
FROM <TableName> -- replace with actual table name
GROUP BY securemeSessionId
HAVING COUNT(*) = COUNT(CASE WHEN type = 'errors' THEN 1 END)
And a pro tip: When asking sql-server questions, it's best to follow these guidelines
SELECT *
FROM NameOfDataBase
WHERE type!= 'errors'
Is it what you wanted to do?

Insert query failed in Vertica with ERROR code 4534 when triggered from RStudio

I am executing an insert query on Vertica DB and it is working fine when triggered from a SQL client(SQuirrel). But when I am trying to trigger the same query from RStudio it is returning the following error:
Error in .local(conn, statement, ...) : execute JDBC update query
failed in dbSendUpdate ([Vertica]VJDBC ERROR: Receive on
v_default_node0005: Message receipt from v_default_node0008 failed [])
The SQL query somewhat looks like:
insert into SCHEMA1.TEMP_NEW(
SELECT C.PROGRAM_GROUP_ID,
C.POPULATION_ID,
C.PROGRAM_ID,
C.FULLY_QUALIFIED_NAME,
C.STATE,
C.DATA_POINT_TYPE,
C.SOURCE_TYPE,
B.SOURCE_DATA_PARTITION_ID AS DATA_PARTITION_ID,
C.PRIMARY_CODE_PRIMARY_DISPLAY,
C.PRIMARY_CODE_ID,
C.PRIMARY_CODING_SYSTEM_ID,
C.PRIMARY_CODE_RAW_CODE_DISPLAY,
C.PRIMARY_CODE_RAW_CODE_ID,
C.PRIMARY_CODE_RAW_CODING_SYSTEM_ID,
(C.COMPONENT_QUALIFIED_NAME)||('/2') AS SPLIT_PART,
Count(*) AS RECORD_COUNT
from (SELECT DPL.PROGRAM_GROUP_ID,
DPL.POPULATION_ID,
DPL.PROGRAM_ID,
DPL.FULLY_QUALIFIED_NAME,
'MET' AS STATE,
DPL.DATA_POINT_TYPE,
DPL.IDENTIFIER_SOURCE_TYPE AS SOURCE_TYPE,
DPL.IDENTIFIER_SOURCE_DATA_PARTITION_ID AS DATA_PARTITION_ID,
DPL.PRIMARY_CODE_PRIMARY_DISPLAY,
DPL.PRIMARY_CODE_ID,
DPL.PRIMARY_CODING_SYSTEM_ID,
DPL.PRIMARY_CODE_RAW_CODE_DISPLAY,
DPL.PRIMARY_CODE_RAW_CODE_ID,
DPL.PRIMARY_CODE_RAW_CODING_SYSTEM_ID,
DPL.supporting_data_point_lite_id,
DPL.COMPONENT_QUALIFIED_NAME,
COUNT(*) AS RECORD_COUNT
FROM SCHEMA2.TABLE1 DPL
WHERE DPL.DATA_POINT_TYPE <> 'PREFERRED_DEMOGRAPHICS'
AND DPL.DATA_POINT_TYPE <> 'PERSON_DEMOGRAPHICS'
AND DPL.DATA_POINT_TYPE <> 'CALCULATED_RISK_SCORE'
AND DPL.DATA_POINT_TYPE <> '_NOT_RECOGNIZED'
AND DPL.POPULATION_ID NOT ILIKE '%ARCHIVE%'
AND DPL.POPULATION_ID NOT ILIKE '%SNAPSHOT%'
AND DPL.PROGRAM_GROUP_ID = '<PROGRAM_GROUP_ID>'
AND PROGRAM_GROUP_ID IS NOT NULL
AND DPL.IDENTIFIER_SOURCE_DATA_PARTITION_ID IS NULL
AND DPL.PRIMARY_CODE_RAW_CODE_ID IS NOT NULL
AND DPL.PRIMARY_CODE_ID IS NOT NULL
AND EXISTS (SELECT 1
FROM SCHEMA2.TABLE2 MO
WHERE MO.STATE = 'MET'
AND MO.POPULATION_ID NOT ILIKE '%ARCHIVE%'
AND MO.POPULATION_ID NOT ILIKE '%SNAPSHOT%'
AND DPL.PROGRAM_GROUP_ID = MO.PROGRAM_GROUP_ID
AND DPL.PROGRAM_ID = MO.PROGRAM_ID
AND DPL.FULLY_QUALIFIED_NAME = MO.FULLY_QUALIFIED_NAME
AND DPL.OUTCOME_SEQUENCE = MO.MEASURE_OUTCOME_SEQ
AND MO.PROGRAM_GROUP_ID = '<PROGRAM_GROUP_ID>')
GROUP BY 1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16) AS C
Left Join
(SELECT DISTINCT SOURCE_DATA_PARTITION_ID,
supporting_data_point_lite_id
FROM SCHEMA2.TABLE3 DPI
where DPI.SOURCE_DATA_PARTITION_ID is not null
AND EXISTS (SELECT 1
FROM (SELECT DPL.supporting_data_point_lite_id
FROM SCHEMA2.TABLE1 DPL
WHERE DPL.DATA_POINT_TYPE <> 'PREFERRED_DEMOGRAPHICS'
AND DPL.DATA_POINT_TYPE <> 'PERSON_DEMOGRAPHICS'
AND DPL.DATA_POINT_TYPE <> 'CALCULATED_RISK_SCORE'
AND DPL.DATA_POINT_TYPE <> '_NOT_RECOGNIZED'
AND DPL.POPULATION_ID NOT ILIKE '%ARCHIVE%'
AND DPL.POPULATION_ID NOT ILIKE '%SNAPSHOT%'
AND DPL.PROGRAM_GROUP_ID = '<PROGRAM_GROUP_ID>'
AND PROGRAM_GROUP_ID IS NOT NULL
AND DPL.IDENTIFIER_SOURCE_DATA_PARTITION_ID IS NULL
AND DPL.PRIMARY_CODE_RAW_CODE_ID IS NOT NULL
AND DPL.PRIMARY_CODE_ID IS NOT NULL
AND EXISTS (SELECT 1
FROM SCHEMA2.TABLE2 MO
WHERE MO.STATE = 'MET'
AND MO.POPULATION_ID NOT ILIKE '%ARCHIVE%'
AND MO.POPULATION_ID NOT ILIKE '%SNAPSHOT%'
AND DPL.PROGRAM_GROUP_ID = MO.PROGRAM_GROUP_ID
AND DPL.PROGRAM_ID = MO.PROGRAM_ID
AND DPL.FULLY_QUALIFIED_NAME = MO.FULLY_QUALIFIED_NAME
AND DPL.OUTCOME_SEQUENCE = MO.MEASURE_OUTCOME_SEQ
AND MO.PROGRAM_GROUP_ID = '<PROGRAM_GROUP_ID>')) SDP
WHERE DPI.supporting_data_point_lite_id = SDP.supporting_data_point_lite_id)) AS B
on C.supporting_data_point_lite_id = B.supporting_data_point_lite_id
group by 1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15)
Only the schema name and the table names have been replaced. All other details the same.
Can someone please help me to fix the error.
This error means some node-to-node communication that happened during the processing of your query failed for some reason.
There are many possible reasons this could happen. Sometimes a poor network or other environment issues could cause this to occur. If v_default_node0008 was taken down while this query was running for example, you may see this message. Other times it can be the sign of a Vertica bug, in which case you'd have to take it up with support and/or your administrator.
Normally when a query plan is executing, the control flow happens from the bottom up. At the lowest levels of the plan, various scan(s) read from projections, and when there's no data left to feed to operators above the scan(s), they stop, which causes their neighboring operators to stop, until ultimately the root operator stops and the query finishes.
Occasionally, there is a need to end the query in a top-down fashion. When you have many nodes, each passing data between multiple threads in service of your query, it can be tricky for Vertica to tear down everything atomically in a deterministic fashion. If a thread sending data stops before the thread receiving data was expecting it to (because the receiver hasn't realized the plan is being stopped yet), then it may log this error message. Usually when that happens it is innocuous; you'll see it in vertica.log but it doesn't bubble all the way up to the application. If one of those is making its way to the application then it is probably a Vertica bug.
So when can this happen?
One common scenario is when you have a LIMIT clause. The different scans producing rows on different nodes can't coordinate directly, so they have to be told by operators higher up in the plan when the limit has been reached.
It also happens when a query gets canceled. Cancellation can happen for many reasons -- at the request of the application, from the dba running interrupt_statement on your query, or via resource pool policy. If you exceed the RUNTIMECAP for your resource pool for example, the query is automatically cancelled if it exceeds a configured execution time threshold.
There may be others too, but those are the most common cases. It won't always be obvious that either limits or cancels are happening to you. The query may be rewritten to include a limit at various stages, and the application or and/or DBA's policy may be affecting things under the cover.
While this doesn't directly solve your problem, it hopefully gives you some additional context and ideas for further troubleshooting. The problem is likely going to be very specific to your use case, environment and data, and could be a bug. If you can't make progress I'd suggest taking it to Vertica support, since they will be more equipped to help you make sense of this further.

SQLite query WHERE with OUTER JOIN

I am a bit rusty with my SQL and am running into a little issue with a query. In our application we have two relative tables to this problem. There are entries, and for each entry there are N steps.
We are trying to optimize our querying, so instead of asking for all entries all the time, we just ask for entries that were updated after we last checked. There can be a lot of steps, so this query is just supposed to return the entries and some step summary data, and we can separately query for steps if needed.
The entry start time and updated time are calculated from the first and most recent process step time respectively. We also have to group together entry statuses.
Here's the query as we build it in python, since it seems easier to read:
statement = 'SELECT e.serial_number, ' + \
'e.description, ' + \
'min(p.start_time) begin_time, ' + \
'group_concat(p.status) status, ' + \
'max(p.last_updated) last_updated, ' + \
'FROM entries e ' + \
'LEFT OUTER JOIN process_steps p ON e.serial_number = p.serial_number ' + \
# if the user provides a "since" date, only return entries updated after
# that date
if since is not None:
statement += ' WHERE last_updated > "{0}"'.format(since)
statement += ' GROUP BY e.serial_number'
The issue we are having is that if we apply that WHERE clause, it filters the process steps too. So for example if we have this situation with two entries:
Entry: 123 foo
Steps:
1. start time 10:00, updated 10:30, status completed
2. start time 11:00, updated 11:30, status completed
3. start time 12:00, updated 12:30, status failed
4. start time 13:00, updated 13:30, status in_progress
Entry: 321 bar
Steps:
1. start time 01:00, updated 01:30, status completed
2. start time 02:00, updated 02:30, status completed
If we query without the where, we would get all entries. So for this case it would return:
321, bar, 01:00, "completed,completed", 02:30
123, foo, 10:00, "completed,completed,failed,in_progress", 13:30
If I had time of 12:15, then it would only return this:
123, foo, 12:00, "failed,in_progress", 13:30
In that result, the start time comes from step 3, and the statuses are only from steps 3 and 4. What I'm looking for is the whole entry:
123, foo, 10:00, "completed,completed,failed,in_progress", 13:30
So basically, I want to filter the final results based on that last_updated value, but it is currently filtering the join results as well, which throws off the begin_time, last_updated and status values since they are calculated with a partial set of steps. Any ideas how to modify the query to get what I want here?
Edit:
It seems like there might be some naming issues here too. The names I used in the example code are equal to or similar to what we actually have in our code. If we change max(p.last_updated) last_updated to max(p.last_updated) max_last_updated, and change the WHERE clause to use max_last_updated as well, we get OperationalError: misuse of aggregate: max() We have also tried adding AS statements in there with no difference.
Create a subquery that selects updated processes first:
SELECT whatever you need FROM entries e
LEFT OUTER JOIN process_steps p ON e.serial_number = p.serial_number
WHERE e.serial_number in (SELECT distinct serial_number from process_steps
WHERE last_updated > "date here")
GROUP BY e.serial_number
You can do this with a having clause:
SELECT . . .
FROM entries e LEFT JOIN
process_steps ps
ON e.serial_number = ps.serial_number
GROUP BY e.serial_number
HAVING MAX(ps.last_updated) > <your value here>;

Google BiqQuery Internal Error

Edit: Tidied up the query a bit. Checked running on one day (versus the 27 I need) and the query runs. With 27 days of data it's trying to process 5.67TB. Could this be the issue?
Latest ID of error run:
Job ID: ee-corporate:bquijob_3f47d425_1530e03af64
I keep getting this error message when trying to run a query in BigQuery, both through the UI and Bigrquery.
Query Failed
Error: An internal error occurred and the request could not be completed.
Job ID: ee-corporate:bquijob_6b9bac2e_1530dba312e
Code below:
SELECT
CASE WHEN d.category_grouped IS NULL THEN 'N/A' ELSE d.category_grouped END AS category_grouped_cleaned,
COUNT(UNIQUE(msisdn_token)) AS users,
(SUM(up_link_data_bytes) + SUM(down_link_data_bytes))/1000000 AS tot_data_mb
FROM (
SELECT
request_domain, up_link_data_bytes, down_link_data_bytes, msisdn_token, timestamp
FROM (TABLE_DATE_RANGE([helpful-skyline-97216:WEBLOG_Staging.WEBLOG_], TIMESTAMP('20160101'), TIMESTAMP('20160127')))
WHERE SUBSTR(http_status_code,1,1) IN ('1',
'2',
'3')) a
LEFT JOIN EACH web_usage_201601.domain_to_cat_lookup_27JAN_with_groups d
ON
a.request_domain = d.request_domain
WHERE
DATE(timestamp) >= '2016-01-01'
AND DATE(timestamp) <= '2016-01-27'
GROUP EACH BY
1
Is there something I'm doing wrong?
The problem seems to be coming from UNIQUE() - it returns repeated field with too many elements in it. The error could be improved, but workaround for you would be to use explicit GROUP BY and then run COUNT on top of it.
If you are okay with an approximation, you can also use
COUNT(DISTINCT msisdn_token) AS users
or a higher approximation parameter than the default 1000,
COUNT(DISTINCT msisdn_token, 5000) AS users
GROUP BY is the most general approach, but these can be faster if they do what you need.

Unable to get correct EndTime for the next Stage

I am trying to get how long an activity has been "InProgress" based on the history data i have. Each history record contains StartTime and the "Stage" of an activity.
Stages flow like this:
Ready
InProgress
Completed
Also there is a stage named "OnHold" which puts an activity on Hold. While calculating how long an activity has been "InProgress", i need to subtract the amount of time it was "OnHold".
In the given example you will see Activity named "MA50665" went "InProgress" at "2014-07-17 13:08:04.013" and then was put on hold at "2014-07-17 13:12:14.473" which is roughly about 4 minutes. Then it went "InProgress" again at "2014-07-17 13:22:45.503" and was completed at around "2014-07-17 13:33:38.513" which is roughly around 11 minutes. Which means MA50665 was InProgress for about 11+4=15 minutes.
I have the query which is getting me close to what i am looking for. It gives me two records for "MA50665" which i am expecting but the EndTime for both the records comes to "2014-07-17 13:33:38.513" which is incorrect.
For start time "2014-07-17 13:08:04.013", EndTime should have been "2014-07-17 13:12:14.473" because that is when the "InProgress" stage ends. For the second row, StartTime and EndTime are correct.
How do i say in the query that Get the End Time for the stage from the next history row of that activity? I cannot hard code "+1" in the join .
Here is the SQLFiddle for the Table schema and query:http://sqlfiddle.com/#!3/37ef3/4
I think I'm seeing a duplicate row in your example that you say works but has the "+1" in it. Records 5 & 6, seem to be the same but have different end times. Assuming that you are correct here is a fix for the query.
SELECT ROW_NUMBER()OVER(ORDER BY T1.Seqid, T1.Historyid)Rnumber,
T1.Srid,
T1.Activityid,
T1.Seqid,
T1.Currentactstatus,
T1.Previousactstatus,
T1.Timechanged Statusstarttime,
Endtimehist.Timechanged Statusendtime
FROM T1
LEFT JOIN T1 Endtimehist ON T1.Srid = Endtimehist.Srid
AND T1.Activityid = Endtimehist.Activityid
AND T1.Currentactstatus = Endtimehist.Previousactstatus
WHERE T1.SRId = 'SR50660'
AND T1.Currentactstatus = 'InProgress'
AND T1.Previousactstatus = 'Ready'
AND T1.Historyid < Endtimehist.Historyid --This works but i cannot hard code +1 here as history ids may appear in the random incrementing order
ORDER BY T1.SRId DESC, T1.SeqId, T1.HistoryId