I have an operation that if it fails - it retries 5 times and then gives up. Resulting in following log table:
LogId OpId Message
1 4 Retry 1...Failed
2 4 Retry 2...Failed
3 4 Retry 3...Failed
4 4 Retry 4...Failed
5 4 Retry 5...Failed
6 4 Max Retries exceeded - giving up
Sometimes, it will succeed after retry, which means that I'll never see the Max Retries exceeded - giving up entry within that OpId.
And that I what I am trying to identify. Operations that were forced to go into retries (e.g there is a Retry X... entry), but there isn't a Max Retries exceeded - giving up entry because the retry succeeded at some point.
I tried using Window functions, and I think that might the way to go but I am not sure how to actually identify what I want.
P.S. Added auto-incrementing field per #GMB
For this dataset, you might be able to just use aggregation:
select opId
from mytable
group by opId
having
max(case when message like 'Retry%' then 1 end) = 1
and max(case when message = 'Max Retries exceeded - giving up' then 1 end) is null
This gives you the list of opId for which at least one message starts with 'Retry' and that have no message equal to 'Max Retries exceeded - giving up'.
Anything that goes into retries will have a "Retry 1...Failed" entry, so (assuming opid is different for each set) a self join would probably work.
SELECT opId
, CASE WHEN tGU.opId IS NULL THEN 'Eventually Succeeded' ELSE 'Gave Up' END AS final
FROM theTable AS t1
LEFT JOIN theTable AS tGU
ON t1.opId = tGU.opId
AND tGU.Message = "Max Retries exceeded - giving up"
WHERE t1.Message = "Retry 1...Failed"
If you just want ops that eventually succeeded, you can omit the CASE WHEN stuff (I really just meant it as an example), and just and AND tGU.opId IS NULL to the WHERE clause.
However, and I don't think there is really a way around this, ops currently retrying will be considered "eventually successful". (Due to the nature of the data, you cannot really know "eventually succeeded"; only "didn't or hasn't yet given up".)
Also, perhaps it is a wording thing, but what if "Retry 1" succeeds? (Or does "Retry 1...Failed" really intend to mean something like "Attempt 1 failed, retrying"?)
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=307881806d4da72e5f37c080e419e00b
Given a table that looks something like
CREATE TABLE dbo.so_60047052(OpId int, Message varchar(50));
insert into dbo.so_60047052
SELECT *
FROM
(
VALUES
(4,'Retry 1...Failed')
, (4,'Retry 2...Failed')
, (4,'Retry 3...Failed')
, (4,'Retry 4...Failed')
, (4,'Retry 5...Failed')
, (4,'Max Retries exceeded - giving up')
-- Some failure but not all
, (5,'Retry 1...Failed')
, (6,'Retry 1...Failed')
, (6,'Retry 2...Failed')
, (8,'Retry 1...Failed')
, (8,'Retry 2...Failed')
, (8,'Retry 3...Failed')
, (8,'Retry 4...Failed')
)D(OpId, Message);
You can attack it a few different ways
-- Show me anything that got into a terminal status
SELECT
D.OpId
, D.Message
FROm
dbo.so_60047052 AS D
WHERE
D.Message = 'Max Retries exceeded - giving up';
-- Show me the "last" failing message where it didn't hit max retries
-- Last is in quotes as it's only last because the text sorts that way
SELECT
D.OpId
, D.Message
FROM
dbo.so_60047052 AS D
WHERE
NOT EXISTS
(
SELECT *
FROM dbo.so_60047052 AS DI
WHERE DI.Message = 'Max Retries exceeded - giving up'
AND DI.OpId = D.OpId
)
AND D.Message =
(
SELECT MAX(DI.Message)
FROM dbo.so_60047052 AS DI
WHERE
DI.OpId = D.OpId
);
If you have a table that records all the OpId, beyond the ones that have trouble, you can then build out a set that "had no issues", "had transient issues", "failed" based on
Related
I have a table with sessions events names. Each session can have 3 different types of events.
There are sessions that have only error type event and I need to identify them by getting a list those session.
I tried the following code:
SELECT
test.SessionId, SS.RequestId
FROM
(SELECT DISTINCT
SSE.SessionId,
SSE.type,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId, SSE.type) AS total_XSESIONID_TYPE,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId) AS total_XSESIONID
FROM
[CMstg].SessionEvents SSE
-- WHERE SSE.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
) AS test
WHERE
test.total_XSESIONID_TYPE = test.total_XSESIONID
AND test.type = 'Errors'
-- AND test.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
Each session can have more than one type, and I need to count only the sessions that have only type 'errors'. I don't want to include sessions that have additional types of events in the count
While I'm running the first query I'm getting a count of 3 error event per session, but while running the all procedure the number is multiplied to 90?
Sample table :
sessionID
type
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
NonError
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
In this case I should get
sessionid = fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Please advice - hope this is clearer now, thanks!
It's been a long time but I think something like this should get you the desired results:
SELECT securemeSessionId
FROM <TableName> -- replace with actual table name
GROUP BY securemeSessionId
HAVING COUNT(*) = COUNT(CASE WHEN type = 'errors' THEN 1 END)
And a pro tip: When asking sql-server questions, it's best to follow these guidelines
SELECT *
FROM NameOfDataBase
WHERE type!= 'errors'
Is it what you wanted to do?
I have many hive scripts (somewhat 20-25 scripts), each scripts having multiple queries. I want to run these scripts using spark so that the process can run faster. As map reduce job in hive takes long time to execute from spark it will be much faster. Below is the code I have written but its working for 3-4 files but when given multiple files with multiple queries its getting failed.
Below is the code for the same. Please help me if possible to optimize the same.
val spark = SparkSession.builder.master("yarn").appName("my app").enableHiveSupport().getOrCreate()
val filename = new java.io.File("/mapr/tmp/validation_script/").listFiles.filter(_.getName.endsWith(".hql")).toList
for ( i <- 0 to filename.length - 1)
{
val filename1 = filename(i)
scala.io.Source.fromFile(filename1).getLines()
.filterNot(_.isEmpty) // filter out empty lines
.foreach(query =>
spark.sql(query))
}
some of the error I cam getting is like
ERROR SparkSubmit: Job aborted.
org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:224)
ERROR FileFormatWriter: Aborting job null.
org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 12 (sql at validationtest.scala:67) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: failed to allocate 16777216 byte(s) of direct memory (used: 1023410176, max: 1029177344) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:528)
many different types of error I get when run the same code multiple times.
Below is how one of the HQL file looks like. its name is xyz.hql and has
drop table pontis_analyst.daydiff_log_sms_distribution
create table pontis_analyst.daydiff_log_sms_distribution as select round(datediff(date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) ),cast(subscriberActivationDate as date))/7,4) as daydiff,subscriberkey as key from pontis_analytics.prepaidsubscriptionauditlog
drop table pontis_analyst.weekly_sms_usage_distribution
create table pontis_analyst.weekly_sms_usage_distribution as select sum(event_count_ge) as eventsum,subscriber_key from pontis_analytics.factadhprepaidsubscriptionsmsevent where effective_date_ge_prt < date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) - 1 ) and effective_date_ge_prt >= date_sub(date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) ),84) group by subscriber_key;
drop table pontis_analyst.daydiff_sms_distribution
create table pontis_analyst.daydiff_sms_distribution as select day.daydiff,sms.subscriber_key,sms.eventsum from pontis_analyst.daydiff_log_sms_distribution day inner join pontis_analyst.weekly_sms_usage_distribution sms on day.key=sms.subscriber_key
drop table pontis_analyst.weekly_sms_usage_final_distribution
create table pontis_analyst.weekly_sms_usage_final_distribution as select spp.subscriberkey as key, case when spp.tenure < 3 then round((lb.eventsum )/dayDiff,4) when spp.tenure >= 3 then round(lb.eventsum/12,4)end as result from pontis_analyst.daydiff_sms_distribution lb inner join pontis_analytics.prepaidsubscriptionsubscriberprofilepanel spp on spp.subscriberkey = lb.subscriber_key
INSERT INTO TABLE pontis_analyst.validatedfinalResult select 'prepaidsubscriptionsubscriberprofilepanel' as fileName, 'average_weekly_sms_last_12_weeks' as attributeName, tbl1_1.isEqual as isEqual, tbl1_1.isEqualCount as isEqualCount, tbl1_2.countAll as countAll, (tbl1_1.isEqualCount/tbl1_2.countAll)* 100 as percentage from (select tbl1_0.isEqual as isEqual, count(isEqual) as isEqualCount from (select case when round(aal.result) = round(srctbl.average_weekly_sms_last_12_weeks) then 1 when aal.result is null then 1 when aal.result = 'NULL' and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result = '' and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result is null and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result is null and srctbl.average_weekly_sms_last_12_weeks is null then 1 else 0 end as isEqual from pontis_analytics.prepaidsubscriptionsubscriberprofilepanel srctbl left join pontis_analyst.weekly_sms_usage_final_distribution aal on srctbl.subscriberkey = aal.key) tbl1_0 group by tbl1_0.isEqual) tbl1_1 inner join (select count(*) as countAll from pontis_analytics.prepaidsubscriptionsubscriberprofilepanel) tbl1_2 on 1=1
Your issue is your code is running out of memory as shown below
failed to allocate 16777216 byte(s) of direct memory (used: 1023410176, max: 1029177344)
Though what you are trying to do is not optimal way of doing things in Spark but I would recommend that you remove the memory serialization as it will not help in anyways. You should cache data only if it is going to be used in multiple transformations. If it is going to be used once there is no reason to put the data in cache.
I'm trying to port Delayed Job to Haskell and am unable to understand the WHERE clause in the query that DJ fires to poll the next job:
UPDATE "delayed_jobs"
SET locked_at = '2017-07-18 03:33:51.729884',
locked_by = 'delayed_job.0 host:myhostname pid:21995'
WHERE id IN (
SELECT id FROM "delayed_jobs"
WHERE
(
(
run_at <= '2017-07-18 03:33:51.729457'
AND (locked_at IS NULL OR locked_at < '2017-07-17 23:33:51.729488')
OR locked_by = 'delayed_job.0 host:myhostname pid:21995'
)
AND failed_at IS NULL
) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
The structure of the WHERE clause is the following:
(run_at_condition AND locked_at_condition OR locked_by_condition)
AND failed_at_condition
Is there a set of inner parentheses missing in run_at_condition AND locked_at_condition OR locked_by_condition? In what precedence are the AND/OR clauses evaluated?
What is the purpose of the locked_by_condition where it seems to be picking up jobs that have already been locked by the current DJ process?!
The statement is probably fine. The context of the whole statement is to take the lock on the highest-priority job by setting its locked_at/locked_by fields.
The where condition is saying something like: "if run_at is sooner than now (it's due) AND, it's either not locked or it was locked more than four hours ago... alternatively that's all overridden if it was me that locked it, and of course, if it hasn't failed THEN lock it." So if I'm reading it correctly it looks kinda like it's running things that are ready to run but with a timeout so that things can't be locked-out forever.
To your second question, AND has a higher precedence than OR:
SELECT 'yes' WHERE false AND false OR true; -- 'yes', 1 row
SELECT 'yes' WHERE (false AND false) OR true; -- 'yes', 1 row
SELECT 'yes' WHERE false AND (false OR true); -- 0 rows
The first two statements mean the same thing, the third one is different.
The second point may just be a rough sort of ownership system? If the current process is the one that locked something, it should be able to override that lock.
Edit: Tidied up the query a bit. Checked running on one day (versus the 27 I need) and the query runs. With 27 days of data it's trying to process 5.67TB. Could this be the issue?
Latest ID of error run:
Job ID: ee-corporate:bquijob_3f47d425_1530e03af64
I keep getting this error message when trying to run a query in BigQuery, both through the UI and Bigrquery.
Query Failed
Error: An internal error occurred and the request could not be completed.
Job ID: ee-corporate:bquijob_6b9bac2e_1530dba312e
Code below:
SELECT
CASE WHEN d.category_grouped IS NULL THEN 'N/A' ELSE d.category_grouped END AS category_grouped_cleaned,
COUNT(UNIQUE(msisdn_token)) AS users,
(SUM(up_link_data_bytes) + SUM(down_link_data_bytes))/1000000 AS tot_data_mb
FROM (
SELECT
request_domain, up_link_data_bytes, down_link_data_bytes, msisdn_token, timestamp
FROM (TABLE_DATE_RANGE([helpful-skyline-97216:WEBLOG_Staging.WEBLOG_], TIMESTAMP('20160101'), TIMESTAMP('20160127')))
WHERE SUBSTR(http_status_code,1,1) IN ('1',
'2',
'3')) a
LEFT JOIN EACH web_usage_201601.domain_to_cat_lookup_27JAN_with_groups d
ON
a.request_domain = d.request_domain
WHERE
DATE(timestamp) >= '2016-01-01'
AND DATE(timestamp) <= '2016-01-27'
GROUP EACH BY
1
Is there something I'm doing wrong?
The problem seems to be coming from UNIQUE() - it returns repeated field with too many elements in it. The error could be improved, but workaround for you would be to use explicit GROUP BY and then run COUNT on top of it.
If you are okay with an approximation, you can also use
COUNT(DISTINCT msisdn_token) AS users
or a higher approximation parameter than the default 1000,
COUNT(DISTINCT msisdn_token, 5000) AS users
GROUP BY is the most general approach, but these can be faster if they do what you need.
CREATE TABLE IntegrationLog (
IntegrationLogID INT IDENTITY(1,1) NOT NULL,
RecordID INT NOT NULL,
SyncDate DATETIME NOT NULL,
Success BIT NOT NULL,
ErrorMessage VARCHAR(MAX) NULL,
PreviousError BIT NOT NULL --last sync attempt for record failed for syncdate
)
My goal here, is to return every recordid, erorrmessage that has not been followed by a complete success, exclude where for the recorid there was a ( Success == 1 and PreviousError == 0) that occurred after the last time this error happened. For this recordid, I also want to know whether there has ever been a success ( Partial or otherwise ) that has ever happened.
Or in other words, I want to see errors and the record they occurred on that haven't been fixed since the error occurred. I also want to know whether I have ever had a success for the particular recordid.
This works, but I am curious if there is a better way to do this?
SELECT errors.RecordID ,
errors.errorMessage,
CASE WHEN PartialSuccess.RecordID IS NOT NULL THEN 1
ELSE NULL
END AS Resolved
FROM ( SELECT errors.RecordID ,
errors.ErrorMessage ,
MAX(SyncDate) AS SyncDate
FROM dbo.IntegrationLog AS Errors
WHERE errors.Success = 0
GROUP BY errors.RecordID ,
errors.ErrorMessage ,
errors.ErrorDescription
) AS Errors
LEFT JOIN dbo.IntegrationLog AS FullSuccess ON FullSuccess.RecordID = Errors.RecordID
AND FullSuccess.Success = 1
AND FullSuccess.PreviousError = 0
AND FullSuccess.SyncDate > Errors.SyncDate
LEFT JOIN ( SELECT partialSuccess.RecordID
FROM dbo.IntegrationLog AS partialSuccess
WHERE partialSuccess.Success = 1
GROUP BY partialSuccess.RecordID
) AS PartialSuccess ON Errors.RecordID = PartialSuccess.RecordID
WHERE FullSuccess.RecordID IS NULL
I also created a pastebin with a few different ways I saw of structuring the query. http://pastebin.com/FtNv8Tqw
Is there another option as well?
If it helps, background for the project is that I am trying to sync records that have been updated since their last successful sync ( Partial or Full ) and log the attempts. A batch of records is identified to be synced. Each record attempt is logged. If it failed, depending on the error it might be possible try to massage the data and attempt again. For this 'job', the time we collected the records is used as the SyncDate. So for a given SyncDate, we might have records that successfully synced on the first try, records we gave up on the first attempt, records we massaged and were able to sync, etc. Each attempt is logged.
Does it change anything if instead of wanting to know whether any success has occurred for that recordid, that I wish to identify whether a partial success has occurred since the last error occurrence.
Thank You! Suggestions on my framing of the question are welcome as well.
You should probably show query plan take a look at where most of the time is being spent and index appropriately.
That said one thing you can try is to use the Window Function ROW_NUMBER instead of MAX.
WITH cte
AS (SELECT errors.recordid,
errors.errormessage,
CASE
WHEN partialsuccess.recordid IS NOT NULL THEN 1
ELSE NULL
END
AS resolved,
Row_number() OVER (PARTITION BY errors.recordid ORDER BY
syncdate
DESC)
rn
FROM integrationlog error
LEFT JOIN integrationlog fullsuccess
ON fullsuccess.recordid = errors.recordid
AND fullsuccess.success = 1
AND fullsuccess.previouserror = 0
AND fullsuccess.syncdate > errors.syncdate
LEFT JOIN (SELECT partialsuccess.recordid
FROM dbo.integrationlog AS partialsuccess
WHERE partialsuccess.success = 1
GROUP BY partialsuccess.recordid) AS partialsuccess
ON errors.recordid = partialsuccess.recordid
WHERE errors.success = 0)
SELECT
recordid,
errormessage,
resolved
FROM cte
WHERE rn = 1