Oracle SQL : Wrong error count per error code - sql

I am trying to run one SQL query to find out the count per error code from the database. I have two table
sw_sms_events where the transaction id and sms that was sent is stored.
sw_events where the transaction id and error reason in case it is failed then it is stored otherwise the reason is always "Successfully Sent TariffText".
Total error count :- select count(*) from sw_sms_events where sms_text like '%Welkom in het buitenland%'
Total error count per error reason :-
select distinct count(*) over (partition by b.reason) , b.reason
from sw_sms_events a, sw_events b
where a.transaction_id= b.transaction_id
and a.sms_text like '%Welkom in het buitenland%'
and b.reason !='Successfully Sent TariffText'
order by (count(*) over (partition by b.reason)) desc
Normally these queries gives same result i.e. sum of individual error count = total number of errors.But in worst case scenarios where the same transaction is retried multiple times the results are not same .i.e. we have multiple rows in table with same transaction id.
below is one of the result in case of worst case :
Name 24-07-2015
Total Number of SMSWelcome Sent 156788
Total Number of Error SMSWelcome 1738
Total Number of SMSWelcome Sent with null Tariffs 286
Error Reason Error Count
Unknown error received :BEA-380000 , ErrorMessage : BSL-99999 1829
Backend system not available , ErrorMessage : BSL-50002 641
Remote Error 527
NativeQuery.executeQuery failed , ErrorMessage : BSL-11009 41
This service is available only for active products , ErrorMessage : BSL-15024 30
Unknown error received :BEA-382556 , ErrorMessage : BSL-99999 18
Customer information: Not retrieved. This action cannot continue without customer information. Please try later or contact your system administrator. , ErrorMessage : BSL-10004 13
OMS login failure: Problem in OMS UAMS login - Nested Exception/Error: java.net.ConnectException: Tried all: '1' addresses, but could not connect over HTTP to server: '195.233.102.177', port: '40123' , 12
t3://195.233.102.171:30101: Bootstrap to: 195.233.102.171/195.233.102.171:30101' over: 't3' got an error or timed out , ErrorMessage : BSL-11000 5
getTariffsAndAddOns, status: Failure , ErrorCode : An internal error occured , ErrorMessage : BSL-14005 3
Authorization failed of dealer market restrictions , ErrorMessage : BSL-50005 2
com.amdocs.cih.exception.InvalidUsageException: The input parameter AssignedProductRef is invalid. , ErrorMessage : BSL-10004 1
My question is how i can modify the current sql in such a way that the total count of error should always be equal to sum of individual error count when we have wrost cases where same transaction is multiple times in a table

I don't really understand why you are using an analytical query. Isn't a simpler group by sufficient ?
select count(*), b.reason
from sw_sms_events a, sw_events b
where a.transaction_id= b.transaction_id
and a.sms_text like '%Welkom in het buitenland%'
and b.reason !='Successfully Sent TariffText'
group by b.reason
order by count(*) desc
When you say we have multiple rows in table with same transaction id, do you mean in sw_events table only or in both sw_sms_events and sw_events tables ?
If so, events are counted multiple times because you are doing a cartesian product on all raws with the same transaction_id. You should use a stricter join clause.
You could also do something (quite ugly) like :
select count(distinct b.ROWID), b.reason
from sw_sms_events a, sw_events b
where a.transaction_id= b.transaction_id
and a.sms_text like '%Welkom in het buitenland%'
and b.reason !='Successfully Sent TariffText'
group by b.reason
order by count(distinct b.ROWID) desc
to ensure that each event is only counted once.

select distinct count(distinct b.ROWID) over (partition by b.reason) , b.reason
from sw_sms_events a, sw_events b
where a.transaction_id= b.transaction_id
and a.sms_text like '%Welkom in het buitenland%'
and b.reason !='Successfully Sent TariffText'
order by (count(distinct b.ROWID) over (partition by b.reason)) desc

Related

Clickhouse Cross join workaround?

I am trying to calculate the percentage of faulty transaction statuses per IP address in Clickhouse.
SELECT
c.source_ip,
COUNT(c.source_ip) AS total,
(COUNT(c.source_ip) / t.total_calls) * 100 AS percent_faulty
FROM sip_transaction_call AS c
CROSS JOIN
(
SELECT count(*) AS total_calls
FROM sip_transaction_call
) AS t
WHERE (status = 8 OR status = 9 or status = 13)
GROUP BY c.source_ip
Unfortunately Clickhouse rejects this with:
"Received exception from server (version 20.8.3):
Code: 47. DB::Exception: Received from 127.0.0.1:9000. DB::Exception: Unknown identifier: total_calls there are columns: source_ip, COUNT(source_ip)."
I tried various workarounds for the "invisible" alias, but failed. Any help would be greatly appreciated.
SELECT
source_ip,
countIf(status = 8 OR status = 9 or status = 13) AS failed,
failed / count() * 100 AS percent_faulty
FROM sip_transaction_call
GROUP BY source_ip
If you have a GROUP BY clause, you can only use columns you are grouping by (ie. c.source_ip) - for others you need an aggregate function.
Clickhouse is not too helpful here - for almost any other engine you would get a more meaningful error. See https://learnsql.com/blog/not-a-group-by-expression-error/.
Anyway, change grouping to GROUP BY c.source_ip, t.total_calls to fix it.

How to check rows for continuity?

I have an operation that if it fails - it retries 5 times and then gives up. Resulting in following log table:
LogId OpId Message
1 4 Retry 1...Failed
2 4 Retry 2...Failed
3 4 Retry 3...Failed
4 4 Retry 4...Failed
5 4 Retry 5...Failed
6 4 Max Retries exceeded - giving up
Sometimes, it will succeed after retry, which means that I'll never see the Max Retries exceeded - giving up entry within that OpId.
And that I what I am trying to identify. Operations that were forced to go into retries (e.g there is a Retry X... entry), but there isn't a Max Retries exceeded - giving up entry because the retry succeeded at some point.
I tried using Window functions, and I think that might the way to go but I am not sure how to actually identify what I want.
P.S. Added auto-incrementing field per #GMB
For this dataset, you might be able to just use aggregation:
select opId
from mytable
group by opId
having
max(case when message like 'Retry%' then 1 end) = 1
and max(case when message = 'Max Retries exceeded - giving up' then 1 end) is null
This gives you the list of opId for which at least one message starts with 'Retry' and that have no message equal to 'Max Retries exceeded - giving up'.
Anything that goes into retries will have a "Retry 1...Failed" entry, so (assuming opid is different for each set) a self join would probably work.
SELECT opId
, CASE WHEN tGU.opId IS NULL THEN 'Eventually Succeeded' ELSE 'Gave Up' END AS final
FROM theTable AS t1
LEFT JOIN theTable AS tGU
ON t1.opId = tGU.opId
AND tGU.Message = "Max Retries exceeded - giving up"
WHERE t1.Message = "Retry 1...Failed"
If you just want ops that eventually succeeded, you can omit the CASE WHEN stuff (I really just meant it as an example), and just and AND tGU.opId IS NULL to the WHERE clause.
However, and I don't think there is really a way around this, ops currently retrying will be considered "eventually successful". (Due to the nature of the data, you cannot really know "eventually succeeded"; only "didn't or hasn't yet given up".)
Also, perhaps it is a wording thing, but what if "Retry 1" succeeds? (Or does "Retry 1...Failed" really intend to mean something like "Attempt 1 failed, retrying"?)
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=307881806d4da72e5f37c080e419e00b
Given a table that looks something like
CREATE TABLE dbo.so_60047052(OpId int, Message varchar(50));
insert into dbo.so_60047052
SELECT *
FROM
(
VALUES
(4,'Retry 1...Failed')
, (4,'Retry 2...Failed')
, (4,'Retry 3...Failed')
, (4,'Retry 4...Failed')
, (4,'Retry 5...Failed')
, (4,'Max Retries exceeded - giving up')
-- Some failure but not all
, (5,'Retry 1...Failed')
, (6,'Retry 1...Failed')
, (6,'Retry 2...Failed')
, (8,'Retry 1...Failed')
, (8,'Retry 2...Failed')
, (8,'Retry 3...Failed')
, (8,'Retry 4...Failed')
)D(OpId, Message);
You can attack it a few different ways
-- Show me anything that got into a terminal status
SELECT
D.OpId
, D.Message
FROm
dbo.so_60047052 AS D
WHERE
D.Message = 'Max Retries exceeded - giving up';
-- Show me the "last" failing message where it didn't hit max retries
-- Last is in quotes as it's only last because the text sorts that way
SELECT
D.OpId
, D.Message
FROM
dbo.so_60047052 AS D
WHERE
NOT EXISTS
(
SELECT *
FROM dbo.so_60047052 AS DI
WHERE DI.Message = 'Max Retries exceeded - giving up'
AND DI.OpId = D.OpId
)
AND D.Message =
(
SELECT MAX(DI.Message)
FROM dbo.so_60047052 AS DI
WHERE
DI.OpId = D.OpId
);
If you have a table that records all the OpId, beyond the ones that have trouble, you can then build out a set that "had no issues", "had transient issues", "failed" based on

How to get "session duration" group by "operating system" in Firebase Bigquery SQL?

I try to get the "average session duration" group by "operating system" (device.operating_system) and "date" (event_date).
In the firebase blog, they give us this query to get the average duration session
SELECT SUM(engagement_time) AS total_user_engagement
FROM (
SELECT user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key =
"engagement_time_msec") AS engagement_time
FROM `FIREBASE_PROJECT`
)
WHERE engagement_time > 0
GROUP BY user_pseudo_id
This query give me the total user engagement by user ID (each row is a different user):
row|total_user_engagement
---|------------------
1 |989646
2 |225655
3 |125489
4 | 58496
...|......
But I have no idea where I have to add the "operating system" and "event_date" variables to get this information by os and date. I tried differents queries with no result. For example to get this result by operatiing system I tried the following
SELECT SUM(engagement_time) AS total_user_engagement
FROM (
SELECT device.operating_system,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key =
"engagement_time_msec") AS engagement_time
FROM `FIREBASE_PROJECT`
)
WHERE engagement_time > 0
GROUP BY device.operating_system
But it gives me an error message (Error: Unrecognized name: device at [9:10] ). In others queries device.operating_system is recognized.
For example in that one :
SELECT
event_date,
device.operating_system as os_type,
device.operating_system_version as os_version,
device.mobile_brand_name as device_brand,
device.mobile_model_name as device_model,
count(distinct user_pseudo_id) as all_users
FROM `FIREBASE Project`
GROUP BY 1,2,3,4,5
What I would like to have as a result is something like this :
row|event_date|OS |total_user_engagement
---|----------------------------------------
1 |20191212 |ios |989646
2 |20191212 |android|225655
3 |20191212 |ios |125489
4 |20191212 |android| 58496
...
Thank you
The error is probably because you are referencing the variable device in the outer query, while this variable is only visible from the inner query (subquery). I believe the issue will be fixed by changing the last row of the query from GROUP BY device.operating_system
to
GROUP BY operating_system.
Hopefully this will make clearer what is happening here: the inner query is accessing the table FIREBASE_PROJECT and returning the field operating_system from the nested column device. The outer query accesses the results of the inner query, so it only sees the returned field operating_system, without information about its original context within the nested variable device. That is why trying to reference device at this level will fail.
In the other example you posted this issue does not appear, since there is only a simple query.

Output "none" - psycopg2 database query

0
I'm new to SQL and currently try to resolve a data table problem.
I have a data table and now need to find firstly the dates, on which a request lead to an error. They are pulled as timestamps from the log database. Afterwards the status is checked where not status = '200 OK' and the days on which more than 1% of requests lead to an error are shown having count(*) > 0.01,order by num desc.
Now I have the problem, that I don't get any output shown:
OUTPUT IN THE TERMINAL:
--
Following the dates for >1 percent requests, leading to an error:
None
None
CODE:
def number_one_error():
"""
Percentage of errors from requests
Counting errors and timestamps
Output:
number one errors
"""
db = psycopg2.connect(database=dbname)
c = db.cursor()
c.execute('''
select oneerror.date_column, round(((cast(oneerror.request_error as decimal))/requests*1.0),2) as percent
from (select date(log.time) AS date_column,
count (*) as request_error
from log where not status = '200 OK'
group by date_column) as oneerror
join (select date(log.time) AS date_column,
count(*) as requests
from log
group by date_column) as total
on oneerror.date_column = total.date_column
where round((cast(oneerror.request_error as decimal)/requests*1.0),3)> 0.01
order by percent desc
''')
number_one_error = c.fetchall()
db.close()
THANK YOU SO MUCH!

Select one record with a given status from a set of duplicate records with at least this one status

I have a system that requests information by sending 3 parameters to an external system: user, start_date and end_date.
I have a table
request (
id,
user,
start_date,
end_date,
status
)
that logs these requests and their status (Done for the requests that have returned, Waiting for the requests that havent yet returned).
Every few hours I will resubmit the requests that havent yet returned, even though the initial request could still return some time in the future.
After some time, my table will have multiple requests for the same user/start_date/end_date, some of them Waiting, some Done.
What I need is a query that returns a list of ids of all duplicate requests with the exception of 1 Done, where at least one request has status=Done.
In summary, I need a way to clear the exceeding requests for a given user/start_date/end_date, if at least one of them has status=Done (doesnt matter which one, I just need to keep 1 status = Done for a given user/start_date/end_date).
So far I've been able to pinpoint the duplicate requests that have at least 1 Done. To select all but one complete from this query, I would most likely wrap this entire query into 2 more selects and do the magic, but the query as is, is already really slow. Can someone help me refactor it and select the end result i need?
http://sqlfiddle.com/#!5/10c25a/1
I'm using SQLite
The expected result from the dataset provided in the sqlfiddle is this:
454, 457, 603, (604 or 605 not both), 607, 608
select r.id from request r inner join (
select user, start_date, end_date,
min(case when status = 'Done' then id end) as keep_id
from request
group by user, start_date, end_date
having count(case when status = 'Done' then 1 end) > 0 and count(*) > 1
) s on s.user = r.user and s.start_date = r.start_date and s.end_date = r.end_date
and s.keep_id <> r.id
What you're after are records that match this criteria...
There exists another record with Status "Done"
That other "Done" record matches user, start_date and end_date
That other record has a lower id value (because you need something to identify the record to keep) or the other record has a higher id but the record you're looking at has Status "Waiting"
With all that in mind, here's your query
SELECT id FROM request r1
WHERE EXISTS (
SELECT 1 FROM request r2
WHERE r2.Status = 'Done'
AND r1.user = r2.user
AND r1.start_date = r2.start_date
AND r1.end_date = r2.end_date
AND (r1.id > r2.id OR r1.Status = 'Waiting')
)
ORDER BY id
http://sqlfiddle.com/#!5/10c25a/26 ~ produces IDs 454, 457, 603, 605, 607 and 608