0
I'm new to SQL and currently try to resolve a data table problem.
I have a data table and now need to find firstly the dates, on which a request lead to an error. They are pulled as timestamps from the log database. Afterwards the status is checked where not status = '200 OK' and the days on which more than 1% of requests lead to an error are shown having count(*) > 0.01,order by num desc.
Now I have the problem, that I don't get any output shown:
OUTPUT IN THE TERMINAL:
--
Following the dates for >1 percent requests, leading to an error:
None
None
CODE:
def number_one_error():
"""
Percentage of errors from requests
Counting errors and timestamps
Output:
number one errors
"""
db = psycopg2.connect(database=dbname)
c = db.cursor()
c.execute('''
select oneerror.date_column, round(((cast(oneerror.request_error as decimal))/requests*1.0),2) as percent
from (select date(log.time) AS date_column,
count (*) as request_error
from log where not status = '200 OK'
group by date_column) as oneerror
join (select date(log.time) AS date_column,
count(*) as requests
from log
group by date_column) as total
on oneerror.date_column = total.date_column
where round((cast(oneerror.request_error as decimal)/requests*1.0),3)> 0.01
order by percent desc
''')
number_one_error = c.fetchall()
db.close()
THANK YOU SO MUCH!
Related
I am trying to calculate the percentage of faulty transaction statuses per IP address in Clickhouse.
SELECT
c.source_ip,
COUNT(c.source_ip) AS total,
(COUNT(c.source_ip) / t.total_calls) * 100 AS percent_faulty
FROM sip_transaction_call AS c
CROSS JOIN
(
SELECT count(*) AS total_calls
FROM sip_transaction_call
) AS t
WHERE (status = 8 OR status = 9 or status = 13)
GROUP BY c.source_ip
Unfortunately Clickhouse rejects this with:
"Received exception from server (version 20.8.3):
Code: 47. DB::Exception: Received from 127.0.0.1:9000. DB::Exception: Unknown identifier: total_calls there are columns: source_ip, COUNT(source_ip)."
I tried various workarounds for the "invisible" alias, but failed. Any help would be greatly appreciated.
SELECT
source_ip,
countIf(status = 8 OR status = 9 or status = 13) AS failed,
failed / count() * 100 AS percent_faulty
FROM sip_transaction_call
GROUP BY source_ip
If you have a GROUP BY clause, you can only use columns you are grouping by (ie. c.source_ip) - for others you need an aggregate function.
Clickhouse is not too helpful here - for almost any other engine you would get a more meaningful error. See https://learnsql.com/blog/not-a-group-by-expression-error/.
Anyway, change grouping to GROUP BY c.source_ip, t.total_calls to fix it.
I have a table with sessions events names. Each session can have 3 different types of events.
There are sessions that have only error type event and I need to identify them by getting a list those session.
I tried the following code:
SELECT
test.SessionId, SS.RequestId
FROM
(SELECT DISTINCT
SSE.SessionId,
SSE.type,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId, SSE.type) AS total_XSESIONID_TYPE,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId) AS total_XSESIONID
FROM
[CMstg].SessionEvents SSE
-- WHERE SSE.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
) AS test
WHERE
test.total_XSESIONID_TYPE = test.total_XSESIONID
AND test.type = 'Errors'
-- AND test.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
Each session can have more than one type, and I need to count only the sessions that have only type 'errors'. I don't want to include sessions that have additional types of events in the count
While I'm running the first query I'm getting a count of 3 error event per session, but while running the all procedure the number is multiplied to 90?
Sample table :
sessionID
type
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
NonError
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
In this case I should get
sessionid = fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Please advice - hope this is clearer now, thanks!
It's been a long time but I think something like this should get you the desired results:
SELECT securemeSessionId
FROM <TableName> -- replace with actual table name
GROUP BY securemeSessionId
HAVING COUNT(*) = COUNT(CASE WHEN type = 'errors' THEN 1 END)
And a pro tip: When asking sql-server questions, it's best to follow these guidelines
SELECT *
FROM NameOfDataBase
WHERE type!= 'errors'
Is it what you wanted to do?
Iam trying to make a query that will check if a user is logged in or not.
The data is stored as 2 seperate rows one is called "in", when a users logs in and the other "out". I then need to find all people currently logged in but not logged out. so what ive tried is comparing the two select statement. This gives me the name(UNILOGIN) of all the people currently logged in and not out:
select UNILOGIN from timereg where date = CONVERT(DATE,GETDATE(),110) and CHECKEDIN = 'IND'
except
select UNILOGIN from timereg where date = CONVERT(DATE,GETDATE(),110) and CHECKEDIN = 'UD'
I then need to find their top 1 time, when they checked in. How would one make a statement that could get result in in one query string? If possible at all. something like:
SELECT TOP 1 UNILOGIN, TIME from TIMEREG where UNILOGIN = "result of query"
Tell me if i need to elaborate.
Aggregate. Get one result row per unilogin, make sure it has an 'IND' record and no 'UD' record and select the maximum login time for the date in question.
select unilogin, max(time)
from timereg
where date = convert(date, getdate(), 110)
and checkedin in ('IND', 'UD')
group by unilogin
having count(case when checkedin = 'IND' then 1 end) > 0
and count(case when checkedin = 'UD' then 1 end) = 0;
Edit: Tidied up the query a bit. Checked running on one day (versus the 27 I need) and the query runs. With 27 days of data it's trying to process 5.67TB. Could this be the issue?
Latest ID of error run:
Job ID: ee-corporate:bquijob_3f47d425_1530e03af64
I keep getting this error message when trying to run a query in BigQuery, both through the UI and Bigrquery.
Query Failed
Error: An internal error occurred and the request could not be completed.
Job ID: ee-corporate:bquijob_6b9bac2e_1530dba312e
Code below:
SELECT
CASE WHEN d.category_grouped IS NULL THEN 'N/A' ELSE d.category_grouped END AS category_grouped_cleaned,
COUNT(UNIQUE(msisdn_token)) AS users,
(SUM(up_link_data_bytes) + SUM(down_link_data_bytes))/1000000 AS tot_data_mb
FROM (
SELECT
request_domain, up_link_data_bytes, down_link_data_bytes, msisdn_token, timestamp
FROM (TABLE_DATE_RANGE([helpful-skyline-97216:WEBLOG_Staging.WEBLOG_], TIMESTAMP('20160101'), TIMESTAMP('20160127')))
WHERE SUBSTR(http_status_code,1,1) IN ('1',
'2',
'3')) a
LEFT JOIN EACH web_usage_201601.domain_to_cat_lookup_27JAN_with_groups d
ON
a.request_domain = d.request_domain
WHERE
DATE(timestamp) >= '2016-01-01'
AND DATE(timestamp) <= '2016-01-27'
GROUP EACH BY
1
Is there something I'm doing wrong?
The problem seems to be coming from UNIQUE() - it returns repeated field with too many elements in it. The error could be improved, but workaround for you would be to use explicit GROUP BY and then run COUNT on top of it.
If you are okay with an approximation, you can also use
COUNT(DISTINCT msisdn_token) AS users
or a higher approximation parameter than the default 1000,
COUNT(DISTINCT msisdn_token, 5000) AS users
GROUP BY is the most general approach, but these can be faster if they do what you need.
I'm executing the following query.
SELECT properties.os, boundary, user, td,
SUM(boundary) OVER(ORDER BY rows) AS session
FROM
(
SELECT properties.os, ROW_NUMBER() OVER() AS rows, user, td,
CASE WHEN td > 1800 THEN 1 ELSE 0 END AS boundary
FROM (
SELECT properties.os, t1.properties.distinct_id AS user,
(t2.properties.time - t1.properties.time) AS td
FROM (
SELECT properties.os, properties.distinct_id, properties.time, srlno,
srlno-1 AS prev_srlno
FROM (
SELECT properties.os, properties.distinct_id, properties.time,
ROW_NUMBER()
OVER (PARTITION BY properties.distinct_id
ORDER BY properties.time) AS srlno
FROM [ziptrips.ziptrips_events]
WHERE properties.time > 1367916800
AND properties.time < 1380003200)) AS t1
JOIN (
SELECT properties.distinct_id, properties.time, srlno,
srlno-1 AS prev_srlno
FROM (
SELECT properties.distinct_id, properties.time,
ROW_NUMBER() OVER
(PARTITION BY properties.distinct_id ORDER BY properties.time) AS srlno
FROM [ziptrips.ziptrips_events]
WHERE
properties.time > 1367916800
AND properties.time < 1380003200 )) AS t2
ON t1.srlno = t2.prev_srlno
AND t1.properties.distinct_id = t2.properties.distinct_id
WHERE (t2.properties.time - t1.properties.time) > 0))
It fails the first time with the following error. However on 2nd run it completes without any issue. I'd appreciate any pointers on what might be causing this.
The error message is:
Query Failed
Error: Field 'properties.os' not found in table '__R2'.
Job ID: job_VWunPesUJVLxWGZsMgpoti14BM4
Thanks,
Navneet
We (the BigQuery team) are in the process of rolling out a new version of the query engine that fixes a number of issues like this one. You likely hit an old version of the query engine and then when you retried, hit the new one. It may take us a day or so with a portion of traffic pointing at the updated version in order to verify there aren't any regressions. Please let us know if you hit this again after 24 hours or so.