I'm trying to count the number of occurrences separate log messages appears per client
My table is in this structure
EventTime - Logmessage - HostName - Client
This query gives me a number of logs for each client:
SELECT Count([Log Message]) AS Count
,[Client]
FROM [test1].[dbo].[logs_test]
Group By Client
How would I go down into a lower level and get the number of times a log appears per client? The output I'm looking to achieve is something like the below
Log Message Count Client
NON ATTEMPT 12 TestClient
Appreciate any help
You will need to change what you are counting and add another level to your grouping...
SELECT LogMessage
, Count(EventTime) AS Count
, Client
FROM [test1].[dbo].[logs_test]
Group By Client, LogMessage
Related
For AWS ALB access logs (https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html), I would like an Athena SQL query example to sort descending/ascending by the count of the client:port field for elb_status_code/target_status_code during a start and end date (DD-MM-YYYY HH-MM).
The result of the query for target_status_code=500 to be like:
client:port
count of target_status_code=500
70.132.2.XX:port
2570
70.132.2.XX:port
2315
80.122.1.XX:port
1750
...
...
The point would be to find the top clients:port (The IP address and port of the requesting client) with the elb_status_code/target_status_code=4xx or 5xx (https://en.wikipedia.org/wiki/List_of_HTTP_status_codes).
Using the table described in Querying Classic Load Balancer Logs , assuming you partition it by date (the partition key is called date_partition_key below), you could do something like this:
SELECT
CONCAT(request_ip, ':', CAST(request_port AS VARCHAR)) AS client_port,
COUNT(*) AS count_of_status_500
FROM elb_logs
WHERE elb_response_code = '500'
AND date_partition_key BETWEEN '2022-01-01' AND '2022-01-03'
GROUP BY 1
ORDER BY 2 DESC
The 1 and 2 in the group and order by clauses refer back to the first and second items in the select list, i.e. the client port and the count, respectively. It's just a convenient way of not having to repeat the function calls etc.
Meanwhile I found this link
https://aws.amazon.com/premiumsupport/knowledge-center/athena-analyze-access-logs/
with some ALB access logs queries examples. This may be useful for users not very familiar with SQL queries (like me).
Imagine you have this two tables.
a) streamers: it contains time series data, at a 1-min granularity, of all the channels that broadcast on
Twitch. The columns of the table are:
username: Channel username
timestamp: Epoch timestamp, in seconds, corresponding to the moment the data was captured
game: Name of the game that the user was playing at that time
viewers: Number of concurrent viewers that the user had at that time
followers: Number of total followers that the channel had at that time
b) games_metadata: it contains information of all the games that have ever been broadcasted on Twitch.
The columns of the table are:
game: Name of the game
release_date: Timestamp, in seconds, corresponding to the date when the game was released
publisher: Publisher of the game
genre: Genre of the game
Now I want the Top 10 publishers that have been watched the most during the first quarter of 2019. The output should contain publisher and hours_watched.
The problem is I don't have any database, I created one and inputted some values by hand.
I thought of this query, but I'm not sure if it is what I want. It may be right (I don't feel like it is ), but I'd like a second opinion
SELECT publisher,
(cast(strftime('%m', "timestamp") as integer) + 2) / 3 as quarter,
COUNT((strftime('%M',`timestamp`)/(60*1.0)) * viewers) as total_hours_watch
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game
WHERE quarter = 3
GROUP BY publisher,quarter
ORDER BY total_hours_watch DESC
Looks about right to me. You don't need to include quarter in the GROUP BY since the where clause limits you to only one quarter. You can modify the query to get only the top 10 publishers in a couple of ways depending on the SQL server you've created.
For SQL Server / MS Access modify your select statement: SELECT TOP 10 publisher, ...
For MySQL add a limit clause at the end of your query: ... LIMIT 10;
I am teaching myself SQL and came across this service level question and was stumped. I was hoping for some assistance on it.
I am given a table with three fields.
TicketNumber (Number) ex. 53055
CreatedAt (timestamp field) ex 2015-09-16 12:47
Sender (User or Agent) ex User
The goal is to calculate service level completion. A ticket is created by a user and given a number, and a response by an agent must be given within 6 hours.
Using the formula:
n_agents_reply_back_to_user_within_6hrs / n_contacts_from_user
Now I understand that the denominator in this formula is simply
Select COUNT(Sender)
From Service_Table
Where Sender Like 'User'
The numerator is giving me a lot of issues, and I was hoping someone could walk me through it. I understand I need to identify rows with the same Ticket Number, identify what time the user sent the ticket, and identify what time the agent responded and do a difference of it, and if it is <=6 then count it, otherwise don't.
As a beginner I am having quite a bit of trouble grasping how to write such a query. Any help is appreciated. Thank you
I am not sure what exactly you are trying to achieve but you can start with something like this.
select user.TicketNumber,user.CreatedAt-agent.CreatedAt from
(Select TicketNumber ,CreatedAt ,Sender from Service_Table user
Where Sender Like 'User') user left outer join
(Select TicketNumber ,CreatedAt ,Sender from Service_Table agent Where Sender Like 'Agent') agent
on user.TicketNumber =agent.TicketNumber
I've been trying to build a query over a custom log of mine where I sort the users based on certain criteria to have some overview of them.
My log contains a entry for each time a user tries to download a file, that entry contains date, ip, a custom generated token and how many times that user has tried.
The token is stored by SESSION and a token is only valid for 5 attempts of downloading, So that means that one ip can have multiple users(with different tokens) that each have different amount of attempts.
What I want to achieve is rather simple, I want to group the users by ip, and then count their amount of attempts, and then find out how many users there are.
The amount is not counted per IP but rather per token meaning a log entry may look like this:
IP TOKEN ATTEMPT
111.111.111.111 DK1234 a1
111.111.111.111 DK9876 a1
111.111.111.111 DK9876 a2
222.222.222.222 DK5432 a1
Below is my latest attempts of trying to achieve this, but while I try to make the logic behind it work it just isn't what I want.
(The fields involved are: Ip, Token and Attempt (The attempt value looking like this: a1, a2, a3 and so on for each attempt the user makes).)
SELECT
Ip,
CASE TO_INT(replace_chr(Attempt, 'a', ''))
WHEN 1
THEN
'MUL'
ELSE
'ONE'
END
AS Users,
SUM(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
FROM
--LOG PATH
WHERE
Status = 'SUCCESS'
and
TO_DATE(TO_TIMESTAMP(LDate, 'dd/MM/yyyy-hh:mm:ss')) > SUB( TO_LOCALTIME(SYSTEM_TIMESTAMP()), TIMESTAMP('8','d') )
GROUP BY
Ip,
Users
If I could somehow store a value to increase for each unique Token per IP and store it with the results, but I cannot / do not know a way to achieve this either.
Using DISTINCT won't work either because when I do I get a error saying that DISTINCT cannot work with GROUP BY and my SUM() / Possible COUNT() won't work when Ip isn't in a GROUP BY
(The snippet below is what I have tried with DISTINCT / count)
SELECT
Ip,
COUNT(DISTINCT Token),
SUM(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
FROM
--Log Path
WHERE
Status = 'SUCCESS'
and
TO_DATE(TO_TIMESTAMP(LDate, 'dd/MM/yyyy-hh:mm:ss')) > SUB( TO_LOCALTIME(SYSTEM_TIMESTAMP()), TIMESTAMP('8','d') )
GROUP BY
Ip
How I'd like my result grid to end up: (Without the explanation text of course)
IP Users Attempts
123.456.789.012 4 4 (4 users each trying one time)
120.987.654.321 2 5 (2 users, One user tried once and the other user tried 4 times)
444.444.444.444 1 1 (One user, one attempt)
I hope I'm making sense, otherwise I'll be happy to elaborate / explain anything needed :)
I believe you need two stages. The first stage collapses the entries per-user:
SELECT
Ip,
Token,
MAX(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
FROM
...
GROUP BY
Ip,
Token
The second stage then rolls up by Ip:
SELECT
Ip,
COUNT(*) AS Users,
SUM(Attempts) As TotalAttempts
FROM
...
GROUP BY
Ip
I have a system that logs errors. Selection from my errortable:
SELECT message, personid, count(*)
FROM errorlog
WHERE time BETWEEN TO_DATE(foo) AND TO_DATE(foo) AND substr(message,0,3) = 'ERR'
GROUP BY personid, message
ORDER BY 3
What I want is to see if any user is "producing" more errors then others. For instance ERROR FOO, if user A has 4 errors and user B has 4000, then logic strikes me that user B is doing something wrong.
But can I group the way I do? This is a modified version where the selection only grouped message and counted it, resolving so that ERROR FOO resulted in 4004 from my example over.
With your query, if the messages are different, then you will get multiple records per person.
If you only want one record per person, you would need to put an aggregate function around message
For example you could do:
SELECT MIN(message), personid, count(*)
FROM errorlog
WHERE time BETWEEN TO_DATE(foo) AND TO_DATE(foo) AND substr(message,0,3) = 'ERR'
GROUP BY personid, message
ORDER BY 3
Here I've changed message to MIN(message) which will return the first message for this person, alphabetically.
However if you are happy to return multiple records per person, then I see no problem with your script. It will show a list of personid and message ordered by the ones which are in the table the most, displaying only records which have a message starting with ERR