KQL - Summarize With Nested Counts

KQL - Summarize With Nested Counts - kql

I'm pretty new to KQL, and running into a problem trying to format my data in Azure Sentinel.
I have a query with these columns I'm interested in: Email and IP.
If I run something like summarize count() by Email, IP I get almost what I want, however in some cases, the email value will be the same, but could be coming from a different IP.
Is there a way to have the output formatted so it will show the email value, then under that, list all the IP count values associated with the email?

You can easily create a set (unique values) of IPs per Email
// Data sample generation. Not part of the solution.
let t = range i from 1 to 30 step 1 | extend Email = strcat("email_", tostring(toint(rand(3))), "#", dynamic(["gmail", "outlook", "hotmail"])[toint(rand(3))], ".com"), IP = strcat_delim(".", tostring(toint(rand(256))), tostring(toint(rand(256))), tostring(toint(rand(256))), tostring(toint(rand(256))));
// Solution starts here
t
| summarize make_set(IP) by Email
Email
set_IP
email_0#outlook.com
["22.0.72.237","32.17.234.224","84.232.201.220","181.161.231.252","121.190.204.101"]
email_1#gmail.com
["187.58.44.239","95.117.156.141","16.245.100.138"]
email_2#outlook.com
["154.46.54.212","178.139.208.204","204.197.11.160","160.96.246.141","173.141.14.145","100.35.29.216"]
email_0#gmail.com
["230.16.241.147","173.164.214.236","95.194.124.236","186.101.39.234"]
email_1#hotmail.com
["19.214.101.122","168.72.148.236"]
email_2#hotmail.com
["136.190.117.24","113.147.42.218","224.220.103.201"]
email_0#hotmail.com
["126.176.108.237","201.222.155.151"]
email_2#gmail.com
["132.67.147.234","2.101.57.210"]
email_1#outlook.com
["6.173.214.26","18.169.68.195","87.141.157.8"]
Fiddle

Related

Bitcoin in BigQuery: blockchain analytics on public data wrong ommited results

I am trying to use the Bitcoin in BigQuery to extract bitcoin transactions related to some addresses.
I tried the below query to retrieve this information, but I always get empty results.
SELECT
timestamp,
inputs.input_pubkey_base58 AS input_key,
outputs.output_pubkey_base58 AS output_key,
outputs.output_satoshis as satoshis
FROM `bigquery-public-data.bitcoin_blockchain.transactions`
JOIN UNNEST (inputs) AS inputs
JOIN UNNEST (outputs) AS outputs
WHERE outputs.output_pubkey_base58 = '16XMrZ2GNsrUBv3qNZtvvPKna2PKFuq8gQ'
AND outputs.output_satoshis >= 0
AND inputs.input_pubkey_base58 IS NOT NULL
AND outputs.output_pubkey_base58 IS NOT NULL
GROUP BY timestamp, input_key, output_key, satoshis
Also, when I change the address to one with more transactions, I get results but with some transactions omitted.
I do not know if I am writing something wrong or what. Can anyone help, please?
Thanks
I saw a similar question in a previous post and tried what was suggested but it did not work:
BigQuery Blockchain Dataset is Missing Data?
I am expecting to get 3 address when trying
WHERE outputs.output_pubkey_base58 = '16XMrZ2GNsrUBv3qNZtvvPKna2PKFuq8gQ'
and 2 addresses when chenge the condition to consider the address in the input side

I am expecting to get 3 address when trying WHERE outputs.output_pubkey_base58 = '16XMrZ2GNsrUBv3qNZtvvPKna2PKFuq8gQ'
Your expectations are correct, but the issue here is that the dataset you are using is outdated and has been migrated to bigquery-public-data.crypto_bitcoin. Updates to the data are being sent to the new version of this dataset, whose schema is better aligned with our other cryptocurrency offerings.
To get started run below to see that expected data is there
#standardSQL
SELECT COUNT(*)
FROM `bigquery-public-data.crypto_bitcoin.transactions`,
UNNEST(outputs) AS output,
UNNEST(output.addresses) AS address
WHERE address = '16XMrZ2GNsrUBv3qNZtvvPKna2PKFuq8gQ'
with output
Row f0_
1 3

How can I use a select query in SQL to find a list of email addresses that only contain 6 random numbers?

I’m currently looking for a way to use a SQL query in order to find a list of all email addresses in our DB has only 6 random numbers and then ‘#gmail.com’.
Example:
email
----------
123456#gmail.com
324522#gmail.com
Here is what I tried:
select email
from customers
where email Not like '%^[0-9]%'
When I run this, all emails appear even the ones without any numbers in them.
select email,
SPLIT_PART(email, '#',1) as username,
SPLIT_PART(email, '#',2) as domain,
(case when username not like '%^[0-9]%' then 'Incorrect' else 'Correct' End) as format
from customers
where domain = 'gmail.com'
and format = 'Correct'
I tried this as well, for all emails even if they had numbers in them they appeared as Incorrect.
It seems like the numbers in the columns are not being recognized and I'm not sure how to fix that. The column format is Varchar

I spoke with Mode Analytics and it turns out my DB is Redshift this is how I was able to get this work:
select email
from customers
where email similar to '[0-9]{6}#gmail.com'
Thanks everyone for your help!

Splunk: Duplicate Fields, different fields - merge

I have a number of individual records in Splunk all with a common field of X, which i'm trying to combine.
E.g
User-name=JG, srcIP=10.0.0.1
User-name=JG,file=jg.docx
User-name=JG, dstIP=10.1.1.0
User-name=JG,Email=jg#jg.com
User-name=AB, srcIP=10.0.0.2
User-name=AB,file=AB.docx
User-name=AB, dstIP=10.2.2.0
User-name=AB,Email=AB#AB.com
I want to do the following search: Group all the records which match by the User-name fields, and allow me to manipulate the fields.
E.g
USERNAE, srcIP, file, dstIP, Email
JG, 10.0.0.1, jg.docx, 10.1.1.0, jg#jg.com
AB, 10.0.0.2, AB.docx, 10.2.2.0, AB#AB.com
Thank you!

You can check out the stats command to do this:
your search
| stats latest(srcIP) as srcIP, latest(file) as file, latest(dstIP) as dstIP, latest(email) as email by User-name
You can then perform any operations you want to on these fields. The latest function will give you the latest value seen for srcIP/file etc. for that user name.

How do I find first occurence of a particular event for the list of users in splunk

i have to first occurence of a particular event for the list of users in splunk.
eg: i have list of user say 10 from another query.
i am using below query to find date of first mail sent by customer 12345. How do i find the same for a list of customer that i get from another query?
index=abc appname=xyz "12345" "*\"SENT\"}}"|reverse|table _time|head 1

Try using stats.
index=abc appname=xyz "12345" "*\"SENT\"}}" | stats first(_time)

Log parser 2.2 Query to count Unique user

I've been trying to build a query over a custom log of mine where I sort the users based on certain criteria to have some overview of them.
My log contains a entry for each time a user tries to download a file, that entry contains date, ip, a custom generated token and how many times that user has tried.
The token is stored by SESSION and a token is only valid for 5 attempts of downloading, So that means that one ip can have multiple users(with different tokens) that each have different amount of attempts.
What I want to achieve is rather simple, I want to group the users by ip, and then count their amount of attempts, and then find out how many users there are.
The amount is not counted per IP but rather per token meaning a log entry may look like this:
IP TOKEN ATTEMPT
111.111.111.111 DK1234 a1
111.111.111.111 DK9876 a1
111.111.111.111 DK9876 a2
222.222.222.222 DK5432 a1
Below is my latest attempts of trying to achieve this, but while I try to make the logic behind it work it just isn't what I want.
(The fields involved are: Ip, Token and Attempt (The attempt value looking like this: a1, a2, a3 and so on for each attempt the user makes).)
SELECT
Ip,
CASE TO_INT(replace_chr(Attempt, 'a', ''))
WHEN 1
THEN
'MUL'
ELSE
'ONE'
END
AS Users,
SUM(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
FROM
--LOG PATH
WHERE
Status = 'SUCCESS'
and
TO_DATE(TO_TIMESTAMP(LDate, 'dd/MM/yyyy-hh:mm:ss')) > SUB( TO_LOCALTIME(SYSTEM_TIMESTAMP()), TIMESTAMP('8','d') )
GROUP BY
Ip,
Users
If I could somehow store a value to increase for each unique Token per IP and store it with the results, but I cannot / do not know a way to achieve this either.
Using DISTINCT won't work either because when I do I get a error saying that DISTINCT cannot work with GROUP BY and my SUM() / Possible COUNT() won't work when Ip isn't in a GROUP BY
(The snippet below is what I have tried with DISTINCT / count)
SELECT
Ip,
COUNT(DISTINCT Token),
SUM(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
FROM
--Log Path
WHERE
Status = 'SUCCESS'
and
TO_DATE(TO_TIMESTAMP(LDate, 'dd/MM/yyyy-hh:mm:ss')) > SUB( TO_LOCALTIME(SYSTEM_TIMESTAMP()), TIMESTAMP('8','d') )
GROUP BY
Ip
How I'd like my result grid to end up: (Without the explanation text of course)
IP Users Attempts
123.456.789.012 4 4 (4 users each trying one time)
120.987.654.321 2 5 (2 users, One user tried once and the other user tried 4 times)
444.444.444.444 1 1 (One user, one attempt)
I hope I'm making sense, otherwise I'll be happy to elaborate / explain anything needed :)

I believe you need two stages. The first stage collapses the entries per-user:
SELECT
Ip,
Token,
MAX(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
FROM
...
GROUP BY
Ip,
Token
The second stage then rolls up by Ip:
SELECT
Ip,
COUNT(*) AS Users,
SUM(Attempts) As TotalAttempts
FROM
...
GROUP BY
Ip

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

KQL - Summarize With Nested Counts - kql

Related

Bitcoin in BigQuery: blockchain analytics on public data wrong ommited results

How can I use a select query in SQL to find a list of email addresses that only contain 6 random numbers?

Splunk: Duplicate Fields, different fields - merge

How do I find first occurence of a particular event for the list of users in splunk

Log parser 2.2 Query to count Unique user

Categories

Resources