I'm using AWS Athena to parse my Application Load Balancer logs.
I'm trying to get a list of browsers and for each browsers, the number of unique user.
I've managed to get this list but the user count is not correct. I don't know how to group users by their IP.
1 Google Chrome 9000000
2 Apple Safari 8000000
3 Unknown 5000000
4 Mozilla Firefox 2000000
5 Internet Explorer 10000
6 Outlook 10000
7 Opera 88
8 Edge 7
Here is the query
SELECT DISTINCT
CASE
WHEN user_agent LIKE '%edge%'THEN 'Edge'
WHEN user_agent LIKE '%MSIE%' THEN
'Internet Explorer'
WHEN user_agent LIKE '%Firefox%' THEN
'Mozilla Firefox'
WHEN user_agent LIKE '%Chrome%' THEN
'Google Chrome'
WHEN user_agent LIKE '%Safari%' THEN
'Apple Safari'
WHEN user_agent LIKE '%Opera%' THEN
'Opera'
WHEN user_agent LIKE '%Outlook%' THEN
'Outlook'
ELSE 'Unknown'
END AS browser , COUNT(client_ip) AS Number
FROM alb_logs
WHERE parse_datetime(time,'yyyy-MM-DD''T''HH:mm:ss.SSSSSS''Z')
BETWEEN parse_datetime('2018-01-01-00:00:00','yyyy-MM-DD-HH:mm:ss')
AND parse_datetime('2018-07-18-00:00:00','yyyy-MM-DD-HH:mm:ss')
GROUP BY CASE
WHEN user_agent LIKE '%edge%'THEN 'Edge'
WHEN user_agent LIKE '%MSIE%' THEN
'Internet Explorer'
WHEN user_agent LIKE '%Firefox%' THEN
'Mozilla Firefox'
WHEN user_agent LIKE '%Chrome%' THEN
'Google Chrome'
WHEN user_agent LIKE '%Safari%' THEN
'Apple Safari'
WHEN user_agent LIKE '%Opera%' THEN
'Opera'
WHEN user_agent LIKE '%Outlook%' THEN
'Outlook'
ELSE 'Unknown'
END
ORDER BY Number DESC
I'm missing some kind of group by client_ip, but the result would be wrong...
You need COUNT(DISTINCT client_ip) aggregation and also you don't need SELECT DISTINCT, like this
SELECT CASE WHEN user_agent ... END AS browser, COUNT(DISTINCT client_ip) AS Number
FROM alb_logs
WHERE ...
GROUP BY 1
ORDER BY 2 DESC
Related
We noticed an issue across all our projects starting Feb 10th where the first_open event contains an empty campaign name (traffic_source.name column). The traffic_source.source and traffic_source.medium columns contain the right values indicating the users are from Google Ads campaigns (google and cpc respectively).
Following is the BigQuery select (just replace the {project} and {ID}):
SELECT event_date, traffic_source.source, traffic_source.medium, traffic_source.name, count(*)
FROM `{PROJECT}.analytics_{ID}.events_202202*`
WHERE event_name = 'first_open' AND traffic_source.medium = 'cpc' and traffic_source.name IS NULL
group by 1,2,3,4
order by 1,2,3,4
I've checked the first_open attribution on Firebase console and it seems to be fine. We suspect the issue is related to the export to BigQuery only.
Does anyone else notice this?
Trying to retrieve just users that don't have a disabled campaign, where disabled = 1.
A user can have a disabled campaign and a non-disabled campaign, but if they have any disabled campaigns I want to exclude them from my final result.
Thinking I need something like
SELECT DISTINCT
user_id,
CASE
WHEN
disabled = 1
THEN
'Disabled'
ELSE
'Good'
END
AS campaign_disabled
But this just returns two rows for each user_id, one being Good and the other campaign_disabled
You want the users that have disabled = 0 for all their campaigns, so the max value of the column disabled must be 0:
SELECT user_id
FROM tablename
GROUP BY user_id
HAVING MAX(disabled) = 0
I have a log file I am parsing through and finding an application id from a specific page. I can grab that and their IP fine, but I would like to take that IP, find the earliest point they connect to the site and look at their referrer there. I'm running into trouble how to pull that info. This here gets me their Application ID, if they're on mobile, if they're a new applicant, if they're using the Spanish version of our site, and their IP. Is there anyway to tag that IP, look back at the earliest time (the long is all on the same day, so just need normal hh:mm:ss) and then grab the referrer from that line all in one statement? Or will I have to do this query, then do another query for each IP?
SELECT
COALESCE(
TO_INT(REPLACE_CHR(EXTRACT_TOKEN(TO_LOWERCASE(cs-uri-query),3,'+'),';','')),
TO_INT(REPLACE_CHR(EXTRACT_TOKEN(TO_LOWERCASE(cs-uri-query),2,'+'),';',''))
) AS ApplicationID,
CASE
COALESCE(
INDEX_OF(cs(User-Agent), 'iPhone'),
INDEX_OF(cs(User-Agent), 'iPad'),
INDEX_OF(cs(User-Agent), 'Android'),
INDEX_OF(cs(User-Agent), 'BlackBerry'),
INDEX_OF(cs(User-Agent), 'Windows Phone')
) WHEN NULL THEN 0
ELSE 1
END AS Mobile,
CASE TO_LOWERCASE(SUBSTR(EXTRACT_FILENAME(cs(Referer)),0,INDEX_OF(EXTRACT_FILENAME(cs(Referer)),'?')))
WHEN 'application.aspx' THEN 1
ELSE 0
END AS NewApplication,
CASE TO_LOWERCASE(EXTRACT_TOKEN(cs(Referer),5,'/')) WHEN 'es' THEN 1 ELSE 0
END AS Spanish,
EXTRACT_FILENAME(c-ip)
AS IP,
INTO %outPutFile%
FROM %logFile%
WHERE %whereClause%
AND TO_LOWERCASE(EXTRACT_FILENAME(cs-uri-stem))
IN ('logclientconfirmation')
AND TO_UPPERCASE(cs-method) = 'POST'
AND TO_LOWERCASE(cs-uri-query) LIKE '{application}%'
ORDER BY ApplicationID
I'm currently trying to teach myself SQL in order to write better reports with our Orion system and I'm running into a small issue. I want to generate a report with a count of the number of Windows machines and Linux machines. This is my current code.
SELECT OperatingSystem, Count(OperatingSystem) AS TotalMachines
FROM Machines
Where
(
(OperatingSystem LIKE '%Windows%') OR
(OperatingSystem LIKE '%Linux%')
)
GROUP BY OperatingSystem
And the return I get is this
Red Hat Enterprise Linux 20
Novell SUSE Linux Enterprise 17
Debian Linux 5
Windows Server 2008 (32-bit) 11
Windows Server 2008 R2 (32-bit) 49
Windows Server 2008 (64-bit) 33
Windows Server 2008 R2 (64-bits) 16
Windows Server 2003 (32-bit) 35
Is it possible to combine all of the different Linux Operating Systems into a single row called Linux and combine all of the Windows Operating Systems into a single row called Windows in an SQL Query?
Yes. You want to use case in the group by clause itself:
SELECT (case when OperatingSystem LIKE '%Windows%' then 'Windows'
when OperatingSystem LIKE '%Linux%' then 'Linux'
end) as WhichOs, Count(*) AS TotalMachines
FROM Machines
Where (OperatingSystem LIKE '%Windows%') OR
(OperatingSystem LIKE '%Linux%')
GROUP BY (case when OperatingSystem LIKE '%Windows%' then 'Windows'
when OperatingSystem LIKE '%Linux%' then 'Linux
end);
EDIT:
The above should work (note the same expression is in the select and group by. Perhaps this will work:
SELECT WhichOs, Count(*) AS TotalMachines
FROM (SELECT m.*,
(case when OperatingSystem LIKE '%Windows%' then 'Windows'
when OperatingSystem LIKE '%Linux%' then 'Linux'
end) as WhichOs
FROM Machines m
) m
Where (OperatingSystem LIKE '%Windows%') OR
(OperatingSystem LIKE '%Linux%')
GROUP BY WhichOs;
to solve such problems I prefer to use union neither case, as it helps you easily extend query in future:
select OSType, count(*) as TotalMachines
from (
SELECT 'Linux' as OSType FROM Machines WHERE OperatingSystem LIKE '%Linux%'
UNION ALL
SELECT 'Windows' as OSType FROM Machines WHERE OperatingSystem LIKE '%Windows%'
) as subquery
GROUP BY OSType
in any case, check both variants and select fastest
Try something like this:
SELECT
case
when OperatingSystem like '%Windows%' then 'Windows'
when OperatingSystem like '%Linux%' then 'Linux'
else 'Other'
end as Operating_System
, Count(OperatingSystem) AS TotalMachines
FROM Machines
Where
(
(OperatingSystem LIKE '%Windows%') OR
(OperatingSystem LIKE '%Linux%')
)
GROUP BY
OperatingSystem
, case
when OperatingSystem like '%Windows%' then 'Windows'
when OperatingSystem like '%Linux%' then 'Linux'
else 'Other'
end
I have a query that counts records based on various criteria, but I would like to split the results into two grouped fields based on said criteria. Is there a way to do this inside the query - or is it necessary to have two queries?
select count(PT_TASK_ASSAY_DISP) as 'Count' from vbasedata v
where
v.PT_TASK_ASSAY_DISP like 'SS%' or
(v.PT_TASK_ASSAY_DISP like 'IP%' and
v.PT_TASK_ASSAY_DISP not like 'IP CUT' and
v.PT_TASK_ASSAY_DISP not like 'IP NEG');
I'm trying to group by IP and SS to get a total count.
I am not really sure what you want. But if you want one count per 'SS%' and one for 'IP%'. Then maybe something like this:
select
SUM(CASE WHEN PT_TASK_ASSAY_DISP LIKE 'SS%' THEN 1 ELSE 0 END) as SSCount,
SUM(CASE WHEN PT_TASK_ASSAY_DISP LIKE 'IP%' THEN 1 ELSE 0 END) as IPCount
from vbasedata v
where
v.PT_TASK_ASSAY_DISP like 'SS%' or
(v.PT_TASK_ASSAY_DISP like 'IP%' and
v.PT_TASK_ASSAY_DISP not like 'IP CUT' and
v.PT_TASK_ASSAY_DISP not like 'IP NEG');