How to optimise query?
df = hive.sql("""
WITH cte AS (
SELECT
MAX(timestamp) as event_time,
name, domain, type, event_id
FROM table1
WHERE
device = '{device}'
AND name IS NOT NULL
AND (timestamp >= (CAST(NOW() as INT) - {delta}*24*60*60))
AND name NOT IN (
SELECT
DISTINCT(name)
FROM table1
WHERE (
device = '{device}' AND
name IS NOT NULL AND org ='{org}' AND
((timestamp < CAST(NOW() as INT) - {delta}*24*60*60) AND
(timestamp >= CAST(NOW() as INT) - (120+{delta})*24*60*60)))
)
AND event_id NOT IN ('4624', '4656')
AND LOWER(status) NOT IN ('failure')
GROUP BY
name, domain, type, event_id
)
SELECT
from_unixtime(timestamp, 'dd-MMM-yy HH:mm:ss') as timestamp,
name, domain, type, event_id
FROM cte
WHERE
event_time IN (
SELECT
MAX(event_time)
FROM cte
GROUP BY
name
)
""".format(device=device, device=device))
result = df.toPandas()
...
I'am trying to get all latest events with some parameters in table, which were not in last 120 days before delta.
My spark config:
SparkConf()).config("spark.sql.execution.arrow.pyspark.enabled", "true").("spark.driver.memory", "16G").config("spark.driver.maxResultSize", "8G").config("spark.sql.broadcastTimeout", 36000)
Here is an example of my table.
timestamp
type
aux1
aux2
event_id
dst.fqdn
dst.host
src.cat
src.host
src.hostname
src.ip
subsystem
title
vendor
action
severity
status
name
type
domain
device
org
1676619776
norm
0x80
%%4423
4776
do.uwc.com
uwc-nb-121
OperatingSystem
192.168.40.100
uwc-infra-fs1
192.168.40.100
Security
Windows
Microsoft
access
LOW
success
ifernin
account
uwc
mswin
uwc
1676637549
norm
0x100081
%%1541%%4423
5145
do.uwc.com
uwc-pc-022
OperatingSystem
192.168.40.100
uwc-infra-fs1
192.168.40.100
Security
Windows
Microsoft
access
LOW
success
kiersk
account
uwc
mswin
uwc
1676632328
norm
0x3e7
0xa18
4658
do.uwc.com
uwc-nb-144
OperatingSystem
192.168.40.100
uwc-infra-fs1
192.168.40.100
Security
Windows
Microsoft
close
LOW
success
aseen
account
uwc
mswin
uwc
1676620697
norm
0x100081
%%1541%%4416
5145
do.uwc.com
uwc-nb-124
OperatingSystem
192.168.40.100
uwc-infra-fs1
192.168.40.100
Security
Windows
Microsoft
access
LOW
success
aseen
account
uwc
mswin
uwc
1676641029
norm
0x100081
%%1541%%4416
5145
do.uwc.com
10.77.11.30
OperatingSystem
192.168.40.100
uwc-infra-fs1
192.168.40.100
Security
Windows
Microsoft
access
LOW
failure
aseen
account
uwc
mswin
uwc
1676540338
norm
0x80
%%4423
5145
eff2.uniwc.com
uwc-infra-fs1
OperatingSystem
192.168.40.100
uwc-infra-fs1
192.168.40.100
Security
Windows
Microsoft
access
LOW
success
pgolubeva
account
uwc
mswin
uwc
1676632808
norm
4776
eff2.uniwc.com
uwc-infra-dc2
OperatingSystem
192.168.40.10
uwc-infra-dc2
192.168.40.10
Security
Windows
Microsoft
login
HIGH
ongoing
vdolgov
account
mswin
uwc
1676540338
norm
S-1-0-0
4624
eff2.uniwc.com
192.168.40.12
OperatingSystem
192.168.40.10
uwc-infra-dc2
192.168.40.10
Security
Windows
Microsoft
login
LOW
failure
popovm
account
uwc.com
mswin
uwc
1676673260
norm
0x100081
%%1541%%4416%%4423
5145
eff2.uniwc.com
10.77.11.51
OperatingSystem
192.168.40.10
uwc-infra-fs1
192.168.40.10
Security
Windows
Microsoft
access
LOW
ongoing
evteeva
account
twqp.uwc.com
mswin
uwc
1676540338
norm
4776
eff2.uniwc.com
uwc-nb-101
OperatingSystem
192.168.40.10
uwc-infra-dc2
192.168.40.10
Security
Windows
Microsoft
login
HIGH
success
monitor
account
mswin
uwc
the last select statement can be re-written as window function to avoid possibly costly, see the example query below:
WITH cte AS (
SELECT
MAX(timestamp) as event_time,
name, domain, type, event_id
FROM table1
WHERE
device = '{device}'
AND name IS NOT NULL
AND (timestamp >= (CAST(NOW() as INT) - {delta}*24*60*60))
AND name NOT IN (
SELECT
DISTINCT(name)
FROM table1
WHERE (
device = '{device}' AND
name IS NOT NULL AND org ='{org}' AND
((timestamp < CAST(NOW() as INT) - {delta}*24*60*60) AND
(timestamp >= CAST(NOW() as INT) - (120+{delta})*24*60*60)))
)
AND event_id NOT IN ('4624', '4656')
AND LOWER(status) NOT IN ('failure')
GROUP BY
name, domain, type, event_id
)
SELECT
from_unixtime(timestamp, 'dd-MMM-yy HH:mm:ss') as timestamp,
name,
domain,
type,
event_id
FROM
(SELECT
timestamp,
name, domain, type, event_id
rank() over (partition by name order by event_time desc) as event_time_rank
FROM cte
) t
WHERE
event_time_rank = 1
Related
We noticed an issue across all our projects starting Feb 10th where the first_open event contains an empty campaign name (traffic_source.name column). The traffic_source.source and traffic_source.medium columns contain the right values indicating the users are from Google Ads campaigns (google and cpc respectively).
Following is the BigQuery select (just replace the {project} and {ID}):
SELECT event_date, traffic_source.source, traffic_source.medium, traffic_source.name, count(*)
FROM `{PROJECT}.analytics_{ID}.events_202202*`
WHERE event_name = 'first_open' AND traffic_source.medium = 'cpc' and traffic_source.name IS NULL
group by 1,2,3,4
order by 1,2,3,4
I've checked the first_open attribution on Firebase console and it seems to be fine. We suspect the issue is related to the export to BigQuery only.
Does anyone else notice this?
I am using entity framework to connect to SQL Azure and data pushed from azure functions.
I noticed that at a particular time interval of 10 mins today there were errors like following thrown from function
An exception has been raised that is likely due to a transient failure. If you are connecting to a SQL Azure database consider using SqlAzureExecutionStrategy
When I looked at sql database statistics, it had reached 99% during that time and then it was good after that.
How can I find out how many transactions were executed during that timeframe using azure portal?
It can probably give me an idea about what caused this load on the server.
In this case what you are seeing may be throttling. When throttling occurs connections get affected but also bad programming may produce unwanted number of connections and every tier has limits of number connections. Below queries will help you monitor successful/terminated/throttled connections.
select *
from sys.database_connection_stats_ex
where start_time >= CAST(FLOOR(CAST(getdate() AS float)) AS DATETIME)
order by start_time desc
select *
from sys.event_log
where event_type <> 'connection_successful' and
start_time >= CAST(FLOOR(CAST(getdate() AS float)) AS DATETIME)
order by start_time desc
To monitor when your databases reach the DTU limit you can use below query:
SELECT
(COUNT(end_time) - SUM(CASE WHEN avg_cpu_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'CPU Fit Percent',
(COUNT(end_time) - SUM(CASE WHEN avg_log_write_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Log Write Fit Percent',
(COUNT(end_time) - SUM(CASE WHEN avg_data_io_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Physical Data Read Fit Percent'
FROM sys.dm_db_resource_stats
When query above shows service level objective (SLO) of 99.9% <= go to next tier.
To identify queries creating high IO, high CPU usage, high resource usage you can use Query Store. There you can find queries creating the high DTU usage.
-- Top 10 long running queries
SELECT TOP 10 q.query_id, p.plan_id,
rs.count_executions,
qsqt.query_sql_text,
CONVERT(NUMERIC(10,2), (rs.avg_cpu_time/1000)) as 'avg_cpu_time_seconds',
CONVERT(NUMERIC(10,2),(rs.avg_duration/1000)) as 'avg_duration_seconds',
CONVERT(NUMERIC(10,2),rs.avg_logical_io_reads ) as 'avg_logical_io_reads',
CONVERT(NUMERIC(10,2),rs.avg_logical_io_writes ) as 'avg_logical_io_writes',
CONVERT(NUMERIC(10,2),rs.avg_physical_io_reads ) as 'avg_physical_io_reads',
CONVERT(NUMERIC(10,0),rs.avg_rowcount ) as 'avg_rowcount'
from sys.query_store_query q
JOIN sys.query_store_plan p ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
INNER JOIN sys.query_store_query_text qsqt
ON q.query_text_id = qsqt.query_text_id
WHERE rs.last_execution_time > dateadd(hour, -1, getutcdate())
ORDER BY rs.avg_duration DESC
I want to filter out NHT out of my BigQuery with some criterion I have found in my Dataset from Google Analytics. For my example I want these two sets of criterion filtered out:
networkLocation REGEXP_Contains (r"^(ovh \(nwk\)|hostwinds llc.|bhost inc|prisma networks llc|psychz networks|buyvm services|private customer|secure dragon llc.|vmpanel|netaction telecom srl-d|hostigation|frontlayer technologies inc.|digital energy technologies limited|owned-networks|rica web services|netaction telecom srl-d|hurricane electric inc.|private customer - host.howpick.com|ssdvirt|sway broadband|detect network|gorillaservers inc.|micfo llc.| netaction telecom srl|egihosting|zenlayer inc|intercom online inc.|gs1 argentine|ovh hosting inc.|vps cheap inc.|limeip networks|blackhost ltd.|amazon.com inc.)$")
AND
device.browserVersion REGEXP_Contains(r"^(41.0|55.0)$")
OR
networkLocation REGEXP_Contains ("^(hpro group ltd)$")
AND
device.browserVersion REGEXP_Contains("45.0")
My SQL:
SELECT
channelGrouping,
date,
h.page.pagePath AS Page,
SUM(totals.timeOnSite) AS Session_Duration,
SUM(totals.visits) AS Visits,
AVG(totals.timeonSite/totals.visits) AS Avg_Time_per_Session,
SUM(totals.bounces) AS Bounce,
(SUM(totals.bounces)/SUM(totals.visits)) AS Bounce_rate,
geoNetwork.networkLocation,
device.browserVersion,
device.browser
FROM
`93868086.ga_sessions_*`,
UNNEST(hits) as h
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY))
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY
date,
channelGrouping,
geoNetwork.networkLocation,
device.browserVersion,
device.browser,
h.page.pagePath
I need a HAVING NOT Clause however I am not sure how to group the set of statements I need to filter out my criterion. Any help would be great!
Assuming your expressions for criterion are correct - below should be a way
HAVING NOT (
(
REGEXP_CONTAINS (networkLocation, r"^(ovh \(nwk\)|hostwinds llc.|bhost inc|prisma networks llc|psychz networks|buyvm services|private customer|secure dragon llc.|vmpanel|netaction telecom srl-d|hostigation|frontlayer technologies inc.|digital energy technologies limited|owned-networks|rica web services|netaction telecom srl-d|hurricane electric inc.|private customer - host.howpick.com|ssdvirt|sway broadband|detect network|gorillaservers inc.|micfo llc.| netaction telecom srl|egihosting|zenlayer inc|intercom online inc.|gs1 argentine|ovh hosting inc.|vps cheap inc.|limeip networks|blackhost ltd.|amazon.com inc.)$")
AND REGEXP_CONTAINS(device.browserVersion, r"^(41.0|55.0)$")
) OR (
REGEXP_CONTAINS (networkLocation, r"^(hpro group ltd)$")
AND REGEXP_CONTAINS(device.browserVersion, r"45.0")
)
)
I have a table with speaker, session, conference, email. my goal is to make a query that combines the conference and session into one field so that we can apply some HTML to it and format it when we preview it elsewhere.
The issue here is when a speaker is attending two difference conferences and presenting different sessions. This query somehow duplicates the sessions from one conference and applies it to the second conference:
SELECT speaker AS 'speakername', email AS 'email',
CAST(
(SELECT conference AS 'strong',
(SELECT session AS 'session' from speakersessions AS ds
WHERE ds.speaker = dd.speaker
GROUP BY session
for xml path(''), type) AS 'sessions'
FROM speakersessions AS ds
WHERE ds.speaker = dd.speaker
GROUP BY conference
for xml path(''), type)
AS NVARCHAR(MAX))
AS 'conferences'
FROM speakersessions AS dd
GROUP BY speaker, email
the results that show for speaker 'greg' are:
<strong>Business Planning </strong>
<sessions><session>
10 tips to fast-track
</session><session>
Hybrid planning
</session><session>
Planning on the cloud
</session><session>
The Boardroom
</session></sessions>
<strong>Reporting Analytics</strong>
<sessions><session>
10 tips to fast-track
</session><session>
Hybrid planning
</session><session>
Planning on the cloud
</session><session>
The Boardroom
</session></sessions> <br/>
(I added line breaks)
but as you can see, this is not what the speakersessions table shows:
Conference Session
Business Planning | 10 tips to fast-track
Business Planning | Hybrid planning
Reporting Analytics | Planning on the cloud
Reporting Analytics | The Boardroom
so the sessions for reporting analytics are not populating. What's going on here?
SELECT speaker AS 'speakername', email,
CAST(
(SELECT conference AS 'strong',
(SELECT session AS 'session'
from speakersessions AS part3
WHERE part3.conference = part2.conference and part3.email = part1.email
GROUP BY session,conference
for xml path(''), type) AS 'sessions'
FROM speakersessions AS part2
WHERE part1.email = part2.email
GROUP BY part2.conference
for xml path(''), type)
AS NVARCHAR(MAX)) AS 'conferences'
FROM speakersessions AS part1
GROUP BY email, speaker
had to make sure that conference was added but that it was joined with the proper subquery
I'm currently trying to teach myself SQL in order to write better reports with our Orion system and I'm running into a small issue. I want to generate a report with a count of the number of Windows machines and Linux machines. This is my current code.
SELECT OperatingSystem, Count(OperatingSystem) AS TotalMachines
FROM Machines
Where
(
(OperatingSystem LIKE '%Windows%') OR
(OperatingSystem LIKE '%Linux%')
)
GROUP BY OperatingSystem
And the return I get is this
Red Hat Enterprise Linux 20
Novell SUSE Linux Enterprise 17
Debian Linux 5
Windows Server 2008 (32-bit) 11
Windows Server 2008 R2 (32-bit) 49
Windows Server 2008 (64-bit) 33
Windows Server 2008 R2 (64-bits) 16
Windows Server 2003 (32-bit) 35
Is it possible to combine all of the different Linux Operating Systems into a single row called Linux and combine all of the Windows Operating Systems into a single row called Windows in an SQL Query?
Yes. You want to use case in the group by clause itself:
SELECT (case when OperatingSystem LIKE '%Windows%' then 'Windows'
when OperatingSystem LIKE '%Linux%' then 'Linux'
end) as WhichOs, Count(*) AS TotalMachines
FROM Machines
Where (OperatingSystem LIKE '%Windows%') OR
(OperatingSystem LIKE '%Linux%')
GROUP BY (case when OperatingSystem LIKE '%Windows%' then 'Windows'
when OperatingSystem LIKE '%Linux%' then 'Linux
end);
EDIT:
The above should work (note the same expression is in the select and group by. Perhaps this will work:
SELECT WhichOs, Count(*) AS TotalMachines
FROM (SELECT m.*,
(case when OperatingSystem LIKE '%Windows%' then 'Windows'
when OperatingSystem LIKE '%Linux%' then 'Linux'
end) as WhichOs
FROM Machines m
) m
Where (OperatingSystem LIKE '%Windows%') OR
(OperatingSystem LIKE '%Linux%')
GROUP BY WhichOs;
to solve such problems I prefer to use union neither case, as it helps you easily extend query in future:
select OSType, count(*) as TotalMachines
from (
SELECT 'Linux' as OSType FROM Machines WHERE OperatingSystem LIKE '%Linux%'
UNION ALL
SELECT 'Windows' as OSType FROM Machines WHERE OperatingSystem LIKE '%Windows%'
) as subquery
GROUP BY OSType
in any case, check both variants and select fastest
Try something like this:
SELECT
case
when OperatingSystem like '%Windows%' then 'Windows'
when OperatingSystem like '%Linux%' then 'Linux'
else 'Other'
end as Operating_System
, Count(OperatingSystem) AS TotalMachines
FROM Machines
Where
(
(OperatingSystem LIKE '%Windows%') OR
(OperatingSystem LIKE '%Linux%')
)
GROUP BY
OperatingSystem
, case
when OperatingSystem like '%Windows%' then 'Windows'
when OperatingSystem like '%Linux%' then 'Linux'
else 'Other'
end