azure stream analytics combine data with same time - sql

I am working with azure stream analytics query language and i have some problems with data preparation.this is my current output
but i want to combine data with same time
for example in first line
{"46027020", "#A83","2017-05-18T08:47:26.5620000Z"}
with header "IGEF_NR", "Decklack" and "time"
second line: {"46027070", "#475","2017-05-18T08:49:20.1750000Z"}

You will need split the filter sub-query to "IGEF_NR" and "DECKLACK", then join the two. Something like below:
filter_IGEF AS
(
SELECT sub_id, value,time
FROM stringfilter
WHERE sub_id = 'IGEF_Nr'
),
filter_Decklack AS
(
SELECT sub_id, value,time
FROM stringfilter
WHERE sub_id = 'Decklack'
),
joined_value AS
(
SELECT
i.value AS IGEF_Nr,
d.value AS Decklack,
i.time as time
FROM filter_IGEF i
JOIN filter_Decklack d
ON i.time = d.time
AND DATEDIFF(second, i, d) BETWEEN 0 AND 5
)
SELECT IGEF_Nr, Decklack, time
INTO csv
FROM joined_value

Related

Oracle SQL - Timestamp splits query result into 2 rows, Need all in one with

I need a time-based query (Random or Current) with all results in one row. My current query is as follows:
WITH started AS
(
SELECT f.*, CURRENT_DATE + ROWNUM / 24
FROM
(
SELECT
d.route_name,
d.op_name,
d.route_step_name,
nvl(MAX(DECODE(d.complete_reason, NULL, d.op_STARTS)), 0) started_units,
round(nvl(MAX(DECODE(d.complete_reason, 'PASS', d.op_complete)), 0) / d.op_starts * 100, 2) yield
FROM
(
SELECT route_name,
op_name,
route_step_name,
complete_reason,
complete_quantity,
sum(start_quantity) OVER(PARTITION BY route_name, op_name, COMPLETE_REASON) op_starts,
sum(complete_quantity) OVER(PARTITION BY route_name, op_name, COMPLETE_REASON ) op_complete
FROM FTPC_LT_PRDACT.tracked_object_history
WHERE route_name = 'HEADER FINAL ASSEMBLY'
AND OP_NAME NOT LIKE '%DISPOSITION%'
and (tobj_type = 'Lot')
AND xfr_insert_pid IN
(
SELECT xfr_start_id
FROM FTPC_LT_PRDACT.xfr_interval_id
WHERE last_modified_time <= SYSDATE
AND OP_NAME NOT LIKE '%DISPOSITION%'
and complete_reason = 'PASS' OR complete_reason IS NULL
)
) d
GROUP BY d.route_name, d.op_name, d.route_step_name, complete_reason, d.op_starts
ORDER BY d.route_step_name
) f
),
queued AS
(
SELECT
ts.route_name,
ts.queue_name,
o.op_name,
sum (th.complete_quantity) queued_units
FROM
FTPC_LT_PRDACT.tracked_object_HISTORY th,
FTPC_LT_PRDACT.tracked_object_status ts,
FTPC_LT_PRDACT.route_arc a,
FTPC_LT_PRDACT.route_step r,
FTPC_LT_PRDACT.operation o,
FTPC_LT_PRDACT.lot l
WHERE r.op_key = o.op_key
and l.lot_key = th.tobj_key
AND a.to_node_key = r.route_step_key
AND a.from_node_key = ts.queue_key
and th.tobj_history_key = ts.tobj_history_key
AND a.main_path = 1
AND (ts.tobj_type = 'Lot')
AND O.OP_NAME NOT LIKE '%DISPOSITION%'
and th.route_name = 'HEADER FINAL ASSEMBLY'
GROUP BY ts.route_name, ts.queue_name, o.op_name
)
SELECT
started.route_name,
started.op_name,
started.route_step_name,
max(started.yield) started_yield,
max(started.started_units) started_units,
case when queued.queue_name is NULL then 'N/A' else queued.queue_name end QUEUE_NAME,
case when queued.queued_units is NULL then 0 else queued.queued_units end QUEUED_UNITS
FROM started
left JOIN queued ON started.op_name = queued.op_name
group by started.route_name, started.op_name, started.route_step_name, queued.queue_name, QUEUED_UNITS
order by started.route_step_name asc
;
Current Query (as expected) but missing timestamp:
I need to have a timestamp for each individual row for a different application to display the results. Any help would be greatly appreciated! When I try to add a timestamp my query is altered:
Query once timestamp is added:
Edit: I need to display the query in a visualization tool. That tool is time based and will skew the table results unless there is a datetime associated with each field. The date time value can be random, but cannot be the same for each result.
The query is to be displayed on a live dashboard, every time the application is refreshed, the query is expected to be updated.

SQL joining a separate query as a column in original query

I am struggling with joining the below two queries.
My main query is the first one below and what I am trying to achieve is the output of query 2's opt in rate column as a new column in my original query.
SELECT CAST("public"."event_event"."date_created" AS date) AS "date_created", "marketing_message__via__messag"."name" AS "name", count(*) AS "count"
FROM "public"."event_event"
LEFT JOIN "public"."marketing_message" "marketing_message__via__messag" ON "public"."event_event"."message_id" = "marketing_message__via__messag"."id" LEFT JOIN "public"."marketing_campaign" "marketing_campaign__via__campa" ON "public"."event_event"."campaign_id" = "marketing_campaign__via__campa"."id"
WHERE (date_trunc('month', CAST("public"."event_event"."date_created" AS timestamp)) = date_trunc('month', CAST(now() AS timestamp))
AND "marketing_message__via__messag"."name" IS NOT NULL AND ("marketing_message__via__messag"."name" <> ''
OR "marketing_message__via__messag"."name" IS NULL) AND "public"."event_event"."stage" = 'Lead' AND ("marketing_campaign__via__campa"."name" = 'a'
OR "marketing_campaign__via__campa"."name" = 'b'
OR "marketing_campaign__via__campa"."name" = 'c'
OR "marketing_campaign__via__campa"."name" = 'c1' OR "marketing_campaign__via__campa"."name" = 'd'
OR "marketing_campaign__via__campa"."name" = 'e'))
GROUP BY CAST("public"."event_event"."date_created" AS date), "marketing_message__via__messag"."name"
ORDER BY CAST("public"."event_event"."date_created" AS date) ASC, "marketing_message__via__messag"."name" ASC
I would like to add the following query output for "Opt-In rate" to my query above as a new column.
Query 2
SELECT marketing_message.message_text,
cast(sum((event_event.status='Opt-in')::int) as decimal) / nullif(sum((event_event.status='Sent')::int), 0)* 100 as "Opt-in Rate (Sent)"
FROM event_event
JOIN marketing_campaign ON event_event.campaign_id = marketing_campaign.id
JOIN marketing_message ON event_event.message_id = marketing_message.id
WHERE True [[AND {{campaign_name}}]] [[AND {{date_created}}]]
GROUP BY marketing_message.message_text
The short answer is that you can't.
Your first query is keyed on date and name (the GROUP BY clause) and your second query is keyed on message_text. As there is no relationship between the two datasets there is no way of joining/combining them in a single query.
You would need to find a common field (or fields) between the two datasets and join on them - but this won't give the same results that you have at the moment as your queries would need to be completely restructured.

How to get yes/no statistics from SQL of how often strings occur each

Is there a way to query a table from BigQuery project HTTPArchive by checking how often certain strings occur by a certain file type?
I was able to write a query for a single check but how to perform this query on multiple strings at once without needing to send the same query every time just with a different string check and process the ~800GB of table data every time?
Getting the results as array might work somehow? I want to publish in-depth monthly statistics to the public for free so the option to send those queries separately and get billed for querying of roughly $2000/month is no option for me as a student.
SELECT matched, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
FROM (
SELECT url, (LOWER(body) CONTAINS 'document.write') AS matched
FROM httparchive.har.2017_09_01_chrome_requests_bodies
WHERE url LIKE "%.js"
)
GROUP BY matched
Please note that this is just one example of many (~50) and the pre-generated stats are not what I am looking for as it doesn't contain the needed information.
Below is for BigQuery Standard SQL
#standardSQL
WITH strings AS (
SELECT LOWER(str) str FROM UNNEST(['abc', 'XYZ']) AS str
), files AS (
SELECT LOWER(ext) ext FROM UNNEST(['JS', 'go', 'php'])AS ext
)
SELECT
ext, str, COUNT(1) total,
COUNTIF(REGEXP_CONTAINS(LOWER(body), str)) matches,
ROUND(COUNTIF(REGEXP_CONTAINS(LOWER(body), str)) / COUNT(1), 3) ratio
FROM `httparchive.har.2017_09_01_chrome_requests_bodies` b
JOIN files f ON LOWER(url) LIKE CONCAT('%.', ext)
CROSS JOIN strings s
GROUP BY ext, str
-- ORDER BY ext, str
You can test / play with above using [totally] dummy data as below
#standardSQL
WITH `httparchive.har.2017_09_01_chrome_requests_bodies` AS (
SELECT '1234.js' AS url, 'abc=1;x=2' AS body UNION ALL
SELECT 'qaz.js', 'y=1;xyz=0' UNION ALL
SELECT 'edc.go', 's=1;xyz=2;abc=3' UNION ALL
SELECT 'edc.go', 's=1;xyz=4;abc=5' UNION ALL
SELECT 'rfv.php', 'd=1' UNION ALL
SELECT 'tgb.txt', '?abc=xyz' UNION ALL
SELECT 'yhn.php', 'like v' UNION ALL
SELECT 'ujm.go', 'lkjsad' UNION ALL
SELECT 'ujm.go', 'yhj' UNION ALL
SELECT 'ujm.go', 'dfgh' UNION ALL
SELECT 'ikl.js', 'werwer'
), strings AS (
SELECT LOWER(str) str FROM UNNEST(['abc', 'XYZ']) AS str
), files AS (
SELECT LOWER(ext) ext FROM UNNEST(['JS', 'go', 'php'])AS ext
)
SELECT
ext, str, COUNT(1) total,
COUNTIF(REGEXP_CONTAINS(LOWER(body), str)) matches,
ROUND(COUNTIF(REGEXP_CONTAINS(LOWER(body), str)) / COUNT(1), 3) ratio
FROM `httparchive.har.2017_09_01_chrome_requests_bodies` b
JOIN files f ON LOWER(url) LIKE CONCAT('%.', ext)
CROSS JOIN strings s
GROUP BY ext, str
ORDER BY ext, str
One method is to bring in a table with the different strings. This is the idea:
SELECT str, matched, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
FROM (SELECT crb.url, s.str, (LOWER(crb.body) CONTAINS s.str) AS matched
FROM httparchive.har.2017_09_01_chrome_requests_bodies crb CROSS JOIN
(SELECT 'document.write' as str UNION ALL
SELECT 'xxx' as str
) s
WHERE url LIKE "%.js"
)
GROUP BY str, matched;
You would just add more strings to s.

Use of MAX function in SQL query to filter data

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC
Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.
Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.
Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

speed up SQL Query

I have a query which is taking some serious time to execute on anything older than the past, say, hours worth of data. This is going to create a view which will be used for datamining, so the expectations are that it would be able to search back weeks or months of data and return in a reasonable amount of time (even a couple minutes is fine... I ran for a date range of 10/3/2011 12:00pm to 10/3/2011 1:00pm and it took 44 minutes!)
The problem is with the two LEFT OUTER JOINs in the bottom. When I take those out, it can run in about 10 seconds. However, those are the bread and butter of this query.
This is all coming from one table. The ONLY thing this query returns differently than the original table is the column xweb_range. xweb_range is a calculated field column (range) which will only use the values from [LO,LC,RO,RC]_Avg where their corresponding [LO,LC,RO,RC]_Sensor_Alarm = 0 (do not include in range calculation if sensor alarm = 1)
WITH Alarm (sub_id,
LO_Avg, LO_Sensor_Alarm, LC_Avg, LC_Sensor_Alarm, RO_Avg, RO_Sensor_Alarm, RC_Avg, RC_Sensor_Alarm) AS (
SELECT sub_id, LO_Avg, LO_Sensor_Alarm, LC_Avg, LC_Sensor_Alarm, RO_Avg, RO_Sensor_Alarm, RC_Avg, RC_Sensor_Alarm
FROM dbo.some_table
where sub_id <> '0'
)
, AddRowNumbers AS (
SELECT rowNumber = ROW_NUMBER() OVER (ORDER BY LO_Avg)
, sub_id
, LO_Avg, LO_Sensor_Alarm
, LC_Avg, LC_Sensor_Alarm
, RO_Avg, RO_Sensor_Alarm
, RC_Avg, RC_Sensor_Alarm
FROM Alarm
)
, UnPivotColumns AS (
SELECT rowNumber, value = LO_Avg FROM AddRowNumbers WHERE LO_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, LC_Avg FROM AddRowNumbers WHERE LC_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, RO_Avg FROM AddRowNumbers WHERE RO_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, RC_Avg FROM AddRowNumbers WHERE RC_Sensor_Alarm = 0
)
SELECT rowNumber.sub_id
, cds.equipment_id
, cds.read_time
, cds.LC_Avg
, cds.LC_Dev
, cds.LC_Ref_Gap
, cds.LC_Sensor_Alarm
, cds.LO_Avg
, cds.LO_Dev
, cds.LO_Ref_Gap
, cds.LO_Sensor_Alarm
, cds.RC_Avg
, cds.RC_Dev
, cds.RC_Ref_Gap
, cds.RC_Sensor_Alarm
, cds.RO_Avg
, cds.RO_Dev
, cds.RO_Ref_Gap
, cds.RO_Sensor_Alarm
, COALESCE(range1.range, range2.range) AS xweb_range
FROM AddRowNumbers rowNumber
LEFT OUTER JOIN (SELECT rowNumber, range = MAX(value) - MIN(value) FROM UnPivotColumns GROUP BY rowNumber HAVING COUNT(*) > 1) range1 ON range1.rowNumber = rowNumber.rowNumber
LEFT OUTER JOIN (SELECT rowNumber, range = AVG(value) FROM UnPivotColumns GROUP BY rowNumber HAVING COUNT(*) = 1) range2 ON range2.rowNumber = rowNumber.rowNumber
INNER JOIN dbo.some_table cds
ON rowNumber.sub_id = cds.sub_id
It's difficult to understand exactly what your query is trying to do without knowing the domain. However, it seems to me like your query is simply trying to find, for each row in dbo.some_table where sub_id is not 0, the range of the following columns in the record (or, if only one matches, that single value):
LO_AVG when LO_SENSOR_ALARM=0
LC_AVG when LC_SENSOR_ALARM=0
RO_AVG when RO_SENSOR_ALARM=0
RC_AVG when RC_SENSOR_ALARM=0
You constructed this query assigning each row a sequential row number, unpivoted the _AVG columns along with their row number, computed the range aggregate grouping by row number and then joining back to the original records by row number. CTEs don't materialize results (nor are they indexed, as discussed in the comments). So each reference to AddRowNumbers is expensive, because ROW_NUMBER() OVER (ORDER BY LO_Avg) is a sort.
Instead of cutting this table up just to join it back together by row number, why not do something like:
SELECT cds.sub_id
, cds.equipment_id
, cds.read_time
, cds.LC_Avg
, cds.LC_Dev
, cds.LC_Ref_Gap
, cds.LC_Sensor_Alarm
, cds.LO_Avg
, cds.LO_Dev
, cds.LO_Ref_Gap
, cds.LO_Sensor_Alarm
, cds.RC_Avg
, cds.RC_Dev
, cds.RC_Ref_Gap
, cds.RC_Sensor_Alarm
, cds.RO_Avg
, cds.RO_Dev
, cds.RO_Ref_Gap
, cds.RO_Sensor_Alarm
--if the COUNT is 0, xweb_range will be null (since MAX will be null), if it's 1, then use MAX, else use MAX - MIN (as per your example)
, (CASE WHEN stats.[Count] < 2 THEN stats.[MAX] ELSE stats.[MAX] - stats.[MIN] END) xweb_range
FROM dbo.some_table cds
--cross join on the following table derived from values in cds - it will always contain 1 record per row of cds
CROSS APPLY
(
SELECT COUNT(*), MIN(Value), MAX(Value)
FROM
(
--construct a table using the column values from cds we wish to aggregate
VALUES (LO_AVG, LO_SENSOR_ALARM),
(LC_AVG, LC_SENSOR_ALARM),
(RO_AVG, RO_SENSORALARM),
(RC_AVG, RC_SENSOR_ALARM)
) x (Value, Sensor_Alarm) --give a name to the columns for _AVG and _ALARM
WHERE Sensor_Alarm = 0 --filter our constructed table where _ALARM=0
) stats([Count], [Min], [Max]) --give our derived table and its columns some names
WHERE cds.sub_id <> '0' --this is a filter carried over from the first CTE in your example