Azure Stream Analytics - No output - azure-stream-analytics

I have a Stream job that uses reference data and retrieved device data from an IOT Hub. The code is below;
WITH AggregatedValues AS
(
SELECT
iot1.DeviceId,
iot1.SensorId,
MAX(CAST(iot1.Timestamp AS DateTime)) AS [DateTime],
CASE WHEN ch1.IsActive = 1 AND ch1.AggregateType = 1
THEN SUM(iot1.SensorValue)
WHEN ch1.IsActive = 1 AND ch1.AggregateType = 3
THEN MAX(iot1.SensorValue)
WHEN ch1.IsActive = 1 AND ch1.AggregateType = 4
THEN MIN(iot1.SensorValue)
ELSE AVG(iot1.SensorValue)
END [AggValue]
FROM
MecfabIoTHub iot1
JOIN DeviceRef1 ch1
ON iot1.DeviceId = ch1.DeviceId AND iot1.SensorId = ch1.SensorId
GROUP BY iot1.DeviceId, iot1.SensorId,ch1.IsActive, ch1.AggregateType, TumblingWindow(minute,5)
)
SELECT
ch2.DeviceName,
ch2.SensorType,
ch2.SensorName,
ch2.TriggerVal,
ch2.TriggerState,
ch2.AggregateType,
ch2.IsActive,
AggregatedValues.[AggValue],
CASE WHEN ch2.IsActive = 1 AND ch2.TriggerState = 1 AND AggregatedValues.AggValue >= ch2.TriggerVal
THEN ch2.AlertDesc
WHEN ch2.IsActive = 1 AND ch2.TriggerState = 2 AND AggregatedValues.AggValue <= ch2.TriggerVal
THEN ch2.AlertDesc
ELSE NULL
END AS Alert
INTO
BLOBSensorData
FROM
AggregatedValues
JOIN DeviceRef1 ch2
ON ch2.DeviceId = AggregatedValues.DeviceId AND ch2.SensorId = AggregatedValues.SensorId
I have no issues with the reference data and all fields have been checked to be as expected (DeviceId is same from the reference data to the device data, etc.). Output I am just placing in a blob to test. Not sure what is going on but there is no output being generated. All inputs and outputs have been tested.
I also have checked the data coming from the IOTHub using the device explorer and there are definitely events being received by the IOT Hub.
Any ideas?

Have you checked that you output name in the query is correct, e.g. INTO BLOBSensorData?
I had this problem before because the output name in my query was incorrect.

Have you tested your query creates as described in the 'Diagnose and solve problems' from your stream analytics job? Maybe the join does not work.
Regards,
Filip

Related

How do I merge these 2 datasets on Jupyter using SQLAlchemy?

I'm currently using the WRDS cloud to connect to their servers so that I can manipulate my data before I download it, so I can run some ML algorithms on it. One way of extracting the data is by using the raw_sql() function that they have built in, presumably from SQLalchemy. The reason I'm trying to merge both of these is because I need 2 columns that aren't available in the first data set. I'm trying to merge them based on the security ID and the date, but I keep getting an empty table.
Here is what I'm trying to do:
company = conn.raw_sql("""select a.symbol, a.strike_price, a.optionid, a.impl_volatility, a.open_interest, a.volume,
a.vega, a.best_bid, a.best_offer, a.date, a.exdate, b.underlyingbid, b.underlyingask
from optionm.opprcd2020 a
inner join optionm.option_price_2020 b
on a.optionid = b.optionid
and a.date = b.date
where a.volume > 500
and a.volume > 0.5 * a.open_interest
and a.cp_flag = 'C'
and (a.exdate - a.date) <= 30 and (a.exdate-a.date) > 0
and a.best_bid > 0.3
LIMIT 50""") # use raw_sql() method to open 2020 options data with relevant info
The relevant documentation is here: https://www.fredasongdrechsler.com/intro-to-python-for-fnce/maneuvering-wrds-data

Trying to grab data from two columns and format them properly

So I have a database here with a table that lists off whether or not certain processes have failed. There are two columns, IsProcessed, and IsFailed. A failed process can still be considered processed if the error was handled, but I still need to recognize that it failed. They're both bit values, and so I have to try and grab and separate them despite that they may depend on one another. After they've been separated out, I need to count the relative successes and relative failures.
I utilize an AND statement in my WHERE clause to try and separate out the successes from the failures. I honestly have no idea where to go from here.
SELECT CAST(PQ.ProcessedDate AS Date) AS Date, COUNT(PQ.IsProcessed) AS Successes
FROM PQueue PQ
WHERE PQ.ProcessDate BETWEEN '2019-10-1' AND '2019-10-31' AND PQ.IsFailed = 0 AND PQ.IsProcessed = 1
GROUP BY CAST(PQ.ProcessDate AS Date)
ORDER BY CAST(PQ.ProcessDate AS Date) ASC
Because a failed process can still be processed in the system, we have to do a check first to try and grab the data that was processed but didn't flag a failure. Now I need to try and find a way to not exclude the failures, but include them and place them in a group. I can do the group part, but I'm relatively new to SQL so I don't know whether or not I can place something in an IF statement somewhere or try to use variables to get this done. Thank you in advance.
You seem to want conditional aggregation:
SELECT CAST(PQ.ProcessedDate AS Date) AS Date,
SUM(CASE WHEN PQ.IsFailed = 0 AND PQ.IsProcessed = 1 THEN 1 ELSE 0 END) as Successes,
SUM(CASE WHEN PQ.IsFailed = 1 AND PQ.IsProcessed = 1 THEN 1 ELSE 0 END) as Fails
FROM PQueue PQ
WHERE PQ.ProcessDate BETWEEN '2019-10-1' AND '2019-10-31'
GROUP BY CAST(PQ.ProcessDate AS Date)
ORDER BY CAST(PQ.ProcessDate AS Date) ASC
If SQL Server then maybe a CASE statement would help you out.
eg
SELECT ...........
CASE
WHEN IsFailed = 1 AND IsProcessed = 1 THEN "Processed But Failed"
WHEN IsFailed = 0 AND IsProcessed = 0 THEN "Not Processed"
WHEN IsFailed = 0 AND IsProcessed = 1 THEN "Processed Succesfully"
WHEN IsFailed = 1 AND IsProcessed = 0 THEN "Failed"
END as REsult

SQL GROUP BY function returning incorrect SUM amount

I've been working on this problem, researching what I could be doing wrong but I can't seem to find an answer or fault in the code that I've written. I'm currently extracting data from a MS SQL Server database, with a WHERE clause successfully filtering the results to what I want. I get roughly 4 rows per employee, and want to add together a value column. The moment I add the GROUP BY clause against the employee ID, and put a SUM against the value, I'm getting a number that is completely wrong. I suspect the SQL code is ignoring my WHERE clause.
Below is a small selection of data:
hr_empl_code hr_doll_paid
1 20.5
1 51.25
1 102.49
1 560
I expect that a GROUP BY and SUM clause would give me the value of 734.24. The value I'm given is 211461.12. Through troubleshooting, I added a COUNT(*) column to my query to work out how many lines it's running against, and it's giving a result of 1152, furthering reinforces my belief that it's ignoring my WHERE clause.
My SQL code is as below. Most of it has been generated by the front-end application that I'm running it from, so there is some additional code in there that I believe does assist the query.
SELECT DISTINCT
T000.hr_empl_code,
SUM(T175.hr_doll_paid)
FROM
hrtempnm T000,
qmvempms T001,
hrtmspay T166,
hrtpaytp T175,
hrtptype T177
WHERE 1 = 1
AND T000.hr_empl_code = T001.hr_empl_code
AND T001.hr_empl_code = T166.hr_empl_code
AND T001.hr_empl_code = T175.hr_empl_code
AND T001.hr_ploy_ment = T166.hr_ploy_ment
AND T001.hr_ploy_ment = T175.hr_ploy_ment
AND T175.hr_paym_code = T177.hr_paym_code
AND T166.hr_pyrl_code = 'f' AND T166.hr_paid_dati = 20180404
AND (T175.hr_paym_type = 'd' OR T175.hr_paym_type = 't')
GROUP BY T000.hr_empl_code
ORDER BY hr_empl_code
I'm really lost where it could be going wrong. I have stripped out the additional WHERE AND and brought it down to just T166.hr_empl_code = T175.hr_empl_code, but it doesn't make a different.
By no means am I any expert in SQL Server and queries, but I have decent grasp on the technology. Any help would be very appreciated!
Group by is not wrong, how you are using it is wrong.
SELECT
T000.hr_empl_code,
T.totpaid
FROM
hrtempnm T000
inner join (SELECT
hr_empl_code,
SUM(hr_doll_paid) as totPaid
FROM
hrtpaytp T175
where hr_paym_type = 'd' OR hr_paym_type = 't'
GROUP BY hr_empl_code
) T on t.hr_empl_code = T000.hr_empl_code
where exists
(select * from qmvempms T001,
hrtmspay T166,
hrtpaytp T175,
hrtptype T177
WHERE T000.hr_empl_code = T001.hr_empl_code
AND T001.hr_empl_code = T166.hr_empl_code
AND T001.hr_empl_code = T175.hr_empl_code
AND T001.hr_ploy_ment = T166.hr_ploy_ment
AND T001.hr_ploy_ment = T175.hr_ploy_ment
AND T175.hr_paym_code = T177.hr_paym_code
AND T166.hr_pyrl_code = 'f' AND T166.hr_paid_dati = 20180404
)
ORDER BY hr_empl_code
Note: It would be more clear if you have used joins instead of old style joining with where.

How to get last synchronization date and server

I have a central suscriber with 5 publishers. After replication, I want to get the last synchronization date and the server that made that synchronization. Is it possible to find this information from existing tables that SQL Server uses for replication?
I use something simmiler to this to check last replication times. However this is on a database level and not a table level. this is ran from the distribution database of the publisher.
SELECT MAX(DISTINCT h.[time]) AS RunTime
FROM MSmerge_history h INNER JOIN
MSmerge_agents a ON a.id = h.agent_id
WHERE (a.publisher_db = 'PublishedDbName')
AND (h.runstatus <> 1)
AND Left(h.comments,2) = 'Up'
OR (a.publisher_db = 'PublishedDbName')
AND (h.runstatus <> 1)
AND Left(h.comments,2) = 'No'
OR (a.publisher_db = 'PublishedDbName')
AND (h.runstatus <> 1)
AND Left(h.comments,2) = 'Me'
And Publication Like 'PublictionName%'
GO

Data Modeling of Entity with Attributes

I'm storing some very basic information "data sources" coming into my application. These data sources can be in the form of a document (e.g. PDF, etc.), audio (e.g. MP3, etc.) or video (e.g. AVI, etc.). Say, for example, I am only interested in the filename of the data source. Thus, I have the following table:
DataSource
Id (PK)
Filename
For each data source, I also need to store some of its attributes. Example for a PDF would be "numbe of pages." Example for audio would be "bit rate." Example for video would be "duration." Each DataSource will have different requirements for the attributes that need to be stored. So, I have modeled "data source attribute" this way:
DataSourceAttribute
Id (PK)
DataSourceId (FK)
Name
Value
Thus, I would have records like these:
DataSource->Id = 1
DataSource->Filename = 'mydoc.pdf'
DataSource->Id = 2
DataSource->Filename = 'mysong.mp3'
DataSource->Id = 3
DataSource->Filename = 'myvideo.avi'
DataSourceAttribute->Id = 1
DataSourceAttribute->DataSourceId = 1
DataSourceAttribute->Name = 'TotalPages'
DataSourceAttribute->Value = '10'
DataSourceAttribute->Id = 2
DataSourceAttribute->DataSourceId = 2
DataSourceAttribute->Name = 'BitRate'
DataSourceAttribute->Value '16'
DataSourceAttribute->Id = 3
DataSourceAttribute->DataSourceId = 3
DataSourceAttribute->Name = 'Duration'
DataSourceAttribute->Value = '1:32'
My problem is that this doesn't seem to scale. For example, say I need to query for all the PDF documents along with thier total number of pages:
Filename, TotalPages
'mydoc.pdf', '10'
'myotherdoc.pdf', '23'
...
The JOINs needed to produce the above result are just too costly. How should I address this problem?
Scaling is one of the most common problems with EAV (Entity-Attribute-Value) data structures. In short, you have to ask for the meta data (i.e. locate the attributes) to get to the data. However, here is a query that you can use to get the data you want:
Select DataSourceId
, Min( Case When Name = 'TotalPages' Then Value End ) As TotalPages
, Min( Case When Name = 'BitRate' Then Value End ) As BitRate
, Min( Case When Name = 'Duration' Then Vlaue End ) As Duration
From DataSourceAttribute
Group By DataSourceId
In order to improve performance, you'll want an index on DataSourceId and perhaps Name as well. To get to the results you posted, you would do:
Select DataSource.FileName
, Min( Case When DataSourceAttribute.Name = 'TotalPages' Then Value End ) As TotalPages
, Min( Case When DataSourceAttribute.Name = 'BitRate' Then Value End ) As BitRate
, Min( Case When DataSourceAttribute.Name = 'Duration' Then Vlaue End ) As Duration
From DataSourceAttribute
Join DataSource
On DataSource.Id = DataSourceAttribute.DataSourceId
Group By DataSource.FileName
It seems like you want something a bit more losse than a typical Relational db. Sounds like a good candidate for something like Lucene or MongoDB. Lucene is an index engine which allows any type of document to be stored and indexed. MongoDB is in the middle between RDBMS and free-form document storage. JSON in some form or other (MongoDB is a good example) should fit nicely.
This might work, but define too costly...
select
datasource.id,
d1.id as d1id,
d1.value as d1filename,
d2.id as d2id,
d2.value as d2totalpages
from datasource
inner join datasourceattribute d1
on datasource.id = d1.datasourceid and d1.name = 'filename'
inner join datasourceattribute d2
on datasource.id = d2.datasourceid and d2.name = 'totalpages'
having d1filename like '%pdf'