Add blob storage as input to stream analytics - azure-stream-analytics

I have an IoThub trigger Azure function app which outputs its data to blob storage, the blob storage is connected to stream analytics job, and the output of this job goes to Powerbi for visualization, I checked my blob storage "butterflycontainer/nodecurrentstatus" and see my data successfully written, the problem is that I can't see any input or output events at stream analytics job, nothing triggered , no datasets created at powerbi, why? and when I press test for input of stream analytics I got,
Successful connection test
Connection to input 'nodecurrentstatusin' succeeded.
but when press sample data input, I got the below error,
No events found for 'nodecurrentstatusin'
No events found for 'nodecurrentstatusin'. Start time: Monday, November 6, 2017, 6:10:05 PM End time: Monday, November 6, 2017, 6:15:05 PM Last time arrival: Thursday, January 1, 1970, 2:00:00 AM Diagnostics: While sampling data, no data was received from '1' partitions.
find also below paths, for my setup,
IotHub
ButteryFly.azure-devices.net
Function App
https://butteryflyfnapp.azurewebsites.net
function name "Nodes_Current_State_Report_Fn_App"
in this function I write to my blob storage using the below code
var strings = JSON.stringify(tableentr);
blobSvc.createBlockBlobFromText(
containername,
'nodecurrentstatus',
strings,
function(error, result, response){
if(error){
console.log("Couldn't upload string");
console.error(error);
} else {
console.log('String uploaded successfully');
}
});
Blob path
https://butterflystorageaccount.blob.core.windows.net/butterflycontainer/nodecurrentstatus
Stream Analytics job - RESOURCE ID
/subscriptions/f80e5865-d8c1-4ac7-b102-ff72bdbe1188/resourceGroups/IotHub/providers/Microsoft.StreamAnalytics/streamingjobs/butterfly_streamanalytics

Related

Stream analytics time agnostic processing

my stream analytics jobs is not generating any output and watermark delay matrix keeps on increasing. for all other metrices like Input events, output events, cpu%(increases to 10%), runtime error, data conversion error there is no change (value is zero).
Input stream: 2 gb file in ADLS gen2. with file having multiple JSON objects (not JSON array)
Sample data:
{"a":"1"} //row 1
{"a":"2"} // row 2
{"a":"1"} // row 3
The data doesn't have any time field to indicate event time.
output stream: Cosmos db
late event set to 5 sec
out of order event set to 25 sec.
Expected output: i want job to write to output every X mins.
current output: job is not writing any data to output and all metric values are zero
Thanks

Azure Stream analytics 'InputDeserializerError.InvalidData' error

I want to store the telemetry data received at the Azure IoTHub to SQL data base through Stream analytics. But I am getting input error when i a start my streaming job. I have attached screenshot of the code, error and message received at the IoT hub.
How to rectify this error?
Thanks
Rectified the error.
Mistake was in the defined JSON String.
Corrected string :
MSG_TXT = '{ {"temperature": "{temperature}","time": "{time}","date":"{date}"}}'
Thanks.

Bigquery Data Transfer from S3 intermittent success

When using bigquery data transfer to move data into BigQuery from S3, I get intermittent success (I've actually only seen it work correctly one time).
The success:
6:00:48 PM Summary: succeeded 1 jobs, failed 0 jobs.
6:00:14 PM Job bqts_5f*** (table test_json_data) completed successfully. Number of records: 516356, with errors: 0.
5:59:13 PM Job bqts_5f*** (table test_json_data) started.
5:59:12 PM Processing files from Amazon S3 matching: "s3://bucket-name/*.json"
5:59:12 PM Moving data from Amazon S3 to Google Cloud complete: Moved 2661 object(s).
5:58:50 PM Starting transfer from Amazon S3 for files with prefix: "s3://bucket-name/"
5:58:49 PM Starting transfer from Amazon S3 for files modified before 2020-07-27T16:48:49-07:00 (exclusive).
5:58:49 PM Transfer load date: 20200727
5:58:48 PM Dispatched run to data source with id 138***3616
The usual instance those is just 0 success, 0 failures, like the following:
8:33:13 PM Summary: succeeded 0 jobs, failed 0 jobs.
8:32:38 PM Processing files from Amazon S3 matching: "s3://bucket-name/*.json"
8:32:38 PM Moving data from Amazon S3 to Google Cloud complete: Moved 3468 object(s).
8:32:14 PM Starting transfer from Amazon S3 for files with prefix: "s3://bucket-name/"
8:32:14 PM Starting transfer from Amazon S3 for files modified between 2020-07-27T16:48:49-07:00 and 2020-07-27T19:22:14-07:00 (exclusive).
8:32:13 PM Transfer load date: 20200728
8:32:13 PM Dispatched run to data source with id 13***0415
What might be going on such that the second log above doesn't have the Job bqts... run? Is there somewhere I can get more details about these data transfer jobs? I had a different job that ran into a JSON error, so I don't believe it was that.
Thanks!
I was a bit confused by the logging, since it finds and moves the objects like
I believe I misread the docs, I had thought previously that an amazon URI of s3://bucket-name/*.json would crawl the directory for the json files, but even though the message above seems to indicate such, it only loads files into bigquery that are at the top level (for the s3://bucket-name/*.json URI).

GoogleApiException: Google.Apis.Requests.RequestError Backend Error [500] when streaming to BigQuery

I'm streaming data to BigQuery for the past year or so from a service in Azure written in c# and recently started to get increasing amount of the following errors (most of the requests succeed):
Message: [GoogleApiException: Google.Apis.Requests.RequestError An
internal error occurred and the request could not be completed. [500]
Errors [
Message[An internal error occurred and the request could not be completed.] Location[ - ] Reason[internalError] Domain[global] ] ]
This is the code I'm using in my service:
public async Task<TableDataInsertAllResponse> Update(List<TableDataInsertAllRequest.RowsData> rows, string tableSuffix)
{
var request = new TableDataInsertAllRequest {Rows = rows, TemplateSuffix = tableSuffix};
var insertRequest = mBigqueryService.Tabledata.InsertAll(request, ProjectId, mDatasetId, mTableId);
return await insertRequest.ExecuteAsync();
}
Just like any other cloud service, BigQuery doesn't offer a 100% uptime SLA (it's actually 99.9%), so it's not uncommon to encounter transient errors like these. We also receive them frequently in our applications.
You need to build exponential backoff-and-retry logic into your application(s) to handle such errors. A good way of doing this is to use a queue to stream your data to BigQuery. This is what we do and it works very well for us.
Some more info:
https://cloud.google.com/bigquery/troubleshooting-errors
https://cloud.google.com/bigquery/loading-data-post-request#exp-backoff
https://cloud.google.com/bigquery/streaming-data-into-bigquery
https://cloud.google.com/bigquery/sla

BigQuery: Not able to export to CSV

I am able to import data from my google storage. However, having troubling exporting data to Google Cloud Storage CSV files through the web console. Data set is small, and I am not getting any specific reasons that cause the issue.
Extract9:30am
gl-analytics:glcqa.Device togs://glccsv/device.csv
Errors:
Unexpected. Please try again.
Job ID: job_f8b50cc4b4144e14a22f3526a2b76b75
Start Time: 9:30am, 24 Jan 2013
End Time: 9:30am, 24 Jan 2013
Source Table: gl-analytics:glcqa.Device
Destination URI: gs://glccsv/device.csv
It looks like you have a nested schema, which cannot be output to csv. Try setting the output format to JSON.
Note this bug has now been fixed internally, so after our next release you'll get a better error when this happens.