Azure Stream analytics 'InputDeserializerError.InvalidData' error - azure-sql-database

I want to store the telemetry data received at the Azure IoTHub to SQL data base through Stream analytics. But I am getting input error when i a start my streaming job. I have attached screenshot of the code, error and message received at the IoT hub.
How to rectify this error?
Thanks

Rectified the error.
Mistake was in the defined JSON String.
Corrected string :
MSG_TXT = '{ {"temperature": "{temperature}","time": "{time}","date":"{date}"}}'
Thanks.

Related

Datastream Troubleshoot: "An unknown error occurred. Please try again. If the error persists, contact Google support"

We are trying to replicate data from AlloyDB to Bigquery using Datastream.
We Get "An unknown error occurred. Please try again. If the error persists, contact Google support."
In the Datastream console --> objects list, we see all source tables with Object Status "Failed" and Backfill status "Completed".
In Bigquery we see only a subset of the tables (not all the "Completed" objects were synced).
In the Logs Explorer I can see this error on BQ:
I also see this error: error: {
code: 11
message: "Unsupported primary key column either does not exist or is a pseudocolumn at [1:401]"
}
The column referred in the error is of type enum.
The desired situation is having all the AlloyDB tables replicated into Bigquery.
The error message is not very informative...
What does it mean?
What would be the best way to go about troubleshooting this?
We're actively working on making these error messages be more informative, and improvements are continuously being rolled out as we identify more edge cases. Assuming you followed all the steps in the documentation, then you may need to open a ticket with support for further investigation. If a support ticket isn't an option, you can still report the issue using the public issue tracker
I just had this same issue but connecting to a PostgreSQL in AWS RDS:
Beginning with Postgres 10, passwords are encrypted using SCRAM-SHA-256 in PostgreSQL. Google DataStream still expects MD5 password encryption, or it will generate an "unknown error" in the logs and fail the backfills.
You'll need to update your postgresql.conf (or RDS Cluster Parameter Group if you're using AWS like me):
password_encryption = 'MD5'
Restart the database and make sure the parameter has changed with:
SHOW password_encryption;
Reset the password of your users:
ALTER USER "{username}" with password '{password}';
More info from the PostgreSQL docs: https://www.postgresql.org/docs/current/auth-password.html

pubsub message ingestion into Bigquery using data fusion

I've built a simple realtime pipeline to receive messages and attributes from pubsub subscription and wrangle them to keep only a few fields and load it to a BigQuery table. when deployed and run, the pipeline log says Importing into table '<tablename>' from 0 paths; path[0] is '(empty)'; awaitCompletion: true
I'm unable to understand why 0 paths and why all the records are going to errors when an error collector was setup. Is there a way to debug the wrangler stage better?
sample wrangler directives as below:
keep message,attributes
set-charset :message 'utf-8'
set-type :attributes string
parse-as-json :attributes 1
parse-as-json :message 5
keep attributes_page_url,attributes_cart_remove,attributes_page_title,attributes_transaction_complete,message_event_id,message_data_dom_domain,message_data_dom_title,message_data_dom_pathname,message_data_udo_ut_visitor_id
columns-replace s/^attributes_//g
columns-replace s/^message_//g
Any help is appreciated.Thanks
The reason you see that there are 0 paths to load is that all records are causing errors during wrangling.
There are 2 ways in which you can capture these errors:
Configure the Wrangler stage to Fail Pipeline on error. This will show the exception/error in the logs.
Attach the Error output from the Wrangler stage to an Error Collector, and store the output in a File or GCS Sink. This allows you to capture the error message for each row. Configure the Error Collector as follows:
Error Message Column Name = errorMsg
Error Code Column Name = errorCode
Error Emitter Node Name = invalidRecord

Am I able to view a list of devices by partition in iothub?

I have 2 nodes of a cluster receiving messages from iothub. I split their responsibility by partition. Node 1 reads from partitions 1,3,5,7,9 and the other 2,4,6,8, and 0. Recently, my partition 8 stops responding until I stop my code and restart it. It seems like a device is sending a message that locks up the partition. What I want to do is list all devices in my partition 8. Is that possible? Is there a cloud shell command to get those devices in a list?
Not sure this will help you, but you can see the partition on the incoming messages. For example you could use Azure Stream Analytics to see the partitions using this query:
Select GetMetadataPropertyValue(IoTHub, '[IoTHub].[ConnectionDeviceId]') as DeviceId, partitionId
from IoTHub
Also, if you run locally in VisualStudio it will tell you which device is sending malformed JSON. eg.
[Warning] 10/21/2021 9:12:54 AM : User Warning Source 'IoTHub' had 1 occurrences of kind 'InputDeserializerError.InvalidData' between processing times '2021-10-21T15:12:50.5076449Z' and '2021-10-21T15:12:50.5712076Z'. Could not deserialize the input event(s) from resource 'Partition: [1], Offset: [455266583232], SequenceNumber: [634800], DeviceId: [DeviceName]' as Json. Some possible reasons: 1) Malformed events 2) Input source configured with incorrect serialization format
Also check your "Activity Log" blade in the ASA job. It may have more details for you.

LeaseAlreadyPresent Error in Azure Data Factory V2

I am getting the following error in a pipeline that has Copy activity with Rest API as source and Azure Data Lake Storage Gen 2 as Sink.
"message": "Failure happened on 'Sink' side. ErrorCode=AdlsGen2OperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'Conflict'. Account: '{Storage Account Name}'. FileSystem: '{Container Name}'. Path: 'foodics_v2/Burgerizzr/transactional/_567a2g7a/2018-02-09/raw/inventory-transactions.json'. ErrorCode: 'LeaseAlreadyPresent'. Message: 'There is already a lease present.'. RequestId: 'd27f1a3d-d01f-0003-28fb-400303000000'..,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.Azure.Storage.Data.Models.ErrorSchemaException,Message=Operation returned an invalid status code 'Conflict',Source=Microsoft.DataTransfer.ClientLibrary,'",
The pipeline runs in a for loop with Batch size = 5. When I make it sequential, the error goes away, but I need to run it in parallel.
This is known issue with adf limitation variable thread parallel running.
You probably trying to rename filename using variable.
Your option is to run another child looping after each variable execution.
i.e. variable -> Execute Pipeline
enter image description here
or
remove those variable, hard coded those variable expression in azure activity.
enter image description here
Hope this helps

Getting exception while reading data from blob in azure

While I am trying to read the list of blob data on azure, I am getting the following error:
Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation.
How to resolve this?
Please see the following link. Your code likely has a endless loop. https://msdn.microsoft.com/en-us/library/ms234762.aspx