SCOM 2019 not generating and not sending any e-mail notifications - notifications

New SCOM 2019 Installation. Notifications (channel, subscription and subscribers) has been configured to send all alerts via email. Alerts are visible in SCOM "Active Alerts" window every day, but no notification has been sent via email.
Port TCP 25 to SMTP mail server is open. Server has been configured to accept anonymous traffic from SCOM Server. Connecting and sending a test message by using Telnet from SCOM 2019 Management Server works well - email has been delivered. Management Server is a part of Notification Resource Pool (dynamic). Wireshark monitoring on TCP 25 shows that SCOM 2019 generates no notification at all. Looks like a serious bug in SCOM 2019 on Win Srv 2019 Datacenter. Maybe similar to SCOM 2012 R2 (when enabling SNMP service on OS level disabled email notifications). I tried to disable SNMP Trap service or enable it but it hasn't change anything. No email notifications has been sent. Wireshark shows that SCOM 2019 has not generated any email. Maybe something is broken in SCOM 2019, email notifications needs to be enabled somewhere (in SCOM Console or from PowerShell) or some OS service needs to be enabled or disabled as email notifications depends on it ?
No errors in Event Logs. Wireshark monitoring shows no activity on TCP 25 on SCOM 2019 Management Server.
Any ideas are more than welcome.

A solution has been found:
Looks like there may be some bug with “Windows Cluster Management Library” Management Pack (Target: Windows Cluster Service, Cluster Service Discovery)
It was not able to properly discover our cluster based on Windows Server 2012R2 and due to this fact SCOM 2019 entered a rows containing NULLs into OperationsManager.dbo.MT_HealthService table. In the Health Explorer for Management Configuration Service Group we’ve found out that “Delta Syncronization Proces” failed with XML form notification -> Specified Cast is not valid.
To solve the issue we’ve deleted such rows manually and disable/override Object Discovery for a specific object of class: Windows CLuster Service (selecting both cluster nodes).
In a few seconds email notifications started instantly 😊
Next step is to create a ticket to Microsoft to troubleshoot that “Windows Cluster Management Library” Management Pack.

Related

How to check logs between OPC Publisher and IoT Hub to confirm the data transfer

I have setup IoT Edge up in one of our machines and installed OPC Publisher and connected it to one of our opc-ua servers which then sends data to OPC Publisher and then to IoT Hub. We have not received any data to our IoT hub for the last 10 days and suddenly today we have received the data. How can we troubleshoot why the data is missing for the last 10 days?
You can generate a support bundle on your edge device that will collect the logs of all deployed modules as well as the edge runtime logs.
sudo iotedge support-bundle --since 11d
More details on troubleshooting IoT Edge here
You can first look into the logs of the publisher and validate if the connection to the OPC UA Server was/is active. If this is fine then have a look into the edgeHub and validate if the upstream connectivity to IoT Hub was affected.
One of the most powerful tools to monitor your edge deployments is the integration with Azure Monitor. It will collect metrics from the edgeHub and edgeAgent, which combined will give you an overview of where your messages are going. It can show you how many messages are sent to your upstream endpoint and when.
Source of image
For a full overview of the capabilities, you can check out this blog post. Installation steps are here
Edit:
OPC Publisher aLso supports diagnostic logging, which will give you more information about the connections to OPC servers. To do this, you need to set the diagnostic interval. You can do this by specifying the --di command argument in your createOptions:
"OPCPublisher":{
"settings":{
"image":"<image>",
"createOptions":{
"Cmd":["di=60"]
}
},
"type":"docker",
"version":"1.0",
"status":"running",
"restartPolicy":"always"
}
The example above will log diagnostic metrics every 60 seconds. You can then upload the logs using the support bundle command from Cristian's answer, or use the UploadSupportBundle direct method to do the same without needing access to the device.

Free device logs for analysis

I am setting up HOME SIEM lab using SPLUNK. I am looking for sources which can provide different logs for various devices but not limited for below ones.
Windows Logs
IIS Logs
IDS/IPS Logs
Based on the logs i am planning to build search queries for various events and further using the same to build the rules.
It is not clear why you need logs when you can generate these? For example you can set up a VM with Windows Server and install an agent like NXLog (or any log collection agent that can send logs forwarded via TCP, UDP, TLS, or HTTP) for log collection to Splunk.
Checkout the Montgomery County Data Portal. It's free
https://data.montgomerycountymd.gov/
You could also connect to a crypto exchange API and have lots of data flow in real-time

MobileFirst 7.0 Operational Analytics - No Data loaded

I configured the Operational analytics for the MobileFirst 7.0
Configured the JDNI as per the IBM document and created client side log profile in Admin Operation Console. But it always shows 0 data. Not load any client logs / server logs.
Log receiver adapter has been built and deployed in the operation console. Client has the method to push the logs to server via WL.Logger.send(). I see the client log console and logcat, the logs has been pushed to server. In server log also, i see the invoke success log for logReceiverAdapter call.
In Operation Analytics console JNDI, the Queue and Size has been set to 1.
This was identified to be a defect in the product and will be resolved as part of APAR #PI42509 WHEN USING SSL ON WEBSPHERE THE ANALYTICS DATA IS NOT RECEIVED BY THE ANALYTICS SERVER BECAUSE OF THE SSL CONFIGURATION USED
There is no local workaround.
Continue to follow up on the issue in the PMR (support ticket) you have opened.

.net application send and receive directly to Remote Queue Manager

i'm trying to connect my VB.net application to Remote Queue Manager.
but keep getting the followig Erorr: MQRC_Q_MGR_NAME_ERROR. can anyone tell me what i'm doing wrong?
here is the code i'm using:
Dim mqQMgr As MQQueueManager = Nothing
Dim props As New Hashtable()
props.Add(MQC.HOST_NAME_PROPERTY, "192.168.28.191")
props.Add(MQC.CHANNEL_PROPERTY, "SYSTEM.ADMIN.SVRCONN")
props.Add(MQC.USER_ID_PROPERTY, "AQ")
props.Add(MQC.PORT_PROPERTY, 1313)
props.Add(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_BINDINGS)
mqQMgr = New MQQueueManager("QMSEPAM", props)
i also tried differentconnection method (MQC.TRANSPORT_MQSERIES_CLIENT) and i get the following error:
"MQRC_NOT_AUTHORIZED"
Thanks in advance.
You need to set MQC.TRANSPORT_PROPERTY property to MQC.TRANSPORT_MQSERIES_MANAGED or MQC.TRANSPORT_MQSERIES_CLIENT for connecting through TCP/IP socket to a queue manager that is running on a remote machine or on the same machine as your application . The MQC.TRANSPORT_MQSERIES_BINDINGS should be used only when the queue manager is running on the same machine as your application in which case your application would communicate with queue manager using shared memory.
The MQRC_NOT_AUTHORIZED (2035) is thrown if the user with which your application is attempting to connect to a remote queue manager does not have authority. There are multiple ways of providing access to queue manager, talk to your IBM MQ Administrator to provide you the required authority. Take a look at this link. BTW what version of MQ are you using?
Update: Two points
1) Is user AQ same as logged in user of the machine where the .NET application is running? MQ .NET v7.5 sends the logged in user id to queue manager for authorization unless you are using a security exit. If you are not using any security exit then the logged in user id must exist on remote machine and has authorizations to connect.
2) Do not add user to mqm as users in that group are blocked from connecting to queue manager by the Channel Authentication feature
There are couple of posts in SO that you can read:
WebSphere MQ v7.1 Security User Credentials
MQRC_NOT_AUTHORIZED error while connecting to Websphere MQ 7.1
Finally if you are just testing out your application and do not really need user id authorization, then you can disable channel authentication by running the below runmqsc command.
ALTER QMGR CHLAUTH (DISABLED)

BizTalk connectivity issue to SQL during VM snapshot

We have one VM for BizTalk and a separate VM for the SQL backend. We are using Veeam for backups which basically kicks off a snapshot of the VM. When this snapshot is being finalized on the SQL VM, BizTalk services on the application server fail. Usually they restart automatically but sometimes this requires manual intervention to start the services. The error below is logged on the BizTalk server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
An error occurred that requires the BizTalk service to terminate. The most common causes are the following:
1) An unexpected out of memory error.
OR
2) An inability to connect or a loss of connectivity to one of the BizTalk databases.
The service will shutdown and auto-restart in 1 minute. If the problematic database remains unavailable, this cycle will repeat.
Error message: [DBNETLIB][ConnectionRead (recv()).]General network error. Check your network documentation.
Error source:
BizTalk host name: BizTalkServerApplication
Windows service name: BTSSvc$BizTalkServerApplication
We experienced the same situation and error with both BizTalk 2009 and BizTalk 2013, each set up with two App servers and one SQL DB server.
When our VMware does the final step of the Snapshot backup on the Application servers, it freezes the application server for about 10 seconds, preventing it from receiving packets. On SQL Server 2008 and 2012, it by default will send out keep-alive packets to the clients every 30 seconds (30,000 ms). If the SQL server fails to receive a response back from the App server, it will send out 5 retries (default setting) of the keep-alive request 1 second (1,000 ms) apart. If SQL still does not receive the response back, it will terminate the connection, which will cause the BizTalk hosts on the App server to reset, and in our case, when our German-made ERP system sends its EDI documents over to BizTalk during that reset period, the transmission will fail.
We trapped the issue by running NetMon on the DB and App servers, waiting for the next error message. Upon inspection, we see the five SQL keep-alive packets being sent to the App servers 1 second apart, and at the same time there were NO packets at all received on the Application server. At first guess, one might think they were "just dropped network packets", which is rarely the case. We then made the correlation to the timing of the VM Snapshots, and now confirm each time the snapshot finishes each day, the App servers freeze.
As a Short-to-mid-term workaround, we raised the number of retries SQL attempts before declaring a connection dead, (5 by default), by adding the registry value TcpMaxDataRetransmissions and setting it to 30 (thus 30 seconds before SQL declares the client unresponsive). This has masked the problem for now for us, and use at your own discretion.
We are also looking at an Agent-based version of the VM Snapshot, which may alleviate the condition of freezing the server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
Not that I am aware of, however you might want to Google config options in the btsntsvc.exe.config file which is located in your BizTalk installation directory.
All messages that pass through BizTalk are written to the BizTalkMsgBoxDb and its other databases are involved if you are running tracking, BAM etc. The only service that can cache 'stuff' and handle a database outage is the Enterprise Single Sign-On (ESSO) Service. BizTalk therefore needs a persistent connection to the database server to remain 'up', hence why your Host Instance (BizTalkServerApplication) is stopping - it simply wouldn't be able to process messages if the database wasn't there.
I would add that your approach to back-ups probably isn't supported by Microsoft and I would further suggest that you seriously consider whether an approach that takes your database server offline during the backup is viable?
BizTalk has a pretty robust backup solution for its various databases built into the product, and I would recommend that you take a look at using this supported method.
If you do need to take snapshots of the database system - say once a night - you might want to consider stopping the BizTalk Host Instances, performing the snapshot, and then re-starting the Host Instances through some scripted task.
You might also want to consider checking whether there are any hotfixes for your version of BizTalk Server included in a Cumulative Update that might help address your problem.