Is it possible to view RabbitMQ message contents directly from the command line? - rabbitmq

Is it possible to view RabbitMQ message contents directly from the command line?
sudo rabbitmqctl list_queues lists the queues.
Is there any command like sudo rabbitmqctl list_queue_messages <queue_name>?

You should enable the management plugin.
rabbitmq-plugins enable rabbitmq_management
See here:
http://www.rabbitmq.com/plugins.html
And here for the specifics of management.
http://www.rabbitmq.com/management.html
Finally once set up you will need to follow the instructions below to install and use the rabbitmqadmin tool. Which can be used to fully interact with the system.
http://www.rabbitmq.com/management-cli.html
For example:
rabbitmqadmin get queue=<QueueName> requeue=false
will give you the first message off the queue.

Here are the commands I use to get the contents of the queue:
RabbitMQ version 3.1.5 on Fedora linux using https://www.rabbitmq.com/management-cli.html
Here are my exchanges:
eric#dev ~ $ sudo python rabbitmqadmin list exchanges
+-------+--------------------+---------+-------------+---------+----------+
| vhost | name | type | auto_delete | durable | internal |
+-------+--------------------+---------+-------------+---------+----------+
| / | | direct | False | True | False |
| / | kowalski | topic | False | True | False |
+-------+--------------------+---------+-------------+---------+----------+
Here is my queue:
eric#dev ~ $ sudo python rabbitmqadmin list queues
+-------+----------+-------------+-----------+---------+------------------------+---------------------+--------+----------+----------------+-------------------------+---------------------+--------+---------+
| vhost | name | auto_delete | consumers | durable | exclusive_consumer_tag | idle_since | memory | messages | messages_ready | messages_unacknowledged | node | policy | status |
+-------+----------+-------------+-----------+---------+------------------------+---------------------+--------+----------+----------------+-------------------------+---------------------+--------+---------+
| / | myqueue | False | 0 | True | | 2014-09-10 13:32:18 | 13760 | 0 | 0 | 0 |rabbit#ip-11-1-52-125| | running |
+-------+----------+-------------+-----------+---------+------------------------+---------------------+--------+----------+----------------+-------------------------+---------------------+--------+---------+
Cram some items into myqueue:
curl -i -u guest:guest http://localhost:15672/api/exchanges/%2f/kowalski/publish -d '{"properties":{},"routing_key":"abcxyz","payload":"foobar","payload_encoding":"string"}'
HTTP/1.1 200 OK
Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
Date: Wed, 10 Sep 2014 17:46:59 GMT
content-type: application/json
Content-Length: 15
Cache-Control: no-cache
{"routed":true}
RabbitMQ see messages in queue:
eric#dev ~ $ sudo python rabbitmqadmin get queue=myqueue requeue=true count=10
+-------------+----------+---------------+---------------------------------------+---------------+------------------+------------+-------------+
| routing_key | exchange | message_count | payload | payload_bytes | payload_encoding | properties | redelivered |
+-------------+----------+---------------+---------------------------------------+---------------+------------------+------------+-------------+
| abcxyz | kowalski | 10 | foobar | 6 | string | | True |
| abcxyz | kowalski | 9 | {'testdata':'test'} | 19 | string | | True |
| abcxyz | kowalski | 8 | {'mykey':'myvalue'} | 19 | string | | True |
| abcxyz | kowalski | 7 | {'mykey':'myvalue'} | 19 | string | | True |
+-------------+----------+---------------+---------------------------------------+---------------+------------------+------------+-------------+

I wrote rabbitmq-dump-queue which allows dumping messages from a RabbitMQ queue to local files and requeuing the messages in their original order.
Example usage (to dump the first 50 messages of queue incoming_1):
rabbitmq-dump-queue -url="amqp://user:password#rabbitmq.example.com:5672/" -queue=incoming_1 -max-messages=50 -output-dir=/tmp

If you want multiple messages from a queue, say 10 messages, the command to use is:
rabbitmqadmin get queue=<QueueName> ackmode=ack_requeue_true count=10
This is how it looks on front interface avalable on http://localhost:15672 :
If you don't want the messages requeued, just change ackmode to ack_requeue_false.

you can use RabbitMQ API to get count or messages :
/api/queues/vhost/name/get
Get messages from a queue. (This is not an HTTP GET as it will alter the state of the queue.) You should post a body looking like:
{"count":5,"requeue":true,"encoding":"auto","truncate":50000}
count controls the maximum number of messages to get. You may get fewer messages than this if the queue cannot immediately provide them.
requeue determines whether the messages will be removed from the queue. If requeue is true they will be requeued - but their redelivered flag will be set.
encoding must be either "auto" (in which case the payload will be returned as a string if it is valid UTF-8, and base64 encoded otherwise), or "base64" (in which case the payload will always be base64 encoded).
If truncate is present it will truncate the message payload if it is larger than the size given (in bytes).
truncate is optional; all other keys are mandatory.
Please note that the publish / get paths in the HTTP API are intended for injecting test messages, diagnostics etc - they do not implement reliable delivery and so should be treated as a sysadmin's tool rather than a general API for messaging.
http://hg.rabbitmq.com/rabbitmq-management/raw-file/rabbitmq_v3_1_3/priv/www/api/index.html

a bit late to this, but yes rabbitmq has a build in tracer that allows you to see the incomming messages in a log. When enabled, you can just tail -f /var/tmp/rabbitmq-tracing/.log (on mac) to watch the messages.
the detailed discription is here http://www.mikeobrien.net/blog/tracing-rabbitmq-messages

Related

Apache Ignite: SQL query returns empty result on non-baseline node

I have set up a 3 node Apache Ignite cluster and noticed the following unexpected behavior:
(Tested with Ignite 2.10 and 2.13, Azul Java 11.0.13 on RHEL 8)
We have a relational table "RELATIONAL_META". It's created by our software vendors product that uses Ignite to exchange configuration data. This table is backed by this cache, that gets replicated to all nodes:
[cacheName=SQL_PUBLIC_RELATIONAL_META, cacheId=-252123144, grpName=null, grpId=-252123144, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]
Seen behavior:
I did a failure test, simulating a disk failure of one of the Ignite nodes. The "failed" node restarts with an empty disk and joins the topology as expected. While the node is not yet part of the baseline nodes, either because auto-adjust is disabled, or auto-adjust did not yet complete, the restarted node returns empty results via the JDBC connection:
0: jdbc:ignite:thin://b2bivmign2/> select * from RELATIONAL_META;
+------------+--------------+------+-------+---------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+------------+--------------+------+-------+---------+
+------------+--------------+------+-------+---------+
No rows selected (0.018 seconds)
It's interesting that it knows the structure of the table, but not the contained data.
The table actually contains data, as I can see when I query against one of the other cluster nodes:
0: jdbc:ignite:thin://b2bivmign1/> select * from RELATIONAL_META;
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| cluster_configuration_1 | writer | change index | 1653 | 2023-01-24 10:25:27 |
| cluster_configuration_1 | writer | last run changes | 0 | Updated at 2023-01-29 11:08:48. |
| cluster_configuration_1 | writer | require full sync | false | Flag set to false on 2022-06-11 09:46:45 |
| cluster_configuration_1 | writer | schema version | 1.4 | Updated at 2022-06-11 09:46:25. Previous version was 1.3 |
| cluster_processing_1 | reader | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:50] |
| cluster_processing_1 | reader | change index | 1653 | 2023-01-29 10:20:39 |
| cluster_processing_1 | reader | conflicts | 0 | Reset due to full sync at 2022-06-11 09:50:12 |
| cluster_processing_1 | reader | require full sync | false | Cleared the flag after full reader sync at 2022-06-11 09:50:12 |
| cluster_processing_2 | reader | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:43] |
| cluster_processing_2 | reader | change index | 1653 | 2023-01-29 10:24:06 |
| cluster_processing_2 | reader | conflicts | 0 | Reset due to full sync at 2022-06-11 09:52:19 |
| cluster_processing_2 | reader | require full sync | false | Cleared the flag after full reader sync at 2022-06-11 09:52:19 |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
12 rows selected (0.043 seconds)
Expected behavior:
While a node is not part of the baseline, it is per definition not persisting data. So when I run a query against it, I would expect it to fetch the partitions that it does not hold itself, from the other nodes of the cluster. Instead it just shows an empty result, even showing the correct structure of the table, just without any rows. This has caused inconsistent behavior in the product we're actually running, that uses Ignite as a configuration store, because suddenly the nodes see different results depending on which node they have opened their JDBC connection to. We are using a JDBC connection string that contains all the Ignite server nodes, so it fails over when one goes down, but of course it does not prevent the issue I have described here.
Is this "works a designed"? Is there any way to prevent such issues? It seems to be problematic to use Apache Ignite as a configuration store for an application with many nodes, when it behaves like this.
Regards,
Sven
Update:
After restarting one of the nodes with an empty disk, it joins as a node with a new ID. I think that is expected behavior. We have enabled baseline auto-adjust, so the new node id should join the baseline, and old one should leave the baseline. This works, but before this is completed, the node returns empty results to SQL queries.
Cluster state: active
Current topology version: 95
Baseline auto adjustment enabled: softTimeout=60000
Baseline auto-adjust is in progress
Current topology version: 95 (Coordinator: ConsistentId=cdf43fef-deb8-4732-907f-6264bd55de6f, Address=b2bivmign3.fritz.box/192.168.0.151, Order=11)
Baseline nodes:
ConsistentId=3ffe3798-9a63-4dc7-b7df-502ad9efc76c, Address=b2bivmign1.fritz.box/192.168.0.149, State=ONLINE, Order=64
ConsistentId=40a8ae8c-5f21-4f47-8f67-2b68f396dbb9, State=OFFLINE
ConsistentId=cdf43fef-deb8-4732-907f-6264bd55de6f, Address=b2bivmign3.fritz.box/192.168.0.151, State=ONLINE, Order=11
--------------------------------------------------------------------------------
Number of baseline nodes: 3
Other nodes:
ConsistentId=080fc170-1f74-44e5-8ac2-62b94e3258d9, Order=95
Number of other nodes: 1
Update 2:
This is the JDDB URL the application uses:
#distributed.jdbc.url - run configure to modify this property
distributed.jdbc.url=jdbc:ignite:thin://b2bivmign1.fritz.box:10800..10820,b2bivmign2.fritz.box:10800..10820,b2bivmign3.fritz.box:10800..10820
#distributed.jdbc.driver - run configure to modify this property
distributed.jdbc.driver=org.apache.ignite.IgniteJdbcThinDriver
We have seen it connecting via JDBC to a node that was not part of the baseline and therefore receiving empty results. I wonder why a node that is not part of the baseline returns any results without fetching the data from the baseline nodes?
Update 3:
It seems to be dependent on the tables/caches attributes wether this happens, I cannot yet reproduce it with a table I create on my own, only with the table that is created by the product we use.
This is the cache of the table that I can reproduce the issue with:
[cacheName=SQL_PUBLIC_RELATIONAL_META, cacheId=-252123144, grpName=null, grpId=-252123144, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]
I have created 2 tables my own for testing:
CREATE TABLE Test (
Key CHAR(10),
Value CHAR(10),
PRIMARY KEY (Key)
) WITH "BACKUPS=2";
CREATE TABLE Test2 (
Key CHAR(10),
Value CHAR(10),
PRIMARY KEY (Key)
) WITH "BACKUPS=2,atomicity=ATOMIC";
I then shut down one of the Ignite nodes, in this case b2bivmign3, and remove the ignite data folders, then start it again. It starts as a new node that is not part of the baseline, and I disabled auto-adjust to just keep that situation. I then connect to b2bivmign3 with the SQL CLI and query the tables:
0: jdbc:ignite:thin://b2bivmign3/> select * from Test;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.202 seconds)
0: jdbc:ignite:thin://b2bivmign3/> select * from Test2;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.029 seconds)
0: jdbc:ignite:thin://b2bivmign3/> select * from RELATIONAL_META;
+------------+--------------+------+-------+---------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+------------+--------------+------+-------+---------+
+------------+--------------+------+-------+---------+
No rows selected (0.043 seconds)
The same when I connect to one of the other Ignite nodes:
0: jdbc:ignite:thin://b2bivmign2/> select * from Test;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.074 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from Test2;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.023 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from RELATIONAL_META;
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| CLUSTER_ID | CLUSTER_TYPE | NAME | VALUE | DETAILS |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
| cluster_configuration_1 | writer | change index | 1653 | 2023-01-24 10:25:27 |
| cluster_configuration_1 | writer | last run changes | 0 | Updated at 2023-01-29 11:08:48. |
| cluster_configuration_1 | writer | require full sync | false | Flag set to false on 2022-06-11 09:46:45 |
| cluster_configuration_1 | writer | schema version | 1.4 | Updated at 2022-06-11 09:46:25. Previous version was 1.3 |
| cluster_processing_1 | reader | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:50] |
| cluster_processing_1 | reader | change index | 1653 | 2023-01-29 10:20:39 |
| cluster_processing_1 | reader | conflicts | 0 | Reset due to full sync at 2022-06-11 09:50:12 |
| cluster_processing_1 | reader | require full sync | false | Cleared the flag after full reader sync at 2022-06-11 09:50:12 |
| cluster_processing_2 | reader | STOP synchronization | false | Resume synchronization - the processing has the same version as the config - 2.6-UP2022-05 [2023-01-29 11:00:43] |
| cluster_processing_2 | reader | change index | 1653 | 2023-01-29 10:24:06 |
| cluster_processing_2 | reader | conflicts | 0 | Reset due to full sync at 2022-06-11 09:52:19 |
| cluster_processing_2 | reader | require full sync | false | Cleared the flag after full reader sync at 2022-06-11 09:52:19 |
+-------------------------+--------------+----------------------+-------+------------------------------------------------------------------------------------------------------------------+
12 rows selected (0.032 seconds)
I will test more tomorrow the find out which attribute of the table/cache enables this issue.
Update 4:
I can reproduce this with a table that is set to mode=REPLICATED instead of PARTITIONED.
CREATE TABLE Test (
Key CHAR(10),
Value CHAR(10),
PRIMARY KEY (Key)
) WITH "BACKUPS=2";
[cacheName=SQL_PUBLIC_TEST, cacheId=-2066189417, grpName=null, grpId=-2066189417, prim=1024, mapped=1024, mode=PARTITIONED, atomicity=ATOMIC, backups=2, affCls=RendezvousAffinityFunction]
CREATE TABLE Test2 (
Key CHAR(10),
Value CHAR(10),
PRIMARY KEY (Key)
) WITH "BACKUPS=2,TEMPLATE=REPLICATED";
[cacheName=SQL_PUBLIC_TEST2, cacheId=372637563, grpName=null, grpId=372637563, prim=512, mapped=512, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, affCls=RendezvousAffinityFunction]
0: jdbc:ignite:thin://b2bivmign2/> select * from TEST;
+------+-------+
| KEY | VALUE |
+------+-------+
| Sven | Demo |
+------+-------+
1 row selected (0.06 seconds)
0: jdbc:ignite:thin://b2bivmign2/> select * from TEST2;
+-----+-------+
| KEY | VALUE |
+-----+-------+
+-----+-------+
No rows selected (0.014 seconds)
Testing with Visor:
It makes no difference where I run Visor, same results.
We see both caches for the tables have 1 entry:
+-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+
| SQL_PUBLIC_TEST(#c9) | PARTITIONED | 3 | 1 (0 / 1) | min: 0 (0 / 0) | min: 0 | min: 0 | min: 0 | min: 0 |
| | | | | avg: 0.33 (0.00 / 0.33) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 |
| | | | | max: 1 (0 / 1) | max: 0 | max: 0 | max: 0 | max: 0 |
+-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+
| SQL_PUBLIC_TEST2(#c10) | REPLICATED | 3 | 1 (0 / 1) | min: 0 (0 / 0) | min: 0 | min: 0 | min: 0 | min: 0 |
| | | | | avg: 0.33 (0.00 / 0.33) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 |
| | | | | max: 1 (0 / 1) | max: 0 | max: 0 | max: 0 | max: 0 |
+-----------------------------------------+-------------+-------+---------------------------------+-----------------------------------+-----------+-----------+-----------+-----------+
One is empty when I scan it, the other has one row as expected:
visor> cache -scan -c=#c9
Entries in cache: SQL_PUBLIC_TEST
+================================================================================================================================================+
| Key Class | Key | Value Class | Value |
+================================================================================================================================================+
| java.lang.String | Sven | o.a.i.i.binary.BinaryObjectImpl | SQL_PUBLIC_TEST_466f2363_47ed_4fba_be80_e33740804b97 [hash=-900301401, VALUE=Demo] |
+------------------------------------------------------------------------------------------------------------------------------------------------+
visor> cache -scan -c=#c10
Cache: SQL_PUBLIC_TEST2 is empty
visor>
Update 5:
I have reduced the configuration file to this:
https://pastebin.com/dL9Jja8Z
I did not manage to reproduce this with persistence turned off, as I don't manage to keep a node out the baseline then, it always joins immediately. So maybe this problem is only reproducible with persistence enabled.
I go to each of the 3 nodes, remove the Ignite data to start from scratch, and start the service:
[root#b2bivmign1,2,3 apache-ignite]# rm -rf db/ diagnostic/ snapshots/
[root#b2bivmign1,2,3 apache-ignite]# systemctl start apache-ignite#b2bi-config.xml.service
I open visor, check the topology that all nodes have joined, then activate the cluster.
https://pastebin.com/v0ghckBZ
visor> top -activate
visor> quit
I connect with sqlline and create my tables:
https://pastebin.com/Q7KbjN2a
I go to one of the servers, stop the service and delete the data, then start the service again:
[root#b2bivmign2 apache-ignite]# systemctl stop apache-ignite#b2bi-config.xml.service
[root#b2bivmign2 apache-ignite]# rm -rf db/ diagnostic/ snapshots/
[root#b2bivmign2 apache-ignite]# systemctl start apache-ignite#b2bi-config.xml.service
Baseline looks like this:
https://pastebin.com/CeUGYLE7
Connect with sqlline to that node, issue reproduces:
https://pastebin.com/z4TMKYQq
This was reproduced on:
openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment Zulu11.62+17-CA (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM Zulu11.62+17-CA (build 11.0.18+10-LTS, mixed mode)
RPM: apache-ignite-2.14.0-1.noarch
Rocky Linux release 8.7 (Green Obsidian)

How do I find out which exchange routed a message last?

We are intending to build up a one way network of exchange-to-exchange bindings. Our requirement is to attach the route a message took to its header, but there seems to be no way to find out which exchange handled a message last.
I already tried looking up the information using the tracing functionality and there also exists a plugin that subscribes to the internal basic.publish event. Yet, all these ways just give me the exchange of first entry.
I even took a look at the rabbitmq-server source code and it seems like there is no possible extension point inside the routing function (see headers exchange routing for example). I am not an Erlang dev, so maybe there is an Erlang way of intercepting/extending functions to be called?
Example
+---------+
| |
| POE |
| |
+--+--+---+
| |
+-------+ | | +-------+
| | | | | |
| EX1 +---+ +----+ EX2 |
| | | |
+--+----+ +---+---+
| |
| |
| |
+--+----+ +---+---+
| | | |
| QU1 | | QU2 |
| | | |
+-------+ +-------+
For a message that ends up in QU2 we would like to have a header field like this:
{...
x-route, ["POE", "EX2"]
}
This could probably be accomplished with a RabbitMQ plugin but it would be difficult to do: a channel interceptor would have to effectively do a part of the routing to determine the "last" exchange. There could be more than one "last" exchange, as well.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

Can we go back in .net core middleware pipeline

My question is can we go back to middle ware pipeline let see an example
we have middleware1, middlware2, middleware3 and middlware3 is executing. I want middleware1 to execute again and then control back to middleware3 can we do that ?
Thank in advance
No, you cannot do that. The middlewares are called in a pipeline. That means, there is one middleware which will start and which will pass on to the next middleware in the pipeline, which will then pass on to the next, and so on. Eventually, each middleware has a way to do something afterwards as the pipeline completes.
This generally looks like this:
Request
| → Middleware1
| | run
| | next() → Middleware2
| | | run
| | | next() → Middleware3
| | | | run
| | | | next() → {}
| | | | run after
| | | | return ↵
| | | ←
| | | run after
| | | return ↵
| | ←
| | run after
| | return ↵
| ←
| ⇒ Send response
Since this is a strict pipeline that only goes in a single direction, you cannot randomly jump around. You only get the chance to call the next middleware in the pipeline or return.
What you can do however is invoke the following pipeline multiple times. For example, the StatusCodePages middleware does this to re-execute the pipeline for the status code page when an error occurs:
Request
| → StatusCodePagesMiddleware
| | run
| | next() → Pipeline
| | | … throw an error
| | ← catch exception
| | run after
| | adjust parameters
| | next() → Run pipeline again with modified parameters do display error page
| | return ↵
| ←
| ⇒ Send response
Note that this is a very special thing which only works because the StatusCodePages middleware is usually registered very early and because it wants to rerun the full pipeline.
If you want finer control over this, chances are you shouldn’t even split your logic up into multiple middlewares. It might be a better idea to have a single middleware that just has a very controlled logic inside. That logic could for example be another pipeline, or just a straightforward control flow.

compute engine load balancer UDP/DNS responses dropped

Have been testing out GCE and the load balancing capabilities - however have been seeing some unexpected results.
The trial configuration involves 2 instances acting as DNS resolvers in a target pool with a 3rd test instance. There is also a http server running on the hosts. No health check scripts have been added.
DNS request to individual instance public IP (from ANY) - OK
HTTP request to individual instance public IP (from ANY) - OK
HTTP request to load balance IP (from ANY) - OK
DNS request to load balance IP (from an instance in the target pool) - OK
DNS request to load balance IP (from an instance in the same network - but not in the target pool) - NOK
DNS request to load balance IP (other) - NOK
I can see in the instance logs that the DNS request arrive for all cases and are distributed evenly - though the replies don't seem to get back to the originator.
The behavior seems unexpected. I've played with the session affinity with similar results - though the default behavior is the most desired option.
Have hit a wall. Are there some ideas to try?
Information on the setup:
$ gcutil listhttphealthchecks
+------+------+------+
| name | host | port |
+------+------+------+
$ gcutil listtargetpools
+----------+-------------+
| name | region |
+----------+-------------+
| dns-pool | us-central1 |
+----------+-------------+
$ gcutil listforwardingrules
+---------+-------------+-------------+
| name | region | ip |
+---------+-------------+-------------+
| dns-tcp | us-central1 | 8.34.215.45 |
+---------+-------------+-------------+
| dns-udp | us-central1 | 8.34.215.45 |
+---------+-------------+-------------+
| http | us-central1 | 8.34.215.45 |
+---------+-------------+-------------+
$ gcutil getforwardingrule dns-udp
+---------------+----------------------------------+
| name | dns-udp |
| description | |
| creation-time | 2013-12-28T12:28:05.816-08:00 |
| region | us-central1 |
| ip | 8.34.215.45 |
| protocol | UDP |
| port-range | 53-53 |
| target | us-central1/targetPools/dns-pool |
+---------------+----------------------------------+
$ gcutil gettargetpool dns-pool
+------------------+-------------------------------+
| name | dns-pool |
| description | |
| creation-time | 2013-12-28T11:48:08.896-08:00 |
| health-checks | |
| session-affinity | NONE |
| failover-ratio | |
| backup-pool | |
| instances | us-central1-a/instances/dns-1 |
| | us-central1-b/instances/dns-2 |
+------------------+-------------------------------+
[#dns-1 ~]$ curl "http://metadata/computeMetadata/v1/instance/network-interfaces/?recursive=true" -H "X-Google-Metadata-Request: True"
[{"accessConfigs":[{"externalIp":"162.222.178.116","type":"ONE_TO_ONE_NAT"}],"forwardedIps":["8.34.215.45"],"ip":"10.240.157.97","network":"projects/763472520840/networks/default"}]
[#dns-2 ~]$ curl "http://metadata/computeMetadata/v1/instance/network-interfaces/?recursive=true" -H "X-Google-Metadata-Request: True"
[{"accessConfigs":[{"externalIp":"8.34.215.162","type":"ONE_TO_ONE_NAT"}],"forwardedIps":["8.34.215.45"],"ip":"10.240.200.109","network":"projects/763472520840/networks/default"}]
$ gcutil getfirewall dns2
+---------------+------------------------------------+
| name | dns2 |
| description | Allow the incoming service traffic |
| creation-time | 2013-12-28T10:35:18.185-08:00 |
| network | default |
| source-ips | 0.0.0.0/0 |
| source-tags | |
| target-tags | |
| allowed | tcp: 53 |
| allowed | udp: 53 |
| allowed | tcp: 80 |
| allowed | tcp: 443 |
+---------------+------------------------------------+
The instances are CentOS and have their iptables firewalls disabled.
Reply from instance in target pool
#dns-1 ~]$ nslookup test 8.34.215.45 | grep answer
Non-authoritative answer:
#dns-1 ~]$
Reply from other instance in target pool
#dns-2 ~]$ nslookup test 8.34.215.45 | grep answer
Non-authoritative answer:
#dns-2 ~]$
No reply from instance not in the target pool on the load balanced IP. However it gets a reply from all other interfaces
#dns-3 ~]$ nslookup test 8.34.215.45 | grep answer
#dns-3 ~]$
#dns-3 ~]$ nslookup test 8.34.215.162 | grep answer
Non-authoritative answer:
#dns-3 ~]$ nslookup test 10.240.200.109 | grep answer
Non-authoritative answer:
#dns-3 ~]$ nslookup test 10.240.157.97 | grep answer
Non-authoritative answer:
#dns-3 ~]$ nslookup test 162.222.178.116 | grep answer
Non-authoritative answer:
-- Update --
Added a health check so that the instances wouldn't be marked as UNHEALTHY. However got the same result.
$ gcutil gettargetpoolhealth dns-pool
+-------------------------------+-------------+--------------+
| instance | ip | health-state |
+-------------------------------+-------------+--------------+
| us-central1-a/instances/dns-1 | 8.34.215.45 | HEALTHY |
+-------------------------------+-------------+--------------+
| us-central1-b/instances/dns-2 | 8.34.215.45 | HEALTHY |
+-------------------------------+-------------+--------------+
-- Update --
Looks like the DNS service is not responding with the same IP that the request came in on. This is for sure be the reason it doens't appear to be responding.
0.000000 162.222.178.130 -> 8.34.215.45 DNS 82 Standard query 0x5323 A test.internal
2.081868 10.240.157.97 -> 162.222.178.130 DNS 98 Standard query response 0x5323 A 54.122.122.227
Looks like the DNS service is not responding with the same IP that the request came in on. This is for sure be the reason it doens't appear to be responding.
0.000000 162.222.178.130 -> 8.34.215.45 DNS 82 Standard query 0x5323 A test.internal
2.081868 10.240.157.97 -> 162.222.178.130 DNS 98 Standard query response 0x5323 A 54.122.122.227

iOS APNS Messages not arriving until app reinstall

I have an app that is using push notifications with apples APNS.
Most of the time it works fine, however occasionally (at random it seems, I havent been able to find any verifiable pattern) the messages just dont seem to be getting to the phone.
The messages are being recieved by APNS but just never delivered. However when I reinstall the app or restart the iPhone they seem to arrive.
Im not sure if this is a problem within my app or not, as even when the app is closed (and handling of the notification should rest completely with the Operating System no notification is recieved until a restart/reinstall is done.
The feedback service yields nothing, and NSLogging the received notification within the app also yields nothing (like the notification never makes it to the app)
EDIT:
Some additional information, as nobody seems to know whats going on.
I am using the sandbox server, with the app signed with the developer provisioning profile, so theres no problems there. And the App recieves the notifications initially.
The problem seems to be that when the app doesnt recieve anything when its in the background for about 90s-120s it just stops receiving anything until it is reinstalled.
Even double tapping home and stopping the app that way doesnt allow it to recieve notifications in the app closed state. Which I would have thought would have eliminated problems with the apps coding entirely, since at that point its not even running.
I timed it to see after how long it stops recieving notifications. There are 3 trials here.
==================================Trial 1=====================================
| Notification Number | Time since Last | Total Time | Pass/fail |
| 1 | 6s | 6s | Pass |
| 2 | 30s | 36s | Pass |
| 3 | 60s | 96s | Pass |
| 4 | 120s | 216s | Fail |
==============================================================================
==================================Trial 2=====================================
| Notification Number | Time since Last | Total Time | Pass/fail |
| 1 | 3s | 3s | Pass |
| 2 | 29s | 32s | Pass |
| 3 | 60s | 92s | Pass |
| 4 | 91s | 183s | Fail |
==============================================================================
==================================Trial 3=====================================
| Notification Number | Time since Last | Total Time | Pass/fail |
| 1 | 1s | 1s | Pass |
| 2 | 30s | 61s | Pass |
| 3 | 30s | 91s | Pass |
| 4 | 30s | 121s | Pass |
| 5 | 30s | 151s | Pass |
| 6 | 30s | 181s | Pass |
| 7 | 30s | 211s | Pass |
| 8 | 30s | 241s | Pass |
| 9 | 60s | 301s | Pass |
| 10 | 120s | 421s | Fail |
==============================================================================
Does anyone have any idea what could be going on here.
Another Edit:
Just tested the problem across multiple devices, and its happening on all of them, so its definately not a device issue. The notifications stop coming through even when the app has never been openened. Could the programming within the app effect how the push notifications are received even when its never been open?
It appears this may have been an issue outside of my control, as everything is now working fine, with zero changes.
Going to blame apple or some sort of networking problem somewhere inbetween.