I'd like for a BigQuery table to send a pub/sub message anytime the table changes.
Is this possible? I'm not seeing it anywhere in the documentation.
BigQuery does not currently have any notifications that it sends directly to Pub/Sub. You can however use events that it sends to Cloud Logging (e.g. for table update) and use a Cloud Logging Pub/Sub topic sink to be notified of these. Please see this article which outlines the approach.
Related
I would like to confront my understanding of google pubSub/lite vs RabbitMQ (using MQTT over WSS).
My use case is that I need something like a topic exchange. To send messages individually or to all or to some.
Having RabbitMQ I understand that I can create a topic have multiple queues linked via routingKey. E.g. amqTopic.routingKey1-10.
And I can push a message to a specific queue e.g. like this amqTopic.routingKey8
or push to the entire topic(all queues routed) like this amqTopic.*
Is it possible to create topic exchange structure with Google PubSub and if so how? I am not sure if I miss something. But from what I read I am inclined to say no, because google works like a direct exchange.
Thank you for helping..
This kind of topic exchange structure is possible to re-create using Cloud Pub/Sub filters. You can attach attributes to your messages when they are published (e.g. "routingKey": "8" or "routingKey": "all") and configure filters on your subscriptions to receive only messages meant for a particular routing key (attributes.routingKey="8" OR attributes.routingKey="all" in this scenario).
It's not currently possible to create this kind of topic exchange structure in Pub/Sub Lite.
Here goes my use case:
We use redis appender to write our log messages to redis. These messages have MDC data (Trace Id) to track individual requests. We want other application to subscribe to the trace id and get all the messages logged (As they are inserted). Can we have some sort of a trigger that can publish the message as it is being added?
The appender does not provide us with the ability to publish to a channel and we don't want to create a custom publisher for this use case. I am sure this use case is not unique and am hoping for a recommendation. basically looking for something like a trigger that rdbms have on insert.
Redis Keyspace Notifications sound like they might fit your use case: https://redis.io/topics/notifications
You can subscribe to a variety of notification types and I would guess that one of those would fit your need.
Consider using the Stream (v5) data type for storing your log, and having consumers consume that stream for incoming updates.
In the docs about GCP Storage and Pub/Sub notification I find this sentence that is not really clear:
Cloud Pub/Sub also offers at-least-once delivery to the recipient [that's pretty clear],
which means that you could receive multiple messages, with multiple
IDs, that represent the same Cloud Storage event [why?]
Can anyone give a better explanation of this behavior?
Thanks!
Google Cloud Storage uses at-least-once delivery to deliver your notifications To Cloud Pub/Sub. In other words, GCS will publish at least one message into Cloud Pub/Sub for each event that occurs.
Next, a Cloud Pub/Sub subscription will deliver the message to you, the end user, at least once.
So, say that in some rare case, GCS publishes two messages about the same event to Cloud Pub/Sub. Now that one GCS event has two Pub/Sub message IDs. Next, to make it even more unlikely, Pub/Sub delivers each of those messages twice. Now you have received 4 messages, with 2 message IDs, about the same single GCS event.
The important takeaway of the warning is that you should not attempt to dedupe GCS events by Pub/Sub message ID.
An at-least-once delivery means that the service must receive confirmation from the recipient to ensure that the message was received. In this case, we need some sort of timeout period in order to re-send the message. It is possible, due to network latency or packet loss, etc, to have the recipient send a confirmation, but the sender to not receive the confirmation before the timeout period, and therefore the sender will send the message again.
This is a common problem is network communications and distributed systems, and there are different types of messaging to address this issue.
To answer the question of 'why'
'At least once' delivery just means messages will be retried via some retry mechanism until successfully delivered (i.e. acknowledged). So if there's a failure or timeout then there's a retry.
By it's essence (retrying mechanism) this means you might occasionally have duplicates / more than once delivery. It's the same whether it's PubSub or GCS notifications delivering the message.
In the scenario you quote, you have:
The Publisher (GCS notification) -- may send duplicates of GCS events to pubsub topic
The PubSub topic messages --- may contain duplicates from publisher
no deduplication as messages come in
all messages assigned unique PubSub message_id even if they are duplicates of the same GCS event notification
PubSub topic Subscription(s) --- may also send duplicates of messages to subscribers
With PubSub
Once a message is sent to a subscriber, the subscriber must either acknowledge or drop the message. A message is considered outstanding once it has been sent out for delivery and before a subscriber acknowledges it.
A subscriber has a configurable, limited amount of time, or ackDeadline, to acknowledge the message. Once the deadline has passed, an outstanding message becomes unacknowledged.
Cloud Pub/Sub will repeatedly attempt to deliver any message that has not been acknowledged or that is not outstanding.
Source: https://cloud.google.com/pubsub/docs/subscriber#at-least-once-delivery
With Google Cloud Storage
They need to do something similar internally to 'publish' the notification event from GCS to PubSub - so reason is essentially the same.
Why this matters
You need to expect occasional duplicates originating from GCS notifications as well as the PubSub subscriptions
The PubSub message id can be used to detect duplicates from the pubsub topic -> subscriber
You have to figure out your own idempotent id/token to handle duplicates from the 'publisher' (the GCS notification event)
generation, metageneration, etc.. from the resource representation might help
If you need to de-duplicate or achieve exactly once processing, you can then build your own solution utilising the idempotent ids/tokens or see if Cloud Dataflow can accommodate your needs.
You can achieve exactly once processing of Cloud Pub/Sub message streams using Cloud Dataflow PubsubIO. PubsubIO de-duplicates messages on custom message identifiers or those assigned by Cloud Pub/Sub.
Source: https://cloud.google.com/pubsub/docs/faq#duplicates
If interested in a more fundamental exploration of the why we see:
There is No Now - Problems with simultaneity in distributed systems
I need to design a system that allows
Users to subscribe to any topic
No defined topic limit
Control over sending to one device, or all
Recovery when offline clients, (or APNS) that drops a notification. Provide a way to catch up via REST
Discard all updates older than age T.
I studied many different solutions, such as Notification Hubs, Service Bus, Event Hub... and now discovered Kafka and not sure if that's a good fit.
Draft architecture
Use an Event Hub to listen for mobile deviceID registrations, and userIDs that requests for topic subscriptions .. Pass that to Reddis, below
If registering a phone/subscribing to a topic, save the deviceID userID to the topic key.
If sending a message to a topic, query Reddis for the topic key, and send that result to a FIFO queue for processing.
Pipe the output of the previous query into the built in Reddis Pub/Sub features to alert worker roles that there is work pending.
While the workers send notices to Apple and Firebase, archive out the sent notices to some in-memory store below.
Archive server maintains a history of sent events, so that out-of-sync devices can get the most up to date information LIFO-queue style.
Question
What are your thoughts on using this approach to solve the above needs?
What other things should I learn, research, or experiment (measure)?
I am publishing serial data to the Google PubSub system, and am needing to send that data to a local RabbitMQ (AMQP) server. Does anyone have any thoughts or know of a method?
Thanks
I am not sure why you would even want to do that, but I think the only way to do it is to create a separate subscription in a topic from which you want to send the messages, then write a script that would pull the messages from it, send them to rabbit and acknowledge them.