Google Cloud Pubsub Data lost - google-cloud-messaging

I'm experiencing a problem with GCP pubsub where a small percentage of data was lost when publishing thousands of messages in couple seconds.
I'm logging both message_id from pubsub and a session_id unique to each message on both the publishing end as well as the receiving end, and the result I'm seeing is that some message on the receiving end has same session_id, but different message_id. Also, some messages were missing.
For example, in one test I send 5,000 messages to pubsub, and exactly 5,000 messages were received, with 8 messages lost. The log lost messages look like this:
MISSING sessionId:sessionId: 731 (missing in log from pull request, but present in log from Flask API)
messageId FOUND: messageId:108562396466545
API: 200 **** sessionId: 731, messageId:108562396466545 ******(Log from Flask API)
Pubsub: sessionId: 730, messageId:108562396466545(Log from pull request)
And the duplicates looks like:
======= Duplicates FOUND on sessionId: 730=======
sessionId: 730, messageId:108562396466545
sessionId: 730, messageId:108561339282318
(both are logs from pull request)
All missing data and duplicates look like this.
From the above example, it is clear that some messages has taken the message_id of another message, and has been sent twice with two different message_ids.
I wonder if anyone would help me figure out what is going on? Thanks in advance.
Code
I have an API sending message to pubsub, which looks like this:
from flask import Flask, request, jsonify, render_template
from flask_cors import CORS, cross_origin
import simplejson as json
from google.cloud import pubsub
from functools import wraps
import re
import json
app = Flask(__name__)
ps = pubsub.Client()
...
#app.route('/publish', methods=['POST'])
#cross_origin()
#json_validator
def publish_test_topic():
pubsub_topic = 'test_topic'
data = request.data
topic = ps.topic(pubsub_topic)
event = json.loads(data)
messageId = topic.publish(data)
return '200 **** sessionId: ' + str(event["sessionId"]) + ", messageId:" + messageId + " ******"
And this is the code I used to read from pubsub:
from google.cloud import pubsub
import re
import json
ps = pubsub.Client()
topic = ps.topic('test-xiu')
sub = topic.subscription('TEST-xiu')
max_messages = 1
stop = False
messages = []
class Message(object):
"""docstring for Message."""
def __init__(self, sessionId, messageId):
super(Message, self).__init__()
self.seesionId = sessionId
self.messageId = messageId
def pull_all():
while stop == False:
m = sub.pull(max_messages = max_messages, return_immediately = False)
for data in m:
ack_id = data[0]
message = data[1]
messageId = message.message_id
data = message.data
event = json.loads(data)
sessionId = str(event["sessionId"])
messages.append(Message(sessionId = sessionId, messageId = messageId))
print '200 **** sessionId: ' + sessionId + ", messageId:" + messageId + " ******"
sub.acknowledge(ack_ids = [ack_id])
pull_all()
For generating session_id, sending request & logging response from API:
// generate trackable sessionId
var sessionId = 0
var increment_session_id = function () {
sessionId++;
return sessionId;
}
var generate_data = function () {
var data = {};
// data.sessionId = faker.random.uuid();
data.sessionId = increment_session_id();
data.user = get_rand(userList);
data.device = get_rand(deviceList);
data.visitTime = new Date;
data.location = get_rand(locationList);
data.content = get_rand(contentList);
return data;
}
var sendData = function (url, payload) {
var request = $.ajax({
url: url,
contentType: 'application/json',
method: 'POST',
data: JSON.stringify(payload),
error: function (xhr, status, errorThrown) {
console.log(xhr, status, errorThrown);
$('.result').prepend("<pre id='json'>" + JSON.stringify(xhr, null, 2) + "</pre>")
$('.result').prepend("<div>errorThrown: " + errorThrown + "</div>")
$('.result').prepend("<div>======FAIL=======</div><div>status: " + status + "</div>")
}
}).done(function (xhr) {
console.log(xhr);
$('.result').prepend("<div>======SUCCESS=======</div><pre id='json'>" + JSON.stringify(payload, null, 2) + "</pre>")
})
}
$(submit_button).click(function () {
var request_num = get_request_num();
var request_url = get_url();
for (var i = 0; i < request_num; i++) {
var data = generate_data();
var loadData = changeVerb(data, 'load');
sendData(request_url, loadData);
}
})
UPDATE
I made a change on the API, and the issue seems to go away. The changes I made was instead of using one pubsub.Client() for all request, I initialized a client for every single request coming in. The new API looks like:
from flask import Flask, request, jsonify, render_template
from flask_cors import CORS, cross_origin
import simplejson as json
from google.cloud import pubsub
from functools import wraps
import re
import json
app = Flask(__name__)
...
#app.route('/publish', methods=['POST'])
#cross_origin()
#json_validator
def publish_test_topic():
ps = pubsub.Client()
pubsub_topic = 'test_topic'
data = request.data
topic = ps.topic(pubsub_topic)
event = json.loads(data)
messageId = topic.publish(data)
return '200 **** sessionId: ' + str(event["sessionId"]) + ", messageId:" + messageId + " ******"

Talked with some guy from Google, and it seems to be an issue with the Python Client:
The consensus on our side is that there is a thread-safety problem in the current python client. The client library is being rewritten almost from scratch as we speak, so I don't want to pursue any fixes in the current version. We expect the new version to become available by end of June.
Running the current code with thread_safe: false in app.yaml or better yet just instantiating the client in every call should is the work around -- the solution you found.
For detailed solution, please see the Update in the question

Google Cloud Pub/Sub message IDs are unique. It should not be possible for "some messages [to] taken the message_id of another message." The fact that message ID 108562396466545 was seemingly received means that Pub/Sub did deliver the message to the subscriber and was not lost.
I recommend you check how your session_ids are generated to ensure that they are indeed unique and that there is exactly one per message. Searching for the sessionId in your JSON via a regular expression search seems a little strange. You would be better off parsing this JSON into an actual object and accessing fields that way.
In general, duplicate messages in Cloud Pub/Sub are always possible; the system guarantees at-least-once delivery. Those messages can be delivered with the same message ID if the duplication happens on the subscribe side (e.g., the ack is not processed in time) or with a different message ID (e.g., if the publish of the message is retried after an error like a deadline exceeded).

You shouldn't need to create a new client for every publish operation. I'm betting that the reason that that "fixed the problem" is because it mitigated a race that exists in the publisher client side. I'm also not convinced that the log line you've shown on the publisher side:
API: 200 **** sessionId: 731, messageId:108562396466545 ******
corresponds to a successful publish of sessionId 731 by publish_test_topic(). Under what conditions is that log line printed? The code that has been presented so far does not show this.

Related

Encrypted access token request to google api failed with 400 code

Recently I come up a scenario where I need to encrypt a WEB API request and response using PyCryptodome inside Synapse notebook activity. I am trying to make a call to Google API, but the request should be encrypted and similarly response should be encrypted. After making the call with encrypted data, I am getting below error.
Error:
error code: 400, message: Invalid JSON Payload received. Unexpected Token, Status: Invalid argument.
I have written below code:-
import os
import requests
import json
import base64
from Crypto import Random
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
from Crypto.Util.padding import pad,unpad
import secrets
key= os.urandom(16)
iv = Random.new().read(AES.block_size)
def encrypt_data(key, data):
BS = AES.block_size
pad = lambda s: s + ((BS - len(s) % BS) * chr(BS - len(s) % BS)).encode()
cipher = AES.new(key, AES.MODE_CBC, iv)
encrypted_data = base64.b64encode(cipher.encrypt(pad(data)))
return encrypted_data
url = "https://accounts.google.com/o/oauth2/token"
client_Id = "XXXXX"
client_secret = "YYYYY"
grant_type = "refresh_token"
refresh_token = "ZZZZZZ"
access_type="offline"
data = {"grant_type":grant_type,
"client_id":client_Id,
"client_secret":client_secret,
"refresh_token":refresh_token,
"access_type":access_type
}
encode_data = json.dumps(data).encode("utf-8")
encrypt_data = encrypt_data(key,encode_data)
response = requests.post(url, data = encrypt_data)
print(response.content)
It would be really helpful if someone can give me idea or guide me on how I can achieve this.
Thank You!

Having trouble running multiple functions in Asyncio

I'm a novice programmer looking to build a script that reads a list of leads from Google Sheets and then messages them on telegram. I want to separate out the first and second message by three days thats why im separating the methods.
import asyncio
from telethon import TelegramClient
from telethon.errors.rpcerrorlist import SessionPasswordNeededError
import logging
from async_class import AsyncClass, AsyncObject, task, link
from sheetdata import *
logging.basicConfig(format='[%(levelname) 5s/%(asctime)s] %(name)s: %(message)s',
level=logging.WARNING)
api_id = id
api_hash = 'hash'
phone='phone'
username='user'
client = TelegramClient(username, api_id, api_hash)
#already been touched once
second_touch_array=[]
#touched twice
third_touch_array=[]
async def messageCount(userid):
count = 0
async for message in client.iter_messages(userid):
count+=1
yield count
async def firstMessage():
#clear prospects from array and readData from google sheet
clearProspects()
readData(sheet)
#loop through prospects and send first message
for user in prospect_array:
#check if we already messaged the prospect. If we haven't, execute function
if(messageCount(user.id) == 0):
await client.send_message(user.id, 'Hi')
second_touch_array.append(prospect(user.name, user.company, user.id))
print("First Message Sent!")
else:
print("Already messaged!")
async def secondMessage():
for user in second_touch_array:
if(messageCount(user.id) == 1):
await client.send_message(user.id, 'Hello')
third_touch_array.append(prospect(user.name, user.company, user.id))
print("Second Message Sent!")
else:
print("Prospect has already replied!")
async def main():
# Getting information about yourself
me = await client.get_me()
await firstMessage()
await secondMessage()
for user in second_touch_array:
print(user.name, user.company, user.id)
with client:
client.loop.run_until_complete(main())
Anyways, when I run my code i'm successfully getting the "Already Messaged!" print statement in my terminal from the firstMessage function.
This is good - it's detecting I've already messaged the one user on my Google Sheets list; however, my second function isn't being called at all. I'm not getting any print statement and every time I try to print the contents of the second array nothing happens.
If you have any advice it would be greatly appreciated :)

Akka http - SSE - Not receiving streaming Json response

I am playing with Server Sent Events to get updates from akka-http v2.4.11 based micro-service. I am using akka-sse. For some reason, I am not receiving any updates on my Javascript front-end. However, as soon as, I terminate or kill the server process, I get some of the messages in the front-end. My code looks like this:
val start = ByteString.empty
val sep = ByteString("\n")
val end = ByteString.empty
import Fill._
implicit val jsonStreamingSupport: JsonEntityStreamingSupport =
EntityStreamingSupport.json()
.withFramingRenderer(Flow[ByteString].intersperse(start,
sep,
end))
import de.heikoseeberger.akkasse.EventStreamMarshalling._
def routes: Route = pathPrefix("subscribe") {
path("fills") {
get {
complete {
Source.actorPublisher[Fill](FillProvider())
.map(fill ⇒ sse(fill))
.keepAlive(1.second, () ⇒ ServerSentEvent.heartbeat)
}
}
}
}
def sse[T: ClassTag](obj: T)(implicit w: JsonWriter[T]): ServerSentEvent = {
ServerSentEvent(data = w.write(obj).compactPrint,
eventType = classTag[T].runtimeClass.getSimpleName)
}
Any pointers what I can be doing wrong? To me, it seems that I am following every instructions as mentioned here

pika dropping basic_publish messages sporadically

I'm trying to publish messages with pika, using Celery tasks.
from celery import shared_task
from django.conf import settings
import json
#shared_task
def publish_message():
params = pika.URLParameters(settings.BROKER_URL + '?' + 'socket_timeout=10&' + 'connection_attempts=2')
conn = pika.BlockingConnection(parameters=params)
channel = conn.channel()
channel.exchange_declare(
exchange = 'foo',
type='topic'
)
channel.tx_select()
channel.basic_publish(
exchange = 'foo',
routing_key = 'bar',
body = json.dumps({'foo':'bar'}),
properties = pika.BasicProperties(content_type='application/json')
)
channel.tx_commit()
conn.close()
This task is called from the views.
Due to some weird reason, sometimes randomly, the messages are not getting queued. In my case, every second message is getting dropped. What am I missing here?
I would recommend that you enable confirm_delivery in pika. This will ensure that messages get delivered properly, and if for some reason the message could not be delivered. Pika will fail with either an exception, or return False.
channel.confirm_delivery()
successful = channel.basic_publish(...)
If the process fails you can try to send the message again, or log the error message from the exception so that you can act accordingly.
Try this:
chanel = conn.channel()
try:
chanel.queue_declare(queue='foo')
except:
pass
chanel.basic_publish(
exchange='',
routing_key='foo',
body=json.dumps({'foo':'bar'})
)

Sending form data with an HTTP PUT request using Grinder API

I'm trying to replicate the following successful cURL operation with Grinder.
curl -X PUT -d "title=Here%27s+the+title&content=Here%27s+the+content&signature=myusername%3A3ad1117dab0ade17bdbd47cc8efd5b08" http://www.mysite.com/api
Here's my script:
from net.grinder.script import Test
from net.grinder.script.Grinder import grinder
from net.grinder.plugin.http import HTTPRequest
from HTTPClient import NVPair
import hashlib
test1 = Test(1, "Request resource")
request1 = HTTPRequest(url="http://www.mysite.com/api")
test1.record(request1)
log = grinder.logger.info
test1.record(log)
m = hashlib.md5()
class TestRunner:
def __call__(self):
params = [NVPair("title","Here's the title"),NVPair("content", "Here's the content")]
params.sort(key=lambda param: param.getName())
ps = ""
for param in params:
ps = ps + param.getValue() + ":"
ps = ps + "myapikey"
m.update(ps)
params.append(NVPair("signature", ("myusername:" + m.hexdigest())))
request1.setFormData(tuple(params))
result = request1.PUT()
The test runs okay, but it seems that my script doesn't actually send any of the params data to the API, and I can't work out why. There are no errors generated, but I get a 401 Unauthorized response from the API, indicating that a successful PUT request reached it, but obviously without a signature the request was rejected.
This isn't exactly an answer, more of a workaround that I came up with, that I've decided to post since this question hasn't yet received any responses, and it may help anyone else trying to achieve the same thing.
The workaround is basically to use the httplib and urllib modules to build and make the PUT request instead of the HTTPClient module.
import hashlib
import httplib, urllib
....
params = [("title", "Here's the title"),("content", "Here's the content")]
params.sort(key=lambda param: param[0])
ps = ""
for param in params:
ps = ps + param[1] + ":"
ps = ps + "myapikey"
m = hashlib.md5()
m.update(ps)
params.append(("signature", "myusername:" + m.hexdigest()))
params = urllib.urlencode(params)
print params
headers = {"Content-type": "application/x-www-form-urlencoded"}
conn = httplib.HTTPConnection("www.mysite.com:80")
conn.request("PUT", "/api", params, headers)
response = conn.getresponse()
print response.status, response.reason
print response.read()
conn.close()
(Based on the example at the bottom of this documentation page.)
You have to refer to the multi-form posting example in Grinder script gallery, but changing the Post to Put. It works for me.
files = ( NVPair("self", "form.py"), )
parameters = ( NVPair("run number", str(grinder.runNumber)), )
# This is the Jython way of creating an NVPair[] Java array
# with one element.
headers = zeros(1, NVPair)
# Create a multi-part form encoded byte array.
data = Codecs.mpFormDataEncode(parameters, files, headers)
grinder.logger.output("Content type set to %s" % headers[0].value)
# Call the version of POST that takes a byte array.
result = request1.PUT("/upload", data, headers)