decode snappy messages using Bigtable cbt - bigtable

I have some data in bigtable, but the messages are all snappy encoded. Does anyone know a way to decode it?
cbt -project project-1 -instance test read events count=10 prefix=test123
This gives me output but its snappy encoded

Related

Proper recovery of KSQL target stream to preserve offset for Connector to avoid duplicate records

We recently adopted Kafka Streams via KSQL to convert our source JSON topics into AVRO so that our S3 Sink Connector can store the data into Parquet format in their respective buckets.
Our Kafka cluster was taken down over the weekend and we've noticed that some of our target streams (avro) have no data, yet all of our source streams do (checked via print 'topic_name'; with ksql).
I know that I can drop the target stream and recreate it but will that lose the offset and duplicate records in our Sink?
Also, I know that if I recreate the target stream with the same topic name, I may run into the "topic already exists, with different partition/offset.." thus I am hesitant to try this.
So what is the best way to recreate/recover our target streams such that we preserve the topic name and offset for our Sink Connector?
Thanks.

how to impl kafka-connect with my own message

I have Kafka topic contains binary message (byte Arr).
I would like to write the message to S3 as Parquet format.
I tried to use Kafka Connect and struggled with the configuration.
My Kafka contains also some Kafka Headers that need to written to parquet as well.
what is the right configuration in this case?
it's not Avro and not Json.
can i write the byte Arr as is to the parquet file without serializing it?
Thanks

Convert Avro in Kafka to Parquet directly into S3

I have topics in Kafka that are stored in Avro format. I would like to consume the entire topic (which at time of receipt will not change any messages) and convert it into Parquet, saving directly on S3.
I currently do this but it requires me consuming the messages from Kafka one a time and processing on a local machine, converting them to a parquet file, and once the entire topic is consumed and the parquet file completely written, close the writing process and then initiate an S3 multipart file upload. Or | Avro in Kafka -> convert to parquet on local -> copy file to S3 | for short.
What I'd like to do instead is | Avro in Kafka -> parquet in S3 |
One of the caveats is that the Kafka topic name isn't static, and needs to be fed in an argument, used once, and then never used again.
I've looked into Alpakka and it seems like it might be possible - but it's unclear, I haven't seen any examples. Any suggestions?
You just described Kafka Connect :)
Kafka Connect is part of Apache Kafka, and with the S3 connector plugin. Although, at the moment the development of Parquet support is still in progress.
For a primer in Kafka Connect see http://rmoff.dev/ksldn19-kafka-connect
Try to add "format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat" in your PUT request when you set up your connector.
You can find more details here.

stream analytics and compressed avro

Hi currenlty were are sending avro message to eventhub. With stream analytics we read from this eventhub. We saw that it is possible to compress our avro in deflate compression. Can Stream analytics read deflated compressed avro
ASA natively supports Avro, CSV or raw JSON for input or output sinks.
I personally use this ability to get Avro -> JSON deserialization for "free" without any custom code. You just have to tell ASA the serialization format when configuring the inputs and outputs:
ASA does not support compressed messages for any serialization format as of now, except for avro. Deflate should work.

Is REDIS Pub/Sub apt for moderate size binary data?

I've got jobs that I'm planning to send to workers via REDIS Pub/Sub. Job involves processing an image (JPEG, 20KB-800KB, typically around 150KB).
Is it a good idea to send the image directly as the message's payload?
I don't see this as a problem at all. If you are confident your subscriber(s)/worker(s) will be able to keep up and you won't risk running out of RAM then I think this is a valid approach. I don't know if its better than nginx streaming as suggested, but being an in-memory data store redis should scale pretty close to the hardware and network limits.
Keep in mind that redis pub/sub is not "durable" so if an image is published to a channel no one is currently subscribed to it won't get picked up. The image would just go nowhere.
You could build a durable queue pretty easy using a redis List if you need durability.
You can encode the JPEG file by base64 into string, and publish the string to the channel.
The size of send data(payload JPEG file) would be increased to about 1.5x to 2x.