Integrate BigQuery SubPub and Cloud Functions - google-bigquery

I'm in a project the we need to use BigQuery, PubSub, Logs explorer and Cloud Functions.
The project:
Every time certain event occurs (like an user accepting cookies), a system inserts a new query into BigQuery with a lot of columns (params) like: utm_source, utm_medium, consent_cookies, etc...
Once I have this new query in my table I need to read the columns and get the values to use in a cloud function.
In the cloud function I want to use those values to make api calls.
What I manage to do so far:
I created a log routing sink that filter the new entries and send the log to my PubSub topic.
Where I'm stuck:
I want to create a Cloud function that triggers every time a new log comes in and in that function I want to access the information that is contained in the log, such as utm_source, utm_medium, consent_cookies, etc... And use values to make api calls.
Anyone can help me? Many MANY thanks in advance!
I made a project to illustrate the flow:
Insert to table:
2.From this insertion create a sink in logging: (filtering)
Now every time I create a new query it goes to PUB/SUB i get the log of the query
What I want to do is to trigger a function on this topic and use the values I have in the query to do operations like call api etc...
So far I was able to write this code:
"use strict";
function main() {
// Import the Google Cloud client library
const { BigQuery } = require("#google-cloud/bigquery");
async function queryDb() {
const bigqueryClient = new BigQuery();
const sqlQuery = `SELECT * FROM \`mydatatable\``;
const options = {
query: sqlQuery,
location: "europe-west3",
};
// Run the query
const [rows] = await bigqueryClient.query(options);
rows.forEach((row) => {
const username = row.user_name;
});
}
queryDb();
}
main();
Now I'm again stuck, Idont know how to get the correct query from the sink I created and use the info to make my calls...

You have 2 solutions to call your Cloud Functions from a PubSub message
HTTP Functions: You can set up a HTTP call. Create your Cloud Function in trigger-http, and create a push subscription on your PubSub topic to call the Cloud Functions. Don't forget to add security (make your function private and enable security on PubSub) because your function is publicly accessible
Background functions: You can bind directly your Cloud Functions to PubSub topic. A subscription is automatically created and linked to the Cloud Functions. The security is built-in.
And, because you have 2 types of functions, you have 2 different function signatures. I provide you both, the processing is the (quite) same.
function extractQuery(pubSubMessage){
// Decide base64 the PubSub message
let logData = Buffer.from(pubSubMessage, 'base64').toString();
// Convert it in JSON
let logMessage= JSON.parse(logData)
// Extract the query from the log entry
let query = logMessage.protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.query.query
console.log(query)
return query
}
// For HTTP functions
exports.bigqueryQueryInLog = (req, res) => {
console.log(req.body)
const query = extractQuery(req.body.message.data)
res.status(200).send(query);
}
// For Background functions
exports.bigqueryQueryInLogTopic = (message, context) => {
extractQuery(message.data)
};
The query logged is the insert into... that you have in your log entry. Then, you have to parse your SQL request to extract the part that you want.

Related

Why some addresses not returning eth balance from BigQuery

I've been playing around with BigQuery's Ethereum ETL. There's a table specifically for mapping wallet addresses to eth balances in gwei. Lots of addresses work just fine, but in the example below you'll find one famous wallet (Justin Beiber) that definitely has eth, but doesn't appear in the table
Anyone know if there are reasons for this like wallet age, or if it's just gaps in data in BigQuery? I'm happy to use other services to pull eth info for the missing addresses, but of course ideally I can get 100% from this source
function main() {
// Import the Google Cloud client library
const {BigQuery} = require('#google-cloud/bigquery');
async function getWalletBalances() {
// Create a client
const bigqueryClient = new BigQuery();
// The SQL query to run
const sqlQuery = `SELECT eth_balance
FROM \`bigquery-public-data.crypto_ethereum.balances\`
WHERE address = '0xE21DC18513e3e68a52F9fcDaCfD56948d43a11c6'`;
console.log(sqlQuery);
const options = {
query:
sqlQuery,
// Location must match that of the dataset(s) referenced in the query.
location: 'US',
params: {
},
};
// Run the query
const [rows] = await bigqueryClient.query(options);
console.log('Rows:');
rows.forEach(row => console.log(row));
}
getWalletBalances();
}
main();
If you want 100% up-to-date data from the Ethereum blockchain, I recommend you connect to a JSON-RPC node. Infura provides free RPCs, and I personally recommend using ethers.js. The function you're looking for is https://docs.ethers.io/v5/api/providers/provider/#Provider-getBalance.
It appears some addresses do not appear in the balances tables correctly.
It may be due to an issue with addresses which have polygon and eth transactions. I have seen other examples of this phenomena.
However, wrapping the address condition in a substr- does get the correct balance. Unclear exactly why. If anyone has the answer it would be useful.
SELECT * FROM `bigquery-public-data.crypto_ethereum.balances` WHERE contains_substr(`address`, '0xE21DC18513e3e68a52F9fcDaCfD56948d43a11c6')

How to push Salesforce Order to an external REST API?

I have experience in Salesforce administration, but not in Salesforce development.
My task is to push a Order in Salesforce to an external REST API, if the order is in the custom status "Processing" and the Order Start Date (EffectiveDate) is in 10 days.
The order will be than processed in the down-stream system.
If the order was successfully pushed to the REST API the status should be changed to "Activated".
Can anybody give me some example code to get started?
There's very cool guide for picking right mechanism, I've been studying from this PDF for one of SF certifications: https://developer.salesforce.com/docs/atlas.en-us.integration_patterns_and_practices.meta/integration_patterns_and_practices/integ_pat_intro_overview.htm
A lot depends on whether the endpoint is accessible from Salesforce (if it isn't - you might have to pull data instead of pushing), what authentication it needs.
For push out of Salesforce you could use
Outbound Message - it'd be an XML document sent when (time-based in your case?) workflow fires, not REST but it's just clicks, no code. The downside is that it's just 1 object in message. So you can send Order header but no line items.
External Service would be code-free and you could build a flow with it.
You could always push data with Apex code (something like this). We'd split the solution into 2 bits.
The part that gets actual work done: At high level you'd write function that takes list of Order ids as parameter, queries them, calls req.setBody(JSON.serialize([SELECT Id, OrderNumber FROM Order WHERE Id IN :ids]));... If the API needs some special authentication - you'd look into "Named Credentials". Hard to say what you'll need without knowing more about your target.
And the part that would call this Apex when the time comes. Could be more code (a nightly scheduled job that makes these callouts 1 minute after midnight?) https://salesforce.stackexchange.com/questions/226403/how-to-schedule-an-apex-batch-with-callout
Could be a flow / process builder (again, you probably want time-based flows) that calls this piece of Apex. The "worker" code would have to "implement interface" (a fancy way of saying that the code promises there will be function "suchAndSuchName" that takes "suchAndSuch" parameters). Check Process.Plugin out.
For pulling data... well, target application could login to SF (SOAP, REST) and query the table of orders once a day. Lots of integration tools have Salesforce plugins, do you already use Azure Data Factory? Informatica? BizTalk? Mulesoft?
There's also something called "long polling" where client app subscribes to notifications and SF pushes info to them. You might have heard about CometD? In SF-speak read up about Platform Events, Streaming API, Change Data Capture (although that last one fires on change and sends only the changed fields, not great for pushing a complete order + line items). You can send platform events from flows too.
So... don't dive straight to coding the solution. Plan a bit, the maintenance will be easier. This is untested, written in Notepad, I don't have org with orders handy... But in theory you should be able to schedule it to run at 1 AM for example. Or from dev console you can trigger it with Database.executeBatch(new OrderSyncBatch(), 1);
public class OrderSyncBatch implements Database.Batchable, Database.AllowsCallouts {
public Database.QueryLocator start(Database.BatchableContext bc) {
Date cutoff = System.today().addDays(10);
return Database.getQueryLocator([SELECT Id, Name, Account.Name, GrandTotalAmount, OrderNumber, OrderReferenceNumber,
(SELECT Id, UnitPrice, Quantity, OrderId FROM OrderItems)
FROM Order
WHERE Status = 'Processing' AND EffectiveDate = :cutoff]);
}
public void execute(Database.BatchableContext bc, List<sObject> scope) {
Http h = new Http();
List<Order> toUpdate = new List<Order>();
// Assuming you want 1 order at a time, not a list of orders?
for (Order o : (List<Order>)scope) {
HttpRequest req = new HttpRequest();
HttpResponse res;
req.setEndpoint('https://example.com'); // your API endpoint here, or maybe something that starts with "callout:" if you'd be using Named Credentials
req.setMethod('POST');
req.setHeader('Content-Type', 'application/json');
req.setBody(JSON.serializePretty(o));
res = h.send(req);
if (res.getStatusCode() == 200) {
o.Status = 'Activated';
toUpdate.add(o);
}
else {
// Error handling? Maybe just debug it, maybe make a Task for the user or look into
// Database.RaisesPlatformEvents
System.debug(res);
}
}
update toUpdate;
}
public void finish(Database.BatchableContext bc) {}
public void execute(SchedulableContext sc){
Database.executeBatch(new OrderSyncBatch(), Limits.getLimitCallouts()); // there's limit of 10 callouts per single transaction
// and by default batches process 200 records at a time so we want smaller chunks
// https://developer.salesforce.com/docs/atlas.en-us.apexref.meta/apexref/apex_methods_system_limits.htm
// You might want to tweak the parameter even down to 1 order at a time if processing takes a while at the other end.
}
}

Parse server useMasterKey syntax

I am using a simple query to increment the stock of a product. The query works when the class level permissions are set to public read and write however I cannot work out how to get the query to use the master key so that the class can be restricted from client-side changes. How should this be done?
itemQuery.equalTo('productName', items[count]);
itemQuery.first({
success: function(object) {
// Successfully retrieved the object.
object.increment('stock', 1);
object.save();
},
});
Set the class level permissions to restrict access as you see fit, then in cloud code you have two options: (1) user master key for the whole cloud function:
Parse.Cloud.useMasterKey();
itemQuery.equalTo('productName', items[count]);
// and so on...
Or (2) better, apply master key as an option for only the action that might be restricted:
// etc
object.save(null, { useMasterKey: true });

SQL Azure rest api BeginExport... how to check if export completed

I need to programmatically export an SQL Azure database to a BACPAC file and once the export has completed I need to delete the database.
The SQL Azure REST API allows me to submit an export request which will run and export the database to a blob storage container.
But... I can't see how to check on the status of the export request.
Here's the export api description: https://learn.microsoft.com/en-us/rest/api/sql/Databases%20-%20Import%20Export/Export
And the overall SQL api description: https://learn.microsoft.com/en-us/rest/api/sql/
The sys.dm_ operation_status DMV should help you know the status of the operation.
SELECT * FROM sys.dm_ operation_status
WHERE major_resource_id = ‘myddb’
ORDER BY start_time DESC;
For more inromation about this DMV, please visit this documentation.
If you use PowerShell New-Azure​RmSql​Database​Export cmdlet you can use Get-AzureRmSqlDatabaseImportExportStatus cmdlet to track the progress of an export operation and of an import operation too.
For any api such as BeginX(), there is a corresponding api X() which waits for completion. In this case, instead of BeginExport() use Export().
If you wish to have more direct control over the polling, then you can look inside the definition of Export and directly use the lower layer:
public async Task<AzureOperationResponse<ImportExportResponse>> ExportWithHttpMessagesAsync(string resourceGroupName, string serverName, string databaseName, ExportRequest parameters, Dictionary<string, List<string>> customHeaders = null, CancellationToken cancellationToken = default(CancellationToken))
{
// Send request
AzureOperationResponse<ImportExportResponse> _response = await BeginExportWithHttpMessagesAsync(resourceGroupName, serverName, databaseName, parameters, customHeaders, cancellationToken).ConfigureAwait(false);
// Poll for completion
return await Client.GetPostOrDeleteOperationResultAsync(_response, customHeaders, cancellationToken).ConfigureAwait(false);
}
This answer is specifically for .net but for other languages the same principle applies.

Servicestack.Redis Pub/Sub limitations with other nested Redis commands

I am having a great experience with ServiceStack & Redis, but I'm confused by ThreadPool and Pub/Sub within a thread, and an apparent limitation for accessing Redis within a message callback. The actual error I get states that I can only call "Subscribe" or "Publish" within the "current context". This happens when I try to do another Redis action from the message callback.
I have a process that must run continuously. In my case I can't just service a request one time, but must keep a thread alive all the time doing calculations (and controlling these threads from a REST API route is ideal). Data must come in to the process on a regular basis, and data must be published. The process must also store and retrieve data from Redis. I am using routes and services to take data in and store it in Redis, so this must take place async from the "calculation" process. I thought pub/sub would be the answer to glue the pieces together, but so far that does not seem possible.
Here is how my code is currently structured (the code with the above error). This is the callback for the route that starts the long term "calculation" thread:
public object Get(SystemCmd request)
{
object ctx = new object();
TradingSystemCmd SystemCmd = new TradingSystemCmd(request, ctx);
ThreadPool.QueueUserWorkItem(x =>
{
SystemCmd.signalEngine();
});
return (retVal); // retVal defined elsewhere
}
Here is the SystemCmd.signalEngine():
public void signalEngine(){
using (var subscription = Redis.CreateSubscription())
{
subscription.OnSubscribe = channel =>
{
};
subscription.OnUnSubscribe = channel =>
{
};
subscription.OnMessage = (channel, msg) =>
{
TC_CalcBar(channel, redisTrade);
};
subscription.SubscribeToChannels(dmx_key); //blocking
}
}
The "TC_CalcBar" call does processing on data as it becomes available. Within this call is a call to Redis for a regular database accesses (and the error). What I could do would be to remove the Subscription and use another method to block on data being available in Redis. But the current approach seemed quite nice until it failed to work. :-)
I also don't know if the ThreadPool has anything to do with the error, or not.
As per Redis documentation:
Once the client enters the subscribed state it is not supposed to
issue any other commands, except for additional SUBSCRIBE, PSUBSCRIBE,
UNSUBSCRIBE and PUNSUBSCRIBE commands.
Source : http://redis.io/commands/subscribe