Auto-refresh Derived Tables in BigQuery web UI - google-bigquery

I'm using the BigQuery Web UI to derive several custom tables from our firebase app event data (one big messy table). This allows other display services (in this case Google Data Studio) to display dynamic dashboards and reports.
The problem is that I can't get my derived tables to auto-update each morning. Instead I have to manually re-run the queries in order to refresh the data. Is there a way to configure these jobs to run in the interface? Or do I have to configure jobs somewhere else? Thanks.

While not as simple as scheduled table materialization, you could set up a cron in Google App Engine to kick off your daily query job. I believe this should easily remain within the free tier.
There are docs for both Python and Java.
It looks like you can schedule daily tasks with Apps Script as well.
Here is a quick example I tried.
Go to: script.google.com
Enable advanced services (Menu > Resources > Advanced Services ..., then turn on BigQuery).
Name the default function something more descriptive, I chose "rebuildTables".
Click the "triggers" button (looks like a clock with a tack sticking out of it).
Give your project a name. I chose "BigQuery -- Build Daily Tables".
Now you can add a trigger. For example: time-driven, day timer, 5am to 6am
Edit the script. Borrowed from here: https://developers.google.com/apps-script/advanced/bigquery
function rebuildTables() {
// Replace this value with the project ID listed in the Google
// Developers Console project.
var projectId = 'EXAMPLE_PROJECT';
var request = {
configuration: {
query: {
query: 'SELECT 17;',
destinationTable: {
"projectId": "EXAMPLE_PROJECT",
"datasetId": "EXAMPLE_DATASET",
"tableId": "EXAMPLE_TABLE"
},
writeDisposition: "WRITE_TRUNCATE"
}
}
};
// Fire-and-forget.
BigQuery.Jobs.insert(request, projectId);
}

Related

how to list windows per KDE/Plasma5 Activity

I am trying to write a script that launches an app if not running or activates the window if already visible in the current activity.
Using xdotool or wmctrl I am able to get the list of windows and activate them. If they are not open, then I can launch them. But the problem comes with KDE Activities. These tools list windows from all the activities even if they are not visible in current activity.
I am going through various qdbus methods but not finding anything close.
have anyone created such scripts? how one could get the windows visibility with respect to the activities?
Edits:
as shown in the picture below , I was able to see the activity IDs that the window is attached to. But I am not able to find any way to get it programmatically.
An alternative aproach was given in kde forum. But it is not completely clear if it can help solve your issue.
The recommendation is as follows:
On the activity level you can make use of URI > Activity relations and
query dbus for further scripting. For example:
Link a directory to an activity in dolphin.
Add an application "dolphin-directive" to application launcher and make it run a custom script to conditionally start dolphin instances.
Set "dolphin-directive" as default filemanager
A similar workflow is possible for each filetype via File Association
Settings
As far as I could figure out through experiments it is not possible to link windows to activities and query relations via ActivityManager. I guess the multiple-screen-workspace-uri-activity-window-rule architecture is supposed to setup workflows to solve the problem in a more holistic manner. But hopefully someone can give a better answer here.
I wrote a script to regex inspect the whole session bus tree for related and usefull methods. You can simply use it by ./query-dbus.py --pattern "^.*activit.*$". So the answer is work in progress.
EDIT: some services do have the method isMonitorActivity, isOnActivity
"org.kde.konsole": {
"/Sessions/1": {
"org.kde.konsole.Session": {
"method": [
"setMonitorActivity",
"isMonitorActivity"
]
}
}
}
"org.kde.kate": {
"/MainApplication": {
"org.kde.Kate.Application": {
"method": [
"isOnActivity"
]
}
}
}
}
Did you already file a feature request?

Performance issues with large datasets

Is there any way of filtering the events in a projection associated with a read model by the aggregateId?
In the tests carried out we always receive all registered events. Is it possible to apply filters in a previous stage?
We have 100,000 aggregateId and each id has associated 15,000 events. Unable to filter by aggregateId, our projections have to iterate over all events.
So you have 100.000 aggregates with 15.000 events each.
You can use ReadModel or ViewModel:
Read Model:
Read model can be seen as a read database for your app. So if you want to store some data about each aggregate, you should insert/update row or entry in some table for each aggregate, see Hacker News example read model code.
It is important to understand that resolve read models are built on demand - on the first query. If you have a lot of events, it may take some time.
Another thing to consider - a newly created resolve app is configured to use in-memory database for read models, so on each app start you will have it rebuilt.
If you have a lot of events, and don't want to wait to wait for read models to build each time you start the app, you have to configure a real database storage for your read models.
Configiuring adapters is not well documented, we'll fix this. Here is what you need to write in the relevant config file for mongoDB:
readModelAdapters: [
{
name: 'default',
module: 'resolve-readmodel-mongo',
options: {
url: 'mongodb://127.0.0.1:27017/MyDatabaseName',
}
}
]
Since you have a database engine, you can use it for an event store too:
storageAdapter: {
module: 'resolve-storage-mongo',
options: {
url: 'mongodb://127.0.0.1:27017/MyDatabaseName',
collectionName: 'Events'
}
}
ViewModel
ViewModel is built on the fly during the query. It does not require a storage, but it reads all events for the given aggregateId.
reSolve view models are using snapshots. So if you have 15.000 events for a give aggregate, then on the first request all those events will be applied to calculate a vies state for the first time. After this, this state will be saved, and all subsequent requests will read a snapshot and all later events. By default snapshot is done per 100 events. So on the second query reSolve would read a snapshot for this view model, and apply not more than 100 events to it.
Again, keep in mind, that if you want snapshot storage to be persistent, you should configure a snapshot adapter:
snapshotAdapter: {
module: 'resolve-snapshot-lite',
options: {
pathToFile: 'path/to/file',
bucketSize: 100
}
}
ViewModel has one more benefit - if you use resolve-redux middleware on the client, it will be kept up-to-date there, reactively applying events that app is receiving via websockets.

Camunda BPMN/CMMN: Access to historical User Tasks and Form Data

Trying to wrap my head around how a BPMN/CMMN model can be used in my application.
There are several CMMN User Tasks with Forms as part of my application BPMN process.
I use Embedded User Task Forms
The data submitted by my forms get stored in the task variables and passed out to the parent process using all to all variables mapping.
To progress with the process, the user needs to [claim task], fill out the form and then complete it (via a REST call).
After the User Task with the form is completed is disappears from the list of available tasks in the /task REST endpoint (as well as in the Admin UI).
But what if I'd like to show users the variables that they have submitted to their tasks before completion after they have completed the task?
First, I thought to use the Get Tasks (Historic) (POST).
And that works in a sense that I can see the metadata about the tasks completed by the users before.
But how can I see the variables and actually the HTML form that had been used at the point of task completion? That is, the data available via
/task/{id}/variables
/task/{id}/form
before the task is completed?
The response from /history/task contains neither variables nor the form key.
Trying to access the completed task by its id, like {{camunda}}/task/46386/form or {{camunda}}/task/46386/variables
results in
{
"type": "RestException",
"message": "Cannot get form for task 46386"
}
or
{
"type": "NullValueException",
"message": "task 46386 doesn't exist: task is null"
}
respectively.
I think that I am missing something basic here.
That is probably the principle of the BPMN engine. When tasks are completed, they are considered to be gone forever with no option to access its data later any more (except for basic audit log details)?
Another side-question is whether the task access permissions that were set up in the Authorizations apply to the results returned by the /history/task endpoint?
Update:
Found the way to access the historical variables: Get Variable Instances but not the historical Task form keys.
Found a similar question.

Porting PHP API over to Parse

I am a PHP dev looking to port my API over to the Parse platform.
Am I right in thinking that you only need cloud code for complex operations? For example, consider the following methods:
// Simple function to fetch a user by id
function getUser($userid) {
return (SELECT * FROM users WHERE userid=$userid LIMIT 1)
}
// another simple function, fetches all of a user's allergies (by their user id)
function getAllergies($userid) {
return (SELECT * FROM allergies WHERE userid=$userid)
}
// Creates a script (story?) about the user using their user id
// Uses their name and allergies to create the story
function getScript($userid) {
$user = getUser($userid)
$allergies = getAllergies($userid).
return "My name is {$user->getName()}. I am allergic to {$allergies}"
}
Would I need to implement getUser()/getAllergies() endpoints in Cloud Code? Or can I simply use Parse.Query("User")... thus leaving me with only the getScript() endpoint to implement in cloud code?
Cloud code is for computation heavy operations that should not be performed on the client, i.e. handling a large dataset.
It is also for performing beforeSave/afterSave and similar hooks.
In your example, providing you have set up a reasonable data model, none of the operations require cloud code.
Your approach sounds reasonable. I tend to put simply queries that will most likely not change on the client side, but it all depends on your scenario. When developing mobile apps I tend to put a lot of code in cloud code. I've found that it speeds up my development cycle. For example, if someone finds a bug and it's in cloud code, make the fix, run parse deploy, done! The change is available to all mobile environments instantly!!! If that same code is in my mobile app, it really sucks, cause now I have to fix the bug, rebuild, push it to the app store/google play, wait x number of days for it to be approved, have the users download it... you see where I'm going here.
Take for example your
SELECT * FROM allergies WHERE userid=$userid query.
Even though this is a simple query, what if you want to sort it? maybe add some additional filtering?
These are the kinds of things I think of when deciding where to put the code. Hope this helps!
As a side note, I have also found cloud code very handy when needing to add extra security to my apps.

30 sec periodic task to poll external web service and cache data

I'm after some advice on polling an external web service every 30 secs from a Domino server side action.
A quick bit of background...
We track the location of cars thru the TomTom api. We now have a requirement to show this in our web app, overlayed onto a map (google, bing, etc.) and mashed up with other lat long data from our application. Think of it as dispatching calls to taxis and we want to assign those calls to the taxis (...it's not taxis\ calls, but it is similar process). We refresh the dispatch controllers screens quite aggressively, so they can see the status of all the objects and assign to the nearest car. If we trigger the pull of data from the refresh of the users screen, we get into some tricky controlling server side, else we will hit the max allowable requests per minute to the TomTom api.
Originally I was going to schedule an agent to poll the web service, write to a cached object in our app, and the refreshing dispatch controllers screen pulls the data from our cache....great, except, user requirement is our cache must be updated every 30secs. I can create a program doc that runs every 1 min, but still not aggressive enough.
So we are currently left with: our .net guy will create a service that polls TomTom every 30secs, and we retrieve from his service, or I figure out a way to do in Domino. It would be nice to do in Domino database, and not some stand alone java app or .net, to keep as much of the logic as possible in one system (Domino).
We use backing beans heavily in our system. I will be testing this later today I hope, but would this seem like a sensible route to go down..?:
Spawning threads in a JSF managed bean for scheduled tasks using a timer
...or are their limitations I am not aware of, has anyone tackled this before in Domino or have any comments?
Thanks in advance,
Nick
Check out DOTS (Domino OSGi Tasklet Service): http://www.openntf.org/internal/home.nsf/project.xsp?action=openDocument&name=OSGI%20Tasklet%20Service%20for%20IBM%20Lotus%20Domino
It allows you to define background Java tasks on a Domino server that have all the advantages of agents (can be scheduled or triggered) with none of the performance or maintenance issues.
If you cache the data in a bean (application or session scoped). Have a date object that contains the last refreshed date. When the data is requested, check last cached date against current time. If it's more than/equal to 30 seconds, refresh data.
A way of doing it would be to write a managed bean which is created in the application scope ( aka there can only be one..). In this managed bean you take care of the 30sec polling of the webservice by good old java webservice implementation and a java thread which you start at the creation of your managed-bean something like
public class ServicePoller{
private static myThread = null;
public ServicePoller(){
if(myThread == null){
myThread = new ServicePollThread();
(new Thread(myThread)).start());
}
}
}
class ServicePollThread implements Runnable(){
private hashMap yourcache = null;
public ServicePollThread(){
}
public void run(){
while(running){
doPoll();
Thread.sleep(4000);
}
}
....
}
This managed bean will then poll every 30 seconds the webservice and save it's findings in a hashmap or some other managed-bean classes. This way you dont need to run an agent or something like that and you achieve when you use the dispatch screen to retrieve data from the cache.
Another option would be to write an servlet ( that would be possible with the extlib but I cant find the information right now ) which does the threading and reading the service for you. Then in your database you should be able to read the cache of the servlet and use it wherever you need.
As Tim said DOTS or as jjtbsomhorst said a thread or an Eclipse job.
I've created a video describing DOTS: http://www.youtube.com/watch?v=CRuGeKkddVI&list=UUtMIOCuOQtR4w5xoTT4-uDw&index=4&feature=plcp
Next Monday I'll publish a sample how to do threads and Eclipse jobs. Here is a preview video: http://www.youtube.com/watch?v=uYgCfp1Bw8Q&list=UUtMIOCuOQtR4w5xoTT4-uDw&index=1&feature=plcp