Making webapp using Google BigQuery - google-bigquery

I have a big table containing domain data in Google Big query and I would like to create a web app similar to http://whois.domaintools.com/browse/a/
Page with list of sorted results I can dig into.
Is it possible without making query every time page is opened or refreshed which is most ovious way.
Thanks in advance!

Querying directly BigQuery introduces lags, which will affect frontend performance, and for some users and queries it will be several seconds, hence it's not recommended to be used on a live website, as the most suitable way is to run asynchronously in the background.
You need to build your website, so the website itself reads the data from a cache or from a local database.
You then need to build a background process (Message Queue or Cron) in which you will periodically run the BigQuery job, and process the results and write to your local database. You can then choose to run only every 1 hour or so.
See what you can do with BigQuery
https://github.com/everythingme/redash frontend available at http://demo.redash.io/
http://bigqueri.es/

Related

Dashboard that updates real time, how to structure

I am trying to create a dashboard app for my company that displays data from a few different sources that they use. I am starting with an in house system that stores data in MSSQL. I'm struggling to decide how I can display real time (or at least updated regularly) data based on this database.
I was thinking of writing a node server to poll the company database and check for updates and then store a copy of the relevant tables in my own database. Then creating another node server that computes metrics (average delivery time, Turnover, etc.) from my database and then a frontend (probably react) to display these metrics nicely and trigger the logic in the backend whenever the page is loaded by a user.
This is my first project so just need some guidance on whether this is the right way to go about this or if I'm over complicating it.
Thanks
One of solutions is to implement a CRON job in nodejs or in you frontEnd side, then you can retrieve new Data inserted to your Database.
You can reffer to this link for more information about the CROB job :
https://www.npmjs.com/package/cron
if you are using MySQL, you can use the mysql-events listner, it is a MySQL database and runs callbacks on matched events.
https://www.npmjs.com/package/mysql-events

Service that does advanced queries on a data set, and automatically returns relevant updated results every time new data is added to the set?

I'm looking for a cloud service that can do advanced statistics calculations on a large amount of votes submitted by users, in "real time".
In our app, users can submit different kind of votes like picking a favorite, rating 1-5, say yes/no etc. on various topics.
We also want to show "live" statistics to the user, showing the popularity of a person etc. This will be generated by a rather complex SQL where we are calculating the average number of times a person was picked as favorite, divided by total number of votes and the number of games in which the person has been participating etc. And the score for the latest X games should count higher than the overall score for all games. This is just an example, there are several other SQL queries with similar complexity.
All our presentable data (including calculated statistics) is served from Firestore documents, and the votes will be saved as Firestore documents.
Ideally, the Firebase-backend (functions, firestore etc) should not need to know about the query logic.
What I wish for is a pay as you go cloud service that does the following:
I define some schemas and set up the queries we need for the statistics we have (15-20 different SQLs). Like setting up views in MySQL
On every vote, we push the vote data to this service, which will store it in a row.
The service should then, based on its knowledge about the defined queries, and the content of the pushed vote data, determine which statistics that are affected by the newly added row, and recalculate these. A specific vote type can affect one or more statistics.
Every time a statistic is recalculated, the result should be automatically pushed back to our Firebase backend (for instance by calling an HTTPS endpoint that hits a cloud function) - so we can update the relevant Firestore documents.
The service should be able to throttle the calculations, like only regenerating new statistics every 1 minute despite having several votes per second on the same topic.
Is there any product like this in the market? Or can it be built by combining available cloud services? And what is the official term for such a product, if I should search for it myself?
I know that I can probably build a solution like this myself, and run it on a cloud hosted database server, which can scale as our need grows - but I believe that I'm not the first developer with a need of this, so I hope that someone has solved it before me :)
You can leverage the existing cloud services available on the Google Cloud Platform.
Google BigQuery, Google Cloud Firestore, Google App Engine (CRON Jobs), Google Cloud Tasks
The services can be used to solve the problems mentioned above:
1) Google BigQuery : Here you can define schema for the data on which you're going to run the SQL queries. BigQuery supports Standard and legacy SQL queries.
2) Every vote can be pushed to the defined BigQuery tables using its streaming insert service.
3) Every vote pushed can trigger the recalculation service which calculates the statistics by executing the defined SQL queries and the query results can be stored as documents in collections in Google Cloud Firestore.
4) Google Cloud Firestore: Here you can store the live statistics of the user. This is a real time database, so you'll be able to configure listeners for the modifications to the statistics and show the modifications as soon as the statistics are recalculated.
5) In the same service which inserts every vote, create a new record with a "syncId" in an another table. The idea is to group a number of votes cast in a particular interval to a its corresponding syncId. The syncId can be suffixed with a timestamp. According to your requirement a particular time interval can be set so that the recalculation can be triggered using CRON jobs service which invokes the recalculation service within the interval. Once the recalculation related to a particular syncId is completed the record corresponding to the syncId should be marked as completed.
We are leveraging the above technologies to build a web application on Google Cloud Platform, where the inputs are recorded on Google Firestore and then stream-inserted to Google BigQuery. The data stored in BigQuery is queried after 30 sec of each update using SQL queries and the query results are stored in Google Cloud Firestore to serve dashboards which are automatically updated using listeners configured for the collection in which the dashboard information is stored.

Database for live mobile tracking

I'm developing an app that allows to track a mobile device instantly (live) ... I need an of advice. The application must send the location to a webservice that in it's turn records the received data in a database.
What would be, in your opinion, the best way to store the location values?
I'm new in using bigdata and I'm afraid that simple sql requests wont be able to do the work properly ... I imagine if there is lot of users and each user send a request each 1sec I'll have issue with the database ...
An advice ? Thank you very much
i think you could have a look into the geospatial queries in mongo, if you chose to go ahead with mongodb.
Refer here
And here
for the design of the database would depend on the nature of the query (essentially the read and write).
Worth having a look into
Working at Cintric we landed on using elasticsearch. We process billions of location points in real time and provide advanced analytics to our users.
We started with mongoDB and ran into a lot of troubles, eventually leading to a painful migration.
Our stack currently has mobile devices dump location updates into AWS Kinesis, which are then processed by AWS Lambda handlers, and then dumped into elasticsearch. We're able to serve, process and store 300 million requests/month for only a few hundred dollars/month. Analytics for our dashboard add additional cost but for your needs I would highly recommend checking out your options on AWS.

How to proceed with query automation using Import.io

I've successfully created a query with the Extractor tool found in Import.io. It does exactly what I want it to do, however I need to now run this once or twice a day. Is the purpose of Import.io as an API to allow me to build logic such as data storage and schedules tasks (running queries multiple times a day) with my own application or are there ways to scheduled queries and make use of long-term storage of my results completely within the Import.io service?
I'm happy to create a Laravel or Rails app to make requests to the API and store the information elsewhere but if I'm reinventing the wheel by doing so and they provides the means to address this then that is a true time saver.
Thanks for using the new forum! Yes, we have moved this over to Stack Overflow to maximise the community atmosphere.
At the moment, Import does not have the ability to schedule crawls. However, this is something we are going to roll out in the near future.
For the moment, there is the ability to set a Cron job to run when you specify.
Another solution if you are using the free version is to use a CI tool like travis or jenkins to schedule your API scripts.
You can query live the extractors so you don't need to make them run manually every time. This will consume one of your requests from your limit.
The endpoint you can use is:
https://extraction.import.io/query/extractor/extractor_id?_apikey=apikey&url=url
Unfortunately the script will not be a very simple one since most websites have very different respond structures towards import.io and as you may already know, the premium version of the tool provides now with scheduling capabilities.

Append Data in a Big Query Table Automatically from Cloud Storage

I have some excel files that i want to append to a Big Query table. After doing that i want to append data to the same Big Query table using cloud storage automatically thru a scheduler which runs 4 times a day. How to do this. Please keep in mind i am not a developer. I just know SQL and Big Query.
I suggest write a simple script and host it Google App Engine, and register a bucket notification and hook it with that GAE endpoint.
Whenever a file is upload to the bucket, GAE will be notificated, and invoke your script, which then gets the file, extract the info from it, and append to the table you specify in Big Query.
https://cloud.google.com/storage/docs/object-change-notification#_Watching