GraphQL Database design patterns - sql

I am researching on the technologies available to build a social network website. The front-end portion has been decided, but I am still unsure about the back-end. The back-end will also be handled by an server running NodeJS. I was planning on using something like MongoDB, or even a combination of MongoDB and a relational database, but I do not know what the best practices are.
I stumbled upon an article that talked about using GraphQL with different types of database structures. The writer mentioned SQL, Redis and MongoDB for their user data, but I did not quite understand their sturcture, or if it would work for mine.
Basically I would like to have users to sign up to my page and add descriptions about themselves, have friends, join groups, upload media (pictures and videos), etc. A pretty basic social network setup, but ultimately the question narrows down to:
"What database structures are best for what?"
Should the SQL handle: userID, email, password
MongoDB: about, description, posts
Redis: cache for something like posts feed
And how would you handle media uploads? SQL, MongoDB? I have not found any useful articles that talk about using different database structures and most of the other StackOverflow questions feel outdated. They either favor one or another, but not a combination.

TL;DR: There aren't really any particular databases you "should use for GraphQL." Just use whatever is best suited for your data and requirements.
I'll give you a rough idea how we use it at work. We have an app that lets users upload songs and videos for us to distribute to digital streaming platforms, e.g. Spotify and such.
We use Postgres to handle all of the app's "transactional data." This is user account info, their song/video metadata, where to send the money, other stuff for the app, etc. All these data are a mere few-hundred MB.
Multimedia uploads go to AWS S3, and the location of each file is stored in Postgres.
The "analytical data," regarding how well each song did on each platform, e.g. number of streams and money made in each region, is stored in Google BigQuery. There is much, much more of these data and Google BigQuery is more equipped for this size.
Of course, the question of which data store one should use is highly case-dependent. GraphQL didn't really influence any of these decisions. The nice thing about GraphQL is that the resolvers let you pull data from anywhere and conglomerate it all into a single API response. By using GraphQL you should feel liberated to use a variety of databases, each well tailored to the type of data it handles.

Well, GraphQL as methodology has nothing related to dbs - it's about how to organize API level of application. It means you should configure graphql schema (tables or entities, own fields, ect.) and set up resolver to fill out response model.
In case if you don't want to do it - you can use solution from the box, like NReco.GraphQL

Related

How can you document a REST API usage/consumption?

We have a medium size app (100+ SQL tables), and we often need to integrate it with partner APIs (with our system as a client/consumer). Process of designing such integration is non-trivial:
We often need to map columns in our database to fields in requests to partner API.
Some fields in requests to partner API must be constant, or conditional
In rare occasions output from one API response becomes an input to another API request
There are many resources on the web to document REST APIs - there are specific formats for that (Swagger, RAML, etc.). These formats allow efficient generation of client code and human-readable documentation. However these formats are not very helpful for describing how your app integrates with an API. We create lengthy Microsoft Word documents which contain more or less a copy of partner API methods with comments how every individual field should be used. Such solution seems sub-optimal.
Googling for better options did not yield many results, namely Swaggerhub seems to have "comments" functionality which seems to target the problem above and pretty much that's all.
Question: are there some tools, formats, workflows, ideas, etc. which facilitate designing and documenting API integrations described above?
I dont know which language you use, but i work with ApiDoc
https://apidocjs.com/
He is perfect to generate an API REST doc with comment in NodeJS, he can be used with many language

What do the Google Analytics related API's buy me that the Google Analytics UI cannot achieve?

Long time ago, I took and passed the Google Analytics IQ certification test. At the time, I don't believe there were such things as Core Reporting API, Management API, and Metadata API (and probably some other Google Analytics related API's that I don't know about). Now that I am going through the Google Analytics IQ certification training course again (provided by Google, presented by Justin Curtoni?? I believe that's his name), I found that they now have Core Reporting API, Management API, and Metadata API.
I am a computer programmer by trade; so, I have no problem with programming using these API's. However, what I don't understand is, what do these API's buy me that the Google Analytics UI cannot offer? There is no reason to write a program that utilizes these API's simply because I can do it. To me, the existing Google Analytics UI has a lot of tools, reports, and other features that quite extensive. I am hoping that some of you can help me see something that I am probably missing.
The APIs are primarily for programmatic access. For example, if you need to create 1000 accounts all with the same property/view structure and then maybe add a few view filters to each of those accounts, you'll probably want to use the Management API. Doing that by hand would be a nightmare.
The same thing is true for the reporting API. Maybe you want to set up task that runs every monday morning and reports on the previous weeks data. And maybe you want to display that data on an internal dashboard for your company using some fancy charting library. You'd have to use the API to get the data.
Dashboards (executive summaries; managers often want nice visualizations instead of boring drill-downs)
Custom reports for user groups that do not have a Google Account or are not supposed to have access to full reports (e.g. Affiliates)
advanced filtering and aggregation (GA report cannot do everything)
You can combine analytics data with external data (e.g. you are not allowed to store personally identifiable information within GA; but you might store a custom key that allows you to link analytics data to customer data from you CRM or fulfillment system)
Machine-to-machine communication; I once did tracking for an airline that needed trend data on what people where searching for and what they where actually booking; that data was used to allocate/withdraw resources from busy/lame flights, and part of this was done by hooking up GA to their backend system
Take a look at the GA Partner Page. I would say the primary reason is to "liberate" GA Data from outside of GA itself. As Eike mentions, you can create dashboards and combine this data with other sources for a complete "View" of your online presence.
HI I guess there is no definite answer. Here are some things you can do with the APIs:
Automating AdWords CRO based on keyword ad and campaign performance.
Scoring leads based on Analytics data (Engagement with different items) and external data from a CRM.
Collecting unsampled data using multiple daily queries
Filtering using several dimension.
Tracking conversions for periods longer than supported by AdWords.
Looking at a funnel via segments
Analyzing funnels with non-linear structures
Create more robust alerts
Export data to BigQuery and analyse it together with data from other systems.
Create Machine learning apps for behavioural customizing your site.
Create a dashboard with data from multiple views
Use product recommendation to implements "better together" in an online store.
Automate creation of accounts and properties + their integration in a Hosting provider's console.
Cheers!!

Can BigQuery's browser interface be white-labeled?

Like most people, we're pretty impressed with BigQuery. We're willing to put up with it being based on proprietary "Dremel" in exchange for not having to configure a ton of servers in our LAN, on EC2, or anywhere else.
The REST API is excellent, and we're incorporating that into our apps, but we still find ourselves using the BQ Browser interface as well. We'd like to incorporate something like a 'generic SQL window' into our app, without divulging that the backend is BQ or that data is stored in Google at all, for that matter. Does Google provide a way to use their BQ browser tool in a white-label manner?
Note also, that even extending access to the existing browser tool is problematic. It relies on user-accounts existing in one's own domain - something that can't be done, in our case, with a customer's email address. The REST interface solves this with service-level accounts, but that doesn't get you to the SQL window/browser tool.
If the folks at Google are listening (and I know that you are), consider the benefits of white-labeling the browser tool: I think you'd find a lot of software companies integrating it into their suites of products and, then, running circles around any Hadoop/CDH/EMR/Impala/Hive combination.
So, to summarize: How does a software developer import or emulate the BQ browser tool (with all it's autocompletes, query histories, etc..) in their own web-based app?
The initial version of the BigQuery web interface was considered just an 'example' UI that anyone could create themselves. It uses only the public BigQuery API to talk to BigQuery.
There are a couple of Google-internal things we've added since then, such as the current design of 'saved queries', and an auth shortcut so that users don't have to explicitly grant permission to the UI to access BigQuery data. But it is still mostly plain-ol-javascript talking to BigQuery via the REST API the same way anybody else does.
The javascript is obfuscated, however, but my understanding is that this is just for compression purposes so that it downloads more quickly.
The SQL highlighting is done by CodeMirror with special configuration for the BigQuery SQL variant.
I'll talk to the other members of the BigQuery team about open-sourcing the javascript code in the Web UI. It may be difficult to do at this point, but it doesn't hurt to have a conversation about it. I'll bring this up with the team and update this thread. The most likely answer will be "We'll think about it", but hopefully we can also think about it and start working on it too :-)
Let me know if that sounds like it would meet your needs. It might not solve the auth problems you mention, since your users likely won't have BigQuery accounts, but you may be able to solve that by proxying oauth2 access tokens.

How to proliferate access permission to Javascript MVC apps

I recently finished one of my first AgilityJS projects, which is a web-based file browser that lets you create and manage folders and files, and navigate around the folder tree. I followed the various AgilityJS recommendations regarding the design and ended up with all my HTML and Javascript in a single Javascript file.
Now, I would like to provide a "read-only" version of this app which does not have the ability to add/edit/remove files and folders. I'd like to have 2 user types on the website, one type which can only read the files and folders, and another user type who can administer.
My question is, how do I proliferate these permission differences to my AgilityJS app? I know how to secure my endpoints and operations on the server side, but I'm wonder about the best way to do this on the client side. Should I create a separate version of the app with a limited set of functionality? Should I simply hide certain buttons/features? Are there theories, frameworks, etc.? which deal with this issue? Any point in the right direction would be helpful.
LOL - probably one could write books about that topic. Some very basic ideas:
I would start with the philosophical debate according to MVC. There are people argue with the help of MVC that any piece of code and also any piece of data model should never be implemented twice. Business logic and model to the server. The opposite view is focussing on serving users at any cost - even if that means to double maintain code or the model for the sake of avoiding extra round trips. The way in between defines a master source for business code and model and makes sure to follow on other places that leading master (the master will be changed first). Take your choice. Your answer to that question results into boundaries for how the user interface can/have to look like for the user.
You need to think by hard about a permissions concept. Looking at Microsoft I would assume that they invested for all their applications a couple of dozens man years to make up the permission concepts. The ideal permission concept very much depends on your application. So it is close to impossible to work this out without knowing at least a very little of your application. However the permission concept has to come up with policies deciding on roles, groups, access rigths, access levels, context driven permissions (eg. based IP address), permissions black or white listing (permissions each user has at creation). An example from Microsoft: http://office.microsoft.com/en-us/windows-sharepoint-services-help/permission-levels-and-permissions-HA010100149.aspx
Data on the client is not secured!!! Whatever you do on the client, be it data hiding, encryption, compression... - if this is done on the client there are ways to read the data (even by disabling the data manipulation) or by reverting those. Somebody can send data to your server, where the client should not even have given an update form could be implemented by hackers. So as soon as you start to implement permissions make sure, that for all data you send to clients users are permitted to read and that you inlcude permissions checking for each time you add/update data to the database.

Design an API for a web service without "selling the farm"?

I'm going to try to phrase this as a generic question.
A company runs a website that has a lot of valuable information on it. This information is queried from an internal private database. So technically, the information in the database is the valuable part.
If this company wished to develop an API that developers could use to access their database of valuable & useful information, what approach should the company take?
It's important to give developers what they need. But it is also important to keep competing websites from essentially using the API to steal everything and essentially steal all traffic from the company's website.
Is there was some way the API could be used in a way that drives traffic back to the original company's website somehow? Something that gives users a reason to keep going there.
This is a design consideration that my company is struggling with that I can imagine other web-based services have come across before.
Institute API keys - don't make it public. Maybe make the signup process more complex than "anyone with an e-mail address".
Rate limit the API based on keys. If you're running more than X requests a minute, you're likely mining the database.
Don't provide a "fetch everything" API. Make the users know something to get information on it. Don't reveal what you know.
I've seen a lot of companies giving out API keys and stating a TOS that all developers must adhere to. For example, any page that uses data from the API must include your logo and a link back to your website. If any developer is found breaking the rules, the API key can be cancelled and your data is safe again.
Who is meant to use the API?
A good general method of solving this problem is to limit access to the data to end users (rather than allow applications or developers at it). Provide applications and users with identification, each, and make sure that to access a subset of the data, a combination of both user and application key is required.
Following this pattern, each user will have access to a very limited subset of the data (presumably, the data that they require for their own specific use), and you can put measures in place to enforce this. Any attempts at data-mining will become obvious.
This type of approach meshes well with capability-type security models on the server side.