How to design API's to store client files on external storage system - amazon-s3

Suppose we have a client application(mobile app or web app etc) working with a our back-end services.T he client wants to store some object(like profile photo or a PDF etc). Now suppose that we want to store this data on the servers of a third party storage provider(like aws S3 etc). What is the best practice to do so?
To achieve this goal I have two ideas:
Our client sends the object directly to the back-end (maybe in a base64 format).
The back-end software takes care of storing this data on the cloud and saving the path of that data on to its own database for later usage, it then informs the client that the operation has succeeded or not.
The client informs the back-end that it wants to store some file. Our software generates a unique Secure URL with limited expiration time and sends it to the client. The client can now save that specific file to the cloud and then inform the back-end. Then the back-end makes sure the file is saved on updates its own database for later usage.
Recently I have encounter such a situation and don't know which method to take. Personally I think the second method is more elegant and idiomatic but it comes at the cost of more complexity and network calls. What method should I choose and If there is other patterns for implementing this kind of task what are they?

Related

OAuth flow for third party API access

There is a lot of information on the web about OAuth 2, its different types of flows and where/how to use them. I find that most of these resources discuss authenticating a user for an application, but I am struggling to understand what the best/correct approach would be when consuming third party APIs (i.e. when our own API is the "middleman" between a user and their data in a third party API).
With the help of an example scenario and some diagrams, I would be really grateful for advice/opinions on how I should properly implement integration with third party APIs and what the pros & cons of each approach are.
Starting point
As a starting point, suppose we have a web app, structured as follows:
Frontend - SPA (Angular), hosted with AWS S3 + Cloudfront
Backend - Node, running as stateless AWS lambda functions with AWS API Gateway
Auth0 for handling auth/signin etc. The frontend uses an Implicit OAuth2 flow to obtain access_tokens, which are stored in local storage and included as a header in all requests to the backend.
Potentially also native mobile app(s), consuming the same backend API.
Goal
Now suppose that we wish to add integration with Google Sheets. The new feature would allow users to use their own Google Sheets (i.e. stored in their own Google account) as a data source, with the app having read&write access to the sheet. There may be other integrations in the future, so I am assuming that other APIs would require a similar process.
Problem statement
In addition to the existing OAuth process (which allows users to sign in to the "MyApp" frontend and communicate with the "MyApp API"), there needs to be an additional OAuth process for users to connect MyApp to the third party Google Sheets API.
The documentation has two Quickstart examples, but neither seems to quite fit my needs:
Browser - https://developers.google.com/sheets/api/quickstart/js
Node.js (console app) - https://developers.google.com/sheets/api/quickstart/nodejs
The data from the third-party (Google) API is one of potentially several integration points, so intuitively it seems more logical (and more secure) that all communication with the Google Sheets API should happen in the MyApp API, and not on the frontend/client side. The MyApp API would fetch data, process/manipulate/format it in some way and then present it for display in the frontend or mobile apps.
We require access to each user's own data, so the Client Credentials flow is not suitable. I am focussing on the Implicit or Authorization Grant workflows.
Important note: The trickiness seems to come from the fact that the MyApp API is stateless, so there is no long-lived session in which to store tokens. On that basis, it seems like tokens need to be stored either in the frontend (e.g. local storage/cookies etc) or in a backend database.
Below is my interpretation of two possible approaches. I'd appreciate thoughts/corrections.
Option 1: Implicit flow - tokens stored FE, passed along to BE which then makes requests to Google
Pros:
Allows access to user's own data
Simpler flow, access_token retrieved immediately without needing the code step
Less steps to implement between initial sign-in process and actually obtaining data
No need for a backend database, can resend the token with each request
Cons:
Frontend (browser) has access to Google access_token which seems unnecessary and is a potential security concern
It seems like a strange process to pass the access_token from FE to BE, purely to allow BE to then use that token to make another request
I'm not sure how we would refresh/renew tokens since I understand that storing refresh_tokens on the client is bad practice. It would not be a good user experience if the user had to frequently sign in to reconnect their account
Option 2: Authorization Code Flow - all communication with Google via BE, tokens stored in BE database
Pros:
Allows access to user's own data
Other than the code-request / consent page, all communication with Google is implemented backend, so the tokens are not accessible on the client
Client secret can be used from the BE
Cons:
More complex flow, requires extra steps
Given that the BE is stateless, it's not clear how best to store the tokens. It seems like it would require storing them in a database which is extra complication and seems like it would have security implications - how would you properly secure / encrypt the access_token/refresh_tokens in said database?
Conclusion
Given that the data processing is to happen on the backend, option 2 seems slightly more suitable because the sensitive tokens can be hidden from the frontend app, and several clients (web frontend, mobile apps) have less obligation to be involved in the process with the exception of the initial sign in / user consent. However I’m not sure whether having a database full of user auth tokens is a good idea or how I could properly protect this database.
The good news is that both options are perfectly valid and equally secure. The concern about a short-lived Access Token being in the browser isn't an issue. Equally, if you only held the tokens on the BE, then you would need to implement your own client authentiation/session/JWT blah blah, which presents the same attack surface.
I've done both, and am currently migrating from BE to FE. In my case the reason is that everything I need to do, I can do on the FE, so I end up with no BE at all. This isn't strictly true since I do some onboarding/payment with the BE, but that's about all.
So the best approach depends on factors beyond those in your question, such as the nature of app, what the BE cost is and how important that is, what your devops skillsets look like for maintaining two environments, to what extent a BE is required anyway, vs being completely optional.

User management across multiple stateless API applications

We want to make our API stateless.
Right now, the tokens for users are provided via 3rd party, upon login, and stored in the application memory.
As long as the token is in use, it is valid. Until it is idle for a configurable amount of time.
On 3rd party's side (the token provider) this token is valid for much longer (For example: A month on their side regardless of usage VS. 20 minutes of idle time on ours).
Meaning, each usage of this token updates the timestamp in the application memory.
As part of making our API stateless I've encountered a problem:
Assuming we will have more than one application and a load balancer,
how do i maintain the user management between 2 applications?
I know how to restore the users profile/details if the token isn't in the application memory (but still valid on 3rd party side), but i can't know the timestamp of it's last usage.
I think that i either have to sync the cache between my applications, or manage the users on another service.
I'm hoping that my explanation is clear enough.
My questions are:
What is the best practice for this issue?
Where can i find useful information regarding user management across multiple applications? I think that i'm struggling with key words in this case.
Thanks in advance
From the architectural point of view separate user manager is preferable. In this case you will never turn to your 3rd party token provider directly but do it via your own manager that stores tokens and the timestamps. This however will probably require a serious refactoring.
So, other solution that I can offer is probably using tool that provides sharing memory among processes and machines. For example you can use Hazelcast. It is very easy to start tool with very user-friendly API. If for example you store mapping from token to timestamp in map now the only thing you have to change is the place where you create map. Use the Hazelcast map factory instead of new HashMap<>() and your tokens will be magically distributed among your applications.

How would you build a p2p twitter clone with GunDB?

GunDB is supposed to support peer-to-peer data access, so I'm trying to better understand how this would work. If I were to build a twitter clone, what would the high-level architecture look like if I wanted each user to store their own tweets on their own server?
Answer by the author of GunDB:
Every tweet is cryptographically signed by the user (in fact, with SEA, all data by that user is signed automatically, so you don't have to worry/do anything!) which means that no matter where it is stored (by the user, by a server, by another user), it cannot be tampered with.
By default with GUN (and you could modify this if you wanted, but you'd be adding a lot of unnecessary complexity, which I don't recommend), the tweets are stored by whoever subscribes to that data. That would mean: (A) the user who made the tweet would store it (B) a server peer, which is subscribed to everything, stores it, (C) and friends/viewers/followers/audience who reads the tweet also stores it.
Realistically, most users probably are using a browser to access their app, so you wouldn't want to assume this is reliable - but you can Electron-ify (or something similar) your app so that users can install it on their desktop, in which case, they would be their own server. Then you (or other users) can also deploy the app to AWS/Heroku/DigitalOcean/whatever and ALSO store data as a backup there (like in the case of B, if you add your S3 credentials, it backs data up to S3 - ideally this would be IPFS instead, or similar), etc.

GunDB user authentication and data storage among users

I have been following your project for quite some time now and am intrigued by the functionality of gunDB where it doesn't require a database in between and keeps security in check.
However, I've got some questions about GunDB which I've been thinking about for quite some time now before I can give Gun a go with a project I'm currently working on. In this project it is necessary that data is safe but should also be shareable once a group has been setup. The project is a mobile app project and ata is mostly stored on the device in a SQLite database.
I have been looking into Gun as it allows for better usage of the app in sense of collaboration. The questions I have, however, are:
User authentication
How is user authentication handled through private keys? So how can a user "register" with, for example, a username and password to login to the service.
For authentication I am currently using Firebase where it is possible to use username/password authentication and I would like to know how Gun approaches this case and how it's implemented.
Data storage
In the documentation and on the website it's stated that data is stored locally with every client and can be stored on a "node" or server using either a local hard drive or the Amazon S3 storage option.
What I am curious about is what data is actually stored at the client? Is this only the data he/she has access to or is this a copy of the whole dataset where the client can only access whatever he/she is granted to have access to?
Maintaining your data
When I've got a production system running with a lot of data, how will I be able to manage my data flows and/or help out my clients with issues they have in the system?
In other words, how can I make sure I can keep up with the system if I want to throw in an update and/or service my clients with data issues.
My main concern is the ability to synchronize their local storage correctly.
Those are all my questions for now.
Thank you very much in advance for providing some clarity on these subjects.
Best regards,
(Answered by Mark Nadal on Github: https://github.com/amark/gun/issues/398#issuecomment-320418285)
#sleever great to hear from you! Thanks for finally jumping into the discussion! :D
User Authentication,
this is currently in alpha. If you haven't already seen these links, check them out:
https://github.com/amark/gun/wiki/auth
http://gun.js.org/explainers/data/security.html
https://github.com/amark/gun/blob/master/sea.js#L23-L43
https://github.com/BrockAtkinson/login-riot-gun
If you have already, would love to either (A) get you to alpha test and help push things forward or (B) hear any specific questions you have about it. This thread is also a more at length discussion about alternative security API ideas: #321 .
Data storage.
Browser peers by default store the data that they subscribe to, not the full data set. You could ask it to store everything, but the browser wouldn't like that. Meanwhile NodeJS peers, especially if hooked up to S3 or others, would store all data and act as a backup.
Does this make data insecure? No, encryption should keep it secure, even if anybody/everybody stores it, the encryption makes it safe. (See [insert link to (1)] for more information).
Maintenance.
You would service your customers by deploying an update to your app code. It would not be ideal for your customers if you could meddle with their data directly. If they wanted you to do that, my recommendation would be that they change their password, give the new password to you, and you login and make any necessary changes. Why? Because if you have admin access to their data, their privacy is fundamentally violated.

Website and Native app user authorization

I wish to create a functionality that is very similar to facebook or pokerstars if you have used them before. Basically the apps require the user to login and their information can be accessed from both browsers and native and web apps.
How can I go about achieving this? Please advice on what services to research on to accomplish this. To my current understanding. I would be creating the website in html and php and creating a webservice using RESTful protocols and hosting them on amazon aws servers. I can then connect to these servers in the native apps? I am not very clear on how the native apps will interact with the servers
If you know of any particular protocol or a better server hosting service please let me know.
If I'm interpreting your question correctly, you are looking for something like this:
The user starts either your browser app or your native app (perhaps a mobile app)
Since the user does not have an account yet, you present them with the appropriate dialog to create said account.
You then ask the "Identity Service" to create a profile for that user
The identity service returns a token for access
This is something we do in the mobile network industry all the time. Technically, we have TAC/ACS or HSS profile services, but in either case, it's the same thing -- a dedicated service and network process that:
Accepts connections from various clients (web, mobile, desktop...)
Has various primitives along the database CRUD (Create, Read, Update, Delete) model
Answers requests the database
If you want a pre-configured solution, you could just use any networked database with a RESTstyle connector for example (MongoDB maybe?) But you could also just through this in a process that talks to a NoSQL or SQLLite database. The end result is the same.
For commercial solutions, I might like at OpenStack as you can run your code on it and they have identity brokers you might be able to CoOpt.
Personally, I'd just have a datastore running on a cloud somewhere like Amazon's EC2 which answers RESTful requests such as:
Create a user with a given profile set, return a unique token
Delete a user given a token
Update elements of the profile for a given token
I'm leaving out the necessary things like security here, but you get the idea.
This also has the advantage that you can have a single identity service for all of your applications/application services. The specifics for a given application element are just sub-fields in the profile. This gives you, not only a common identity broker for web, desktop and mobile, but a single-sign-on for all your applications. The user signs in once and is authenticated for everything you have. Moving from site to site, now just became seamless.
Lastly, you place your identity management, backup, security token management, etc OUTSIDE of your application. If you later want to add Google Authenticator for second-factor authentication, you don't have to add it to every application you have.
I should also add that you don't want to keep the identity database on the direct internet connection point. Someone could make your life difficult and get ahold it later on. Rather, you want your identity server to have a private link to it. Then do something like this:
When the account is created, don't store passwords, store hashes -- much safer
Have your application (web or otherwise) compute a key as the login
In this case, the user might enter a username and password, but the application or website would convert it into a token. THAT is what you send across.
Next, using that token (and suitable security magic), use THAT as the owner key
Send that key to the datastore and retrieve any needed values
Encrypt them back into a blob with the token
Send the block
THe application decrypts the blob to get at values
Why do we do this?
First, if someone were to try to get at your identity database, there's nothing useful. It contains only opaque tokens for logins, and blobs of encrypted data. Go ahead -- take the database. We don't care.
Second, sniffing the transport gets an attacker nothing -- again, it's all encrypted blobs.
This means later on, when you have five applications using the broker, and someone hacks the network and steals the database, you don't care, because your users never gave out logins and passwords in the first place, and even if they did, the data itself is garbage to anyone without the user key.
Does this help?