I've had a WCF Data service published for about 2 months. It's 100% been hacked already. I even noticed the service published on twitter!
Luckily my site was under development and the user entity was only about 80 beta testers.
Still this is a pretty big problem. With the power of E.F. Navigation properties anyone can easily write a script to download all my user data and my valuable domain data that no-one else has. I want to provide non-authenticated access and do things like:
Limit what columns get exposed (e.g. a users emails)
Limit number of requests possible per day (e.g. 10 per request host address)
Be notified when someone is misusing the service
Limit the results set and expand options on different entity sets
Stuff I haven't yet thought about
Does this make sense or should I drop WCF Data Services - which in theory sounded great, but now that I've got experience with them I'm wondering if they are just good for development and not production (they're kind of fatter than I was expecting).
Thoughts that go beyond my knowledge and suggestions here will be greatly appreciated.
Also posting any links to thorough blog post examples or video presentation that cover ground would be excellent!
I think you need to implement some authentication. There is no other way I can think of to "lock down" a web service. This is one of the advantages of WCF -- it makes implementing complex authentication easy.
On my WCF service, I require a UserContext object, simply comprised of two strings, username and password.
Every method on the service requires that context, and if I haven't added the username/password to the database, it denies the request.
This also makes it simple to track who is abusing the service, as you will have their username/password tied to every request.
You should also run it over SSL so other users' credentials will not be easily compromised.
1 - WCF Data Services currently doesn't allow you to easily filter columns on per request basis. You could have two EF models (one "public", and one "private") and expose them as two services. The public one accessible to anybody, the private one behind full auth.
2 - This you will have to implement yourself. But for this to work you need some way to identify the user. So it's pretty close to authentication (Even if it doesn't require password or something like that). There's a series of posts about auth over WCF Data Services here: http://blogs.msdn.com/b/astoriateam/archive/tags/authentication/
3 - If you can identify the user as per #2, you can for example count the number or frequence or requests he/she makes and setup a notification based on that. Again the techniques used for auth should provide you the right hooks.
4 - This is reasonably simple. WCF Data Service allows you to set hard limit on the size of the response (DataServiceConfiguration.MaxResultsPerCollection) or a soft limit, which means paging. Paging is usually better, since it limits the size of a single response but still allows clients to get all the data with multiple requests. This can be done through DataServiceConfiguration.SetEntitySetPageSize. The exand behavior can be limited by usage of DataServiceConfiguration.MaxExpandCount and MaxExpandDepth properties.
Some other possible techniques to use
Query interceptors (http://msdn.microsoft.com/en-us/library/dd744842.aspx) - this allows you to filter rows on per request bases. Typically used to limit rows based on the user making the request (note that this only allows you to filter rows, not columns).
Service operations - if you define a service operation which returns IQueryable the client can still compose queries on top of it, but it gives you the ability to filter the data before the query is applied. Or you can make certain pieces of information accessible only through service operations (not as easy to use and not queryable, but it gives you full control). (http://msdn.microsoft.com/en-us/library/cc668788.aspx)
Related
At my work, I have a task to search and find solutions to implement the ABAC authorization in our microservices organized in a monorepo. We have some products and we use the concept of realms to organize the different client's data in the same database. Here our requirements are likely:
An user, which is a manager of his company, can only see data from your company and from your employees.
The same company can have N places, where each can have a manager. The manager of each place can only see the data from there.
First I thought to build some code to be used in every router of every API to verify the authorization and allow or deny the request. Something like this:
The other thing I thought was to create an API instead of a lib.
So, based on this question, I discovered that ABAC can be externalized from the apps (APIs) and make a lot of sense to me, see the image below.
But then I have some questions.
Is bad to do what I thought in the first image or in the second?
How the PDP will know what the user wants to do? Based on the route he is calling? But with this approach, the single responsibility will be hurt as the PDP needs to internalize (step 2) what other apps do, right?
The PIP needs to call the database for the PDP validates the authorization. So this can be slow as the same query will be done 2x, one for checking the policy and the other inside the service with business logic.
The modern way of doing this is by decoupling your policy and code - i.e. having a seperate microservice for Authorization - here's a part in a talk I gave at OWASP DevSlop about it.
You'd want you code in the middleware to be as simple as possible - basically just querying the Authorization microservice. That service basically becomes your PDP (in XACML terms). This is true for both monolith and microservices (the assumption is you'll end up having more microservices next to your monolith anyhow).
To implement the Authorization microservice / PDP you can use something like OPA (OpenPolicyAgent.org) and then use OPAL as a PAP and manager for PIPs. (Full disclosure I'm a contributor to both OPA and OPAL)
The query to the PDP should include what the user is doing (but not what the rules are). You can do this based on the Route (common when doing service-mesh), but often it's better to define a resource/action layout which becomes part of the query and is independent directly of the application route. Take a look at the Permit.io Policy-Editor which does exactly that kind of mapping. (Permit also uses both OPA and OPAL internally; Full disclosure I'm one of the founders of Permit.io )
Very good point. You don't want your PDP to constantly be querying other sources per incoming query to it (though its okay if you do it for a few edge cases) - What you want is to load data gradually in the background in an asynchronous fashion. Ideally in an event-driven fashion (i.e. have events propagate in realtime from the data sources into the PDP). This is exactly what you can do with OPAL.
We have a back end API, running ASP.Net Core, with two front ends: A SPA web site (Vuejs) and a progressive web page (for mobile users). The front ends are basically only client code and all services are on different domains. We don't use cookies as authentication uses bearer tokens.
We've been playing with Application Insights for monitoring, but as the documentation is not very descriptive for our situations, I would like to get some more inputs for what is the best strategy and possibilities for:
Tracking users and metrics without cookies from e.g. the button click in the applications to the server call, Entity Framework/SQL query (I see that this is currently not supported, How to enable dependency tracking with Application Insights in an Asp.Net Core project), processing data and presentation of the result on the client.
Separating calls from mobile and standard web in an easy manner in Application Insights queries. Any way to show this in the standard charts that show up initially would be beneficial.
Making sure that our strategy will also fit in situations where other external clients will access the API, and we should be able to identify these easily, and see how much load they are creating for the system.
Doing all of the above with the least amount of code.
this might be worthy of several independent questions if you want specifics on any of them. (and generally your last bullet is always implied, isn't it? :))
What have you tried so far? most of the "best way for you" kinds of things are going to be opinions though.
For general answers:
re: tracking users...
If you're already doing user info/auth for other purposes, you'd just set the various context.user.* fields with the info you have on the incoming request's telemetry context. all other telemetry that occurs using that same telemetry context would then inerit whatever user info you already have.
re: separating calls from mobile and standard...
if you're already doing this as different services/domains, and you are already using the same instrumentation key for both places, then the domain/host info of pageviews or requests is already there, you can filter/group on this in the portal or make custom queries in the analytics portal to analyze that way. if you know which site it is regardless of the host, you could add that as custom properties in the telemetry context, you could also do that to avoid dealing with host info.
re: external callers via an api
similarly, if you're already exposing an api and using auth, you should (ideally) already know who the inbound callers are, and you can set that info in custom properties as well.
In general, custom properties (string:string key value pairs) and custom metrics (string:double key value pairs) are your friends. you can set them on contexts so all the events generated in that context inherit the same properties, you can explicitly set them on individual TrackEvent (or any of the other Track* calls) to send specific properties/metrics with any single event.
You can also use telemetry initializers to augment or filter any telemetry that's being generated automatically (like requests or dependencies on the server side, or page views and ajax dependencies client side)
I have been looking into various different APIs which can provide my the weather data I need in JSON format. A lot of these API's have certain limits such as: in order to get more requests per minute, you need to pay more money per month so that your app can make more API requests.
However, a lot of these API's also have free account which five you limited access to them.
So what I was thinking is, wouldn't it be possible for a developer to just make lots of different developer accounts with an API provider and then just make lots of different API keys?
That way, they wouldn't have to pay anything as they could stick with the free accounts. Whenever one of the API keys has reached the maximum daily request calls, the developer could just put a switch statement in their code which gets their software to use a different API key.
I see no reason why this wouldn't work from a technical point of view... but, is such a thing allowed?
Thanks, Dan.
This would technically be possible, and it happens.
It is also probably against the service's terms, a good reason for the service to ban all your sock puppet accounts, and perhaps even illegal.
If the service that offers the API has spent time and money implementing a per-developer limit for their API, they have almost certainly enforced that in their terms of service, and you would be wise to respect those.
(relevant xkcd)
I have a website that revolves around transactions between two users. Each user needs to agree to the same terms. If I want an API so other websites can implement this into their own website, then I want to make sure that the other websites cannot mess with the process by including more fields in between or things that are irrelevant to my application. Is this possible?
If I was to implement such a thing, I would allow other websites to use tokens/URLs/widgets that would link them to my website. So, for example, website X wants to use my service to agree user A and B on the same terms. Their page will have an embedded form/frame which would be generated from my website and user B will also receive an email with link to my website's page (or a page of website X with a form/frame generated from my server).
Consider how different sites use eBay to enable users to pay. You buy everything on the site but when you are paying, either you are taken to ebay page and come back after payment, or the website has a small form/frame that is directly linked to ebay.
But this is my solution, one way of doing it. Hope this helps.
It depends on how your API is implemented. It takes considerably more work, thought, and engineering to build an API that can literally take any kind of data or to build an API that can take additional, named, key/value pairs as fields.
If you have implemented your API in this manner, then it's quite possible that users of this API could use it to extend functionality or build something slightly different by passing in additional data.
However, if your API is built to where specific values must be passed and these fields are required, then it becomes much more difficult for your API to be used in a manner that differs from what you originally intended.
For example, Google has many different API's for different purposes, and each API has a very specific number of required parameters that a developer must use in order to make a successful HTTP request. While the goal of these API's are to allow developers to extend functionality, they do allow access to only very specific pieces of data.
Lastly, you can use authentication to prevent unauthorized access to your API. The specific implementation details depend largely on the platform you're working with as well as how the API will be used. For instance, if users must login to use services provided by your API, then a form of OAuth may suffice. However, if other servers will consume your API, then the authorization will have to take place in the HTTP headers.
For more information on API best practices, see 7 Rules of Thumb When You Build an API, and a slideshow from a Google Engineer titled How to Design a Good API and Why That Matters.
At first brush, OData seems like it will only appeal to "open" databases, and would never be used in envrionments where security is needed, especially with financial or government clients.
Is this the correct perspective to have with the current version of OData/WCF? If not, can you share whatever I would need to change that perspective?
Update
Examples of current concerns include:
Increased possibility of SQL Injection
Additional Data Validation (complicating business logic)
Unauthorized Access to data
Increased ability to do a "raw dump" of data
by this I mean it is easier to use OData to get to HR data, then it is to screen-scrape a traditional ASP.net page
Update 2
Is it also possible for me to enforce business rules? For example a properly formatted SSN, Phone, or Zip. How about ensuring all fields are filled in?
oData is just a way to expose structured data through an open API. It does not requre any particular form of security; it's possible to have fully open datasets (like a wiki database) or world-readable-but-private-writeable (such as a database of votes by members of Congress, so anyone can read it but only you can update it). It also supports more complex security structures (such as a video rental store allowing customers to query only their own history).
Regarding your specific concerns:
SQL injection is simply not possible if you're using the ADO.NET Data Services as your oData server. The incoming oData request is parsed and then passed to an IQueryable, which properly escapes all values.
The business layer / data layer validation remains the same. oData just provides an API for the data layer (or business layer, if it looks databaseish).
Unauthorized access to data isn't possible unless you allow it. The default for ADO.NET Data Services is to not allow any access (even read-only access), so that forces you to explicitly authorize all access.
The "raw dump" scenario is exactly why oData is so useful! It's a protocol that allows efficient querying of data sources over the web, instead of depending on fragile screen scraping "solutions". If you don't want someone to get the information, don't publish it.
Right now (to my knowledge), ADO.NET Data Services is the only oData provider available, and it's secure by default. I suppose that someone else could write an oData provider that wasn't secure by default or allowed SQL injection, but that would be foolish.
Also, remember that oData is completely divorced from the concept of authentication. It's up to you to use whatever authentication makes sense for your API. There's a great recent series of blog posts from the WCF team that address how oData works with various forms of authentication.
What's your business case for using OData? OData primarily exists to expose your data in a platform agnostic manner... so that .NET, Java, Php, Python, REST, etc clients can all access your data. Is that your use-case?
Or are you trying to expose your data via a service layer (kind of an SOA approach) so that your clients (which you control) are better decoupled from your data sources. In that case, OData may not be the right solution. I looked at OData as part of a data service layer and decided it is too slow. I'm now looking at Devforce which implements service-based access for Entity Framework models (via their BOS service)... full CRUD operations including LINQ to service-hosted model.
Security is to you desired level is possible either view OData or via DevForce. Pick the correct data-remoting solution, then research the correct security implementation.
Sure you can use it in government solution. OData is just a way of accessing data, it has nothing to do with making the information secure. You have to implement the security at the transport level (SSL) and not at the application level (provide login and password to application).
There are many ways to go about this. One example is if you are using SSL, you can force your client to provide a client certificate and have that do the authentication. Once the person has authentication, you can use your application to limit what they can see (maybe they can only see their client information, so all queries automatically limit the person to seeing that.)