How to upload a file in WCF along with identifying credentials? - wcf

I've got an issue with WCF, streaming, and security that isn't the biggest deal but I wanted to get people's thoughts on how I could get around it.
I need to allow clients to upload files to a server, and I'm allowing this by using the transferMode="StreamedRequest" feature of the BasicHttpBinding. When they upload a file, I'd like to transactionally place this file in the file system and update the database with the metadata for the file (I'm actually using Sql Server 2008's FILESTREAM data type, that natively supports this). I'm using WCF Windows Authentication and delegating the Kerberos credentials to SQL Server for all my database authentication.
The problem is that, as the exception I get helpfully notes, "HTTP request streaming cannot be used in conjunction with HTTP authentication." So, for my upload file service, I can't pass the Windows authentication token along with my message call. Even if I weren't using SQL Server logins, I wouldn't even be able to identify my calling client by their Windows credentials.
I've worked around this temporarily by leaving the upload method unsecured, and having it dump the file to a temporary store and return a locator GUID. The client then makes a second call to a secure, non-streaming service, passing the GUID, which uploads the file from the temporary store to the database using Windows authentication.
Obviously, this isn't ideal. From a performance point of view, I'm doing an extra read/write to the disk. From a scalability point of view, there's (in principle, with a load balancer) no guarantee that I hit the same server with the two subsequent calls, meaning that the temporary file store needs to be on a shared location, meaning not a scalable design.
Can anybody think of a better way to deal with this situation? Like I said, it's not the biggest deal, since a) I really don't need to scale this thing out much, there aren't too many users, and b) it's not like these uploads/downloads are getting called a lot. But still, I'd like to know if I'm missing an obvious solution here.
Thanks,
Daniel

Related

How to handle temporary unreachable online api

This is a more general question, so bear with my abstraction of the following problem.
I'm currently developing an application, that is interfacing with a remote server over a public api. The api in question does provide mechanisms for fetching data based on a timestamp (e.g. "get me everything that changed since xxx"). Since the amount of data is quite high, I keep a local copy in a database and check for changes on the remote side every hour.
While this makes the application robust against network problems (remote server in maintenance, network outage, etc.) and enables employees to continue working with the application, there is one big gaping problem:
The api in question also offers write access. E.g. my application can instruct the remote server to create a new object. Currently I'm sending the request via api, and upon success create the object in my local database, too. It will eventually propagate via the hourly data fetching, where my application (ideally) sees that no changes need to be made to the local database.
Now when the api is unreachable, i create the object in my database, and cache the request until the api is reachable again. This has multiple problems:
If the request fails (due to not beforehand validateble errors), I end up with an object in the database which shouldn't even exist. I could delete it, but it seems hard to explain to the user(s) ("something went wrong with the api, we deleted that object again").
The problem especially cascades when depended actions que up. E.g. creating the object, and two more requests for modifying it. When the initial create fails, so will the modifying requests (since the object does not exist on the remote side)
Worst case is deletion - when an object is deleted locally, but will not be deleted on the remote site, I have no way of restoring it (easily).
One might suggest to never create objects locally, and let them propagate only through the hourly data sync. This unfortunately is not an option. If the api is not accessible, it can be for hours. And it is mandatory that employees can continue working with the application (which they cannot when said objects don't exist locally).
So bottom line:
How to handle such a scenario, where the api might not be reachable, but certain requests must be cached locally and repeated when the api is reachable again. Especially how to handle cases where those requests unpredictable fail.

CMIS: cache data on server side

I'm writing a CMIS interface(server) for my application. The server needs to load data from a database to process the request. At the moment I'm loading the same data for every request.
Is there a common way to cache this data. Are cookies supported for each cmis client? Is there an other chance to cache this data?
Thank you
You should not rely on cookies. Several clients and client libraries support them but not all. Cookies can help and you should make use of them, but be prepared for simple clients without cookie support.
Since your data is usually bound to a user, you can build a cache based on the username. But it depends on your repository and your use case what you can and should cache. Repository infos and type definitions are good candidates. But you should be careful with everything else.

Reuploading to EdgeCast or Amazon S3?

I am working on a project in which we are planning to use EdgeCast to store our data. I am concerned about it, because the client wants to upload the image to our server first, and then use curl to upload it to EdgeCast. In this case our servers will be "proxying" the request, doubling the time needed for uploads.
What would you suggest? And is direct uploading risky?
PS the reason I mentioned S3 is because of its similarity to EdgeCast. Hence I assume the same principle will apply.
Yep - Martin's right - usually a good idea when letting users have direct access to storage to have a proxy. EdgeCast supports rsync which will automatically sync content from your server and the EdgeCast storage account. Or you can use our "customer origin" reverse proxy feature for our network to pull content automatically from your servers as its requested by the public. Feel free to contact us at sales#edgecast.com with questions.
Having your server in between the end user and the storage, is probably a good idea. Whenever I let users direct access to storage places, with FTP or SSH, it tends to get really messy. A place where you can upload files, that get accessible from the web, is used for all sorts of things.
Having your server in between you can organise the files uploaded into some rational structure. A folder per date for instance, and perhaps also enforce some strict naming of the files themselves, avoiding URL encoding problems etc.
There is no reason to be concerned about Edgecast. On my opinion, it always makes its best to serve its customers the best way so that the customers have their websites as fast as it's possible and also secured and well-optimized. The whole comparison of Edgecasr vs Amazon look at http://jodihost.com/2014_edgecast_vs_amazon.php

Html5 local datastore, and sync across devices

I am building a full featured web application. Naturally, you can save when you are in 'offline' mode to the local datastore. I want to be able to sync across devices, so people can work on one machine, save, then get on another machine and load their stuff.
The questions are:
1) Is it a bad idea to store json on the server? Why parse the json on the server into model objects when it is just going to be passed back to the (other) client(s) as json?
2) Im not sure if I would want to try a NoSql technology for this. I am not breaking the json down, for now the only relationships in the db would be from a user account to their entries. Other than the user data, the domain model would be a String, which is the json. Advice welcome.
In theory, in the future I might want to do some processing on the server or set up more complicated relationships. In other words, right now I would just be saving the json, but in the future I might want a more traditional relational system. Would NoSQL approach get in the way of this?
3) Are there any security concerns with this? JS injection for example? In theory, for this use case, the user doesn't get to enter anything, at least right now.
Thank you in advance.
EDIT - Thanx for the answers. I chose the answer I did because it went into the most detail on the advantages and disadvantages of NoSql.
JSON on the SERVER
It's not a bad idea at all to store JSON on the server, especially if you go with a noSQL solution like MongoDB or CouchDB. Both use JSON as their native format(MongoDB actually uses BSON but it's quite similar).
noSQL Approach: Assuming CouchDB as the storage engine
Baked in replication and concurrency handling
Very simple Rest API, talk to the data base with HTTP.
Store data as JSON natively and not in blobs or text fields
Powerful View/Query engine that will allow you to continue to grow the complexity of your documents
Offline Mode. You can talk to CouchDb directly using javascript and have the entire app continue to run on the client if the internet isn't available.
Security
Make sure you're parsing the JSON documents with the browers JSON.parse or a Javascript library that is safe(json2.js).
Conclusion
I think the reason I'd suggest going with noSQL here, CouchDB in particular, is that it's going to handle all of the hard stuff for you. Replication is going to be a snap to setup. You won't have to worry about concurrency, etc.
That said, I don't know what kind of App you're building. I don't know what your relationship is going to be to the clients and how easy it'll be to get them to put CouchDB on their machines.
Links
CouchDB # Apache
CouchOne
CouchDB the definitive guide
MongoDB
Update:
After looking at the app I don't think CouchDB will be a good client side option as you're not going to require folks to install a database engine to play soduku. That said, I still think it'd be a great server side option. If you wanted to sync the server CouchDb instance with the client you could use something like BrowserCouch which is a JavaScript implementation of CouchDB for local-storage.
If most of your processing is going to be done on the client side using JavaScript, I don't see any problem in storing JSON directly on the server.
If you just want to play around with new technologies, you're most welcome to try something different, but for most applications, there isn't a real reason to depart from traditional databases, and SQL makes life simple.
You're safe as long as you use the standard JSON.parse function to parse JSON strings - some browsers (Firefox 3.5 and above, for example) already have a native version, while Crockford's json2.js can replicate this functionality in others.
Just read your post and I have to say I quite like your approach, it heralds the way many web applications will probably work in the future, with both an element of local storage (for disconnected state) and online storage (the master database - to save all customers records in one place and synch to other client devices).
Here are my answers:
1) Storing JSON on server: I'm not sure I would store the objects as JSON, its possible to do so if your application is quite simple, however this will hamper efforts to use the data (running reports and emailing them on a batch job for example). I would prefer to use JSON for TRANSFERRING the information myself and a SQL database for storing it.
2) NoSQL Approach: I think you've answered your own question there. My preferred approach would be to setup a SQL database now (if the extra resource needed is not a problem), that way you'll save yourself a bit of work setting up the data access layer for NoSQL since you will probably have to remove it in the future. SQLite is a good choice if you dont want a fully-featured RDBMS.
If writing a schema is too much hassle and you still want to save JSON on the server, then you can hash up a JSON object management system with a single table and some parsing on the server side to return relevant records. Doing this will be easier and require less permissioning than saving/deleting files.
3) Security: You mentioned there is no user input at the moment:
"for this use case, the user doesn't
get to enter anything"
However at the begining of the question you also mentioned that the user can
"work on one machine, save, then get
on another machine and load their
stuff"
If this is the case then your application will be storing user data, it doesn't matter that you havent provided a nice GUI for them to do so, you will have to worry about security from more than one standpoint and JSON.parse or similar tools only solve half the the problem (client-side).
Basically, you will also have to check the contents of your POST request on the server to determine if the data being sent is valid and realistic. The integrity of the JSON object (or any data you are tying to save) will need to be validated on the server (using php or another similar language) BEFORE saving to your data store, this is because someone can easily bypass your javascript-layer "security" and tamper with the POST request even if you didnt intend them to do so and then your application will be sending the evil input out the client anyway.
If you have the server side of things tidied up then JSON.parse becomes a bit obsolete in terms of preventing JS injection. Still its not bad to have the extra layer, specially if you are relying on remote website APIs to get some of your data.
Hope this is useful to you.

Write-though caching of large data sets in WCF?

We've got a smart client that talks to a SQL Server database via WCF, displaying the entities in the database, and allowing the user to edit those entities.
Some of the WCF calls return a large data set. Since this data set doesn't change very often, I'm considering some sort of write-through cache on the client, and only getting the deltas from the WCF service.
That is: the client both reads from the service and writes to the service.
I'm not looking for disconnected/offline operation, but since the majority of the data doesn't change very often, I'd probably implement this with a local data store.
I don't want the local store to get too stale, and I don't think I'm too concerned about conflict resolution, because updates will always go straight to the WCF service -- think of it as a write-through cache.
Would Microsoft's Sync Framework be good for this? Could I use a local SQL-CE cache and perform the updates over WCF? The service end has a SQL Server 2005/2008 backend, but I don't want to talk to it directly. Does Sync Framework integrate well with WCF?
Are there other solutions out there? Should I roll something myself?
I don't think you have to couple it to WCF at all. FeedSync allows you to publish directly to an RSS feed.
The only that I'm not too sure about is if it would be suitable for a "large dataset" though. Since you don't need two way replication, if your dataset is extremely large, you might want to write your own WCF implementation to optimize it; especially for the initial population.