Transitioning from SQL-based data retrieval to Web Services - sql

My customer is moving from providing data directly through SQL access to exposing the same data through a web service that will expose the same DB, but due to political reasons they're cutting the direct DB access. They're using SOAP for the web service but that's irrelevant to the issue. They come to me with requests that are vague in a sense that they don't know where the answer to their question is, so I'm left with no other options than to go poking around their data to see from where I could possibly find the correct data for their needs.
With SQL it's been more or less painless, write a simple query with a join or three and Bob's your uncle. In the worst case I have needed to select one row from the tables to see what they contain but that's not part of the actual data retrieval.
Now, with web services, I'm struggling to achieve the same. I feel like I have no tools to efficiently explore the data in a flexible way and each join from SQL requires a new query that includes manually merging results from the previous ones. A good example is
Show me all the users that are part of a service where the name begins with "FOO"
With SQL I would do a simple
SELECT
users.first_name,
users.last_name,
services.name
FROM users
LEFT JOIN services ON (users.service_id=services.id)
WHERE services.name LIKE 'FOO%'
With the SOAP API I'm forced to do a search for the services, write down the IDs and then do a search for the users. They keyword here is efficiency, I can get the same results but it takes so much more time that for anything more complex we might talk about hours instead of minutes.
The question is two-fold:
Is there a more efficient way of achieving the same?
Are there tools that would ease the pain, even if it means the join equivalents would need manual configuration to make work but would then work transparently once configured?
So far I've been using SOAP-UI (and Postman when applicable) for exploring the data. With Postman the scripting helps a bit so I can save the intermediate result into a variable and use them on the subsequent calls. I feel like I'm left with no other options than to take a simple(ish) way of programmatically accessing the API and scripting the same searches. Java is the weapon of choice at the customer for multiple reasons but that isn't the simplest way, so I'm also looking for recommendations regardless of the language.

Related

Database optimisation

I'm starting a web application that will be used by a lot of companies (over 20K), and most importantly a lot of information will be recorded daily. I would like your advice on the following idea: create a database for each company to do sql queries like this:
select * from enterprisedb1.tablename;
select * from enterprisedb2.tablename2 where enterprisedb2.tablename2.col='foo'
Pleace i need your advice, i don't find anything on google
If you are selling this to multiple clients then it might come down to separation of their data.
On the one hand everything for the app is in the one database for each client, and provided you get the connection string right you probably don't need to ever specify the company name again for the rest of the app. No more "where customer=123" on every single query.
Also means a client could be deleted, backed up, moved, audited, whatever in a completely independent manner.
And also means there is no risk of a developer or a query accidentally doing cross-client things. So you can even open up to generic query access that still cant accidentally cross a client-to-client border. And security set-up will be simpler.
But if you have a million clients you do end up with a lot of databases. How well this works will depend on all sorts of things, including your database of choice.
You also end up having multiple copies of reference data unless you create an additional database "common" or something like that.
Its going to be very much a "depends" answer, but that's a few things to consider.
I suggest to use common tables for each company. It will better to manage and easy to understand.
Create one table for company data and use Integer reference of that key in another mete data tables. For better performance, Index and Query must be well formed.

How to isolate SQL Data from different customers?

I'm currently developing a service for an App with WCF. I want to host this data on windows-azure and it should host data from differed users. I'm searching for the right design of my database. In my opinion there are only two differed possibilities:
Create a new database for every customer
Store a customer-id to every table (or the main table when every table is connected via entities)
The first approach has very good speed and isolating, but it's very expansive on windows azure (or am I understanding something of the azure pricing wrong?). Also I don't know how to configure a WCF- Service that way, that it always use another database.
The second approach is low on speed and the isolating is poor. But it's easy to implement and cheaper.
Now to my question:
Is there any other way to get high isolation of data and also easy integration in a WCF- service using azure?
What design should I use and why?
You have two additional options: build multiple schema containers within a database (see my blog post about this technique), or even better use SQL Database Federations (you can use my open-source project called Enzo SQL Shard to access federations). The links I am providing give you access to other options as well.
In the end it's a rather complex decision that involves a tradeoff of performance, security and manageability. I usually recommend Federations, even if it has its own set of limitations, because it is a flexible multitenant option for the cloud with the option to filter data automatically. Check out the open source project - you will see how to implement good separation of customer of data independently of the physical storage.

Split Database Security

I'm working on an .NET MVC SQL application that will contain sensitive data, for example- HIV test results or income. I want to error-proof this privacy as much as possible so no one except the user can access it (think Joe the Plumber having his information hacked by a state employee).
I read hear that splitting the database in two doesn't seem reasonable:
Is splitting databases a legitimate security measure?
although I've heard of this being done. If we could just use two tables... better.
But when I say error-proofing, I mean impossible for ANYONE in our company to access both databases/tables. I'm thinking about putting access to the application code (which would access both databases) and to both databases in the hands of a deep-pockets third party (like PWC or EY) for when the government came calling or some other real need to see both data sources came along.
Anyone have any thoughts on the cleanest way to do this? We'd want to design the tables such that most queries would not require access to both data sources so the relative cost in throughput wouldn't be that much.
You can encrypt a column of data in SQL. So the columns which has the sensitive data e.g. HIV test results/income, you can encrypt the data while storing it in the DB.
Check the details here:
http://msdn.microsoft.com/en-us/library/ms179331.aspx
http://msdn.microsoft.com/en-us/library/bb964742.aspx
Let me know if it helps.

What is the fastest way for me to take a query and turn it into a refreshable graph of the results set?

I often find myself writing one off queries to either answer someone's question or trouble shoot something and I would like to be able to quickly expose the on demand refreshable results of the query graphically so that I can share these results to others without having to go through the process of creating an SSRS report and publishing it to a reporting services server.
I have thought about using excel to do this or maybe running a local SSRS server but both of these options are still labor intensive and I cannot justify the time it would take to do these since no one has officially requested that I turn this data into a report.
The way I see it the business I work for has invested money in me creating these queries that often return potentially useful data that other people in the organization might want but since it isn't exposed in any way and I don't know that this data is something they want and they may not even realize they want this data, the potential value of the query is not realized. I want to increase the company's return on investment on all these one off queries that I and other developers write by exposing their results graphically so that they can be browsed by others and then potentially turned into more formalized SSRS reports if they provide enough value to justify the development of the report.
What is the fastest way for me to take a query and turn it into a refreshable graph of the results set?
Why dont you simply use what you may already have. Excel...you can import data via an ODBC / Oracle / SQL Connection. Get Data..and bam you can run the query and format it right in the spreadsheet and provide sorting etc. All you need to supply is the database name and user name and password to connect to the db.
JonH is right regarding Excel's built in ODBC support, but I have had tons of trouble with this. In my case, the ODBC connection required the client software to be installed so that it could use the encryption methods, etc. Also, even if that were not the case, the user (I believe) would still have to manually install and set up an ODBC connection.
Now if you just want something on your machine to do the queries and refresh them, JohH's solution is great and my caveats are probably irrelavent. But if you want other users to have access, you should consider having a middle-man app (basically a PHP script, assuming a web server is an option for you), that does a query, transforms the results into XML, and outputs it as "report-xyz.xml". You can then point anybody running a newer version of Excel to that address and they can very easily import the data into Excel with no overhead. (basically a kind of web service).
Keep in mind, I don't think you should have a web script that will allow users to make queries to your Database server! You would have some admin page where you make pass the query in and a new xml file with the results gets made. So my idea is also based on the idea that you want to run the same queries over and over without any specifics passed in. (if that were the case, I'd look into just finding a pre-built web services bridge for your database that already has security features built in. Then you could have users make the limited changes allowed.)

How to divide responsibility between LDAP and RDBMS

I'm a lead developer on a project which is building web applications for my companies SaaS offering. We are currently using LDAP to store user data such as IDs, passwords, contanct details, preferences and other user specific data.
One of the applications we are building is a reporting service that will both collect and present management information to our end users. Obviously this service will require a RDBMS but it will also need to access user data stored in LDAP.
As I see it we have a two basic implementation options:
Duplicate user data in both LDAP and the RDBMS.
Have the reporting service access LDAP whenever it needs user data.
Although duplicating data (and implementing the mechanisms to make this happen) as suggested in option 1 seems the wrong way to go, my gut feeling is that option 2 would not perform well enough (how do you 'join' LDAP data to RDBMS data as efficiently as a pure RDBMS implementation?).
I did find a related question but I'm still unsure which approach to take. I'd be interested in seeing what people thought of either option or perhaps other options.
Why would you feel that duplicating data would be the wrong way to go? Reporting tools (web based and otherwise) are mostly built around RDBMS's, so any mix'n'match will introduce unnecessary complexities. Reports are likely to need to be changed fairly frequently (from experience), so you want them to be as simple as possible. The data you store about users is unlikely to change its format very often, so once you have your import function working, you won't need to touch it again.
The only obstacle I can see is latency: how do you ensure that your RDBMS copy is up to date? You might need to ensure that your updating code writes to both destinations. Personally, also, I wouldn't necessarily use LDAP for application specific personal preferences: LDAP can't handle transactions, so what happens when data is updated from several directions? (Transactionality is of course also a problem with letting updaters write to both stores...) I'd rather let the RDBMS be the master for most data, and let LDAP worry only about identity, credentials and entitlements, which are rarely changed and only for one set of purposes. For myself, LDAP's ability to deal with hierarchical data isn't all that great a selling point.
Data duplication is not always a bad thing, especially when the usage scenarios are different enough.