Is there way to have ordered output without ORDER BY clause in MS SQL - sql

We have a large application. Recently, we upgraded the database MS SQL 2016. As a result, we are having issues with output order in some reports. Unfortunately, we can not alter the source code of the application and will not be able to do this for awhile. The reason for this behavior is that some queries in format "SELECT * FROM table" with is no "ORDER BY" clause on them.
Is there any creative way to order the output? Like to add a trigger on SELECT<(I know there is no such thing...)>? Any other way? Any type of index that would be an ORDER default?
UPDATE: I can not alter the source code. This is compiled application and upgrade cycle is few months away. If I could, I would just go through the source code and change those queries.

You asked:
Is there a way to have ordered output without an ORDER BY clause in MS SQL
No.
Unfortunately, there is not.
SQL (in all its flavors) is a declarative scheme for manipulating sets of rows of tables. Sets of things like rows have no inherent order.
If SQL, in any flavor, without ORDER BY statements, happens to deliver rows in an order you expect, it's a coincidence. It's pure luck. Formally speaking, without ORDER BY SQL delivers rows in an unpredictable order."Unpredictable" is like "random," but worse. Random implies you might get a different order every time you run the query: you can catch that kind of thing in system test. Unpredictable means you get the same order each time until you don't. Now you don't.
This is in the basic nature of SQL. Why? The optimizers in modern database servers are really smart about delivering exactly what you asked for as fast as possible, and they get smarter with every release. SQL Server 2016's optimizer probably noticed some opportunities to use concurrency for your queries, or something like that, and took them.
Your choices here are all unpleasant:
Roll back the database upgrade. That may or may not fix the problem.
Make emergency fixes to your software load and roll them out ahead of schedule.
Live with the wrongly-ordered reports until you can fix your software.
I wrote a long answer here so you can have useful arguments to present to your corporate overlords about your situation. I don't envy you when you explain this.

Related

Is it better to sort data at the application layer, or with an order by clause?

A long time ago I was advised to sort data at the application layer and not use the ORDER BY clause in SQL. The reasoning was the .Net will sort more efficiently the SQL engine.
Conflicting with this advice is the SSIS Best Practices I've encountered that recommends sorting data in the SQL, where one can, and avoiding the Sort transformation.
The SSIS advice makes sense to me. So now I am wondering if the initial advice of avoiding the ORDER BY is bogus.
Given a not-too-complex query, does the ORDER BY necessarily mean a performance hit?
Thanks.
Brent Ozar's argument for avoiding ORDER BY boils down to SQL Server licenses being expensive and application server licenses being cheap. The "application server" in the case of SSIS is, in fact, SQL Server, so the "cheaper server" argument doesn't apply.
I've never seen the argument that .NET sorting was inherently faster than SQL Server sorting, but I'd be extremely surprised if that was generally true (especially given the amount of meta-information about the underlying data that is available to the SQL Server query optimizer but is not available to a generic .NET Sort() method). I do know that the SSIS Sort transformation can put a big performance hit on the data flow, since all the data has to be cached by SSIS before the sorting can begin.
So, in the specific case of choosing between using a T-SQL ORDER BY clause to sort data or an SSIS Sort transformation, I'd always choose the ORDER BY clause to start with.
First, if you really want to know on a given set of data, then you should test it.
That said, there are several reasons why I think you should sort on the server side.
First, the server can take advantage of more hardware -- multiple threads, multiple disks, multiple processors -- for sorting. This can make a big difference on performance.
Second, the sorting may not be necessary. There may be cases where the query does not actually have to sort the results, because they are already sorted. For instance, the results may be returned based on an index that is sorted.
Third, memory usage issues and memory leaks tend to be more prevalent on the client side. (Okay, you don't say you are using java, so you are a bit safe from this.) The database server knows how to manage memory.
Fourth, I think it is a good idea to do the data manipulation on the server side. Coding gets quite complicated if you try to micro-optimize each operation, with some being on the server and some being on the client side. Unless something is related specifically to the presentation of the data, do it on the server.
All that said, if you are just sorting 20 items for presentation purposes on, say, a single page, then it doesn't make a difference. Do it on the client side if you are comfortable with that.

When a query is executed, what happens first in the back-end?

I'm having a query in COGNOS which would fetch me a huge volume of data. Since the execution time would be higher, I'd like to fine tune my query. Everyone knows that the WHERE clause in the query would get executed first.
My doubt is which would happen first when a query is executed?
The JOIN in the query would be established first or the WHERE clause would be executed first?
If JOIN is established first, I should specify the filters of the DIMENSION first else I should specify the filters of the FACT first.
Please explain me.
Thanks in advance.
The idea of SQL is that it is a high level declarative language, meaning you tell it what results you want rather than how to get them. There are exceptions to this in various SQL implementations such as hints in Oracle to use a specific index etc, but as a general rule this holds true.
Behind the scenes the optimiser for your RDBMS implements relational algebra to do a cost based estimate of the different potential execution plans and select the one that it predicts will be the most efficient. The great thing about this is that you do not need to worry what order you write your where clauses in etc, so long as all of the information is there the optimiser should pick the most efficient plan.
That being said there are often things that you can so on the database to improve query performance such as building indexes on columns in large tables that are often used in filtering criteria or joins.
Another consideration is whether you can use parallel hints to speed up your run time but this will depend on your query, the execution plan that is being used, the RDBMS you are using and the hardware it is running on.
If you post the query syntax and what RDBMS you are using we can check if there is anything obvious that could be amended in this case.
The order of filters definitely does not matter. The optimizer will take care of that.
As for filtering on the fact or dimension table - do you mean you are exposing the same field in your Cognos model for each (ex ProductID from both fact and Product dimension)? If so, that is not recommended. Generally speaking, you should expose the dimension field only.
This is more of a question about your SQL environment, though. I would export the SQL generated by Cognos from within Report Studio (Tools -> Show Generated SQL). From there, hopefully you are able to work with a good DBA to see if there are any obvious missing indexes, etc in your tables.
There's not a whole lot of options within Cognos to change the SQL generation. The prior poster mentions hints, which could work if writing native SQL, but that is a concept not known to Cognos. Really, all you can do is change the implict/explict join syntax which just controls whether the join happens in an ON statement or in the WHERE. Although the WHERE side is pretty ugly it generally compiles the same as ON.

Over use of Oracle With clause?

I'm writing many reporting queries for my current employer utilizing Oracle's WITH clause to allow myself to create simple steps, each of which is a data-oriented transformation, that build upon each other to perform a complex task.
It was brought to my attention today that overuse of the WITH clause could have negative side effects on the Oracle server's resources.
Can anyone explain why over use of the Oracle WITH clause may cause a server to crash? Or point me to some articles where I can research appropriate use cases? I started using the WITH clause heavily to add structure to my code and make it easier to understand. I hope with some informative responses here I can continue to use it efficiently.
If an example query would be helpful I'll try to post one later today.
Thanks!
Based on this: http://www.dba-oracle.com/t_with_clause.htm it looks like this is a way to avoid using temporary tables. However, as others will note, this may actually mean heavier, more expensive queries that will put an additional drain on the database server.
It may not 'crash'. That's a bit dramatic. More likely it will just be slower, use more memory, etc. How that affects your company will depend on the amount of data, amount of processors, amount of processing (either using with or not)

Refactoring "extreme" SQL queries

I have a business user who tried his hand at writing his own SQL query for a report of project statistics (e.g. number of tasks, milestones, etc.). The query starts off declaring a temp table of 80+ columns. There are then almost 70 UPDATE statements to the temp table over almost 500 lines of code that each contain their own little set of business rules. It finishes with a SELECT * from the temp table.
Due to time constraints and 'other factors', this was rushed into production and now my team is stuck with supporting it. Performance is appalling, although thanks to some tidy up it's fairly easy to read and understand (although the code smell is nasty).
What are some key areas we should be looking at to make this faster and follow good practice?
First off, if this is not causing a business problem, then leave it until it becomes a problem. Wait until it becomes a problem, then fix everything.
When you do decide to fix it, check if there is one statement causing most of your speed issues ... issolate and fix it.
If the speed issue is over all the statements, and you can combine it all into a single SELECT, this will probably save you time. I once converted a proc like this (not as many updates) to a SELECT and the time to run it went from over 3 minutes to under 3 seconds (no shit ... I couldn't believe it). By the way, don't attempt this if some of the data is coming from a linked server.
If you don't want to or can't do that for whatever reason, then you might want to adjust the existing proc. Here are some of the things I would look at:
If you are creating indexes on the temp table, wait until after your initial INSERT to populate it.
Adjust your initial INSERT to insert as many of the columns as possible. There are probably some update's you can eliminate by doing this.
Index the temp table before running your updates. Do not create indexes on any of the columns targetted by the update statements until after their updated.
Group your updates if your table(s) and groupings allow for it. 70 updates is quite a few for only 80 columns, and sounds like there may be an opportunity to do this.
Good luck
First thing I would do is check to make sure there is an active index maintenance job being run periodically. If not, get all existing indexes rebuilt or if not possible at least get statistics updated.
Second thing I would do is set up a trace (as described here) and find out which statements are causing the highest number of reads.
Then I would run in SSMS with 'show actual execution plan' and tally the results with the trace. From this you should be able to work out whether there are missing indexes that could improve performance.
EDIT: If you are going to downvote, please leave a comment as to why.
Just like any refactoring, make sure you have an automated way to verify your refactorings after each change (you can write this yourself using queries which check the development output against a known good baseline). That way, you are always matching the known good data. This will give you a high degree of confidence in the correctness of your approach when you enter the phase where you are deciding whether to switch over to your new version of the process and want to run side by side for a few iterations to ensure correctness.
I also like to log all the test batches and the run times of the processes within the batch, so I can tell if some particular process within the batch was adversely affected at some point in time. I can get average times for processes and see trends of improvement or spot potential problems. This also lets me identify the low-hanging fruit within the batch where I can make the most improvement.
There are then almost 70 UPDATE
statements to the temp table over
almost 500 lines of code that each
contain their own little set of
business rules. It finishes with a
SELECT * from the temp table.
Actually this sounds like it can be followed and understood quite well, each update statement does one thing to the table with a specific purpose and set of business rules. I think that maintaining procedures of 500 lines of code with one or a couple of select statements that does "everything", built with 15 or so joins, and case statements etc scattered all over the place, is a lot harder to maintain. Although it would make for better performance..
It's a bit of a dilemma with SQL, that writing clear and concise code (using multiple updates, creating functions etc) always seems to have a big negative impact on performance. Trying to do everything at once, which is considered bad practice in other programming languages, seems to be the very core of set oriented languages.
If this is a report generating stored procedure, how often is it being run? If it's only necessary to run it once a day and is run during the night how much of an issue is the performance?
If it's not I'd recommend being careful in your choice to re-write it because there is a chance that you could muck up your figures.
Also it sounds like the sort of thing that should be pulled out into an SSIS package building up a new permanent table with the results so it only has to be run once.
Hope this makes sense
One thing you could try is to replace the temp table with a table variable. There are times when this is faster and times when it is not, you will have to just try it and see.
Look at the 70 update statements. It is possible to combine any of them? If the person writing did not use CASE statments, it might be possible to do fewer statements.
Other obvious things to look at - eliminate any cursors, change any subqueries to joins to tables or derived tables.
Rewrite perhaps. One hardware solution would be to make sure your database temp table goes on a 'fast' drive, perhaps a solid state disk (SSD), or can be managed all in memory.
My guess is this 'solution' was developed by someone with a grasp of and a dependency upon spreadsheets, someone who may not be very savvy on 'normalized' databases--how to construct and populate tables to retain data for reporting purposes, something which perhaps BI Business Intelligence software can be utilized with sophistication and yet be adaptable.
You didn't say 'where' the update process is being run. Is the update process being run as a SQL script from a separate computer (desktop) against the server where the data is? There can be significant bottlenecks and overhead created by that approach. If so, consider running the entire update process directly on the server as a local job, as a compiled stored procedure, bypassing the network and (multiple) cursor management overhead. It could have a scheduled time to run and a controlled priority, completing in off peak business data usage hours.
Evaluate how often 'commit' statements are really needed for the sequence of update statements...saving on a bunch of commit lines could notably improve the overall update time. There may be a couple of settings in the database client driver software which can make a notable difference.
Can the queries used for update conditions be factored out as static 'views' which in turn can be shared across multiple update statements? Views can keep in memory data/query rows frequently accessed. There may be performance tuning in determining how much update data can be pended before a commit is optimal.
It might be worth evaluating whether Triggers could be used to replace the batch job update sequence. You don't say from how many tables the data used comes from...that might help with decision making. I don't know if you have the option of adding triggers to the database tables from which the data is gathered. If so, adding a few triggers to a number of tables wouldn't really degrade overall system performance much, but might save a big wad of time on that update process. You could try replacing the update statements one at a time with triggers and see if the results are the same as before. Create a similar temp table, based on the same update process, then carefully test whether triggers feeding updates to the temp table could replace individual update statements. Perhaps you may have a sort of 'Data Warehouse' application. It might be useful to review how to set up a 'star' schema of tables to retain summarized business data for reporting.
Creating a comprehensive and cached 'view' which updates via the queries once per day, reflecting the updates might be another approach to explore.
Well, since the only thing you've told us about this stored procedure is that it has a 80+ column temp table, the only thing I can recommend is to remove that table, and rewrite the rest to remove the need for it.
You should get a tool that allows you to get an explain plan of all queries your app will run. It is the best bang for the buck on an SQL heavy app for performace increases. If you read and react to what the Explain Plan is telling you. If you are on Oracle what we used to use was TOAD by Qwest(?) I think. It was a great tool.
I would recommend looking at the tables involved, the end result, and starting from scratch to see if the query can be done in a more efficient manner. Keep the query to verify that the new one is working exactly the same as the old one, but try to forget all methods used to obtain the end result.
I would rewrite it from scratch.
You say that you understand what it supposed to do so it should not be that difficult. And I bet that the requirements for that piece of code will keep changing so if you do not rewrite it now you may end up maintaining some ugly monster

SQL With A Safety Net

My firm have a talented and smart operations staff who are working very hard. I'd like to give them a SQL-execution tool that helps them avoid common, easily-detected SQL mistakes that are easy to make when they are in a hurry. Can anyone suggest such a tool? Details follow.
Part of the operations team remit is writing very complex ad-hoc SQL queries. Not surprisingly, operators sometimes make mistakes in the queries they write because they are so busy.
Luckily, their queries are all SELECTs not data-changing SQL, and they are running on a copy of the database anyway. Still, we'd like to prevent errors in the SQL they run. For instance, sometimes the mistakes lead to long-running queries that slow down the duplicate system they're using and inconvenience others until we find the culprit query and kill it. Worse, occasionally the mistakes lead to apparently-correct answers that we don't catch until much later, with consequent embarrassment.
Our developers also make mistakes in complex code that they write, but they have Eclipse and various plugins (such as FindBugs) that catch errors as they type. I'd like to give operators something similar - ideally it would see
SELECT U.NAME, C.NAME FROM USER U, COMPANY C WHERE U.NAME = 'ibell';
and before you executed, it would say "Hey, did you realise that's a Cartesian product? Are you sure you want to do that?" It doesn't have to be very smart - finding obviously missing join conditions and similar evident errors would be fine.
It looks like TOAD should do this but I can't seem to find anything about such a feature. Are there other tools like TOAD that can provide this kind of semi-intelligent error correction?
Update: I forgot to mention that we're using MySQL.
If your people are using the mysql(1) program to run queries, you can use the safe-updates option (aka i-am-a-dummy) to get you part of what you need. Its name is somewhat misleading; it not only prevents UPDATE and DELETE without a WHERE (which you're not worried about), but also adds an implicit LIMIT 1000 to SELECT statements, and aborts SELECTs that have joins and are estimated to consider over 1,000,000 tuples --- perfect for discouraging Cartesian joins.
..."writing very complex ad-hoc SQL queries.... they are so busy"
Danger Will Robinson!
Automate Automate Automate.
Ideally, the ops team should not be put into a position where they have to write queries on the fly in a high stress situation – it’s a recipe for disaster! Better for them to build up a library of pre-written scripts that have undergone the appropriate testing to make sure it a) does what you want b) provides an audit trail c) has a possible ‘undo’ type function.
Failing that, giving them a user ID that only has SELECT premissions might help :-)
You might find SQL Prompt from redgate useful. I'm not sure what database engine you're using, as it's only for MSSQL Server
I'm not expecting anything like this to exist. The tool would have to first implement everything that the SQL parser in your database implements, and then it would have to do a data model analysis to predict "bad" queries.
Your best bet might be to write a plugin for a text editor that did some basic checking for suspicious patterns and highlighted them differently than the standard .sql mode. But even that would be quite difficult.
I would be happy with a tool that set off alarm bells whenever I typed in an update statement without a where clause. And perhaps administered a mild electric shock, since it's usually about 1 in the morning after a long day when mistakes like that happen.
It would be pretty easy to build this by setting up a sample database with a extremely small amount of dummy data, which would receive the query first. A couple of things will happen:
You might get a SQL syntax error, which would not load the database much since it's a small database.
You might get back a response which could clearly be shown to contain every row in one or more tables, which is probably not what they want.
Things which pass the above conditions are likely to be okay, so you can run them against the copy of the production database.
Assuming your schema doesn't change much and is not particularly weird, writing the above is likely the quickest solution to your problem.
I'd start with some coding standards - for instance never use the type of join in your example - it often results in bad results (especially in SQL Server if you try to do an outer join that way, you will get bad results). require them to do explicit joins.
If you have complex relationships, you might consider putting them in views and then writing the adhoc queries from the views. Then at least they will never make the mistake of getting the joins wrong.
Can't you just limit the amount of time a query can run for? I'm not sure about MySQL, but for SQL Server, even just the default query analyzer can restrict how long queries will run before they time out. Couple that with limited rights so they can only run SELECT queries, and you should be pretty much covered.