I'm creating an application to calculate some Login -Logouts on a call center, basically what I do is to get an interval within times.
Which would be best:
to get the interval on the DB Server (SQL Server 2000),
or in the code itself (Perl)?
I'm running on Windows Server 2003.
Basically the operation is:
Login-Logout + 1
But there are about 1 000 000 rows on each query.
P.S I do know how to do it, what I'm wondering is what would be a best practice.
This is my actual query :
select S.Ident,S.Dateissued ,
S.LoginMin,S.LogoutMin ,
E.Exc_Name ,
CAST(CAST( (LoginMin / 60 + (LoginMin % 60) / 100.0) as int ) AS varchar ) + ':' + CASE WHEN LoginMin % 60 < 10 THEN '0'+ CAST(LoginMin % 60 AS varchar) ELSE CAST(LoginMin % 60 AS varchar) END ,
CAST(CAST( (LogoutMin / 60 + (LogoutMin % 60) / 100.0) as int ) AS varchar ) + ':' + CASE WHEN LogoutMin % 60 < 10 THEN '0'+ CAST(LogoutMin % 60 AS varchar) ELSE CAST(LogoutMin % 60 AS varchar) END,
(LogoutMin-LoginMin)+1 as Mins,
E.Exc_ID,action
FROM igp_ScheduleLoginLogout S INNER JOIN igp_ExemptionsCatalog E
ON S.Exc_ID = E.Exc_ID
where ident=$ident
and dateissued between '$dateissued' and '$dateissued2'"
Short answer:
If you are doing math on a set of data (like your 1 million row example), SQL is optimized for set-based operations.
If you are doing math on an iterative, row-by-row basis, your calling application or script is probably best.
Generally aggregating on the server and returning the final answer is faster than pulling all of the rows to an application and chugging through them there.
Generally the answer is that if you can do the calculation as part of the SQL query without having to change the form of the query, and if your application-layer code supports it (e.g. you aren't using an ORM that makes it difficult) then you may as well do the query as part of the SQL. With such a simple calculation it's not likely to make much difference, so you should write whatever leads to the most maintainable code.
As with any performance question, the real answer is to benchmark it yourself. Answers on StackOverflow can only get you so far, since so many factors can affect performance in the real world.
It partially depends on how scalable this has to be.
With 1 client and 1 server, as other have noted, doing it in SQL may be faster (but benchmark youself!)
With several clients and 1 server (either now or in projection), you scale the calculations per client and offload ALL of them from 1 server, so the server load is dramatically lower. In this case, do the calculation in the client (or app server).
Related
I'm trying to average a field and it's very simple to do but there are some problems with some values. There are values I know are way too big and I was hoping to exclude them by the number of characters (I would probably put 4 characters max).
I'm unfamiliar with a sql clause that could execute this. If there is one that would be great.
select avg(convert(float,duration)) as averageduration
from AsteriskCalls where ISNUMERIC(duration) = 1
I expect the output to be around 500-1000 but it comes up as an 8 digit number.
That is easy enough:
select avg(convert(float,duration)) as averageduration
from AsteriskCalls
where ISNUMERIC(duration) = 1 and length(duration) <= 4;
This will not necessarily work, of course, because you could have '1E30', which would be a pretty big number. And it would miss '0.001', which is a pretty small number.
The more accurate method uses try_convert():
select avg(try_convert(float, duration)) as averageduration
from AsteriskCalls
where try_convert(float, duration) <= 1000.0
And that should probably really be:
where abs(try_convert(float, duration)) <= 1000.0
Consider these values which are of type MONEY (sample values and these can change)
select 4796.529 + 1585.0414 + 350.9863 + 223.3549 + 127.6314+479.6529 + 158.5041
for some reason I need to round each value to a scale of 3 like this
select round(4796.529,3)+ round(1585.0414,3)+ round(350.9863,3)+ round(223.3549,3)+ round(127.6314,3)+ round(479.6529,3)+ round(158.5041,3)
but when I take the sum they shows a very minor variation. first line of code returns 7721.7000. and the second one 7721.6990. But this variation in not acceptable. What is the best way to solve this ?
As Whencesoever said, your problem is mathmatical one, not a programming error.
12.5 + 11.6 = 24.1
ROUND(12.5) + ROUND(11.6) = 25
ROUND(12.5 + 11.6) = 24
I'd talk with the business and figure out where they want the rounding applied.
Also, as a side note, MONEY is a terrible datatype. If you can, you may want to consider switching to a DECIMAL. See Should you choose the MONEY or DECIMAL(x,y) datatypes in SQL Server?
When you round numbers before you sum them you will get a different result than if you round numbers after you have summed them. Simple as that. There is no way to solve this.
I am working with a legal software called Case Aware. You can do limited sql searches and I have had luck in getting Case Aware to pull a specific value from the database. My problem is that I need to create a sql search that returns multiple values but the Case Aware software will only accept one result as an answer. If my query produces a list, it will only recognize the top value. This is a limitation of the software I cannot get around.
My very basic search is:
select rate
From case_fin_info
where fin_info_id = 7 and rate!=0
This should produce a list of 3-15 rates, which does when the search is run straight from the database. However, when run through Case Aware, only the first rate in the table will pull. I need to pull the values through Case Aware because with Case Aware I can automatically insert the results into a template. (Where I work generates hundreds if not thousands a day so doing it manually is a B$##%!)
I need to find a way to pull all the values from the search into one value. I cannot use XML (Case Aware will give an error) and I cannot create a temporary table. (Again, a Case Aware limitation) If possible, I also need to insert a manual return between each value so they are separated in the document I am pulling this information into.
Case Aware does not have any user manual and you pay for support (We do have it) but I have my doubts on their abilities. I have been able to easily create queries that they have told me in the past are impossible. I am hoping this is one of those times.
IntegrationGirly
Addtl FYI:
I currently have this kludge: Pulling each value individually from the database even if it is null and putting each value into a table in the document. (30 separate searches) It "works" but takes much longer for the document to generate and it also leaves a great deal of empty space. Some case have 3 values, most have 5-10 but we have up to 30 areas for rate because once in a blue moon we need them. This makes the template look horribly junky but that doesn't affect the lawyers who generate the docs since they don't see it, but every time they generate the table they have to take out all the empty columns. With the number of docs we do each day, 1) this becomes time consuming and 2) This assumes attorneys and paralegals know how to take rows out of tables in word.
First, my condolences for having to work with such terrible software.
Second, here's a possible solution (this is assuming SQL Server):
1) Execute a SELECT COUNT(*) FROM case_fin_info WHERE fin_info_id = 7 AND rate <> 0. Store the result (number of rows) in your client application.
2) In your client app, do a for (i = 0; i < count; i++) loop. During each iteration, perform the query
WITH OrderedRates AS
(
SELECT Rate, ROW_NUMBER() OVER (ORDER BY <table primary key> ASC) AS 'RowNum'
FROM case_fin_info WHERE fin_info_id = 7 AND rate <> 0
)
SELECT Rate FROM OrderedRates WHERE RowNum = <count>
Replacing the stuff in <> as appropriate. Essentially you get the row count in your client app, then get one row at a time. It's inefficient as hell, but if you only have 15 rows it shouldn't be too bad.
I had a similar query to implement in my application. This should work.
DECLARE #Rate VARCHAR(8000)
SELECT #Rate = COALESCE(#Rate + ', ', '') + rate
From case_fin_info where fin_info_id = 7 and rate!=0;
Here's a single query that will return the one result in a single column. It assumes your manual return is CR + LF. And, you would need to expand it to handle all 15 rates.
SELECT max(Rate1) + CHAR(13) + CHAR(10)
+ max(Rate2) + CHAR(13) + CHAR(10)
+ max(Rate3) + CHAR(13) + CHAR(10)
+ max(Rate4) + CHAR(13) + CHAR(10)
+ max(Rate5) + CHAR(13) + CHAR(10)
FROM (
SELECT CASE RateID WHEN 1 THEN CAST(rate as varchar) END AS Rate1,
CASE RateID WHEN 2 THEN CAST(rate as varchar) END AS Rate2,
CASE RateID WHEN 3 THEN CAST(rate as varchar) END AS Rate3,
CASE RateID WHEN 4 THEN CAST(rate as varchar) END AS Rate4,
CASE RateID WHEN 5 THEN CAST(rate as varchar) END AS Rate5
FROM
(
select RateID, rate From case_fin_info where fin_info_id = 7 and rate!=0
) as r
) as Rates
Say I have a query like this:
select ((amount1 - amount2)/ amount1) as chg from t1
where
((amount1 - amount2)/ amount1) > 1 OR
((amount1 - amount2)/ amount1) < 0.3
Is there a way I can perform the arithmetic calculation only once instead of doing it thrice as in the above query ?
EDIT: I am not sure if the database automatically optimizes queries to do such calcs only once? I am using T-SQL on Sybase.
Conceptually you could select from your calculation, i.e:
select
calc.amt
from
(select (amt1 - amt2) / amt1 as amt from #tmp) calc
where
calc.amt > 1 OR calc.amt < 0.3
Not sure off the top of my head if SQL would optimise your code to similar anyway - running your query against my query off a basic temp table seems to indicate they execute in the same way.
It's better to have repeated arithmetic operation than more IO
-- ADDED LATER
I've checked all 3 solutions in SQL Server 2000, 2005 and 2008 (over 100,000 random rows), and in all cases they have exactly the same execution plans as well as CPU and IO usage.
Optimizer does good job.
select calc.amtpercent
from ( select (amount1-amount2)/cast(amount1 as float) as amtpercent from z8Test) calc
where calc.amtpercent > 1 OR calc.amtpercent < 0.3
select (amount1-amount2)/cast(amount1 as float) as amtpercent
from z8Test
where (amount1-amount2)/cast(amount1 as float) > 1
or (amount1-amount2)/cast(amount1 as float) < 0.3
select (amount1-amount2)/cast(amount1 as float) as amtpercent
from z8Test
where (amount1-amount2)/cast(amount1 as float) not between 0.3 and 1
Answer on Alex comment: Have you ever worked with databases? I've seen so many times bottlenecks in IO and never in CPU of database server. To be more precise, in few occasions when I had high CPU usage it was caused by IO bottlenecks and only in two cases because of queries that wrongly used encryption techniques (instead of encrypting parameter value and compare against column in huge table query was decrypting column and comparing against parameter) in the first and the awful number of unnecessary conversions (datetime to string and back to datetime) in second case for a query that was triggered really frequent.
Of course you have to avoid unnecessary arithmetic, but be careful if you have to increase IO as a trade off.
SQL is not an imperative language, its a declarative language. You should not approach SQL with a C style optimization mindframe. For instance in your query the repeat of computation, if it occurs, is irelevant compared to the fact that your query is not SARGable. If this kind of search is relevant for your application performance, It would be much more appropriate for an RDBMS to create a computed column with the expression ((ammount1-ammount2)/ammount1) at the table level and index that column, then rewrite your query to use the computed column:
ALTER TABLE t1 ADD chg AS ((ammount1-ammount2)/ammount1);
CREATE INDEX idx_t1_chg ON t1(chg);
GO
SELECT chg FROM t1 WHERE chg > 1 OR chg < 0.3;
GO
This should work I think:
select ((amount1 - amount2)/ amount1) as chg from t1
where
chg > 1 OR
chg < 0.3
EDIT: After a small test I found out this does not work: Invalid column name 'chg'. Chris' answer is a better solution.
I have a table where I'm storing Lat/Long coordinates, and I want to make a query where I want to get all the records that are within a distance of a certain point.
This table has about 10 million records, and there's an index over the Lat/Long fields
This does not need to be precise. Among other things, I'm considering that 1 degree Long == 1 degree Lat, which I know is not true, but the ellipse I'm getting is good enough for this purpose.
For my examples below, let's say the point in question is [40, 140], and my radius, in degrees, is 2 degrees.
I've tried this 2 ways:
1) I created a UDF to calculate the Square of the Distance between 2 points, and I'm running that UDF in a query.
SELECT Lat, Long FROM Table
WHERE (Lat BETWEEN 38 AND 42)
AND (Long BETWEEN 138 AND 142)
AND dbo.SquareDistance(Lat, Long, 40, 140) < 4
I'm filtering by a square first, to speed up the query and let SQL use the index, and then refining that to match only the records that fall within the circle with my UDF.
2) Run the query to get the square (same as before, but without the last line), feed ALL those records to my ASP.Net code, and calculate the circle in the ASP.Net side (same idea, calculate the square of the distance to save the Sqrt call, and compare to the square of my radius).
To my suprise, calculating the circle in the .Net side is about 10 times faster than using the UDF, which leads me to believe that I'm doing something horribly wrong with that UDF...
This is the code I'm using:
CREATE FUNCTION [dbo].[SquareDistance]
(#Lat1 float, #Long1 float, #Lat2 float, #Long2 float)
RETURNS float
AS
BEGIN
-- Declare the return variable here
DECLARE #Result float
DECLARE #LatDiff float, #LongDiff float
SELECT #LatDiff = #Lat1 - #Lat2
SELECT #LongDiff = #Long1 - #Long2
SELECT #Result = (#LatDiff * #LatDiff) + (#LongDiff * #LongDiff)
-- Return the result of the function
RETURN #Result
END
Am I missing something here?
Shouldn't using a UDF within SQL Server be much faster than feeding about 25% more records than necessary to .Net, with the overhead of the DataReader, the communication between processes and whatnot?
Is there something I'm doing horribly wrong in that UDF that makes it run slow?
Is there any way to improve it?
Thank you very much!
You can improve the performance of this UDF by NOT declaring variables and doing your calculations more in-line. This will likely improve performance a little but (but probably not much).
CREATE FUNCTION [dbo].[SquareDistance]
(#Lat1 float, #Long1 float, #Lat2 float, #Long2 float)
RETURNS float
AS
BEGIN
Return ( SELECT ((#Lat1 - #Lat2) * (#Lat1 - #Lat2)) + ((#Long1 - #Long2) * (#Long1 - #Long2)))
END
Even better would be to remove the function and put the calculations in the original query.
SELECT Lat, Long FROM Table
WHERE (Lat BETWEEN 38 AND 42)
AND (Long BETWEEN 138 AND 142)
AND ((Lat - 40) * (Lat - 40)) + ((Long - 140) * (Long - 140)) < 4
There is a little bit of overhead with calling a user defined function. By removing the function, you are likely to gain a little in performance.
Also, I encourage you to check your execution plan just to make sure you are getting index seeks like you expect.
There is a lot of overhead in using a UDF.
Even coding it in-line may not be good because an index can not be used, although here the BETWEEN clauses should reduce the data that needs crunched.
To extend G Mastros' idea, separate the select bit from the square bit. It may help the optimiser.
SELECT
Lat, Long
FROM
(
SELECT
Lat, Long
FROM
Table
WHERE
(Lat BETWEEN 38 AND 42)
AND
(Long BETWEEN 138 AND 142)
) foo
WHERE
((Lat - 40) * (Lat - 40)) + ((Long - 140) * (Long - 140)) < 4
Edit: You may be able to reduce the actual calculations involved.
This next idea may reduce the number of calcs from 7 to 5
...
SELECT
Lat, Long,
Lat - 40 AS LatDiff, Long - 140 AS LongDiff
FROM
...
(LatDiff * LatDiff) + (LongDiff * LongDiff) < 4
...
Basically, try the 3 solutions offered and see what works.
The optimiser may ignore the derived table, it may use it, or it may generate an even worse plan.
Check this article that describes why UDF in SQL Server are generically speaking a bad idea. Unless you're pretty sure the table you're invoking the UDF will not grow up a lot beware that UDF functions are always called on ALL the rows in your tables and not (as one can wrongly guess) only on resultset. This can give you a big performance hit when database grow.
The very good article linked details also some ways to overcome the problem but the real fact is that the SQL Server TSQL dialect misses a way to create a scalar function or a deterministic one (like Oracle does).
Updates:
GMastros: You were absolutely right. Doing the math in the query itself is infinitely faster than the UDF. I'm using the SQUARE() function to do the multiplication, which makes it a bit more concise, but performance is the same.
However, doing it this way is still twice as slow as doing the math in .Net.
I can't really understand that, but i've come to a compromise that is useful for my particular situation (which sucks, because I need to duplicate code, but it's the best scenario, unless we can find a way to make the circle calculation in SQL be faster)
Thanks!