SQL rounding and truncation, needs a thorough explanation - sql

I'm a novice when it comes to SQL and PHP, and I'm trying to round an average price result and take off the extra zeroes that currently appear.
Currently my result turns up as: $3.005000
My code currently reads as follows:
$result = mysql_query("SELECT AVG(price) FROM milk WHERE price > 0 ");
$row = mysql_fetch_row ($result);
I have found several examples of SQL rounding and truncation but unfortunately the tutorials I've seen provide me with no useful information on where or how I am supposed to implement these changes.
This leaves me making guesses on where to make changes -- none of which have worked out so far (obviously).
If someone could provide me with an example of how to round and truncate my results, which includes where exactly I need to make these changes in my current configuration, that would be most helpful and I would be very thankful! I'm really sorry if my n00bishness makes it more difficult to explain the solution.
Thanks!

Formatting of the data should be done in the script making the query, not in the query itself. For example, in PHP you can write the following using sprintf:
$formatted_price = sprintf("%01.2f", $unformatted_price);
(Example complements of the PHP manual).
Also, generally, price values are stored as decimal types or scaled integers, not floating-point types, since floating-point values are not exact.

MySQL has a ROUND() function.
So just round your average in your SQL query:
$result = mysql_query("SELECT ROUND(AVG(price),2) FROM milk WHERE price > 0 ");
If you end up with formatting issues, you can use PHP's number_format() function during output.

Related

Large scale decimal computation in SQL

I've got stuck on some mathematical action that I perform in SQL Server 2016 Enterprise.
I need to calculate this expression:
(4.384 / 4.2989 * 100) * 98.8251017928029 / 100
In SQL I get the result 100.78141988869389772850690000
But when I calculate this expression in MS Excel, I get: 100.7814199585120000
Since these results is a Consumer Price index, the numbers after the decimal point do matter.
So, my question is, which result is correct? SQL Server or Excel.
PS. I have updated my question.
Here is dbfiddle
Thank you.
I found same problem here: Wrong calculation in SQL-Server
I've also tried the calculation with java, and it shows the result of the excel is more accurate.
You should use decimal in calculation using sql.
Actualy, creating DbFiddle I found errors in cast. So when errors was fixed the result now the are almost the same as in excel. There is small difference in results but the difference is behind my needs of scale

Using % with numbers in SQL Server Mgmt Studio

I've come across a bit of code that is used to validate a number inputted.
It uses a percentage sign but is nothing to do with any LIKE or varchar functions - it is doing some sort of calculation but I cannot figure it out.
Essentially it looks like this: 1 % 11
If the second number is bigger than the first it will always bring back the first, but if the second is less than the first it brings back strange results.
Does anyone know what this function is doing?
It is modulo operator (division remainder). See MSDN for details.

Performance of SQL functions vs. code functions

We're currently investigating the load against our SQL server and looking at ways to alleviate it. During my post-secondary education, I was always told that, from a performance standpoint, it was cheaper to make SQL Server do the work. But is this true?
Here's an example:
SELECT ord_no FROM oelinhst_sql
This returns 783119 records in 14 seconds. The field is a char(8), but all of our order numbers are six-digits long so each has two blank characters leading. We typically trim this field, so I ran the following test:
SELECT LTRIM(ord_no) FROM oelinhst_sql
This returned the 783119 records in 13 seconds. I also tried one more test:
SELECT LTRIM(RTRIM(ord_no)) FROM oelinhst_sql
There is nothing to trim on the right, but I was trying to see if there was any overhead in the mere act of calling the function, but it still returned in 13 seconds.
My manager was talking about moving things like string trimming out of the SQL and into the source code, but the test results suggest otherwise. My manager also says he heard somewhere that using SQL functions meant that indexes would not be used. Is there any truth to this either?
Only optimize code that you have proven to be the slowest part of your system. Your data so far indicates that SQL string manipulation functions are not effecting performance at all. take this data to your manager.
If you use a function or type cast in the WHERE clause it can often prevent the SQL server from using indexes. This does not apply to transforming returned columns with functions.
It's typically user defined functions (UDFs) that get a bad rap with regards to SQL performance and might be the source of the advice you're getting.
The reason for this is you can build some pretty hairy functions that cause massive overhead with exponential effect.
As you've found with rtrim and ltrim this isn't a blanket reason to stop using all functions on the sql side.
It somewhat depends on what all is encompassed by: "things like string trimming", but, for string trimming at least, I'd definitely let the database do that (there will be less network traffic as well). As for the indexes, they will still be used if you're where clause is just using the column itself (as opposed to a function of the column). Use of the indexes won't be affected whatsoever by using functions on the actual columns you're retrieving (just on how you're selecting the rows).
You may want to have a look at this for performance improvement suggestions: http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
As I said in my comment, reduce the data read per query and you will get a speed increase.
You said:
our order numbers are six-digits long
so each has two blank characters
leading
Makes me think you are storing numbers in a string, if so why are you not using a numeric data type? The smallest numeric type which will take 6 digits is an INT (I'm assuming SQL Server) and that already saves you 4 bytes per order number, over the number of rows you mention that's quite a lot less data to read off disk and send over the network.
Fully optimise your database before looking to deal with the data outside of it; it's what a database server is designed to do, serve data.
As you found it often pays to measure but I what I think your manager may have been referring to is somthing like this.
This is typically much faster
SELECT SomeFields FROM oelinhst_sql
WHERE
datetimeField > '1/1/2011'
and
datetimeField < '2/1/2011'
than this
SELECT SomeFields FROM oelinhst_sql
WHERE
Month(datetimeField) = 1
and
year(datetimeField) = 2011
even though the rows that are returned are the same

Finding strings that differ with at most one letter from a given string in SAS with PROC SQL

First some context. I am using proc sql in SAS, and need to fetch all the entries in a data set (with a couple of million entries) that have variable "Name" equal to (let's say) "Massachusetts". Of course, since the data was once manually entered by humans, close to all conceivable spelling errors occur ("Amssachusetts", "Kassachusetts" etc.).
I have found that few entries get more than two characters wrong, so the code
Name like "__ssachusetts" OR Name like "_a_sachusetts" OR ... OR Name like "Massachuset__"
would select the entries I am looking for. However, I am hoping that there must be a more convenient way to write
Name that differs by at most 2 characters from "Massachusetts";
Is there? Or is there some other strategy for fetching these entries? I tried searching both stackoverflow and the web but was unsuccesful. I am also a relative beginner with both SQL and SAS.
Some additional information: The database is not in English (and the actual string is not "Massachusetts") so using SOUNDEX is not really feasible (if it ever were).
Thanks in advance.
(Edit: Improved the title)
SAS has built-in functions COMPGED and COMPLEV to compute distances between strings. Here is an example that shows how to select just those with a Levenshtein edit distance of less than or equal to 2.
data typo;
input name $20.;
datalines;
massachusetts
masachusets
mssachusetts
nassachusets
nassachussets
massachusett
;
proc sql;
select name from typo
where complev(name, "massachusetts") <= 2;
quit;
There are other phonetic algorithms like Hamming distance that should work better.
You can search on google for implementation of this algorithm for your specific DB engine.
What you are looking for is "Approximate string matching". For that one can use "Levenshtein distance computing algorithm". I am not sure, but hope that this answer will help
You could implement a stored function of this type (Oracle syntax, transform to your RDBMS):
CREATE FUNCTION distance(one VARCHAR2, two VARCHAR2) RETURN NUMBER IS
DETERMINISTIC
BEGIN
-- do some comparison here
END distance;
And then use it in SQL:
SELECT * FROM table WHERE distance(name, 'Massachusetts') <= 2
Of course, these things tend to be quite slow...
I know this is four years too late but since it might also give ideas to others who are searching this thread:
What you're considering is a semantic layered design you would need to implement some conditional logic for these different text comparisons, using Lenvenschtien distances like the Jaro-Winkler for comparing text of differing lengths and Hamming for those of the same length for which you suppose simple text trans-positioning. This is nothing new these days with all of the various text mining programs out there.
Here is a post which is very good in my view;
Jaro-Winkler string comparison function in SAS

T-SQL Old style joins *= and =*

We have about 150 old style queries and views that use the *= and =* type ANSI92? join.
Does anybody know of a tool / method or script that could help with the conversion or do we have to just slog through all 150 of them.
Thanks
Select PapersSent,
DateSent,
Code,
ActionDate,
ClientAction,
ClientContactRef,
PublishAppraisal,
PublishCV,
SponsorContactREF,
MeetingNotes,
InternalNotes,
Contact_AdminAction,
MeetingLocation
from tblMeetingNotes a,
tblPapersOptions b,
tblContactLog c
where a.CREF=#CREF and
a.CLID=#CLID AND
Isnull(PapersSent,0)*=Value AND
a.CREF*=c.CREF AND
a.CLID*=c.Contact_ID
This probably isn't what you were hoping to hear, but this type of tool doesn't exist. There are situations where an old style JOIN won't cleanly convert to the SQL-92 style, causing the query to give different results, or even requiring the query to be re-written.
Even if there were a tool to automatically convert the joins, you would still need to test every query to make sure that it converted how you wanted it to, creating probably just as much work as it would have been to do it by hand.
Erland Sommarskog has a good step-by-step process on how you would quickly convert the old style joins to SQL-92: http://www.sommarskog.se/Become-an-ANSI-star.doc
Before you convert, I would definitely see about setting up some kind of testing framework so you can compare the results.
This will be easiest if all these are views, or if you can get the output into tables.
At that point you can use things like EXCEPT to ensure that all rows match.
In the past, I've code generated table comparisons using stored procs which take the tables/views and generate the comparisons. Even including numeric/percentage thresholds for amount differences where one set has had awkward rounding problems - like banker's rounding (Teradata) or IEEE floating point-based rounding (WebFocus).
You could script the database and use search and replace to change the bulk of them and manually inspect the more difficult cases. Be sure to test all the queries thoroughly in case the output has changed, as mfredrickson pointed out.
To help with the search, although not strictly necessary if you script the database, download Redgate's SQL Search (it's free) to help you find all the instances. Even if you don't use it for this task it's handy to have.