How to capture data values to the power of ^ (...) in SQL Server? - sql

Perhaps someone with more experience in SQL Server can be of assistance. I am in the middle of putting together the LookUp tables for a new project. For 2 different tests that a user can perform (Bacteria/Fungi) the results are currently recorded on paper as the following:
BACTERIA:
Bacteria cfu / ml
<100
10^2
10^3
10^4
10^5
10^6
10^7
FUNGI:
Fungi (yeast & mold) cfu /ml
<100
10^2
10^3
10^4
10^5
What would be the best way to capture these values in SQL Server 2008 R2? In particular, Data Type and Size?

Something like this would probably be good enough:
CREATE TABLE AmountLookup (
UnitsLimitExp int NULL,
Name nvarchar(10) NULL
)
INSERT INTO AmountLookup
SELECT 2, '<100'
UNION ALL SELECT 3, '10^3'
UNION ALL SELECT 4, '10^4'
UNION ALL SELECT 5, '10^5'
UNION ALL SELECT 6, '10^6'
UNION ALL SELECT 7, '10^7'
This way you store the exponent, not the amount. Real value is just a GUI representation. Another thing is your lookup name, which is ugly here (10^3). However, you can store HTML code and treat it as raw HTML on your user interface, e.g. 104 is 10<sup>4</sup>.

If these are value and not ranges, INT will work. If you start to deal with values greater than 2 billion, BIGINT should be used. If you need decimal digits, you can use the DECIMAL (NUMERIC) type.
Your other option, since these are discrete values, is to use a lookup table for the values, whose surrogate key id you can use in the tables that hold references to the data. That way you can represent a concept such as "<100" .

I'd propose a varchar(50):
<100 "Clean enough"
10^2 "Food over due date"
10^3 "Mr Bean's armpits"
10^4 "Three day old carcass"
10^5 "Biochemical experiment"
10^6 "Maggot invasion"
10^7 "Bacteria overflow"

Related

How to query column with letters on SQL?

I'm new to this.
I have a column: (chocolate_weight) On the table : (Chocolate) which has g at the end of every number, so 30x , 2x5g,10g etc.
I want to remove the letter at the end and then query it to show any that weigh greater than 35.
So far I have done
Select *
From Chocolate
Where chocolate_weight IN
(SELECT
REPLACE(chocolote_weight,'x','') From Chocolate) > 35
It is coming back with 0 , even though there are many that weigh more than 35.
Any help is appreciated
Thanks
If 'g' is always the suffix then your current query is along the right lines, but you don't need the IN you can do the replace in the where clause:
SELECT *
FROM Chocolate
WHERE CAST(REPLACE(chocolate_weight,'g','') AS DECIMAL(10, 2)) > 35;
N.B. This works in both the tagged DBMS SQL-Server and MySQL
This will fail (although only silently in MySQL) if you have anything that contains units other than grams though, so what I would strongly suggest is that you fix your design if it is not too late, store the weight as an numeric type and lose the 'g' completely if you only ever store in grams. If you use multiple different units then you may wish to standardise this so all are as grams, or alternatively store the two things in separate columns, one as a decimal/int for the numeric value and a separate column for the weight, e.g.
Weight
Unit
10
g
150
g
1000
lb
The issue you will have here though is that you will have start doing conversions in your queries to ensure you get all results. It is easier to do the conversion once when the data is saved and use a standard measure for all records.

SQL Server floating point numbers

I was wondering if there is a way to show the values of columns of type floating point numbers in two decimal places in SQL Server 2008 via settings? For instance, let say I have a table called orders with several columns. I want to be able to do the following:
SELECT * FROM orders
I expect to see any values in columns of type float to display with decimal notation; for instance, a value of 4 should display as 4.0 or 4.00.
Thanks
You may use CONVERT function with NUMERIC( x , 2) for numeric values
( where x is at least 3, better more, upto 38 )
SELECT CONVERT(NUMERIC(10, 2), 4 ) as "Dcm Nr(2)";
Dcm Nr(2)
---------
4,00
SELECT CONVERT(NUMERIC(10, 1), 4 ) as "Dcm Nr(1)";
Dcm Nr(1)
---------
4,0
The simplest form of what happens to me is making a "cast", for example:
SELECT CAST(orders AS DECIMAL(10,2)) FROM [your table];
The short answer to your question is "No".
SQL Server isn't really in the business of data presentation. We all do a lot of backbends from time to time to force things into a presentable state, and the other answers provided so far can help you on a column by column basis.
But the sort of "set it and forget it" thing you're looking for is better handled in a front end application.

SQL Query: select only numbers from a list of string values

I need to create a query in an sqlite database using DB Browser. The columns are: state, measure_id, measure_name, score. All of the data has been input as strings. I can cast the score strings as decimal, however the problem is that some of the values for the score column are numeric and some are actual string values (such as "high" etc). I need to ignore the REAL string values in my output. Also, I need to calculate the standard deviation (as well as min,max,avg) for each measure_id.
How can I ignore the real string values and calculate the standard deviation?
Here is some sample data:
sample 1: AL, ID1, Ident, 52
sample 2: TX, ID2, Foo, High
sample 3: MI, ID3, Bar, 21
(I want to select only sample 1 and 3, and then cast the strings as int and calculate stdev)
If the values are never 0, you can do:
select avg(cast(value as decimal))
from t
where cast(value as decimal) > 0;
The standard deviation is a bit trickier to calculate. You can use the defining formula, but SQLite doesn't even have a square root function.
You might want to move to another database, such as Postgres or MySQL, if you want to support these types of operations.

SQL Server Floating Point in WHERE clause

I'm trying to query a database, I need to get a list of customers where their weight is equal to 60.5. The problem is that 60.5 is a real I've never query a database with a real in a where clause before.
I've tried this:
SELECT Name FROM Customers WHERE Weight=60.5
SELECT Name FROM Customers WHERE Weight=cast(60.5 as real)
SELECT Name FROM Customers WHERE Weight=cast(60.5 as decimal)
SELECT Name FROM Customers WHERE Weight=convert(real,'60.5')
SELECT Name FROM Customers WHERE Weight=convert(decimal,'60.5')
These queries return 0 values but in the Customers table their are 10 rows with Weight=60.5
Your problem is that floating point numbers are inaccurate by definition. Comparing what seems to be 60.5 to a literal 60.5 might not work as you've noticed.
A typical solution is to measure the difference between 2 values, and if it's smaller then some predefined epsilon, consider them equal:
SELECT Name FROM Customers WHERE ABS(Weight-60.5) < 0.001
For better performance, you should actually use:
SELECT Name FROM Customers WHERE Weight BETWEEN 64.999 AND 65.001
If you need equality comparison, you should change the type of the column to DECIMAL. Decimal numbers are stored and compared exactly, while real and float numbers are approximations.
#Amit's answer will work, but it will perform quite poorly in comparison to my approach. ABS(Weight-60.5) < 0.001 is unable to use index seeks. But if you convert the column to DECIMAL, then Weight=60.5 will perform well and use index seeks.

Biased random in SQL?

I have some entries in my database, in my case Videos with a rating and popularity and other factors. Of all these factors I calculate a likelihood factor or more to say a boost factor.
So I essentially have the fields ID and BOOST.The boost is calculated in a way that it turns out as an integer that represents the percentage of how often this entry should be hit in in comparison.
ID Boost
1 1
2 2
3 7
So if I run my random function indefinitely I should end up with X hits on ID 1, twice as much on ID 2 and 7 times as much on ID 3.
So every hit should be random but with a probability of (boost / sum of boosts). So the probability for ID 3 in this example should be 0.7 (because the sum is 10. I choose those values for simplicity).
I thought about something like the following query:
SELECT id FROM table WHERE CEIL(RAND() * MAX(boost)) >= boost ORDER BY rand();
Unfortunately that doesn't work, after considering the following entries in the table:
ID Boost
1 1
2 2
It will, with a 50/50 chance, have only the 2nd or both elements to choose from randomly.
So 0.5 hit goes to the second element
And 0.5 hit goes to the (second and first) element which is chosen from randomly so so 0.25 each.
So we end up with a 0.25/0.75 ratio, but it should be 0.33/0.66
I need some modification or new a method to do this with good performance.
I also thought about storing the boost field cumulatively so I just do a range query from (0-sum()), but then I would have to re-index everything coming after one item if I change it or develop some swapping algorithm or something... but that's really not elegant and stuff.
Both inserting/updating and selecting should be fast!
Do you have any solutions to this problem?
The best use case to think of is probably advertisement delivery. "Please choose a random ad with given probability"... however i need it for another purpose but just to give you a last picture what it should do.
edit:
Thanks to kens answer i thought about the following approach:
calculate a random value from 0-sum(distinct boost)
SET #randval = (select ceil(rand() * sum(DISTINCT boost)) from test);
select the boost factor from all distinct boost factors which added up surpasses the random value
then we have in our 1st example 1 with a 0.1, 2 with a 0.2 and 7 with a 0.7 probability.
now select one random entry from all entries having this boost factor
PROBLEM: because the count of entries having one boost is always different. For example if there is only 1-boosted entry i get it in 1 of 10 calls, but if there are 1 million with 7, each of them is hardly ever returned...
so this doesnt work out :( trying to refine it.
I have to somehow include the count of entries with this boost factor ... but i am somehow stuck on that...
You need to generate a random number per row and weight it.
In this case, RAND(CHECKSUM(NEWID())) gets around the "per query" evaluation of RAND. Then simply multiply it by boost and ORDER BY the result DESC. The SUM..OVER gives you the total boost
DECLARE #sample TABLE (id int, boost int)
INSERT #sample VALUES (1, 1), (2, 2), (3, 7)
SELECT
RAND(CHECKSUM(NEWID())) * boost AS weighted,
SUM(boost) OVER () AS boostcount,
id
FROM
#sample
GROUP BY
id, boost
ORDER BY
weighted DESC
If you have wildly different boost values (which I think you mentioned), I'd also consider using LOG (which is base e) to smooth the distribution.
Finally, ORDER BY NEWID() is a randomness that would take no account of boost. It's useful to seed RAND but not by itself.
This sample was put together on SQL Server 2008, BTW
I dare to suggest straightforward solution with two queries, using cumulative boost calculation.
First, select sum of boosts, and generate some number between 0 and boost sum:
select ceil(rand() * sum(boost)) from table;
This value should be stored as a variable, let's call it {random_number}
Then, select table rows, calculating cumulative sum of boosts, and find the first row, which has cumulative boost greater than {random number}:
SET #cumulative_boost=0;
SELECT
id,
#cumulative_boost:=(#cumulative_boost + boost) AS cumulative_boost,
FROM
table
WHERE
cumulative_boost >= {random_number}
ORDER BY id
LIMIT 1;
My problem was similar: Every person had a calculated number of tickets in the final draw. If you had more tickets then you would have an higher chance to win "the lottery".
Since I didn't trust any of the found results rand() * multiplier or the one with -log(rand()) on the web I wanted to implement my own straightforward solution.
What I did and in your case would look a little bit like this:
(SELECT id, boost FROM foo) AS values
INNER JOIN (
SELECT id % 100 + 1 AS counter
FROM user
GROUP BY counter) AS numbers ON numbers.counter <= values.boost
ORDER BY RAND()
Since I don't have to run it often I don't really care about future performance and at the moment it was fast for me.
Before I used this query I checked two things:
The maximum number of boost is less than the maximum returned in the number query
That the inner query returns ALL numbers between 1..100. It might not depending on your table!
Since I have all distinct numbers between 1..100 then joining on numbers.counter <= values.boost would mean that if a row has a boost of 2 it would end up duplicated in the final result. If a row has a boost of 100 it would end up in the final set 100 times. Or in another words. If sum of boosts is 4212 which it was in my case you would have 4212 rows in the final set.
Finally I let MySql sort it randomly.
Edit: For the inner query to work properly make sure to use a large table, or make sure that the id's don't skip any numbers. Better yet and probably a bit faster you might even create a temporary table which would simply have all numbers between 1..n. Then you could simply use INNER JOIN numbers ON numbers.id <= values.boost