How to limit the character amount you are selecting from - sql

I'm trying to average a field and it's very simple to do but there are some problems with some values. There are values I know are way too big and I was hoping to exclude them by the number of characters (I would probably put 4 characters max).
I'm unfamiliar with a sql clause that could execute this. If there is one that would be great.
select avg(convert(float,duration)) as averageduration
from AsteriskCalls where ISNUMERIC(duration) = 1
I expect the output to be around 500-1000 but it comes up as an 8 digit number.

That is easy enough:
select avg(convert(float,duration)) as averageduration
from AsteriskCalls
where ISNUMERIC(duration) = 1 and length(duration) <= 4;
This will not necessarily work, of course, because you could have '1E30', which would be a pretty big number. And it would miss '0.001', which is a pretty small number.
The more accurate method uses try_convert():
select avg(try_convert(float, duration)) as averageduration
from AsteriskCalls
where try_convert(float, duration) <= 1000.0
And that should probably really be:
where abs(try_convert(float, duration)) <= 1000.0

Related

int64 overflow in sampling n number of rows (not %)

The below script is to randomly sample an approximate number of rows (50k).
SELECT *
FROM table
qualify rand() <= 50000 / count(*) over()
This has worked a handful of times before, hence, I was shocked to find this error this morning:
int64 overflow: 8475548256593033885 + 6301395400903259047
I have read this post. But as I am not summing, I don't think it is applicable.
The table in question has 267,606,559 rows.
Looking forward to any ideas. Thank you.
I believe counting is actually a sum the way BQ (and other databases) compute counts. You can see this by viewing the Execution Details/Graph (in the BQ UI). This is true even on a simple select count(*) from table query.
For your problem, consider something simpler like:
select *, rand() as my_rand
from table
order by my_rand
limit 50000
Also, if you know the rough size of your data or don't need exactly 50K, consider using the tablesample method:
select * from table
tablesample system (10 percent)

How to query column with letters on SQL?

I'm new to this.
I have a column: (chocolate_weight) On the table : (Chocolate) which has g at the end of every number, so 30x , 2x5g,10g etc.
I want to remove the letter at the end and then query it to show any that weigh greater than 35.
So far I have done
Select *
From Chocolate
Where chocolate_weight IN
(SELECT
REPLACE(chocolote_weight,'x','') From Chocolate) > 35
It is coming back with 0 , even though there are many that weigh more than 35.
Any help is appreciated
Thanks
If 'g' is always the suffix then your current query is along the right lines, but you don't need the IN you can do the replace in the where clause:
SELECT *
FROM Chocolate
WHERE CAST(REPLACE(chocolate_weight,'g','') AS DECIMAL(10, 2)) > 35;
N.B. This works in both the tagged DBMS SQL-Server and MySQL
This will fail (although only silently in MySQL) if you have anything that contains units other than grams though, so what I would strongly suggest is that you fix your design if it is not too late, store the weight as an numeric type and lose the 'g' completely if you only ever store in grams. If you use multiple different units then you may wish to standardise this so all are as grams, or alternatively store the two things in separate columns, one as a decimal/int for the numeric value and a separate column for the weight, e.g.
Weight
Unit
10
g
150
g
1000
lb
The issue you will have here though is that you will have start doing conversions in your queries to ensure you get all results. It is easier to do the conversion once when the data is saved and use a standard measure for all records.

Converting pounds to kilos in SQL

I am trying to pull data from a table with the filter weight < 25 kgs , but my table has weight in pounds, I tried using below sql can some one please tell me is this the right way to do it or is there any other way .
select * from dbo.abc
where (round((WEIGHT * 0.453592 ),0) < 25)
Your solution would work, but it's not sargaeble. A better solution would be to convert your 25kgs to lbs. That way, if you have an index on your WEIGHT column, the query analyzer could make use of it.
One additional note: Why round to 0 decimal places? You'll lose accuracy that way. Unless you have some requirement to do so, I'd drop the rounding. It's unnecessary overhead.
As other people mentioned, you don't want to convert weight as it will cause SQL Server not to use your index. So try this instead:
SELECT *
FROM dbo.acb
WHERE WEIGHT < ROUND(25/.453592,4)

SQL math syntax difficulty with division

Can't understand why I can't get the correct answer. I'm trying to calculate a net margin percentage but the divide portion is being ignored. Hopefully really simple one?
SUM(
(dbo.K3_TradeTeam_Sales2.TotalSales - dbo.K3_TradeTeam_SalesReturn3.TotalCredits)
ISNULL(dbo.K3_TradeTeam_Purch1.TotalPurchases, 0) /
dbo.K3_TradeTeam_Sales2.TotalSales
) AS NetMargin
To get a net margin you probably want an expression like this,
(
SUM(
dbo.K3_TradeTeam_Sales2.TotalSales
-
COALESCE(dbo.K3_TradeTeam_SalesReturn3.TotalCredits, 0.0)
-
COALESCE(dbo.K3_TradeTeam_Purch1.TotalPurchases, 0.0)
)
/
SUM(
dbo.K3_TradeTeam_Sales2.TotalSales
)
) AS NetMargin
However, without knowing your schema I couldn't say for sure. This would also fail miserable when you have 0.0 total sales.
I'm assuming that your want the sum of net profits for each realtion divided by the sum of the total revenue for each relation. This would give you the cumulative profit margin for all relationships.
The sql logic should work.
Are you sure that TotalPurchases is not null? Because this makes it look like the devide is ignored, but the devide just returns 0. So I'm doing minus zero!
examples:
select SUM((200 - 100) - ISNULL(100, 0) / 10) AS NetMargin --returns 90
select SUM((200 - 100) - ISNULL(null, 0) / 10) AS NetMargin --return 100
When I am having trouble getting a math formula to work, I change the select temporarily to simply see the values of each of the fields as well as the calculation. Then I can do the calualtion manually to see how I am getting the results I am getting which alwmost always points me to the problem.
select
dbo.K3_TradeTeam_Sales2.TotalSales,
dbo.K3_TradeTeam_SalesReturn3.TotalCredits,
ISNULL(dbo.K3_TradeTeam_Purch1.TotalPurchases, 0),
dbo.K3_TradeTeam_Sales2.TotalSales
From...
I also never do a division without a case statement to handle the divisor being 0 problem. Even if I think it can never happen, I have seen it burn people too often not to consider that something (including bad data entry) might make it happen in the future.

Why use the BETWEEN operator when we can do without it?

As seen below the two queries, we find that they both work well. Then I am confused why should we ever use BETWEEN because I have found that BETWEEN behaves differently in different databases as found in w3school
SELECT *
FROM employees
WHERE salary BETWEEN 5000 AND 15000;
SELECT *
FROM employees
WHERE salary >= 5000
AND salary <= 15000;
BETWEEN can help to avoid unnecessary reevaluation of the expression:
SELECT AVG(RAND(20091225) BETWEEN 0.2 AND 0.4)
FROM t_source;
---
0.1998
SELECT AVG(RAND(20091225) >= 0.2 AND RAND(20091225) <= 0.4)
FROM t_source;
---
0.3199
t_source is just a dummy table with 1,000,000 records.
Of course this can be worked around using a subquery, but in MySQL it's less efficient.
And of course, BETWEEN is more readable. It takes 3 times to use it in a query to remember the syntax forever.
In SQL Server and MySQL, LIKE against a constant with non-leading '%' is also a shorthand for a pair of >= and <:
SET SHOWPLAN_TEXT ON
GO
SELECT *
FROM master
WHERE name LIKE 'string%'
GO
SET SHOWPLAN_TEXT OFF
GO
|--Index Seek(OBJECT:([test].[dbo].[master].[ix_name_desc]), SEEK:([test].[dbo].[master].[name] < 'strinH' AND [test].[dbo].[master].[name] >= 'string'), WHERE:([test].[dbo].[master].[name] like 'string%') ORDERED FORWARD)
However, LIKE syntax is more legible.
Using BETWEEN has extra merits when the expression that is compared is a complex calculation rather than just a simple column; it saves writing out that complex expression twice.
BETWEEN in T-SQL supports NOT operator, so you can use constructions like
WHERE salary not between 5000 AND 15000;
In my opinion it's more clear for a human then
WHERE salary < 5000 OR salary > 15000;
And finally if you type column name just one time it gives you less chances to make a mistake
The version with "between" is easier to read. If I were to use the second version I'd probably write it as
5000 <= salary and salary <= 15000
for the same reason.
I vote #Quassnoi - correctness is a big win.
I usually find literals more useful than the syntax symbols like <, <=, >, >=, != etc. Yes, we need (better, accurate) results. And at least I get rid of probabilities of mis-interpreting and reverting meanings of the symbols visually. If you use <= and sense logically incorrect output coming from your select query, you may wander some time and only arrive to the conclusion that you did write <= in place of >= [visual mis-interpretation?]. Hope I am clear.
And aren't we shortening the code (along with making it more higher-level-looking), which means more concise and easy to maintain?
SELECT *
FROM emplyees
WHERE salary between 5000 AND 15000;
SELECT *
FROM emplyees
WHERE salary >= 5000 AND salary <= 15000;
First query uses only 10 words and second uses 12!
Personally, I wouldn't use BETWEEN, simply because there seems no clear definition of whether it should include, or exclude, the values which serve to bound the condition, in your given example:
SELECT *
FROM emplyees
WHERE salary between 5000 AND 15000;
The range could include the 5000 and 15000, or it could exclude them.
Syntactically I think it should exclude them, since the values themselves are not between the given numbers. But my opinion is precisely that, whereas using operators such as >= is very specific. And less likely to change between databases, or between incremements/versions of the same.
Edited in response to Pavel and Jonathan's comments.
As noted by Pavel, ANSI SQL (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) as far back as 1992, mandates the end-points should be considered within the returned date and equivalent to X >= lower_bound AND X <= upper_bound:
8.3
Function
Specify a range comparison.
Format
<between predicate> ::=
<row value constructor> [ NOT ] BETWEEN
<row value constructor> AND <row value constructor>
Syntax Rules
1) The three <row value constructor>s shall be of the same degree.
2) Let respective values be values with the same ordinal position
in the two <row value constructor>s.
3) The data types of the respective values of the three <row value
constructor>s shall be comparable.
4) Let X, Y, and Z be the first, second, and third <row value con-
structor>s, respectively.
5) "X NOT BETWEEN Y AND Z" is equivalent to "NOT ( X BETWEEN Y AND
Z )".
6) "X BETWEEN Y AND Z" is equivalent to "X>=Y AND X<=Z".
If the endpoints are inclusive, then BETWEEN is the preferred syntax.
Less references to a column means less spots to update when things change. It's the engineering principle, that less things means less stuff can break.
It also means less possibility of someone putting the wrong bracket for things like including an OR. IE:
WHERE salary BETWEEN 5000 AND (15000
OR ...)
...you'll get an error if you put the bracket around the AND part of a BETWEEN statement. Versus:
WHERE salary >= 5000
AND (salary <= 15000
OR ...)
...you'd only know there's a problem when someone reviews the data returned from the query.
Semantically, the two expressions have the same result.
However, BETWEEN is a single predicate, instead of two comparison predicates combined with AND. Depending on the optimizer provided by your RDBMS, a single predicate may be easier to optimize than two predicates.
Although I expect most modern RDBMS implementations should optimize the two expressions identically.
worse if it's
SELECT id FROM entries
WHERE
(SELECT COUNT(id) FROM anothertable WHERE something LEFT JOIN something ON...)
BETWEEN entries.max AND entries.min;
Rewrite this one with your syntax without using temporary storage.
I'd better use the 2nd one, as you always know if it's <= or <
In SQL, I agree that BETWEEN is mostly unnecessary, and can be emulated syntactically with 5000 <= salary AND salary <= 15000. It is also limited; I often want to apply an inclusive lower bound and an exclusive upper bound: #start <= when AND when < #end, which you can't do with BETWEEN.
OTOH, BETWEEN is convenient if the value being tested is the result of a complex expression.
It would be nice if SQL and other languages would follows Python's lead in using proper mathematical notation: 5000 <= salary <= 15000.
One small tip that wil make your code more readable: use < and <= in preference to > and >=.