SQL Server LIKE for Integer - sql

I have a column which is integer and I want to search all numbers starting with 1500.
I know I could use something like left(accountid, 4)= 1500.
But is that an optimum solution or is there a better approach?
I am using SQL Server 2005.

LEFT(CONVERT(varchar, accountid),4)="1500"

An INT is an INT is an INT - it's just a numeric value. An INT doesn't "look" like some string value..... if you want to compare with LIKE, you need a string - you need to convert your INT to a string to be able to do that.
If you need to search and compare against that string representation of your INT a lot, I would recommend making it a computed, persisted column on your table:
ALTER TABLE dbo.YourTable
ADD IntString AS LEFT(CAST(YourInt AS VARCHAR(20)), 4) PERSISTED
That way, you get a new column that has a value inside it, that value is always up to date, it's a persisted column, you can index it if needed - you get all the benefits of comparing your "int" with the LIKE operator :-)

If you're willing to trade storage space for performance speed, I would add a persisted, computed column to the table containing the string version of the integer, index appropriately, and use that column for searching.

Assuming you know the minimum and maximum values for your column, you can do this:
select
stuff
from
table
where
(number = 1500)
or (number >= 15000 and number <= 15009)
or (number >= 150000 and number <= 150099)
or (number >= 1500000 and number <= 1500999)
or (number >= 15000000 and number <= 15009999)
-- add however many statements you need.
Or if you don't know the minimums and maximums, you could do this:
...
where
(number - 1500*POWER(10, FLOOR(LOG10(number) - 3)))
< (POWER(10, FLOOR(LOG10(number) - 3)))

If you want to go even further, why not create like a super category for accounts and that would eliminate the need to perform LEFT() on a converted varchar from an integer. To clarify, if you know all accounts beginning with id 1500 are say, accounts related to sales, you could have a super category of 1500 for all of them. This certainly only applies if this hierarchical relationship exists =).

It all depends on the volume of data. If you are talking about millions of records then using left with convert will not be quick. The best possible option would be have a computed column which stores the first four digits and you doing a search from it directly would be the fastest. But every insert or update would take a little more time. So it all depends on how many rows you are dealing with.
HTH

select columnname from dbo.TB
where CONVERT(varchar(20), columnname ) like '15'+'%'

Related

How to do some aggregate calculation through several columns one by one?

I am new to SQL and needed some help. I have a table that has some numeric values and I need to populate a column (all values null) with calculations from other columns in the table.
For example. I have some values and a total value. I need another column to calculate the percentage between those two.
I have many columns that need populating from calculation based on different columns. For eg. A column name "risk1" will help me calculate and populate another column called "1per". My code looks something like this:
UPDATE DPRA2_Export
SET "1Per" = ((CAST(Risk1 AS DECIMAL (38,2))/CAST(GrandTotal AS DECIMAL(38,2))) * 100);
UPDATE DPRA2_Export
SET "2Per" = ((CAST(Risk2 AS DECIMAL (38,2))/CAST(GrandTotal AS DECIMAL(38,2))) * 100);
UPDATE DPRA2_Export
SET "3Per" = ((CAST(Risk3 AS DECIMAL (38,2))/CAST(GrandTotal AS DECIMAL(38,2))) * 100);
.................
It goes on like this.
Is there a way I can for-loop this thing instead of writing over and over again. The only thing that changes in the code is the column name "Risk%" and the SET column name "%Per"
Any ideas?
First, having columns with sequential names -- , , . . . -- is suspicious. In general, this format is not a good fit for relational databases. Instead, the data should be stored with one row per risk and an identifier for the risk.
Such tables are often useful for output purposes, but not for storing data.
Second, the data types for the division are probably entirely unnecessary. If you are using a database that does integer division and your values are integers, you can convert using a simpler method such as '*1.0'.
Finally, you can issue a single update:
UPDATE DPRA2_Export
SET "1Per" = Risk1*1.0 / GrandTotal,
"2Per" = Risk2*1.0 / GrandTotal,
"3Per" = Risk3*1.0 / GrandTotal ;
You can construct the logic by querying for the columns, using the information_schema tables.

String ending in range of numbers

I have a column with data of the following structure:
aaa5644988
aaa4898494
aaa5642185
aaa5482312
aaa4648848
I have a range that can be anything, like 100-30000 or example. I want to have all values that end in numbers between that range.
I tried
like '%[100-30000]'
but this doesn't work apparently.
I have seen a lot of similar questions but none of the solved my problem
edit I'm using SQL server 2008
Example:
Value
aaa45645695
aaa28568720
aaa65818450
8789212
6566700
For the range 600-1200, I want to retrieve row 1,2,5 because they end with the range.
In SQL, like normally only support % and _ these two operators. That's why like '%[100-30000]' doesn't work.
Depend on your use case, there could be two solutions for this problem:
If you only need to do this query two or three times(didn't care how long it takes), or the dataset is not very big. You can select all the data from this column, and then do the filtering in another programming language.
Take ruby for example, you can do:
column_data = #connection.execute("select * from your_column_name")
result = column_data.map{|x| x.gsub(/^.*[^\d]/, '').to_i }.select{|x| x > 100 && x < 30000}
If you need to do this query regularly, I'd suggest you add a new column to this data table with only the numbers in the current column, so as to get a much better performance in querying speed.
SELECT *
FROM your_table
WHERE number_column BETWEEN 100 AND 30000

Order by a field containing Numbers and Letters

I need to extract data from an existing Padadox database under Delphi XE2 (yes, i more than 10 years divide them...).
i need to order the result depending on a field (id in the example) containing values such as : '1', '2 a', '100', '1 b', '50 bis'... and get this :
- 1
- 1 b
- 2 a
- 50 bis
- 100
maybe something like that could do it, but those keywords don't exist :
SELECT id, TRIM(TRIM(ALPHA FROM id)) as generated, TRIM(TRIM(NUMBER FROM id)) as generatedbis, etc
FROM "my.db"
WHERE ...
ORDER BY generated, generatedbis
how could i achieve such ordering with paradox... ?
Try this:
SELECT id, CAST('0' + id AS INTEGER) A
FROM "my.db"
ORDER BY A, id
These ideas spring to mind:
create a sort function in delphi that does the sort client-side, using a comparison/mapping function that rearranges the string into something that is compariable, maybe lexographically.
add a column to the table whose data you wish to sort, that contains a modification of the values that can be compared with a standard string comparison and thus will work with ORDER BY
add a stored function to paradox that does the modification of the values, and use this function in the ORDER BY clause.
by modification, I mean something like, separate the string into components, and re-join them with each component right-padded with enough spaces so that all of the components are in the same position in the string. This will only work reliably if you can say with confidence that for each of the components, no value will exceed a certain length in the database.
I am making these suggestions little/no knowledge of paradox or delphi, so you will have to take my suggestions with a grain of salt.

INT vs VARCHAR in search

Which one of the following queries will be faster and more optimal (and why):
SELECT * FROM items WHERE w = 320 AND h = 200 (w and h are INT)
SELECT * FROM items WHERE dimensions = '320x200'(dimensions is VARCHAR)
Here are some actual measurements. (Using SQLite; may try it with MySQL later.)
Data = All 1,000,000 combinations of w, h ∈ {1...1000}, in randomized order.
CREATE TABLE items (id INTEGER PRIMARY KEY, w INTEGER, h INTEGER)
Average time (of 20 runs) to execute SELECT * FROM items WHERE w = 320 and h = 200 was 5.39±0.29 µs.
CREATE TABLE items (id INTEGER PRIMARY KEY, dimensions TEXT)
Average time to execute SELECT * FROM items WHERE dimensions = '320x200' was 5.69±0.23 µs.
There is no significant difference, efficiency-wise.
But
There is a huge difference in terms of usability. For example, if you want to calculate the area and perimeter of the rectangles, the two-column approach is easy:
SELECT w * h, 2 * (w + h) FROM items
Try to write the corresponding query for the other way.
Intuitively, if you do not create INDEXes on those columns, integer comparison seems faster.
In integer comparison, you compare directly 32-bit values equality with logical operators.
On the other hand, strings are character arrays, it will be difficult to compare them. Character-by-character.
However, another point is that, in 2nd query you have 1 field to compare, in 1st query you have 2 fields. If you have 1,000,000 records and no indexes on columns, that means you may have 1,000,000 string comparisons on worst case (unluckily last result is the thing you've looking for or not found at all)
On the other hand you have 1,000,000 records and all are w=320, then you'll be comparing them for h,too. That means 2,000,000 comparisons. However you create INDEXes on those fields IMHO they will be almost identical since VARCHAR will be hashed (takes O(1) constant time) and will be compared using INT comparison and take O(logn) time.
Conclusion, it depends. Prefer indexes on searchable columns and use ints.
Probably the only way to know that is to run it. I would suspect that if all columns used are indexed, there would be basically no difference. If INT is 4 bytes, it will be almost the same size as the string.
The one wrinkle is in how VARCHAR is stored. If you used a constant string size, it might be faster than VARCHAR, but mostly because your select * needs to go get it.
The huge advantage of using INT is that you can do much more sophisticated filtering. That alone should be a reason to prefer it. What if you need a range, or just width, or you want to do math on width in the filtering? What about constraints based on the columns, or aggregates?
Also, when you get the values into your programming language, you won't need to parse them before using them (which takes time).
EDIT: Some other answers are mentioning string compares. If indexed, there won't be many string compares done. And it's possible to implement very fast compare algorithms that don't need to loop byte-by-byte. You'd have to know the details of what mysql does to know for sure.
Second query, as the chances to match the exact string is smaller (which mean smaller set of records but with greater cardinality)
First query, chances matching first column is higher and more rows are potentially matched (lesser cardinality)
of course, assuming index are defined for both scenario
first one because it is faster to compare numeric data.
It depends on the data and the available indexes. But it is quite possible for the VARCHAR version to be faster because searching a single index can be faster than two. If the combination of values provides a unique (or "mostly" unique) result while each individual H/W value has multiple entries, then it could narrow the down to a much smaller set using the single index.
On the other hand, if you have a multiple column index on the to integer columns, that would likely be the most efficient.

Is it faster to check that a Date is (not) NULL or compare a bit to 1/0?

I'm just wondering what is faster in SQL (specifically SQL Server).
I could have a nullable column of type Date and compare that to NULL, or I could have a non-nullable Date column and a separate bit column, and compare the bit column to 1/0.
Is the comparison to the bit column going to be faster?
In order to check that a column IS NULL SQL Server would actually just check a bit anyway. There is a NULL BITMAP stored for each row indicating whether each column contains a NULL or not.
I just did a simple test for this:
DECLARE #d DATETIME
,#b BIT = 0
SELECT 1
WHERE #d IS NULL
SELECT 2
WHERE #b = 0
The actual execution plan results show the computation as exactly the same cost relative to the batch.
Maybe someone can tear this apart, but to me it seems there's no difference.
MORE TESTS
SET DATEFORMAT ymd;
CREATE TABLE #datenulltest
(
dteDate datetime NULL
)
CREATE TABLE #datebittest
(
dteDate datetime NOT NULL,
bitNull bit DEFAULT (1)
)
INSERT INTO #datenulltest ( dteDate )
SELECT CASE WHEN CONVERT(bit, number % 2) = 1 THEN '2010-08-18' ELSE NULL END
FROM master..spt_values
INSERT INTO #datebittest ( dteDate, bitNull )
SELECT '2010-08-18', CASE WHEN CONVERT(bit, number % 2) = 1 THEN 0 ELSE 1 END
FROM master..spt_values
SELECT 1
FROM #datenulltest
WHERE dteDate IS NULL
SELECT 2
FROM #datebittest
WHERE bitNull = CONVERT(bit, 1)
DROP TABLE #datenulltest
DROP TABLE #datebittest
dteDate IS NULL result:
bitNull = 1 result:
OK, so this extended test comes up with the same responses again.
We could do this all day - it would take some very complex query to find out which is faster on average.
All other things being equal, I would say the Bit would be faster because it is a "smaller" data type. However, if performance is very important here (and I assume it is because of the question) then you should always do testing, as there may be other factors such as indexes, caching that affect this.
It sounds like you are trying to decide on a datatype for field which will record whether an event X has happened or not. So, either a timestamp (when X happened) or just a Bit (1 if X happened, otherwise 0). In this case I would be tempted to go for the Date as it gives you more information (not only whether X happened, but also exactly when) which will most likely be useful in the future for reporting purposes. Only go against this if the minor performance gain really is more important.
Short answer, If you have only 1s and 0s something like bit-map index 1,0 is uber fast. Nulls are not indexed on certain sqlengines so 'is null' and 'not null' are slow. However, do think of the entity semantics before dishing this out. It is always better to have a semantic table definition, if you know what I mean.
The speed comes from ability to use indices and not from data size in this case.
Edit
Please refer to Martin Smith's answer. That makes more sense for sqlserver, I got carried away by oracle DB, my mistake here.
The bit will be faster as loading th bit to memory will load only 1 byte and loading the date will take 8 bytes. The comparison itself will take the same time, but the loading from the disk will take more time. Unless you use a very old server or need to load more then 10^8 rows you won't notice anything.