SQL Substring Subquery Taking Too Long - sql

My current subquery is taking :26 secs to run. I'm using the following query as a subquery as part of another code which altogether takes 2 mins and 25 secs for one month a data to return.
Is there a faster query for this? The current procedureID contains alpha and numeric characters. I only want to pull the ProcedureID's beginning with numeric characters.
SELECT DISTINCT
ProcedureID
FROM Transactions
WHERE Substring(ProcedureID,1,1) NOT LIKE '[A-z]%'

Run the query with the Execution Plan turned on. That should identify any indexes that may help. In addition if you add a new column that is 1 char long with it being populated with the first char of the ProcedureID and add an index to that column you should get better performance when querying based on that column as opposed to the substring() query you have used.

First, the issue is unlikely to be the substring(). The performance hog is the select distinct.
You can simplify the logic. Something like:
SELECT DISTINCT ProcedureID
FROM Transactions
WHERE ProcedureID < 'A' or ProcedureID >= '{' -- 'z' + 1
or:
WHERE ProcedureId >= '0' AND ProcedureId < ':' -- '9' + 1
The magic characacters '{' and ':' are simply the ASCII values of the characters that follow "z" and "9". They could be replaced by an expression such as CHR(ASCII('9') + 1) if you prefer.
However, this will probably have minimal affect on performance. An index on Transactions(ProcedureID) would help, because it covers the query/subquery.
If you really want help on the larger query, you should ask another question and provide the query that you really want optimized (or perhaps a representative simpler version).
EDIT:
You might actually find that a version like this is much faster with the right indexes:
SELECT p.ProcedureId
FROM Procedures p
WHERE p.ProcedureId >= '0' AND p.ProcedureId < ':' AND -- '9' + 1
EXISTS (SELECT 1 FROM Transactions t WHERE t.ProcedureId = p.ProcedureId);
This assumes that you have a table of which ProcedureId is the primary key.
Then, for performance, you want an index on Transactions(ProcedureId).

Try this:
WHERE IsNumeric(Substring(ProcedureID, 1, 1)) = 1
If you do this often enough, it might be worth creating a calculated column that contains just the first character of ProcedureID.

seems you are using a REGEXP not a like
SELECT DISTINCT
ProcedureID
FROM Transactions
WHERE Substring(ProcedureID,1,1) NOT REGEXP '^[A-z]'

Related

Count all elements in an array

I have a table that I save some data include list of numbers.
like this:
numbers
(null)
،42593
،42593،42594،36725،42592،36725،42592
،42593،42594،36725،42592
،31046،36725،42592
I would like to count the number elements in every row in SQL Server
count
0
1
6
4
3
You could use a replacement trick here:
SELECT numbers,
COALESCE(LEN(numbers) - LEN(REPLACE(numbers, ',', '')), 0) AS num_elements
FROM yourTable;
The above trick works by counting the number of commas (assuming your data really has commas as separators). For example, your last sample data point was:
,31046,36725,42592 => length is 18
310463672542592 => length is 15
Hence the difference in lengths correctly yields the right number of elements.
Another idea is to useSTRING_SPLIT:
SELECT y.numbers,
(SELECT COUNT(Value) - 1
FROM string_split(COALESCE(y.numbers,''),',')) AS num_elements
FROM yourtable AS y;
I know this looks a bit unhandy on first glance due to this strange -1 in the second line and the COALESCE in the third line. So why do I talk about this option?
Well, the strange thing in your case which causes these difficulties in my query is that your rows always start with a comma.
This is quite weird and it would be much easier without this first comma in every row.
Let's assume you remove this comma in future. Then this will become really easy and good readable:
SELECT y.numbers,
(SELECT COUNT(Value)
FROM string_split(y.numbers,',')) AS num_elements
FROM yourtable AS y;
Try out: db<>fiddle
your data
CREATE TABLE yourtable(
numbers VARCHAR(max)
);
INSERT INTO yourtable
(numbers) VALUES
(null),
('،42593'),
('،42593،42594،36725،42592،36725،42592'),
('،42593،42594،36725،42592'),
('،31046،36725،42592');
you need ISNULL and len
select
ISNULL(len(numbers) - len(replace(numbers,'،','')) ,0) count
from yourtable
the other way is by using IIF and string_split as follows
SELECT IIF(count < 0, 0, count) count
FROM   (SELECT (SELECT Count(*) - 1
                FROM   STRING_SPLIT (Replace(Replace(numbers, 'R', ''), '،',
                                     'R'), 'R'
                       )) AS
               'count'
        FROM   yourtable) A
dbfiddle

Hive casting function

In a hive table how can I add the '-' sign in a field, but for random records? If I use the syntax below it changes all the records in the field to negative, but I want to change random records to negative.
This is the syntax I used which changed all the records to negative:
CAST(CAST(-1 AS DECIMAL(1,0)) AS DECIMAL(19,2))
*CAST(regexp_replace(regexp_replace(TRIM(column name),'\\-',''),'-','') as decimal(19,2)),
If you want to change random values to negative, why not use a case expression?
select (case when rand() < 0.5 then - column_name else column_name end)
Despite your query, this assumes that the column is a number of some sort, because negating strings doesn't make much sense.

Search Through All Between Values SQL

I have data following data structure..
_ID _BEGIN _END
7003 99210 99217
7003 10225 10324
7003 111111
I want to look through every _BEGIN and _END and return all rows where the input value is between the range of values including the values themselves (i.e. if 10324 is the input, row 2 would be returned)
I have tried this filter but it does not work..
where #theInput between a._BEGIN and a._END
--THIS WORKS
where convert(char(7),'10400') >= convert(char(7),a._BEGIN)
--BUT ADDING THIS BREAKS AND RETURNS NOTHING
AND convert(char(7),'10400') < convert(char(7),a._END)
Less than < and greater than > operators work on xCHAR data types without any syntactical error, but it may go semantically wrong. Look at examples:
1 - SELECT 'ab' BETWEEN 'aa' AND 'ac' # returns TRUE
2 - SELECT '2' BETWEEN '1' AND '10' # returns FALSE
Character 2 as being stored in a xCHAR type has greater value than 1xxxxx
So you should CAST types here. [Exampled on MySQL - For standard compatibility change UNSIGNED to INTEGER]
WHERE CAST(#theInput as UNSIGNED)
BETWEEN CAST(a._BEGIN as UNSIGNED) AND CAST(a._END as UNSIGNED)
You'd better change the types of columns to avoid ambiguity for later use.
This would be the obvious answer...
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #theInput between a._BEGIN and a._END
If the data is string (assuming here as we don't know what DB) You could add this.
Declare #searchArg VARCHAR(30) = CAST(#theInput as VARCHAR(30));
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #searchArg between a._BEGIN and a._END
If you care about performance and you've got a lot of data and indexes you won't want to include function calls on the column values.. you could in-line this conversion but this assures that your predicates are Sargable.
SELECT * FROM myTable
WHERE
(CAST(#theInput AS char) >= a._BEGIN AND #theInput < a.END);
I also saw several of the same type of questions:
SQL "between" not inclusive
MySQL "between" clause not inclusive?
When I do queries like this, I usually try one side with the greater/less than on either side and work from there. Maybe that can help. I'm very slow, but I do lots of trial and error.
Or, use Tony's convert.
I supposed you can convert them to anything appropriate for your program, numeric or text.
Also, see here, http://technet.microsoft.com/en-us/library/aa226054%28v=sql.80%29.aspx.
I am not convinced you cannot do your CAST in the SELECT.
Nick, here is a MySQL version from SO, MySQL "between" clause not inclusive?

Absolute maxvalue comparison of columns in Firebird SQL

I want to perform comparison for the specified columns in database, the comparison logic should compare the numbers regardless of their signs and will retrieve the result original with its sign.
For example, below code works well but as can be seen in the select block it returns the absolute value of columns. Is there any trick, cheat in Firebird 2.1 to overcome that?
SELECT a.ELM_NUM,a.COMBO, maxvalue(abs(a.N_1),abs(a.N_2)) as maxN from ntm a order by a.ELM_NUM
You can use a CASE condition:
SELECT a.ELM_NUM,a.COMBO,
CASE WHEN abs(a.N_1) > abs(a.N_2) THEN a.N_1 ELSE a.N_2 END as maxN
from ntm a
order by a.ELM_NUM

Oracle DBMS_LOB.INSTR and CONTAINS performance

Is there any performance difference between dbms_lob.instr and contains or am I doing something wrong?
Here is my code
SELECT DISTINCT ha.HRE_A_ID, ha.HRE_A_FIRSTNAME, ha.HRE_A_SURNAME, ha.HRE_A_CITY,
ha.HRE_A_EMAIL, ha.HRE_A_PHONE_MOBIL
FROM HRE_APPLICANT ha WHERE ha.HRE_A_STATUS_ID=1 AND ha.HRE_A_CURRENT_STATUS_ID <= '7'
AND ((DBMS_LOB.INSTR(hre_a_for_search,'java') > 0)
OR EXISTS
(SELECT 1 FROM gob_attachment, gob_table WHERE hre_a_id=gob_a_record_id
AND gob_a_table_id = gob_t_id AND gob_t_code = 'HRE_APPLICANT'
AND CONTAINS (gob_a_document, 'java') > 0))
ORDER BY HRE_A_SURNAME
and last two lines changed for using instr
AND dbms_lob.instr(gob_a_document,utl_raw.cast_to_raw('java')) <> 0))
ORDER BY HRE_A_SURNAME
My problem is that I would like to use instr instead of contains, but instr seems to me a lot slower then contains.
CONTAINS will use an Oracle Text index so you'd expect it to be much more efficient than something like INSTR that has to read the entire CLOB at runtime. If you generate the query plans for the two statements, I expect that you'll see that the difference is related to the Oracle Text index.
Why do you want to use INSTR rather than CONTAINS?