Specific string matching

Specific string matching - sql

I am working in SQL Server 2012. In my table, there is a column called St_Num and its data is like this:
St_Num status
------------------------------
128 TIMBER RUN DR EXP
128 TIMBER RUN DRIVE EXP
Now we can notice that there are spelling variations in the data above. What I would like to do is that if the number in this case 128 and first 3 letters in St_Num column are same then these both rows should be considered the same like this the output should be:
St_Num status
-----------------------------
128 TIMBER RUN DR EXP
I did some search regarding this and found that left or substring function can be handy here but I have no idea how they will be used here to get what I need and don't know even if they can solve my issue. Any help regarding how to get the desired output would be great.

This will output only the first of the matching rows:
with cte as (
select *,
row_number() over (order by (select null)) rn
from tablename
)
select St_Num, status from cte t
where not exists (
select 1 from cte
where
left(St_Num, 7) = left(t.St_Num, 7)
and
rn < t.rn
)
See the demo

This could possibly be done by using a subquery in the same way that you would eliminate duplicates in a table so:
SELECT Str_Num, status
FROM <your_table> a
WHERE NOT EXISTS (SELECT 1
FROM <your_table> b
WHERE SUBSTRING(b.Str_Num, 1, 7) = SUBSTRING(a.Str_Num, 1, 7));
This would only work however if the number is guaranteed to be 3 characters long, or if you don't mind it taking more characters in the case that the number is fewer characters.

You can use grouping by status and substring(St_Num,1,3)
with t(St_Num, status) as
(
select '128 TIMBER RUN DR' ,'EXP' union all
select '128 TIMBER RUN DRIVE','EXP'
)
select min(St_Num) as St_Num, status
from t
group by status, substring(St_Num,1,3);
St_Num status
----------------- ------
128 TIMBER RUN DR EXP

I don't really approve of your matching logic . . . but that is not your question. The big issue is how long is the number before the string. So, you can get the shortest of the addresses using:
select distinct t.*
from t
where not exists (select 1
from t t2
where left(t2.st_num, patindex('%[a-zA-Z]%') + 2, t.st_num) = left(t.st_num, patindex('%[a-zA-Z]%', t.st_num) + 2) and
len(t.St_Num) < len(t2.St_Num)
);

I still have odd feeling that your criteria is not enough to match same addresses but this might help, since it considers also length of the number:
WITH ParsedAddresses(st_num, exp, number)
AS
(
SELECT st_num,
exp,
number = ROW_NUMBER() OVER(PARTITION BY LEFT(st_num, CHARINDEX(' ', st_num) + 3) ORDER BY LEN(st_num))
FROM <table_name>
)
SELECT st_num, exp FROM ParsedAddresses
WHERE number = 1

Related

SQL Query Help - Negative reporting

Perhaps somebody can help with Ideas or a Solution. A User asked me for a negative report. We have a table with tickets each ticket has a ticket number which would be easy to select but the user wants a list of missing tickets between the first and last ticket in the system.
E.g. Select TicketNr from Ticket order by TicketNr
Result
1,
2,
4,
7,
11
But we actually want the result 3,5,6,8,9,10
CREATE TABLE [dbo].[Ticket](
[pknTicketId] [int] IDENTITY(1,1) NOT NULL,
[TicketNr] [int] NULL
) ON [PRIMARY]
GO
SQL Server 2016 - TSQL
Any ideas ?
So a bit more information is need all solution thus far works on small table. Our production database has over 4 million tickets. Hence why we need to find the missing ones.

First get the minimum and maximum, then generate all posible ticket numbers and finally select the ones that are missing.
;WITH FirstAndLast AS
(
SELECT
MinTicketNr = MIN(T.TicketNr),
MaxTicketNr = MAX(T.TicketNr)
FROM
Ticket AS T
),
AllTickets AS
(
SELECT
TicketNr = MinTicketNr,
MaxTicketNr = T.MaxTicketNr
FROM
FirstAndLast AS T
UNION ALL
SELECT
TicketNr = A.TicketNr + 1,
MaxTicketNr = A.MaxTicketNr
FROM
AllTickets AS A
WHERE
A.TicketNr + 1 <= A.MaxTicketNr
)
SELECT
A.TicketNr
FROM
AllTickets AS A
WHERE
NOT EXISTS (
SELECT
'missing ticket'
FROM
Ticket AS T
WHERE
A.TicketNr = T.TicketNr)
ORDER BY
A.TicketNr
OPTION
(MAXRECURSION 32000)

If you can accept the results in a different format, the following will do what you want:
select TicketNr + 1 as first_missing,
next_TicketNr - 1 as last_missing,
(next_TicketNr - TicketNr - 1) as num_missing
from (select t.*, lead(TicketNr) over (order by TicketNr) as next_TicketNr
from Ticket t
) t
where next_TicketNr <> TicketNr + 1;
This shows each sequence of missing ticket numbers on a single row, rather than a separate row for each of them.
If you do use a recursive CTE, I would recommend doing it only for the missing tickets:
with cte as (
select (TicketNr + 1) as missing_TicketNr
from (select t.*, lead(TicketNr) over (order by TicketNr) as next_ticketNr
from tickets t
) t
where next_TicketNr <> TicketNr + 1
union all
select missing_TicketNr + 1
from cte
where not exists (select 1 from tickets t2 where t2.TicketNr = cte.missing_TicketNr + 1)
)
select *
from cte;
This version starts with the list of missing ticket numbers. It then adds a new one, as the numbers are not found.

One method is to use recursive cte to find the missing ticket numbers :
with missing as (
select min(TicketNr) as mnt, max(TicketNr) as mxt
from ticket t
union all
select mnt+1, mxt
from missing m
where mnt < mxt
)
select m.*
from missing m
where not exists (select 1 from tickets t where t.TicketNr = m.mnt);

This should do the trick: SQL Fiddle
declare #ticketsTable table (ticketNo int not null)
insert #ticketsTable (ticketNo) values (1),(2),(4),(7),(11)
;with cte1(ticketNo, isMissing, sequenceNo) AS
(
select ticketNo
, 0
, row_number() over (order by ticketNo)
from #ticketsTable
)
, cte2(ticketNo, isMissing, sequenceNo) AS
(
select ticketNo, isMissing, sequenceNo
from cte1
union all
select a.ticketNo + 1
, 1
, a.sequenceNo
from cte2 a
inner join cte1 b
on b.sequenceNo = a.sequenceNo + 1
and b.ticketNo != a.ticketNo + 1
)
select *
from cte2
where isMissing = 1
order by ticketNo
It works by collecting all of the existing tickets, marking them as existing, and assigning each a consecutive number giving their order in the original list.
We can then see the gaps in the list by finding any spots where the consecutive order number shows the next record, but the ticket numbers are not consecutive.
Finally, we recursively fill in the gaps; working from the start of a gap and adding new records until that gap's consecutive numbers no longer has a gap between the related ticket numbers.

I think this one give you easiest solution
with cte as(
select max(TicketNr) maxnum,min(TicketNr) minnum from Ticket )
select a.number FROM master..spt_values a,cte
WHERE Type = 'P' and number < cte.maxnum and number > cte.minno
except
select TicketNr FROM Ticket

So After looking at all the solutions
I went with creating a temp table with a full range of number from Starting to Ending ticket and then select from the Temp table where the ticket number not in the ticket table.
The reason being I kept running in MAXRECURSION problems.

The right way to use CTE

I'm new to Common Table Expressions and I think I need to use one in order to achieve what I require.
If I run the following script -
select MainRentAccountReference,EffectiveFromDate,CollectionDay,NumberOfCollections,DirectDebitTotalOverrideAmount
from DirectDebitApportionment
where id = 1
It would give me the below results -
So for each row that my CTE would return- for each unique MainRentAccountReference - I would want to create a row based on the following criteria.
3 Rows as the NumberOfCollections is set to 3
The following dates on each row - 01/05/18, 01/06/18, 01/07/18 so basically plus one month.
However is the CollectionDate was set to say 10, then I would want the 3 dates to be 10/05/18, 10/06/18, 10/07/18
Finally each row to have a value of DirectDebitTotalOverrideAmount divided by number of NumberOfCollections.
I've been playing about with this and can get no where near the results I'm trying to achieve. Any help would be greatly appreciated. Thanks

You can do this with a recursive CTE
with t as (
select *
from DirectDebitApportionment
where id = 1
),
cte as (
select . . ., , 1 as collection, DirectDebitTotalOverrideAmount / NumberOfCollections as collection_amount
from t
union all
select . . ., , collection + 1, DirectDebitTotalOverrideAmount / NumberOfCollections as collection_amount
from cte
where collection < NumberOfCollections
)
select . . .
from cte;
In some dialects of SQL, you need the recursive keyword.
Also, this can also be accomplished using a numbers table -- and that can be more efficient than the recursive CTE (although recursive CTEs often perform surprisingly well).

This seems to do the trick based on the pointers that Gordon gave me -
with t as (
select MainRentAccountReference,EffectiveFromDate,CollectionDay,NumberOfCollections,DirectDebitTotalOverrideAmount
from DirectDebitApportionment
where id = 1
),
cte as (
select 1 as collection
,t.MainRentAccountReference
,convert(decimal(18,2),DirectDebitTotalOverrideAmount / NumberOfCollections) as collection_amount
,NumberOfCollections
,convert(datetime,DATEFROMPARTS ( DATEPART(YEAR,EffectiveFromDate), DATEPART(MONTH,EffectiveFromDate), CollectionDay )) AS EffectiveFromDate
,CollectionDay
from t
union all
select collection + 1,MainRentAccountReference,collection_amount,NumberOfCollections,DATEADD(M,1,EffectiveFromDate),CollectionDay
from cte
where collection < cte.NumberOfCollections
)
select *
from cte
Order by MainRentAccountReference,collection
;
Gives me the following results -

Substring in a column

I have a column that has several items in which I need to count the times it is called, my column table looks something like this:
Table Example
Id_TR Triggered
-------------- ------------------
A1_6547 R1:23;R2:0;R4:9000
A2_1235 R2:0;R2:100;R3:-100
A3_5436 R1:23;R2:100;R4:9000
A4_1245 R2:0;R5:150
And I would like the result to be like this:
Expected Results
Triggered Count(1)
--------------- --------
R1:23 2
R2:0 3
R2:100 2
R3:-100 1
R4:9000 2
R5:150 1
I've tried to do some substring, but cant seem to find how to solve this problem. Can anyone help?

This solution is X3 times faster than the CONNECT BY solution
performance: 15K records per second
with cte (token,suffix)
as
(
select substr(triggered||';',1,instr(triggered,';')-1) as token
,substr(triggered||';',instr(triggered,';')+1) as suffix
from t
union all
select substr(suffix,1,instr(suffix,';')-1) as token
,substr(suffix,instr(suffix,';')+1) as suffix
from cte
where suffix is not null
)
select token,count(*)
from cte
group by token
;

with x as (
select listagg(Triggered, ';') within group (order by Id_TR) str from table
)
select regexp_substr(str,'[^;]+',1,level) element, count(*)
from x
connect by level <= length(regexp_replace(str,'[^;]+')) + 1
group by regexp_substr(str,'[^;]+',1,level);
First concatenate all values of triggered into one list using listagg then parse it and do group by.
Another methods of parsing list you can find here or here

This is a fair solution.
performance: 5K records per second
select triggered
,count(*) as cnt
from (select id_tr
,regexp_substr(triggered,'[^;]+',1,level) as triggered
from t
connect by id_tr = prior id_tr
and level <= regexp_count(triggered,';')+1
and prior sys_guid() is not null
) t
group by triggered
;

This is just for learning purposes.
Check my other solutions.
performance: 1K records per second
select x.triggered
,count(*)
from t
,xmltable
(
'/r/x'
passing xmltype('<r><x>' || replace(triggered,';', '</x><x>') || '</x></r>')
columns triggered varchar(100) path '.'
) x
group by x.triggered
;

Returning the lowest integer not in a list in SQL

Supposed you have a table T(A) with only positive integers allowed, like:
1,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18
In the above example, the result is 10. We always can use ORDER BY and DISTINCT to sort and remove duplicates. However, to find the lowest integer not in the list, I came up with the following SQL query:
select list.x + 1
from (select x from (select distinct a as x from T order by a)) as list, T
where list.x + 1 not in T limit 1;
My idea is start a counter and 1, check if that counter is in list: if it is, return it, otherwise increment and look again. However, I have to start that counter as 1, and then increment. That query works most of the cases, by there are some corner cases like in 1. How can I accomplish that in SQL or should I go about a completely different direction to solve this problem?

Because SQL works on sets, the intermediate SELECT DISTINCT a AS x FROM t ORDER BY a is redundant.
The basic technique of looking for a gap in a column of integers is to find where the current entry plus 1 does not exist. This requires a self-join of some sort.
Your query is not far off, but I think it can be simplified to:
SELECT MIN(a) + 1
FROM t
WHERE a + 1 NOT IN (SELECT a FROM t)
The NOT IN acts as a sort of self-join. This won't produce anything from an empty table, but should be OK otherwise.

SQL Fiddle
select min(y.a) as a
from
t x
right join
(
select a + 1 as a from t
union
select 1
) y on y.a = x.a
where x.a is null
It will work even in an empty table

SELECT min(t.a) - 1
FROM t
LEFT JOIN t t1 ON t1.a = t.a - 1
WHERE t1.a IS NULL
AND t.a > 1; -- exclude 0
This finds the smallest number greater than 1, where the next-smaller number is not in the same table. That missing number is returned.
This works even for a missing 1. There are multiple answers checking in the opposite direction. All of them would fail with a missing 1.
SQL Fiddle.

You can do the following, although you may also want to define a range - in which case you might need a couple of UNIONs
SELECT x.id+1
FROM my_table x
LEFT
JOIN my_table y
ON x.id+1 = y.id
WHERE y.id IS NULL
ORDER
BY x.id LIMIT 1;

You can always create a table with all of the numbers from 1 to X and then join that table with the table you are comparing. Then just find the TOP value in your SELECT statement that isn't present in the table you are comparing
SELECT TOP 1 table_with_all_numbers.number, table_with_missing_numbers.number
FROM table_with_all_numbers
LEFT JOIN table_with_missing_numbers
ON table_with_missing_numbers.number = table_with_all_numbers.number
WHERE table_with_missing_numbers.number IS NULL
ORDER BY table_with_all_numbers.number ASC;

In SQLite 3.8.3 or later, you can use a recursive common table expression to create a counter.
Here, we stop counting when we find a value not in the table:
WITH RECURSIVE counter(c) AS (
SELECT 1
UNION ALL
SELECT c + 1 FROM counter WHERE c IN t)
SELECT max(c) FROM counter;
(This works for an empty table or a missing 1.)

This query ranks (starting from rank 1) each distinct number in ascending order and selects the lowest rank that's less than its number. If no rank is lower than its number (i.e. there are no gaps in the table) the query returns the max number + 1.
select coalesce(min(number),1) from (
select min(cnt) number
from (
select
number,
(select count(*) from (select distinct number from numbers) b where b.number <= a.number) as cnt
from (select distinct number from numbers) a
) t1 where number > cnt
union
select max(number) + 1 number from numbers
) t1
http://sqlfiddle.com/#!7/720cc/3

Just another method, using EXCEPT this time:
SELECT a + 1 AS missing FROM T
EXCEPT
SELECT a FROM T
ORDER BY missing
LIMIT 1;

MySQL get row position in ORDER BY

With the following MySQL table:
+-----------------------------+
+ id INT UNSIGNED +
+ name VARCHAR(100) +
+-----------------------------+
How can I select a single row AND its position amongst the other rows in the table, when sorted by name ASC. So if the table data looks like this, when sorted by name:
+-----------------------------+
+ id | name +
+-----------------------------+
+ 5 | Alpha +
+ 7 | Beta +
+ 3 | Delta +
+ ..... +
+ 1 | Zed +
+-----------------------------+
How could I select the Beta row getting the current position of that row? The result set I'm looking for would be something like this:
+-----------------------------+
+ id | position | name +
+-----------------------------+
+ 7 | 2 | Beta +
+-----------------------------+
I can do a simple SELECT * FROM tbl ORDER BY name ASC then enumerate the rows in PHP, but it seems wasteful to load a potentially large resultset just for a single row.

Use this:
SELECT x.id,
x.position,
x.name
FROM (SELECT t.id,
t.name,
#rownum := #rownum + 1 AS position
FROM TABLE t
JOIN (SELECT #rownum := 0) r
ORDER BY t.name) x
WHERE x.name = 'Beta'
...to get a unique position value. This:
SELECT t.id,
(SELECT COUNT(*)
FROM TABLE x
WHERE x.name <= t.name) AS position,
t.name
FROM TABLE t
WHERE t.name = 'Beta'
...will give ties the same value. IE: If there are two values at second place, they'll both have a position of 2 when the first query will give a position of 2 to one of them, and 3 to the other...

This is the only way that I can think of:
SELECT `id`,
(SELECT COUNT(*) FROM `table` WHERE `name` <= 'Beta') AS `position`,
`name`
FROM `table`
WHERE `name` = 'Beta'

If the query is simple and the size of returned result set is potentially large, then you may try to split it into two queries.
The first query with a narrow-down filtering criteria just to retrieve data of that row, and the second query uses COUNT with WHERE clause to calculate the position.
For example in your case
Query 1:
SELECT * FROM tbl WHERE name = 'Beta'
Query 2:
SELECT COUNT(1) FROM tbl WHERE name >= 'Beta'
We use this approach in a table with 2M record and this is way more scalable than OMG Ponies's approach.

The other answers seem too complicated for me.
Here comes an easy example, let's say you have a table with columns:
userid | points
and you want to sort the userids by points and get the row position (the "ranking" of the user), then you use:
SET #row_number = 0;
SELECT
(#row_number:=#row_number + 1) AS num, userid, points
FROM
ourtable
ORDER BY points DESC
num gives you the row postion (ranking).
If you have MySQL 8.0+ then you might want to use ROW_NUMBER()

The position of a row in the table represents how many rows are "better" than the targeted row.
So, you must count those rows.
SELECT COUNT(*)+1 FROM table WHERE name<'Beta'
In case of a tie, the highest position is returned.
If you add another row with same name of "Beta" after the existing "Beta" row, then the position returned would be still 2, as they would share same place in the classification.
Hope this helps people that will search for something similar in the future, as I believe that the question owner already solved his issue.

I've got a very very similar issue, that's why I won't ask the same question, but I will share here what did I do, I had to use also a group by, and order by AVG.
There are students, with signatures and socore, and I had to rank them (in other words, I first calc the AVG, then order them in DESC, and then finally I needed to add the position (rank for me), So I did something Very similar as the best answer here, with a little changes that adjust to my problem):
I put finally the position (rank for me) column in the external SELECT
SET #rank=0;
SELECT #rank := #rank + 1 AS ranking, t.avg, t.name
FROM(SELECT avg(students_signatures.score) as avg, students.name as name
FROM alumnos_materia
JOIN (SELECT #rownum := 0) r
left JOIN students ON students.id=students_signatures.id_student
GROUP BY students.name order by avg DESC) t

I was going through the accepted answer and it seemed bit complicated so here is the simplified version of it.
SELECT t,COUNT(*) AS position FROM t
WHERE name <= 'search string' ORDER BY name

I have similar types of problem where I require rank(Index) of table order by votes desc. The following works fine with for me.
Select *, ROW_NUMBER() OVER(ORDER BY votes DESC) as "rank"
From "category_model"
where ("model_type" = ? and "category_id" = ?)

may be what you need is with add syntax
LIMIT
so use
SELECT * FROM tbl ORDER BY name ASC LIMIT 1
if you just need one row..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Specific string matching - sql

This will output only the first of the matching rows: with cte as ( select *, row_number() over (order by (select null)) rn from tablename ) select St_Num, status from cte t where not exists ( select 1 from cte where left(St_Num, 7) = left(t.St_Num, 7) and rn < t.rn ) See the demo

Related

SQL Query Help - Negative reporting

The right way to use CTE

Substring in a column

Returning the lowest integer not in a list in SQL

MySQL get row position in ORDER BY

Categories

Resources