Substring in a column

Substring in a column - sql

I have a column that has several items in which I need to count the times it is called, my column table looks something like this:
Table Example
Id_TR Triggered
-------------- ------------------
A1_6547 R1:23;R2:0;R4:9000
A2_1235 R2:0;R2:100;R3:-100
A3_5436 R1:23;R2:100;R4:9000
A4_1245 R2:0;R5:150
And I would like the result to be like this:
Expected Results
Triggered Count(1)
--------------- --------
R1:23 2
R2:0 3
R2:100 2
R3:-100 1
R4:9000 2
R5:150 1
I've tried to do some substring, but cant seem to find how to solve this problem. Can anyone help?

This solution is X3 times faster than the CONNECT BY solution
performance: 15K records per second
with cte (token,suffix)
as
(
select substr(triggered||';',1,instr(triggered,';')-1) as token
,substr(triggered||';',instr(triggered,';')+1) as suffix
from t
union all
select substr(suffix,1,instr(suffix,';')-1) as token
,substr(suffix,instr(suffix,';')+1) as suffix
from cte
where suffix is not null
)
select token,count(*)
from cte
group by token
;

with x as (
select listagg(Triggered, ';') within group (order by Id_TR) str from table
)
select regexp_substr(str,'[^;]+',1,level) element, count(*)
from x
connect by level <= length(regexp_replace(str,'[^;]+')) + 1
group by regexp_substr(str,'[^;]+',1,level);
First concatenate all values of triggered into one list using listagg then parse it and do group by.
Another methods of parsing list you can find here or here

This is a fair solution.
performance: 5K records per second
select triggered
,count(*) as cnt
from (select id_tr
,regexp_substr(triggered,'[^;]+',1,level) as triggered
from t
connect by id_tr = prior id_tr
and level <= regexp_count(triggered,';')+1
and prior sys_guid() is not null
) t
group by triggered
;

This is just for learning purposes.
Check my other solutions.
performance: 1K records per second
select x.triggered
,count(*)
from t
,xmltable
(
'/r/x'
passing xmltype('<r><x>' || replace(triggered,';', '</x><x>') || '</x></r>')
columns triggered varchar(100) path '.'
) x
group by x.triggered
;

Related

Specific string matching

I am working in SQL Server 2012. In my table, there is a column called St_Num and its data is like this:
St_Num status
------------------------------
128 TIMBER RUN DR EXP
128 TIMBER RUN DRIVE EXP
Now we can notice that there are spelling variations in the data above. What I would like to do is that if the number in this case 128 and first 3 letters in St_Num column are same then these both rows should be considered the same like this the output should be:
St_Num status
-----------------------------
128 TIMBER RUN DR EXP
I did some search regarding this and found that left or substring function can be handy here but I have no idea how they will be used here to get what I need and don't know even if they can solve my issue. Any help regarding how to get the desired output would be great.

This will output only the first of the matching rows:
with cte as (
select *,
row_number() over (order by (select null)) rn
from tablename
)
select St_Num, status from cte t
where not exists (
select 1 from cte
where
left(St_Num, 7) = left(t.St_Num, 7)
and
rn < t.rn
)
See the demo

This could possibly be done by using a subquery in the same way that you would eliminate duplicates in a table so:
SELECT Str_Num, status
FROM <your_table> a
WHERE NOT EXISTS (SELECT 1
FROM <your_table> b
WHERE SUBSTRING(b.Str_Num, 1, 7) = SUBSTRING(a.Str_Num, 1, 7));
This would only work however if the number is guaranteed to be 3 characters long, or if you don't mind it taking more characters in the case that the number is fewer characters.

You can use grouping by status and substring(St_Num,1,3)
with t(St_Num, status) as
(
select '128 TIMBER RUN DR' ,'EXP' union all
select '128 TIMBER RUN DRIVE','EXP'
)
select min(St_Num) as St_Num, status
from t
group by status, substring(St_Num,1,3);
St_Num status
----------------- ------
128 TIMBER RUN DR EXP

I don't really approve of your matching logic . . . but that is not your question. The big issue is how long is the number before the string. So, you can get the shortest of the addresses using:
select distinct t.*
from t
where not exists (select 1
from t t2
where left(t2.st_num, patindex('%[a-zA-Z]%') + 2, t.st_num) = left(t.st_num, patindex('%[a-zA-Z]%', t.st_num) + 2) and
len(t.St_Num) < len(t2.St_Num)
);

I still have odd feeling that your criteria is not enough to match same addresses but this might help, since it considers also length of the number:
WITH ParsedAddresses(st_num, exp, number)
AS
(
SELECT st_num,
exp,
number = ROW_NUMBER() OVER(PARTITION BY LEFT(st_num, CHARINDEX(' ', st_num) + 3) ORDER BY LEN(st_num))
FROM <table_name>
)
SELECT st_num, exp FROM ParsedAddresses
WHERE number = 1

Teradata SQL Reverse Parent Child Hierarchy

I know how to build a hierarchy starting with the root node (i.e. where parent_id is null or something like that), but I can't find anything on how to build a hierarchy upward from the final child/edge node. I'd like to start with a child and build all the way back up to the top. Assume I don't know how many levels, or who the parent is, and we'll have to use SQL to figure it out.
Here is my base table:
old_entity_key,new_entity_key
1,2
2,3
3,4
4,5
5,6
Desired output:
new_entity_key,path
2,1/2
3,1/2/3
4,1/2/3/4
5,1/2/3/4/5
6,1/2/3/4/5/6
This is also acceptable:
new_entity_key,path
2,2/1
3,3/2/1
4,4/3/2/1
5,5/4/3/2/1
6,6/5/4/3/2/1
Here is the CTE I've started with:
with recursive history as (
select
old_entity_key,
new_entity_key,
cast(old_entity_key||'/'||new_entity_key as varchar(1000)) as path
from table
where new_entity_key not in (select old_entity_key from table)
and cast(start_time as date) between current_date - interval '3' day and current_date
union all
select
c.old_entity_key,
c.new_entity_key,
p.new_entity_key||'/'||c.path
from history c
join table p on p.new_entity_key = c.old_entity_key
)
select new_entity_key, old_entity_key, substr(path, 1, instr(path, '/') - 1) as original_entity_key, path
from history s;
The problem with the above query is that it runs forever. I think I've created an infinite loop. I've also tried using the below where filter in the bottom query of the union to try to find the root node, but Teradata gives me an error:
where p.new_entity_key in (select old_entity_key from table)
Any help would be greatly appreciated.

You'll need some sort of counter, and I think your join logic in your CTE doesn't make sense. I threw together a very simple volatile table example:
create volatile table tb
(old_entity_key char(1),
new_entity_key char(1),
rn integer)
on commit preserve rows;
insert into tb values ('1','2',1);
insert into tb values ('2','3',2);
insert into tb values ('3','4',3);
Now we can put together a recursive CTE:
with recursive history as (
select
old_entity_key,
new_entity_key,
cast(old_entity_key||'/'||new_entity_key as varchar(1000)) as path,
rn
from tb t
where
rn = 1
union all
select
t.old_entity_key,
t.new_entity_key,
h.path || '/' || t.new_entity_key,
t.rn
from
tb t
join history h
on t.rn = h.rn + 1
)
select * from history order by rn
The important things here are:
Limit your first pass (accomplished here by rn=1).
The second pass needs to pick up the "next" row, based on the previous row (t.rn = h.rn + 1)

Ordering a SQL query based on the value in a column determining the value of another column in the next row

My table looks like this:
Value Previous Next
37 NULL 42
42 37 3
3 42 79
79 3 NULL
Except, that the table is all out of order. (There are no duplicates, so that is not an issue.) I was wondering if there was any way to make a query that would order the output, basically saying "Next row 'value' = this row 'next'" as it's shown above ?
I have no control over the database and how this data is stored. I am just trying to retrieve it and organize it. SQL Server I believe 2008.
I realize that this wouldn't be difficult to reorganize afterwards, but I was just curious if I could write a query that just did that out of the box so I wouldn't have to worry about it.

This should do what you need:
WITH CTE AS (
SELECT YourTable.*, 0 Depth
FROM YourTable
WHERE Previous IS NULL
UNION ALL
SELECT YourTable.*, Depth + 1
FROM YourTable JOIN CTE
ON YourTable.Value = CTE.Next
)
SELECT * FROM CTE
ORDER BY Depth;
[SQL Fiddle] (Referential integrity and indexes omitted for brevity.)
We use a recursive common table expression (CTE) to travel from the head of the list (WHERE Previous IS NULL) to the trailing nodes (ON YourTable.Value = CTE.Next) and at the same time memorize the depth of the recursion that was needed to reach the current node (in Depth).
In the end, we simply sort by the depth of recursion that was needed to reach each of the nodes (ORDER BY Depth).

Use a recursive query, with the one i list here you can have multiple paths along your linked list:
with cte (Value, Previous, Next, Level)
as
(
select Value, Previous, Next, 0 as Level
from data
where Previous is null
union all
select d.Value, d.Previous, d.Next, Level + 1
from data d
inner join cte c on d.Previous = c.Value
)
select * from cte
fiddle here

If you are using Oracle, try Starts with- connect by
select ... start with initial-condition connect by
nocycle recursive-condition;
EDIT: For SQL-Server, use WITH syntax as below:
WITH rec(value, previous, next) AS
(SELECT value, previous, next
FROM table1
WHERE previous is null
UNION ALL
SELECT nextRec.value, nextRec.previous, nextRec.next
FROM table1 as nextRec, rec
WHERE rec.next = nextRec.value)
SELECT value, previous, next FROM rec;

One way to do this is with a join:
select t.*
from t left outer join
t tnext
on t.next = tnext.val
order by tnext.value
However, won't this do?
select t.*
from t
order by t.next

Something like this should work:
With Parent As (
Select
Value,
Previous,
Next
From
table
Where
Previous Is Null
Union All
Select
t.Value,
t.Previous,
t.Next
From
table t
Inner Join
Parent
On Parent.Next = t.Value
)
Select
*
From
Parent
Example

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.
Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).
So select the row with the primary key of 7 and the nearby rows:
select primary_key from table where primary_key > (7-5) order by primary_key limit 11;
2
3
4
5
6
-=7=-
8
9
10
11
12
But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.
The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):
select primary_key from table where primary_key > (34-5)
order by primary_key where active=1 limit 11;
30
-=34=-
80
83
100
113
125
126
127
128
129
Note how due to the gaps in the primary keys caused by the example where condition (for example becaseu there are many inactive items), I'm no longer getting the closest 5 above and 5 below, instead I'm getting the closest 1 below and the closest 9 above, instead.

There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:
(SELECT * FROM table WHERE id >= 34 AND active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 AND active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC
This would return the 5 rows above, the target row, and 5 rows below.

Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.
WITH base AS (
SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag,
lead(customer_id, 5) OVER (ORDER BY customer_id) lead,
c.*
FROM customer c
WHERE c.active = 1
AND c.last_name LIKE 'B%'
)
SELECT base.* FROM base
JOIN (
-- Select the center row, coalesce so it still works if there aren't
-- 5 rows in front or behind
SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead
FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead
The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.

For similar query I use analytic functions without CTE. Something like:
select ...,
LEAD(gm.id) OVER (ORDER BY Cit DESC) as leadId,
LEAD(gm.id, 2) OVER (ORDER BY Cit DESC) as leadId2,
LAG(gm.id) OVER (ORDER BY Cit DESC) as lagId,
LAG(gm.id, 2) OVER (ORDER BY Cit DESC) as lagId2
...
where id = 25912
or leadId = 25912 or leadId2 = 25912
or lagId = 25912 or lagId2 = 25912
such query works more faster for me than CTE with join (answer from Scott Bailey). But of course less elegant

You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
FROM table
WHERE active=1) t
WHERE 25 < r and r < 35
This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.

If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:
select (
select count(*) from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:
with numbered_emps as (
select (
select count(*)
from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion
What could be simpler!
I'd imagine the row-number based solutions will be faster, though.

MySQL get row position in ORDER BY

With the following MySQL table:
+-----------------------------+
+ id INT UNSIGNED +
+ name VARCHAR(100) +
+-----------------------------+
How can I select a single row AND its position amongst the other rows in the table, when sorted by name ASC. So if the table data looks like this, when sorted by name:
+-----------------------------+
+ id | name +
+-----------------------------+
+ 5 | Alpha +
+ 7 | Beta +
+ 3 | Delta +
+ ..... +
+ 1 | Zed +
+-----------------------------+
How could I select the Beta row getting the current position of that row? The result set I'm looking for would be something like this:
+-----------------------------+
+ id | position | name +
+-----------------------------+
+ 7 | 2 | Beta +
+-----------------------------+
I can do a simple SELECT * FROM tbl ORDER BY name ASC then enumerate the rows in PHP, but it seems wasteful to load a potentially large resultset just for a single row.

Use this:
SELECT x.id,
x.position,
x.name
FROM (SELECT t.id,
t.name,
#rownum := #rownum + 1 AS position
FROM TABLE t
JOIN (SELECT #rownum := 0) r
ORDER BY t.name) x
WHERE x.name = 'Beta'
...to get a unique position value. This:
SELECT t.id,
(SELECT COUNT(*)
FROM TABLE x
WHERE x.name <= t.name) AS position,
t.name
FROM TABLE t
WHERE t.name = 'Beta'
...will give ties the same value. IE: If there are two values at second place, they'll both have a position of 2 when the first query will give a position of 2 to one of them, and 3 to the other...

This is the only way that I can think of:
SELECT `id`,
(SELECT COUNT(*) FROM `table` WHERE `name` <= 'Beta') AS `position`,
`name`
FROM `table`
WHERE `name` = 'Beta'

If the query is simple and the size of returned result set is potentially large, then you may try to split it into two queries.
The first query with a narrow-down filtering criteria just to retrieve data of that row, and the second query uses COUNT with WHERE clause to calculate the position.
For example in your case
Query 1:
SELECT * FROM tbl WHERE name = 'Beta'
Query 2:
SELECT COUNT(1) FROM tbl WHERE name >= 'Beta'
We use this approach in a table with 2M record and this is way more scalable than OMG Ponies's approach.

The other answers seem too complicated for me.
Here comes an easy example, let's say you have a table with columns:
userid | points
and you want to sort the userids by points and get the row position (the "ranking" of the user), then you use:
SET #row_number = 0;
SELECT
(#row_number:=#row_number + 1) AS num, userid, points
FROM
ourtable
ORDER BY points DESC
num gives you the row postion (ranking).
If you have MySQL 8.0+ then you might want to use ROW_NUMBER()

The position of a row in the table represents how many rows are "better" than the targeted row.
So, you must count those rows.
SELECT COUNT(*)+1 FROM table WHERE name<'Beta'
In case of a tie, the highest position is returned.
If you add another row with same name of "Beta" after the existing "Beta" row, then the position returned would be still 2, as they would share same place in the classification.
Hope this helps people that will search for something similar in the future, as I believe that the question owner already solved his issue.

I've got a very very similar issue, that's why I won't ask the same question, but I will share here what did I do, I had to use also a group by, and order by AVG.
There are students, with signatures and socore, and I had to rank them (in other words, I first calc the AVG, then order them in DESC, and then finally I needed to add the position (rank for me), So I did something Very similar as the best answer here, with a little changes that adjust to my problem):
I put finally the position (rank for me) column in the external SELECT
SET #rank=0;
SELECT #rank := #rank + 1 AS ranking, t.avg, t.name
FROM(SELECT avg(students_signatures.score) as avg, students.name as name
FROM alumnos_materia
JOIN (SELECT #rownum := 0) r
left JOIN students ON students.id=students_signatures.id_student
GROUP BY students.name order by avg DESC) t

I was going through the accepted answer and it seemed bit complicated so here is the simplified version of it.
SELECT t,COUNT(*) AS position FROM t
WHERE name <= 'search string' ORDER BY name

I have similar types of problem where I require rank(Index) of table order by votes desc. The following works fine with for me.
Select *, ROW_NUMBER() OVER(ORDER BY votes DESC) as "rank"
From "category_model"
where ("model_type" = ? and "category_id" = ?)

may be what you need is with add syntax
LIMIT
so use
SELECT * FROM tbl ORDER BY name ASC LIMIT 1
if you just need one row..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Substring in a column - sql

This is just for learning purposes. Check my other solutions. performance: 1K records per second select x.triggered ,count(*) from t ,xmltable ( '/r/x' passing xmltype('<r><x>' || replace(triggered,';', '</x><x>') || '</x></r>') columns triggered varchar(100) path '.' ) x group by x.triggered ;

Related

Specific string matching

Teradata SQL Reverse Parent Child Hierarchy

Ordering a SQL query based on the value in a column determining the value of another column in the next row

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

MySQL get row position in ORDER BY

Categories

Resources