SQL Query to concatenate column values from multiple rows in Oracle - sql

Would it be possible to construct SQL to concatenate column values from
multiple rows?
The following is an example:
Table A
PID
A
B
C
Table B
PID SEQ Desc
A 1 Have
A 2 a nice
A 3 day.
B 1 Nice Work.
C 1 Yes
C 2 we can
C 3 do
C 4 this work!
Output of the SQL should be -
PID Desc
A Have a nice day.
B Nice Work.
C Yes we can do this work!
So basically the Desc column for out put table is a concatenation of the SEQ values from Table B?
Any help with the SQL?

There are a few ways depending on what version you have - see the oracle documentation on string aggregation techniques. A very common one is to use LISTAGG:
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
Then join to A to pick out the pids you want.
Note: Out of the box, LISTAGG only works correctly with VARCHAR2 columns.

There's also an XMLAGG function, which works on versions prior to 11.2. Because WM_CONCAT is undocumented and unsupported by Oracle, it's recommended not to use it in production system.
With XMLAGG you can do the following:
SELECT XMLAGG(XMLELEMENT(E,ename||',')).EXTRACT('//text()') "Result"
FROM employee_names
What this does is
put the values of the ename column (concatenated with a comma) from the employee_names table in an xml element (with tag E)
extract the text of this
aggregate the xml (concatenate it)
call the resulting column "Result"

With SQL model clause:
SQL> select pid
2 , ltrim(sentence) sentence
3 from ( select pid
4 , seq
5 , sentence
6 from b
7 model
8 partition by (pid)
9 dimension by (seq)
10 measures (descr,cast(null as varchar2(100)) as sentence)
11 ( sentence[any] order by seq desc
12 = descr[cv()] || ' ' || sentence[cv()+1]
13 )
14 )
15 where seq = 1
16 /
P SENTENCE
- ---------------------------------------------------------------------------
A Have a nice day
B Nice Work.
C Yes we can do this work!
3 rows selected.
I wrote about this here. And if you follow the link to the OTN-thread you will find some more, including a performance comparison.

The LISTAGG analytic function was introduced in Oracle 11g Release 2, making it very easy to aggregate strings.
If you are using 11g Release 2 you should use this function for string aggregation.
Please refer below url for more information about string concatenation.
http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php
String Concatenation

As most of the answers suggest, LISTAGG is the obvious option. However, one annoying aspect with LISTAGG is that if the total length of concatenated string exceeds 4000 characters( limit for VARCHAR2 in SQL ), the below error is thrown, which is difficult to manage in Oracle versions upto 12.1
ORA-01489: result of string concatenation is too long
A new feature added in 12cR2 is the ON OVERFLOW clause of LISTAGG.
The query including this clause would look like:
SELECT pid, LISTAGG(Desc, ' ' on overflow truncate) WITHIN GROUP (ORDER BY seq) AS desc
FROM B GROUP BY pid;
The above will restrict the output to 4000 characters but will not throw the ORA-01489 error.
These are some of the additional options of ON OVERFLOW clause:
ON OVERFLOW TRUNCATE 'Contd..' : This will display 'Contd..' at
the end of string (Default is ... )
ON OVERFLOW TRUNCATE '' : This will display the 4000 characters
without any terminating string.
ON OVERFLOW TRUNCATE WITH COUNT : This will display the total
number of characters at the end after the terminating characters.
Eg:- '...(5512)'
ON OVERFLOW ERROR : If you expect the LISTAGG to fail with the
ORA-01489 error ( Which is default anyway ).

For those who must solve this problem using Oracle 9i (or earlier), you will probably need to use SYS_CONNECT_BY_PATH, since LISTAGG is not available.
To answer the OP, the following query will display the PID from Table A and concatenate all the DESC columns from Table B:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT a.pid, seq, description
FROM table_a a, table_b b
WHERE a.pid = b.pid(+)
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
There may also be instances where keys and values are all contained in one table. The following query can be used where there is no Table A, and only Table B exists:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT pid, seq, description
FROM table_b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
All values can be reordered as desired. Individual concatenated descriptions can be reordered in the PARTITION BY clause, and the list of PIDs can be reordered in the final ORDER BY clause.
Alternately: there may be times when you want to concatenate all the values from an entire table into one row.
The key idea here is using an artificial value for the group of descriptions to be concatenated.
In the following query, the constant string '1' is used, but any value will work:
SELECT SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY unique_id ORDER BY pid, seq) rnum, description
FROM (
SELECT '1' unique_id, b.pid, b.seq, b.description
FROM table_b b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1;
Individual concatenated descriptions can be reordered in the PARTITION BY clause.
Several other answers on this page have also mentioned this extremely helpful reference:
https://oracle-base.com/articles/misc/string-aggregation-techniques

LISTAGG delivers the best performance if sorting is a must(00:00:05.85)
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
COLLECT delivers the best performance if sorting is not needed(00:00:02.90):
SELECT pid, TO_STRING(CAST(COLLECT(Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
COLLECT with ordering is bit slower(00:00:07.08):
SELECT pid, TO_STRING(CAST(COLLECT(Desc ORDER BY Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
All other techniques were slower.

Before you run a select query, run this:
SET SERVEROUT ON SIZE 6000
SELECT XMLAGG(XMLELEMENT(E,SUPLR_SUPLR_ID||',')).EXTRACT('//text()') "SUPPLIER"
FROM SUPPLIERS;

Try this code:
SELECT XMLAGG(XMLELEMENT(E,fieldname||',')).EXTRACT('//text()') "FieldNames"
FROM FIELD_MASTER
WHERE FIELD_ID > 10 AND FIELD_AREA != 'NEBRASKA';

In the select where you want your concatenation, call a SQL function.
For example:
select PID, dbo.MyConcat(PID)
from TableA;
Then for the SQL function:
Function MyConcat(#PID varchar(10))
returns varchar(1000)
as
begin
declare #x varchar(1000);
select #x = isnull(#x +',', #x, #x +',') + Desc
from TableB
where PID = #PID;
return #x;
end
The Function Header syntax might be wrong, but the principle does work.

Related

SQL query to get the columns into one and comma delimited [duplicate]

Would it be possible to construct SQL to concatenate column values from
multiple rows?
The following is an example:
Table A
PID
A
B
C
Table B
PID SEQ Desc
A 1 Have
A 2 a nice
A 3 day.
B 1 Nice Work.
C 1 Yes
C 2 we can
C 3 do
C 4 this work!
Output of the SQL should be -
PID Desc
A Have a nice day.
B Nice Work.
C Yes we can do this work!
So basically the Desc column for out put table is a concatenation of the SEQ values from Table B?
Any help with the SQL?
There are a few ways depending on what version you have - see the oracle documentation on string aggregation techniques. A very common one is to use LISTAGG:
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
Then join to A to pick out the pids you want.
Note: Out of the box, LISTAGG only works correctly with VARCHAR2 columns.
There's also an XMLAGG function, which works on versions prior to 11.2. Because WM_CONCAT is undocumented and unsupported by Oracle, it's recommended not to use it in production system.
With XMLAGG you can do the following:
SELECT XMLAGG(XMLELEMENT(E,ename||',')).EXTRACT('//text()') "Result"
FROM employee_names
What this does is
put the values of the ename column (concatenated with a comma) from the employee_names table in an xml element (with tag E)
extract the text of this
aggregate the xml (concatenate it)
call the resulting column "Result"
With SQL model clause:
SQL> select pid
2 , ltrim(sentence) sentence
3 from ( select pid
4 , seq
5 , sentence
6 from b
7 model
8 partition by (pid)
9 dimension by (seq)
10 measures (descr,cast(null as varchar2(100)) as sentence)
11 ( sentence[any] order by seq desc
12 = descr[cv()] || ' ' || sentence[cv()+1]
13 )
14 )
15 where seq = 1
16 /
P SENTENCE
- ---------------------------------------------------------------------------
A Have a nice day
B Nice Work.
C Yes we can do this work!
3 rows selected.
I wrote about this here. And if you follow the link to the OTN-thread you will find some more, including a performance comparison.
The LISTAGG analytic function was introduced in Oracle 11g Release 2, making it very easy to aggregate strings.
If you are using 11g Release 2 you should use this function for string aggregation.
Please refer below url for more information about string concatenation.
http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php
String Concatenation
As most of the answers suggest, LISTAGG is the obvious option. However, one annoying aspect with LISTAGG is that if the total length of concatenated string exceeds 4000 characters( limit for VARCHAR2 in SQL ), the below error is thrown, which is difficult to manage in Oracle versions upto 12.1
ORA-01489: result of string concatenation is too long
A new feature added in 12cR2 is the ON OVERFLOW clause of LISTAGG.
The query including this clause would look like:
SELECT pid, LISTAGG(Desc, ' ' on overflow truncate) WITHIN GROUP (ORDER BY seq) AS desc
FROM B GROUP BY pid;
The above will restrict the output to 4000 characters but will not throw the ORA-01489 error.
These are some of the additional options of ON OVERFLOW clause:
ON OVERFLOW TRUNCATE 'Contd..' : This will display 'Contd..' at
the end of string (Default is ... )
ON OVERFLOW TRUNCATE '' : This will display the 4000 characters
without any terminating string.
ON OVERFLOW TRUNCATE WITH COUNT : This will display the total
number of characters at the end after the terminating characters.
Eg:- '...(5512)'
ON OVERFLOW ERROR : If you expect the LISTAGG to fail with the
ORA-01489 error ( Which is default anyway ).
For those who must solve this problem using Oracle 9i (or earlier), you will probably need to use SYS_CONNECT_BY_PATH, since LISTAGG is not available.
To answer the OP, the following query will display the PID from Table A and concatenate all the DESC columns from Table B:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT a.pid, seq, description
FROM table_a a, table_b b
WHERE a.pid = b.pid(+)
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
There may also be instances where keys and values are all contained in one table. The following query can be used where there is no Table A, and only Table B exists:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT pid, seq, description
FROM table_b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
All values can be reordered as desired. Individual concatenated descriptions can be reordered in the PARTITION BY clause, and the list of PIDs can be reordered in the final ORDER BY clause.
Alternately: there may be times when you want to concatenate all the values from an entire table into one row.
The key idea here is using an artificial value for the group of descriptions to be concatenated.
In the following query, the constant string '1' is used, but any value will work:
SELECT SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY unique_id ORDER BY pid, seq) rnum, description
FROM (
SELECT '1' unique_id, b.pid, b.seq, b.description
FROM table_b b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1;
Individual concatenated descriptions can be reordered in the PARTITION BY clause.
Several other answers on this page have also mentioned this extremely helpful reference:
https://oracle-base.com/articles/misc/string-aggregation-techniques
LISTAGG delivers the best performance if sorting is a must(00:00:05.85)
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
COLLECT delivers the best performance if sorting is not needed(00:00:02.90):
SELECT pid, TO_STRING(CAST(COLLECT(Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
COLLECT with ordering is bit slower(00:00:07.08):
SELECT pid, TO_STRING(CAST(COLLECT(Desc ORDER BY Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
All other techniques were slower.
Before you run a select query, run this:
SET SERVEROUT ON SIZE 6000
SELECT XMLAGG(XMLELEMENT(E,SUPLR_SUPLR_ID||',')).EXTRACT('//text()') "SUPPLIER"
FROM SUPPLIERS;
Try this code:
SELECT XMLAGG(XMLELEMENT(E,fieldname||',')).EXTRACT('//text()') "FieldNames"
FROM FIELD_MASTER
WHERE FIELD_ID > 10 AND FIELD_AREA != 'NEBRASKA';
In the select where you want your concatenation, call a SQL function.
For example:
select PID, dbo.MyConcat(PID)
from TableA;
Then for the SQL function:
Function MyConcat(#PID varchar(10))
returns varchar(1000)
as
begin
declare #x varchar(1000);
select #x = isnull(#x +',', #x, #x +',') + Desc
from TableB
where PID = #PID;
return #x;
end
The Function Header syntax might be wrong, but the principle does work.

PostgreSQL problem with the "Group By" Clause

I have a table that looks something like this when ORDERED by R_Value Desc
Note that it is important to understand that the below table is SQL SELECT ordered by R_Value
Id Letter R_Value
1 A 1500
2 A 1400
4 B 800
9 B 700
10 B 600
11 A 400
12 A 200
I want my result set to look like this. The result set needs to be grouped 3
times. Each time, the letter changes from A to B or vice versa, a new group should
be formed.
Letter Max_RValue
A 1500
B 800
A 400
I have already experimented with various things- lead/lag functions, partitioning, etc. and it seems impossible. A simple Group by(Letter) obviously won't work because
all 'A' will be put in one group which is not what I want.
My next step would be to try this in a procedural language, dump it in an
intermediate table and then read the results. Before doing that, is this even
possible in Ordinary SQL?
You can use LAG() window function to check the previous value of Letter for each row and SUM() window function to create the groups of rows of consecutive occurrences.
Then aggregate in each group:
SELECT Letter, MAX(R_Value) Max_RValue
FROM (
SELECT *, SUM(flag::int) OVER (ORDER BY R_Value DESC) grp
FROM (
SELECT *, Letter <> LAG(Letter, 1, '') OVER (ORDER BY R_Value DESC) flag
FROM tablename
) t
) t
GROUP BY grp, Letter;
Or, simpler just select the rows where the Letter changes:
SELECT Letter, R_Value
FROM (
SELECT *, LAG(Letter, 1, '') OVER (ORDER BY R_Value DESC) prev_Letter
FROM tablename
) t
WHERE Letter <> prev_Letter;
See the demo.
Forpas's second solution is okay but it can be written more simply in Postgres as:
select letter, r_value
from (select t.*, lag(letter) over (order by r_value) as prev_letter
from t
) t
where prev_letter is distinct from letter;

How to group by one column and display a list in another column- How do you execute in SparkSQL [duplicate]

Would it be possible to construct SQL to concatenate column values from
multiple rows?
The following is an example:
Table A
PID
A
B
C
Table B
PID SEQ Desc
A 1 Have
A 2 a nice
A 3 day.
B 1 Nice Work.
C 1 Yes
C 2 we can
C 3 do
C 4 this work!
Output of the SQL should be -
PID Desc
A Have a nice day.
B Nice Work.
C Yes we can do this work!
So basically the Desc column for out put table is a concatenation of the SEQ values from Table B?
Any help with the SQL?
There are a few ways depending on what version you have - see the oracle documentation on string aggregation techniques. A very common one is to use LISTAGG:
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
Then join to A to pick out the pids you want.
Note: Out of the box, LISTAGG only works correctly with VARCHAR2 columns.
There's also an XMLAGG function, which works on versions prior to 11.2. Because WM_CONCAT is undocumented and unsupported by Oracle, it's recommended not to use it in production system.
With XMLAGG you can do the following:
SELECT XMLAGG(XMLELEMENT(E,ename||',')).EXTRACT('//text()') "Result"
FROM employee_names
What this does is
put the values of the ename column (concatenated with a comma) from the employee_names table in an xml element (with tag E)
extract the text of this
aggregate the xml (concatenate it)
call the resulting column "Result"
With SQL model clause:
SQL> select pid
2 , ltrim(sentence) sentence
3 from ( select pid
4 , seq
5 , sentence
6 from b
7 model
8 partition by (pid)
9 dimension by (seq)
10 measures (descr,cast(null as varchar2(100)) as sentence)
11 ( sentence[any] order by seq desc
12 = descr[cv()] || ' ' || sentence[cv()+1]
13 )
14 )
15 where seq = 1
16 /
P SENTENCE
- ---------------------------------------------------------------------------
A Have a nice day
B Nice Work.
C Yes we can do this work!
3 rows selected.
I wrote about this here. And if you follow the link to the OTN-thread you will find some more, including a performance comparison.
The LISTAGG analytic function was introduced in Oracle 11g Release 2, making it very easy to aggregate strings.
If you are using 11g Release 2 you should use this function for string aggregation.
Please refer below url for more information about string concatenation.
http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php
String Concatenation
As most of the answers suggest, LISTAGG is the obvious option. However, one annoying aspect with LISTAGG is that if the total length of concatenated string exceeds 4000 characters( limit for VARCHAR2 in SQL ), the below error is thrown, which is difficult to manage in Oracle versions upto 12.1
ORA-01489: result of string concatenation is too long
A new feature added in 12cR2 is the ON OVERFLOW clause of LISTAGG.
The query including this clause would look like:
SELECT pid, LISTAGG(Desc, ' ' on overflow truncate) WITHIN GROUP (ORDER BY seq) AS desc
FROM B GROUP BY pid;
The above will restrict the output to 4000 characters but will not throw the ORA-01489 error.
These are some of the additional options of ON OVERFLOW clause:
ON OVERFLOW TRUNCATE 'Contd..' : This will display 'Contd..' at
the end of string (Default is ... )
ON OVERFLOW TRUNCATE '' : This will display the 4000 characters
without any terminating string.
ON OVERFLOW TRUNCATE WITH COUNT : This will display the total
number of characters at the end after the terminating characters.
Eg:- '...(5512)'
ON OVERFLOW ERROR : If you expect the LISTAGG to fail with the
ORA-01489 error ( Which is default anyway ).
For those who must solve this problem using Oracle 9i (or earlier), you will probably need to use SYS_CONNECT_BY_PATH, since LISTAGG is not available.
To answer the OP, the following query will display the PID from Table A and concatenate all the DESC columns from Table B:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT a.pid, seq, description
FROM table_a a, table_b b
WHERE a.pid = b.pid(+)
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
There may also be instances where keys and values are all contained in one table. The following query can be used where there is no Table A, and only Table B exists:
SELECT pid, SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY pid ORDER BY pid, seq) rnum, pid, description
FROM (
SELECT pid, seq, description
FROM table_b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1 AND PRIOR pid = pid
GROUP BY pid
ORDER BY pid;
All values can be reordered as desired. Individual concatenated descriptions can be reordered in the PARTITION BY clause, and the list of PIDs can be reordered in the final ORDER BY clause.
Alternately: there may be times when you want to concatenate all the values from an entire table into one row.
The key idea here is using an artificial value for the group of descriptions to be concatenated.
In the following query, the constant string '1' is used, but any value will work:
SELECT SUBSTR (MAX (SYS_CONNECT_BY_PATH (description, ', ')), 3) all_descriptions
FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY unique_id ORDER BY pid, seq) rnum, description
FROM (
SELECT '1' unique_id, b.pid, b.seq, b.description
FROM table_b b
)
)
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum - 1;
Individual concatenated descriptions can be reordered in the PARTITION BY clause.
Several other answers on this page have also mentioned this extremely helpful reference:
https://oracle-base.com/articles/misc/string-aggregation-techniques
LISTAGG delivers the best performance if sorting is a must(00:00:05.85)
SELECT pid, LISTAGG(Desc, ' ') WITHIN GROUP (ORDER BY seq) AS description
FROM B GROUP BY pid;
COLLECT delivers the best performance if sorting is not needed(00:00:02.90):
SELECT pid, TO_STRING(CAST(COLLECT(Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
COLLECT with ordering is bit slower(00:00:07.08):
SELECT pid, TO_STRING(CAST(COLLECT(Desc ORDER BY Desc) AS varchar2_ntt)) AS Vals FROM B GROUP BY pid;
All other techniques were slower.
Before you run a select query, run this:
SET SERVEROUT ON SIZE 6000
SELECT XMLAGG(XMLELEMENT(E,SUPLR_SUPLR_ID||',')).EXTRACT('//text()') "SUPPLIER"
FROM SUPPLIERS;
Try this code:
SELECT XMLAGG(XMLELEMENT(E,fieldname||',')).EXTRACT('//text()') "FieldNames"
FROM FIELD_MASTER
WHERE FIELD_ID > 10 AND FIELD_AREA != 'NEBRASKA';
In the select where you want your concatenation, call a SQL function.
For example:
select PID, dbo.MyConcat(PID)
from TableA;
Then for the SQL function:
Function MyConcat(#PID varchar(10))
returns varchar(1000)
as
begin
declare #x varchar(1000);
select #x = isnull(#x +',', #x, #x +',') + Desc
from TableB
where PID = #PID;
return #x;
end
The Function Header syntax might be wrong, but the principle does work.

The number of differences in a column

I would like to retrieve a column of how many differences in letters in each row. For instance
If you have a a value "test" and another row has a value "testing ", then the differences is 4 letter between "test" and "testing ". The data of the column would be value 4
I have reflected about it and I don't know where to begin
id || value || category || differences
--------------------------------------------------
1 || test || 1 || 4
2 || testing || 1 || null
11 || candy || 2 || -3
12 || ca || 2 || null
In this scenario and context it is no difference between "Test" and "rest".
I think what you are looking for is a measure of edit difference, rather than just counting prefix similarity, for which there are a few common algorithms. Levenshtein's method is one that I've used before and I've seen it implemented as TSQL functions. The answers to this SO question suggest a couple of implementations in TSQL that you might just be able to take and use as-is.
(though take time to test the code and understand the method rather than just copying the code and using it, so that you can understand the output if something seems to go wrong - otherwise you could be creating some technical debt you'll have to pay back later)
Exactly which distance calculation method you want will depend on how you want to count certain things, for instance do you count a substitution as one change or a delete and an insert, and if your strings are long enough for it to matter do you want to consider substring moves, and so forth.
I think you just want len() and lead():
select t.id, t.value, t.category,
(len(lead(value) over (partition by t.category order by t.id) -
len(value)
) as difference
from t;
You read a next record with LEAD. Then compare the strings with LIKE or other string functions:
select
id, value, category,
case when value like next_value + '%' or next_value like value + '%'
then len(next_value) - len(value)
end as differences
from
(
select id, value, category, lead(value) over (order by id) as next_value
from mytable
) this_and_next;
If you only want to compare values within the same category use a partition clause:
lead(value) over (partition by category order by id)
UPDATE: Please see DhruvJoshi's answer on SQL Server's LEN. This function doesn't count trailing blanks, as I assumed, so you need his trick in case you want to have them counted. Here is the doc on LEN confirming this behaviour: https://technet.microsoft.com/en-us/library/ms190329(v=sql.105).aspx
create table #temp
(
id int,
value varchar(30),
category int
)
insert into #temp
select 1,'test',1
union all
select 2,'testing',1
union all
select 1,'Candy',2
union all
select 2,'Ca',2
;with cte
as
(
select id,value,category,lead(value) over (partition by category order by id) as nxtvalue
from #temp
)
select id,value,category,len(replace(nxtvalue,value,'')) as differences
from cte
you can also use self joining query like below:
--create table tbl (id int, value nvarchar(100), category int);
--insert into tbl values
--(1,N'test',1)
--,(2,N' testing',1)
--,(11,N'candy',2)
--,(12,N'ca',2);
select A.*, LEN(B.value)-LEN(A.value) as difference
from tbl A LEFT JOIN tbl B on A.id +1 =B.id and A.category=B.category
--drop table tbl
Update: I noticed that you have oddly positioned the space at the end. SQL server most times does not count the trailing spaces when calculating length. So here's the hack on above query
select A.*, LEN(B.value+'>')-LEN(A.value+'>') as difference
from tbl A LEFT JOIN tbl B on A.id +1 =B.id and A.category=B.category
As pointed out in comments, that Id's may not be consecutive, in such cases
try this :
create table #temp ( rownum int PRIMARY KEY IDENTITY(1,1), id int, value nvarchar(100), category int)
insert into #temp (id, value, category)
select id, value, category from tbl order by id asc
select A.id, A.value, A.category, LEN(B.value+'>')-LEN(A.value+'>') as difference
from #temp A LEFT JOIN #temp B on A.rownum +1 =B.rownum and A.category=B.category

How to enumerate returned rows in SQL?

I was wondering if it would be possible to enumerate returned rows. Not according to any column content but just yielding a sequential integer index. E.g.
select ?, count(*) as usercount from users group by age
would return something along the lines:
1 12
2 78
3 4
4 42
it is for https://data.stackexchange.com/
try:
SELECT
ROW_NUMBER() OVER(ORDER BY age) AS RowNumber
,count(*) as usercount
from users
group by age
If it's Oracle, use rownum.
SELECT SOMETABLE.*, ROWNUM RN
FROM SOMETABLE
WHERE SOMETABLE.SOMECOLUMN = :SOMEVALUE
ORDER BY SOMETABLE.SOMEOTHERCOLUMN;
The final answer will entirely depend on what database you're using.
For MySql:
SELECT #row := #row + 1 as row FROM anytable a, (SELECT #row := 0) r
use rownumber function available in sql server
SELECT
ROW_NUMBER() OVER (ORDER BY columnNAME) AS 'RowNumber',count(*) as usercount
FROM users
How you'd do that depends on your database server. In SQL Server, you could use row_number():
select row_number() over (order by age)
, age
, count(*) as usercount
from users
group by
age
order by
age
But it's often easier and faster to use client side row numbers.
for Mysql
set #row:=0;
select #row:=#row+1 as row, a.* from table_name as a;
In contrast to majority of other answers, and in accordance of the actual OP question, to
enumerate returned rows (...) NOT according to any column content
but rather in the order of returned query, one could use a dummy ordinal variable to ORDER BY in an ROW_NUMBER function, e.g.
ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS row_num
where one could actually use anything as an argument to the SELECT-statement, like SELECT 100, SELECT ‘A’, SELECT NULL, etc..
This way, the row numbers will be enumerated in the same order the data was added to the table.