I have a table containing many columns, I have to make my selection according to these two columns:
TIME ID
-216 AZA
215 AZA
56 EA
-55 EA
66 EA
-03 AR
03 OUI
-999 OP
999 OP
04 AR
87 AR
The expected output is
TIME ID
66 EA
03 OUI
87 AR
I need to select the rows with no matches. There are rows which have the same ID, and almost the same time but inversed with a little difference. For example the first row with the TIME -216 matches the second record with time 215. I tried to solve it in many ways, but everytime I find myself lost.
First step -- find rows with duplicate IDs. Second step -- filter for rows which are near-inverse duplicates.
First step:
SELECT t1.TIME, t2.TIME, t1.ID FROM mytable t1 JOIN mytable
t2 ON t1.ID = t2.ID AND t1.TIME > t2.TIME;
The second part of the join clause ensures we only get one record for each pair.
Second step:
SELECT t1.TIME,t2.TIME,t1.ID FROM mytable t1 JOIN mytable t2 ON t1.ID = t2.ID AND
t1.TIME > t2.TIME WHERE ABS(t1.TIME + t2.TIME) < 3;
This will produce some duplicate results if eg. (10, FI), (-10, FI) and (11, FI) are in your table as there are two valid pairs. You can possibly filter these out as follows:
SELECT t1.TIME,MAX(t2.TIME),t1.ID FROM mytable t1 JOIN mytable t2 ON
t1.ID = t2.ID AND t1.TIME > t2.TIME WHERE ABS(t1.TIME + t2.TIME) < 3 GROUP BY
t1.TIME,t1.ID;
But it's unclear which result you want to drop. Hopefully this points you in the right direction, though!
Does this help?
create table #RawData
(
[Time] int,
ID varchar(3)
)
insert into #rawdata ([time],ID)
select -216, 'AZA'
union
select 215, 'AZA'
union
select 56, 'EA'
union
select -55, 'EA'
union
select 66, 'EA'
union
select -03, 'AR'
union
select 03, 'OUI'
union
select -999, 'OP'
union
select 999, 'OP'
union
select 04, 'AR'
union
select 87, 'AR'
union
-- this value added to illustrate that the algorithm does not ignore this value
select 156, 'EA'
--create a copy with an ID to help out
create table #Data
(
uniqueId uniqueidentifier,
[Time] int,
ID varchar(3)
)
insert into #Data(uniqueId,[Time],ID) select newid(),[Time],ID from #RawData
declare #allowedDifference int
select #allowedDifference = 1
--find duplicates with matching inverse time
select *, d1.Time + d2.Time as pairDifference from #Data d1 inner join #Data d2 on d1.ID = d2.ID and (d1.[Time] + d2.[Time] <=#allowedDifference and d1.[Time] + d2.[Time] >= (-1 * #allowedDifference))
-- now find all ID's ignoring these pairs
select [Time],ID from #data
where uniqueID not in (select d1.uniqueID from #Data d1 inner join #Data d2 on d1.ID = d2.ID and (d1.[Time] + d2.[Time] <=3 and d1.[Time] + d2.[Time] >= -3))
Related
Table 1
Loc_Id
Label_Id
Active_Date
Inactive_Date
1
1001
2022/05/13
9999/12/31
2
1001
2018/05/20
2022/05/12
3
1001
2012/06/14
2018/05/12
Table 2
Label_Id
Tab2_Active_Date
Tab2_Inactive_Date
1001
2022/05/13
9999/12/31
1001
2018/05/22
2022/05/12
1001
2012/06/14
2018/05/12
I want to know which records in Table2 have Tab2_Active Date > Active Date in Table 1 and Tab2_Inactive Date < Inactive Date in Table 1.
For example in this the scenario the date Tab2_Active Date 2018/05/22 mentioned in Table 2 is greater than 2018/05/20 mentioned in table 1.
So the o/p will be
Loc_Id
Tab2_Active_Date
Tab2_Inactive_Date
2
2018/05/22
2022/05/12
Since I only have only Ids to join as the keys for 2 tables and I need to compare the dates, I cannot take dates to join the tables which results in inaccurate data.
Create table #T1
(
Loc_Id int,
Label_Id int,
Active_Date date,
Inactive_Date date
)
Create table #T2
(
Label_Id int,
Active_Date date,
Inactive_Date date
)
Insert into #T1
Select 1, 1001, '2022-05-13', '9999-12-31'
union
Select 2, 1001, '2022-05-20', '2022-05-12'
union
Select 3, 1001, '2022-06-14', '2018-05-12'
union
Select 4, 1001, '2022-07-14', '2018-08-13'
Insert into #T2
Select 1001, '2022-05-13', '9999-12-31'
union
Select 1001, '2022-05-22', '2022-05-12'
union
Select 1001, '2022-06-14', '2018-05-12'
union
Select 1001, '2022-06-14', '2018-05-12'
union
Select 1001, '2022-07-14', '2018-08-12'
;with Cte as
(
Select Label_Id, Active_Date, Inactive_Date from #T2
EXCEPT
Select Label_Id, Active_Date, Inactive_Date from #T1
)
Select t1.Loc_Id, t2.Active_Date, t2.Inactive_Date
from #T1 t1
inner join Cte t2 on t1.Label_Id = t2.Label_Id and (t2.Active_Date > t1.Active_Date and t2.Inactive_Date = t1.Inactive_Date)
union
Select t1.Loc_Id, t2.Active_Date, t2.Inactive_Date
from #T1 t1
inner join Cte t2 on t1.Label_Id = t2.Label_Id and (t2.Inactive_Date < t1.Inactive_Date and t2.Active_Date = t1.Active_Date)
Drop table #T1
Drop table #T2
Here's what I came up with
with t1 as (
select * from (values
(1, 1001, '2022-05-13', '9999-12-31'),
(2, 1001, '2018-05-20', '2022-05-12'),
(3, 1001, '2012-06-14', '2018-05-12')
) as x(Loc_ID, Label_Id, Active_Date, Inactive_Date)
),
t2 as (
select * from (values
(1001, '2022-05-13', '9999-12-31'),
(1001, '2018-05-22', '2022-05-12'),
(1001, '2012-06-14', '2018-05-12')
) as x(Label_Id, Active_Date, Inactive_Date)
)
select t1.*, '||', t2.*
from t1
join t2
on t2.Active_Date >= t1.Active_Date
and t2.Inactive_Date <= t1.Inactive_Date
and (
t1.Active_Date <> t2.Active_Date
or t1.Inactive_Date <> t2.Inactive_Date
)
Ignoring the CTEs (that's just a way to get the data into a tabular structure), the join criteria in the SELECT statement say that there must be partial overlap in the interval (which are the first two predicates on only one of active_date or inactive_date) but not complete overlap (which is the compound predicate saying that at least one of active_date or inactive_date must not match).
so I have a statement I believe should work... However it feels pretty suboptimal and I can't for the life of me figure out how to optimise it.
I have the following tables:
Transactions
[Id] is PRIMARY KEY IDENTITY
[Hash] has a UNIQUE constraint
[BlockNumber] has an Index
Transfers
[Id] is PRIMARY KEY IDENTITY
[TransactionId] is a Foreign Key referencing [Transactions].[Id]
TokenPrices
[Id] is PRIMARY KEY IDENTITY
TokenPriceAttempts
[Id] is PRIMARY KEY IDENTITY
[TransferId] is a Foreign Key referencing [Transfers].[Id]
What I want to do, is select all the transfers, with a few bits of data from their related transaction (one transaction to many transfers), where I don't currently have a price stored in TokenPrices related to that transfer.
In the first part of the query, I am getting a list of all the transfers, and calculating the DIFF between the nearest found token price. If one isn't found, this is null (what I ultimately want to select). I am allowing 3 hours either side of the transaction timestamp - if nothing is found within that timespan, it will be null.
Secondly, I am selecting from this set, ensuring first that diff is null as this means the price is missing, and finally that token price attempts either doesn't have an entry for attempting to get a price, or if it does than it has fewer than 5 attempts listed and the last attempt was more than a week ago.
The way I have laid this out results in essentially 3 of the same / similar SELECT statements within the WHERE clause, which feels hugely suboptimal...
How could I improve this approach?
WITH [transferDateDiff] AS
(
SELECT
[t1].[Id],
[t1].[TransactionId],
[t1].[From],
[t1].[To],
[t1].[Value],
[t1].[Type],
[t1].[ContractAddress],
[t1].[TokenId],
[t2].[Hash],
[t2].[Timestamp],
ABS(DATEDIFF(SECOND, [tp].[Timestamp], [t2].[Timestamp])) AS diff
FROM
[dbo].[Transfers] AS [t1]
LEFT JOIN
[dbo].[Transactions] AS [t2]
ON [t1].[TransactionId] = [t2].[Id]
LEFT JOIN
(
SELECT
*
FROM
[dbo].[TokenPrices]
)
AS [tp]
ON [tp].[ContractAddress] = [t1].[ContractAddress]
AND [tp].[Timestamp] >= DATEADD(HOUR, - 3, [t2].[Timestamp])
AND [tp].[Timestamp] <= DATEADD(HOUR, 3, [t2].[Timestamp])
WHERE
[t1].[Type] < 2
)
SELECT
[tdd].[Id],
[tdd].[TransactionId],
[tdd].[From],
[tdd].[To],
[tdd].[Value],
[tdd].[Type],
[tdd].[ContractAddress],
[tdd].[TokenId],
[tdd].[Hash],
[tdd].[Timestamp]
FROM
[transferDateDiff] AS tdd
WHERE
[tdd].[diff] IS NULL AND
(
(
SELECT
COUNT(*)
FROM
[dbo].[TokenPriceAttempts] tpa
WHERE
[tpa].[TransferId] = [tdd].[Id]
)
= 0 OR
(
(
SELECT
COUNT(*)
FROM
[dbo].[TokenPriceAttempts] tpa
WHERE
[tpa].[TransferId] = [tdd].[Id]
)
< 5 AND
(
DATEDIFF(DAY,
(
SELECT
MAX([tpa].[Created])
FROM
[dbo].[TokenPriceAttempts] tpa
WHERE
[tpa].[TransferId] = [tdd].[Id]
),
CURRENT_TIMESTAMP
) >= 7
)
)
)
Here is an attempt to help simplify. I stripped out all the [brackets] that really are not required unless you are running into something like a reserved keyword, or columns with spaces in their name (bad to begin with).
Anyhow, your main query had 3 instances of a select per ID. To eliminate that, I did a LEFT JOIN to a subquery that pulls all transfers of type < 2 AND JOINS to the price attempts ONCE. This way, the result will have already pre-aggregated the count(*) and Max(Created) done ONCE for the same basis of transfers in question with your WITH CTE declaration. So you dont have to keep running the 3 queries each time, and you dont have to query the entire table of ALL transfers, just those with same underlying type < 2 condition. The result subquery alias "PQ" (preQuery)
This now simplifies the readability of the outer WHERE clause from the redundant counts per Id.
WITH transferDateDiff AS
(
SELECT
t1.Id,
t1.TransactionId,
t1.From,
t1.To,
t1.Value,
t1.Type,
t1.ContractAddress,
t1.TokenId,
t2.Hash,
t2.Timestamp,
ABS( DATEDIFF( SECOND, tp.Timestamp, t2.Timestamp )) AS diff
FROM
dbo.Transfers t1
LEFT JOIN dbo.Transactions t2
ON t1.TransactionId = t2.Id
LEFT JOIN dbo.TokenPrices tp
ON t1.ContractAddress = tp.ContractAddress
AND tp.Timestamp >= DATEADD(HOUR, - 3, t2.Timestamp)
AND tp.Timestamp <= DATEADD(HOUR, 3, t2.Timestamp)
WHERE
t1.Type < 2
)
SELECT
tdd.Id,
tdd.TransactionId,
tdd.From,
tdd.To,
tdd.Value,
tdd.Type,
tdd.ContractAddress,
tdd.TokenId,
tdd.Hash,
tdd.Timestamp
FROM
transferDateDiff tdd
LEFT JOIN
( SELECT
t1.Id,
COUNT(*) Attempts,
MAX(tpa.Created) MaxCreated
FROM
dbo.Transfers t1
JOIN dbo.TokenPriceAttempts tpa
on t1.Id = tpa.TransferId
WHERE
t1.Type < 2
GROUP BY
t1.Id ) PQ
on tdd.Id = PQ.Id
WHERE
tdd.diff IS NULL
AND ( PQ.Attempts IS NULL
OR PQ.Attempts = 0
OR ( PQ.Attempts < 5
AND DATEDIFF(DAY, PQ.MaxCreated, CURRENT_TIMESTAMP ) >= 7
)
)
REVISED to remove the WITH CTE into a single query
SELECT
t1.Id,
t1.TransactionId,
t1.From,
t1.To,
t1.Value,
t1.Type,
t1.ContractAddress,
t1.TokenId,
t2.Hash,
t2.Timestamp
FROM
-- Now, this pre-query is left-joined to token price attempts
-- so ALL Transfers of type < 2 are considered
( SELECT
t1.Id,
coalesce( COUNT(*), 0 ) Attempts,
MAX(tpa.Created) MaxCreated
FROM
dbo.Transfers t1
LEFT JOIN dbo.TokenPriceAttempts tpa
on t1.Id = tpa.TransferId
WHERE
t1.Type < 2
GROUP BY
t1.Id ) PQ
-- Now, we can just directly join to transfers for the rest
JOIN dbo.Transfers t1
on PQ.Id = t1.Id
-- and the rest from the WITH CTE construct
LEFT JOIN dbo.Transactions t2
ON t1.TransactionId = t2.Id
LEFT JOIN dbo.TokenPrices tp
ON t1.ContractAddress = tp.ContractAddress
AND tp.Timestamp >= DATEADD(HOUR, - 3, t2.Timestamp)
AND tp.Timestamp <= DATEADD(HOUR, 3, t2.Timestamp)
WHERE
ABS( DATEDIFF( SECOND, tp.Timestamp, t2.Timestamp )) IS NULL
AND ( PQ.Attempts = 0
OR ( PQ.Attempts < 5
AND DATEDIFF(DAY, PQ.MaxCreated, CURRENT_TIMESTAMP ) >= 7 )
)
I don't understand why you are doing this:
LEFT JOIN
(
SELECT
*
FROM
[dbo].[TokenPrices]
)
AS [tp]
ON [tp].[ContractAddress] = [t1].[ContractAddress]
AND [tp].[Timestamp] >= DATEADD(HOUR, - 3, [t2].[Timestamp])
AND [tp].[Timestamp] <= DATEADD(HOUR, 3, [t2].[Timestamp])
Isn't this a
LEFT JOIN [dbo].[TokenPrices] as TP ...
This:
SELECT
COUNT(*)
FROM
[dbo].[TokenPriceAttempts] tpa
WHERE
[tpa].[TransferId] = [tdd].[Id]
could be another CTE instead of being a sub...
In fact any of your sub queries could be CTE's, that's part of a CTE is making things easier to read.
,TPA
AS
(
SELECT COUNT(*)
FROM [dbo].[TokenPriceAttempts] tpa
WHERE [tpa].[TransferId] = [tdd].[Id]
)
I have two tables, Table 1 and Table 2. Table 1 have columns "start" and "end" . Table 2 has column "position" and "Sequence". I would like to extract the sequences from Table 2 from position = start to position = end and the create a new column with the concatenated string.
Table 1
Start
End
100
104
105
109
Table 2
Position
Seq
100
A
101
T
102
C
103
T
104
G
105
T
106
T
107
G
108
T
109
G
My final result needs to be
Start
End
Sequence
100
104
ATCTG
105
109
TTGTG
I tried concatenating the values in the Table 2 using the below statement
SELECT Sequence = (Select '' + Seq
from Table2
where Position >= 100 and Position <= 104
order by Position FOR XML PATH('')
)
You don't state what DBMS you are using so here is a SQL Server solution using a CTE and FOR XML to perform the transpose:
; WITH SequenceCTE AS
(
SELECT [Start],
[End],
Seq
FROM Table1 a
JOIN Table2 b
ON b.Position >= a.[Start] AND
b.Position <= a.[End]
)
SELECT DISTINCT
a.[Start],
a.[End],
(
SELECT STUFF(',' + Seq,1,1,'')
FROM SequenceCTE b
WHERE a.[Start] = b.[Start] AND
a.[End] = b.[end]
FOR XML PATH ('')
)
FROM SequenceCTE a
In standard SQL, you can do something like this:
select t1.start, t1.end,
listagg(t2.position, '') within group (order by t2.seq) as sequence
from table1 t1 join
table2 t2
on t2.position between t1.start and t2.end
group by t1.start, t1.end;
Most databases support aggregate string concatenation, but the function may have a different name and slightly different syntax.
Note that start and end are poor names for columns because they are SQL keywords -- as is sequence in most databases.
You can generate row numbers for your first table which can later be used to group the ranges after joining on those numbers:
with to_id as (select row_number(*) over (order by t1.start) id, t1.* from table1 t1),
ranges as (select t3.id, t2.* from table2 t2 join to_id t3 on t3.start <= t2.position and t2.position <= t3.end)
select t3.start, t3.end, group_concat(r1.seq, '') from ranges r1 join to_id t3 on r1.id = t3.id group by r1.id;
Look into how crosstab queries are done.
I need to reference a field from the outer query in a derived table . The issue is that I need to limit the max date that is fetched from the derived table using a value from the outer table (A in this case) because the outer table is a temp work table that is populated with specific values by a process.
Below approach is incorrect as it cannot reference the outer table correctly. Is there a better way to write this?
Below is the example of how I want it to work:
SELECT A.EMP, X.SCHEDULE, A.DATE FROM CUST A, (SELECT T1.EMP, T1.NAME, CASE WHEN T1.USER1 = '3' THEN T2.USER4 ELSE T1.USER4 END AS Schedule
FROM TEMP1 T1, TEMP2 T2, TEMP3 T3
WHERE T1.EMP = T3.EMP
AND T2.VALUE= T3.VALUE
AND T1.DATE = (SELECT MAX(T1A.DATE) FROM TEMP1 T1A
WHERE T1A.EMP = T1.EMP
AND T1A.DATE <= A.DATE)
AND T2.DATE = (SELECT MAX(T2A.DATE) FROM TEMP2 T2A
WHERE T2A.VALUE= T2.VALUE
AND T2A.DATE <= A.DATE)
AND T3.DATE = (SELECT MAX(T3A.DATE) FROM TEMP3 T3A
WHERE T3A.EMP = T3.EMP
AND T3A.VALUE = T3.VALUE
AND T3A.DATE <= A.DATE)) X
WHERE A.EMP = X.EMP
AND X.EMP IN ('1','2');
Below is some sample data and results:
TABLE CUST
EMP DATE VALUE
1 1/1/17 R
2 2/1/17 R
TABLE TEMP1
EMP DATE USER1 USER4
1 3/2/16 3 4
1 5/1/17 3 3
2 2/1/17 9 2
TABLE TEMP2
DATE VALUE USER4
1/1/01 S 100
1/1/03 P 200
1/3/07 R 300
8/1/17 R 350
TABLE TEMP3
EMP DATE VALUE
1 3/2/16 R
1 5/1/17 R
2 2/1/17 R
The sample output should be:
EMP SCHEDULE DATE
1 300 1/1/17
2 2 2/1/17
I tried with your sample data and i got the output. I have rewritten your query and got the output expected.
SELECT A.emp, A.tdate,
CASE WHEN T1.USER1 = '3' THEN T2.USER4 ELSE T1.USER4 END AS Schedule
FROM CUST A, Temp1 t1, temp2 t2, temp3 t3
WHERE A.emp=t1.emp
AND A.tvalue = t2.tvalue
AND A.emp = t3.emp
AND A.tvalue = t3.tvalue
AND t1.tdate <=(SELECT max(tdate) from temp1 t where tdate<=A.tdate and t.emp=A.emp)
AND t2.tdate <= (SELECT max(tdate) from temp2 t where tdate <= A.tdate and A.tvalue = t.tvalue)
AND t3.tdate <=(SELECT max(tdate) from temp3 t where tdate <= A.tdate and A.tvalue = t.tvalue and t.emp=A.emp)
If you are going to have only one date in temp tables that is less than the date in cust table, then use the below query
SELECT A.emp, A.tdate,
CASE WHEN T1.USER1 = '3' THEN T2.USER4 ELSE T1.USER4 END AS Schedule
FROM CUST A, Temp1 t1, temp2 t2, temp3 t3
WHERE A.emp=t1.emp
AND A.tvalue = t2.tvalue
AND A.emp = t3.emp
AND A.tvalue = t3.tvalue
AND t1.tdate <=A.tdate
AND t2.tdate <= A.tdate
AND t3.tdate <= A.tdate
NOTE:- Column names like date and value willl throw error while table creation. They are reserved keywords
As "date" is a SQL reserved word (for a data type in Oracle) I would never use that as a column name. Below I have used DATECOL instead.
I think it is much easier to just compare the dates via a simple set of joins.
See this working here at SQL Fiddle
CREATE TABLE CUST
(EMP int, DATECOL date, VALUE varchar2(1))
;
INSERT ALL
INTO CUST (EMP, DATECOL, VALUE)
VALUES (1, to_date('01-Jan-2017','dd-mon-yyyy'), 'R')
INTO CUST (EMP, DATECOL, VALUE)
VALUES (2, to_date('01-Feb-2017','dd-mon-yyyy'), 'R')
SELECT * FROM dual
;
CREATE TABLE TEMP1
(EMP int, DATECOL date, USER1 int, USER4 int)
;
INSERT ALL
INTO TEMP1 (EMP, DATECOL, USER1, USER4)
VALUES (1, to_date('02-Mar-2016','dd-mon-yyyy'), 3, 4)
INTO TEMP1 (EMP, DATECOL, USER1, USER4)
VALUES (1, to_date('01-May-2017','dd-mon-yyyy'), 3, 3)
INTO TEMP1 (EMP, DATECOL, USER1, USER4)
VALUES (2, to_date('01-Feb-2017','dd-mon-yyyy'), 9, 2)
SELECT * FROM dual
;
CREATE TABLE TEMP2
(DATECOL date, VALUE varchar2(1), USER4 int)
;
INSERT ALL
INTO TEMP2 (DATECOL, VALUE, USER4)
VALUES (to_date('01-Jan-2001','dd-mon-yyyy'), 'S', 100)
INTO TEMP2 (DATECOL, VALUE, USER4)
VALUES (to_date('01-Jan-2003','dd-mon-yyyy'), 'P', 200)
INTO TEMP2 (DATECOL, VALUE, USER4)
VALUES (to_date('03-Jan-2007','dd-mon-yyyy'), 'R', 300)
INTO TEMP2 (DATECOL, VALUE, USER4)
VALUES (to_date('01-Aug-2017','dd-mon-yyyy'), 'R', 350)
SELECT * FROM dual
;
CREATE TABLE TEMP3
(EMP int, DATECOL date, VALUE varchar2(1))
;
INSERT ALL
INTO TEMP3 (EMP, DATECOL, VALUE)
VALUES (1, to_date('02-Mar-2016','dd-mon-yyyy'), 'R')
INTO TEMP3 (EMP, DATECOL, VALUE)
VALUES (1, to_date('01-May-2017','dd-mon-yyyy'), 'R')
INTO TEMP3 (EMP, DATECOL, VALUE)
VALUES (2, to_date('01-Feb-2017','dd-mon-yyyy'), 'R')
SELECT * FROM dual
;
Query 1:
SELECT
T1.EMP
--, T1.NAME
, CASE WHEN T1.USER1 = '3'
THEN
T2.USER4 ELSE
T1.USER4
END AS Schedule
, c.datecol
, t1.datecol t1date
, t2.datecol t2date
, t3.datecol t3date
FROM TEMP1 T1
INNER JOIN cust c ON T1.EMP = c.EMP
INNER JOIN TEMP3 T3 ON T1.EMP = T3.EMP
INNER JOIN TEMP2 T2 ON T3.VALUE = T2.VALUE
WHERE t1.datecol <= c.datecol
AND t2.datecol <= c.datecol
AND t3.datecol <= c.datecol
Results:
| EMP | SCHEDULE | DATECOL | T1DATE | T2DATE | T3DATE |
|-----|----------|----------------------|----------------------|----------------------|----------------------|
| 1 | 300 | 2017-01-01T00:00:00Z | 2016-03-02T00:00:00Z | 2007-01-03T00:00:00Z | 2016-03-02T00:00:00Z |
| 2 | 2 | 2017-02-01T00:00:00Z | 2017-02-01T00:00:00Z | 2007-01-03T00:00:00Z | 2017-02-01T00:00:00Z |
Also, over 25 years ago SQL standardized on a way better method of joining tables together. This simple trick to remember is stop using commas between table names in the FROM clause. This helps to ensure explicit join syntax is adopted.
Suppose that I have the following table:
ID: STR:
01 abc
02 abcdef
03 abx
04 abxy
05 abxyz
06 abxyv
I need to use an SQL query that returns the ID column and the occurrences of the corresponding string as a prefix for string in other rows. E.g. The desired result for the table above:
ID: OCC:
01 2
02 1
03 4
04 3
05 1
06 1
You could JOIN the table with itself and GROUP BY the ID to get you the result.
SELECT t1.ID, COUNT(*)
FROM ATable t1
INNER JOIN ATable t2 ON t2.Str LIKE t1.Str + '%'
GROUP BY
t1.ID
Some notes:
You want to make sure you have an index on the Str column
Depending on the amount of data, your DBMS might choke on the amount it has to handle. Worst case, you are asking for the SQR(#Rows) in your table.
Can can do that in SQL Server's T-SQL with the following code. Caution: I do not guarantee how this will perform though with a large dataset!
Declare #Table table
(
Id int,
String varchar(10)
)
Insert Into #Table
( Id, String )
Values ( 1, 'abc' ),
( 2, 'abcdef' ),
( 3, 'abx' ),
( 4, 'abxy' ),
( 5, 'abxyz' ),
( 6, 'abxyv' )
Select t.Id,
t.String
From #Table as t
Inner Join #Table as t2 On t2.String Like t.String + '%'
Order By t.Id
Select t.Id,
Count(*) As 'Count'
From #Table as t
Inner Join #Table as t2 On t2.String Like t.String + '%'
Group By t.Id
Order By t.Id