How to concatenate strings in Teradata - sql

I use Teradata 14 with all strtok and other new functions , but i am not allowed to write my own functions .
In the following table each person has many skills . How can I concatenate those skills ?
team person
1 Mike (swi)
1 Nick (dri)
1 Mike (coo)
2
3 Kate (swi)
3 Kate (coo)
3 Kate (dri)
3 Wend (fly)
4 Pete (jum)
Desired table is
team person
1 Mike (swi coo), Nick (dri),
2
3 Kate (swi coo dri), Wend(fly),
4 Pete (jum),
How can I concatenate strings ?

You should use recursive queries to do such thing without the use of UDFs. I have given you the query to aggregate skills use similar approach to get the end result.
CREATE Volatile Table TempTable1
as
(
SELECT
team
,substr(person,0,Index(trim(person),'(')) as name
,substr(person,Index(person,'(')+1,3) as skill
,Row_Number() Over(Partition by team,name order by skill) as rnk
from
MainTable)
WITH DATA
Primary Index(team,name)
ON COMMIT Preserve Rows;
CREATE VOLATILE TABLE temp_table2 (team,name)
as
(WITH RECURSIVE temp_table3 (team,name,skill,rnk,lev)
AS
(
SELECT team,name,cast(skill as varchar(1000)),rnk,1 as lev
from TempTable1
where rnk = 1
UNION ALL
SELECT t1.team,t1.name,t1.skill||','||t2.skill,t1.rnk,t2.lev+1
FROM
TempTable1 t1
Inner join
temp_table3 t2
on t1.team = t2.team
AND t1.name = t2.name
AND t1.rnk = t2.rnk + 1
)
SELECT team,name||'('||skill||')' as new_name
from temp_table3
qualify rank() over (partition by team,name order by lev desc) = 1)
WITH DATA
ON COMMIT PRESERVE ROWS;

Related

deleting specific duplicate and original entries in a table based on date

i have a table called "main" which has 4 columns, ID, name, DateID and Sign.
i want to create a query that will delete entries in this table if there is the same ID record in twice within a certain DateID.
i have my where clause that searches the previous 3 weeks
where DateID =((SELECT MAX( DateID)
WHERE DateID < ( SELECT MAX( DateID )-3))
e.g of my dataset im working with:
id
name
DateID
sign
12345
Paul
1915
Up
23658
Danny
1915
Down
37868
Jake
1916
Up
37542
Elle
1917
Up
12345
Paul
1917
Down
87456
John
1918
Up
78563
Luke
1919
Up
23658
Danny
1920
Up
in the case above, both entries for ID 12345 would need to be removed.
however the entries for ID 23658 would need to be kept as the DateID > 3
how would this be possible?
You can use window functions for this.
It's not quite clear, but it seems LAG and conditional COUNT should fit what you need.
DELETE t
FROM (
SELECT *,
CountWithinDate = COUNT(CASE WHEN t.PrevDate >= t.DateId - 3 THEN 1 END) OVER (PARTITION BY t.id)
FROM (
SELECT *,
PrevDate = LAG(t.DateID) OVER (PARTITION BY t.id ORDER BY t.DateID)
FROM YourTable t
) t
) t
WHERE CountWithinDate > 0;
db<>fiddle
Note that you do not need to re-join the table, you can delete directly from the t derived table.
Hope this works:
DELETE FROM test_tbl
WHERE id IN (
SELECT T1.id
FROM test_tbl T1
WHERE EXISTS (SELECT 1 FROM test_tbl T2 WHERE T1.id = T2.id AND ABS(T2.dateid - T1.dateid) < 3 AND T1.dateid <> T2.dateid)
)
In case you need more logic for data processing, I would suggest using Stored Procedure.

Sql Query: How to Base on the row name to display

I have the table data as listed on below:
name | score
andy | 1
leon | 2
aaron | 3
I want to list out as below, even no jacky's data, but list his name and score set to 0
aaron 3
andy 2
jacky 0
leon 2
You didn't specify your DBMS, but the following is 100% standard ANSI SQL:
select v.name, coalesce(t.score, 0) as score
from (
values ('andy'),('leon'),('aaron'),('jacky')
) as v(name)
left join your_table t on t.name = v.name;
The values clause builds up a "virtual table" that contains the names you are interested in. Then this is used in a left join so that all names from the virtual table are returned plus the existing scores from your (unnamed table). For non-existing scores, NULL is returned which is turned to 0 using coalesce()
If you only want to specify the missing names, you can use a UNION in the virtual table:
select v.name, coalesce(t.score, 0) as score
from (
select t1.name
from your_table t1
union
select *
from ( values ('jacky')) as x
) as v(name)
left join your_table t on t.name = v.name;
fixed the query, could list out the data, but still missing jacky, only could list out as shown on below, the DBMS. In SQL is SQL2008.
data
name score scoredate
andy 1 2021-08-10 01:23:16
leon 2 2021-08-10 03:25:16
aaron 3 2021-08-10 06:25:16
andy 4 2021-08-10 11:25:16
leon 5 2021-08-10 13:25:16
result set
name | score
aaron | 1
andy | 5
leon | 7
select v.name as Name,
coalesce(sum(t.score),0) as Score
from (
values ('aaron'), ('andy'), ('jacky'), ('leon')
) as v(name)
left join Score t on t.name=v.name
where scoredate>='2021-08-10 00:00:00'
and scoredate<='2021-08-10 23:59:59'
group by v.name
order by v.name asc
Your question lacks a bunch of information, such as where "Jacky"s name comes from. If you have a list of names that you know are not in the table, just use union all:
select name, score
from t
union all
select 'Jacky', 0;

Fuzzy lookup in SQL to match names

I am stuck at a problem where I need to populate historical data using Fuzzy match. I'm using SQL Server 2014 Developer Edition
MainTbl.UNDERWRITER_CODE is where data needs to be populated in place of NULL. This data needs to be from LKP table. The Matching criteria is MainTbl.UNDERWRITER_NAME with LKP.UNDERWRTIER_NAME
sample:
CREATE TABLE MainTbl(UNDERWRITER_CODE int, UNDERWRITER_NAME varchar(100))
INSERT INTO MainTbl VALUES
(NULL,'dylan.campbell'),
(NULL,'dylanadmin'),
(NULL,'dylanc'),
(002,'Dylan Campbell'),
(002,'dylan.campbell'),
(002,'dylanadmin'),
(NULL,'scott.noffsinger'),
(001,'Scott Noffsinger')
CREATE TABLE LKP(UNDERWRITER_CODE int, UNDERWRITER_NAME varchar(100))
INSERT INTO LKP VALUES
(002,'Dylan Campbell'),
(001,'Scott Noffsinger')
expected output:
2 dylan.campbell
2 dylanadmin
2 dylanc
2 Dylan Campbell
2 dylan.campbell
2 dylanadmin
1 scott.noffsinger
1 Scott Noffsinger
SQL is not really designed for such fuzzy string comparisons. However, SQL Server has a function called difference(), which works for your data:
select mt.*, l.*
from maintbl mt outer apply
(select top (1) lkp.*
from lkp
order by difference(mt.underwriter_name, lkp.underwriter_name) desc
) l;
Here is a db<>fiddle.
UPDATE T1 SET T1.UNDERWRITER_CODE = T2.UNDERWRITER_CODE
FROM MainTbl T1
INNER JOIN LKP T2
ON T1.UNDERWRITER_NAME LIKE CONCAT('%', LEFT( LOWER(T2.UNDERWRITER_NAME)
,CHARINDEX(' '
,LOWER(T2.UNDERWRITER_NAME)
) - 1
)
, '%'
)
Output
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=23a3a55cc1ab1741f6e70dd210db0471
Explanation
Step 1:
SELECT *
,CONCAT('%', LEFT( LOWER(T2.UNDERWRITER_NAME)
,CHARINDEX(' '
,LOWER(T2.UNDERWRITER_NAME)
) - 1
)
, '%'
) AS JOIN_COL
FROM LKP T2
Output of above Query
UNDERWRITER_CODE UNDERWRITER_NAME JOIN_COL
2 Dylan Campbell %dylan%
1 Scott Noffsinger %scott%
Used the above JOIN_COL data format in join condion with like operator
Step 2:
SELECT T2.UNDERWRITER_CODE,T1.UNDERWRITER_NAME
FROM MainTbl T1
INNER JOIN LKP T2
ON T1.UNDERWRITER_NAME LIKE CONCAT('%', LEFT( LOWER(T2.UNDERWRITER_NAME)
,CHARINDEX(' '
,LOWER(T2.UNDERWRITER_NAME)
) - 1
)
, '%'
)
Output of above query:
UNDERWRITER_CODE UNDERWRITER_NAME
2 dylan.campbell
2 dylanadmin
2 dylanc
2 Dylan Campbell
2 dylan.campbell
2 dylanadmin
1 scott.noffsinger
1 Scott Noffsinger
First, fuzzy lookup is a little vague. There are a number of algorithms that are used for fuzzy matching including the Levenshtein Distance, Longest Common Subsequence, and some others referenced in the "See Also" section of this Wikipedia page about Approximate String Matching.
To rephrase what you are attempting to do. You are updating the UNDERWRITER_CODE column in MainTbl with the UNDERWRITER_CODE that matches the most similar UNDERWRITER_NAME in LKP. Fuzzy algorithms can be used for measuring similarity. Note my post here. For the sample data you provided we can use Phil Factor's T-SQL Levenshtein functions and match based on the lowest Levenshtein value like so:
SELECT TOP (1) WITH TIES
UNDERWRITER_CODE_NULL = m.UNDERWRITER_CODE,
LKP_UN = m.UNDERWRITER_NAME, l.UNDERWRITER_NAME, l.UNDERWRITER_CODE,
MinLev = dbo.LEVENSHTEIN(m.UNDERWRITER_NAME, l.UNDERWRITER_NAME)
FROM dbo.MainTbl AS m
CROSS JOIN dbo.LKP AS l
WHERE m.UNDERWRITER_CODE IS NULL
ORDER BY ROW_NUMBER() OVER (PARTITION BY m.UNDERWRITER_NAME
ORDER BY dbo.LEVENSHTEIN(m.UNDERWRITER_NAME, l.UNDERWRITER_NAME))
Returns:
UNDERWRITER_CODE_NULL LKP_UN UNDERWRITER_NAME UNDERWRITER_CODE MinLev
--------------------- ------------------ ------------------ ---------------- -----------
NULL dylan.campbell Dylan Campbell 2 1
NULL dylanadmin Dylan Campbell 2 8
NULL dylanc Dylan Campbell 2 8
NULL scott.noffsinger Scott Noffsinger 1 1
We can use this logic to update UNDERWRITE_CODE like so:
WITH FuzzyCompare AS
(
SELECT TOP (1) WITH TIES
UNDERWRITER_CODE_NULL = m.UNDERWRITER_CODE,
LKP_UN = m.UNDERWRITER_NAME, l.UNDERWRITER_NAME, l.UNDERWRITER_CODE,
MinLev = dbo.LEVENSHTEIN(m.UNDERWRITER_NAME, l.UNDERWRITER_NAME)
FROM dbo.MainTbl AS m
CROSS JOIN dbo.LKP AS l
WHERE m.UNDERWRITER_CODE IS NULL
ORDER BY ROW_NUMBER() OVER (PARTITION BY m.UNDERWRITER_NAME
ORDER BY dbo.LEVENSHTEIN(m.UNDERWRITER_NAME, l.UNDERWRITER_NAME))
)
UPDATE fc
SET fc.UNDERWRITER_CODE_NULL = fc.UNDERWRITER_CODE
FROM FuzzyCompare AS fc
JOIN dbo.MainTbl AS m ON fc.UNDERWRITER_NAME = m.UNDERWRITER_NAME;
After this update SELECT * FROM dbo.mainTbl Returns:
UNDERWRITER_CODE UNDERWRITER_NAME
---------------- -------------------
2 dylan.campbell
2 dylanadmin
2 dylanc
2 Dylan Campbell
2 dylan.campbell
2 dylanadmin
1 scott.noffsinger
1 Scott Noffsinger
This should get you started; depending on the amount & kind of data you are dealing with, you will need to be very selective about what algorithms you use. Do your homework and test, test ,test!
Let me know if you have questions.

SQL - Returning unique row based on criteria and a priority

I have a data table that looks in practice like this:
Team Shirt Number Name
1 1 Seaman
1 13 Lucas
2 1 Bosnic
2 14 Schmidt
2 23 Woods
3 13 Tubilandu
3 14 Lev
3 15 Martin
I want to remove duplicates of team by the following logic - if there is a "1" shirt number, use that. If not, look for a 13. If not look for 14 then any.
I realise it is probably quite basic but I don't seem to be making any progress with case statements. I know it's something with sub-queries and case statements but I'm struggling and any help gratefully received!
Using SSMS.
Since you didn't specified any DBMS, let me assume row_number() would work for that :
DELETE
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY Team
ORDER BY (CASE WHEN Shirt_Number = 1
THEN 1
WHEN Shirt_Number = 13
THEN 2
WHEN Shirt_Number = 14
THEN 3
ELSE 4
END)
) AS Seq
FROM table t
) t
WHERE Seq = 1;
This assuming Shirt_Numbers have a gap else only order by Shirt_Number enough.
I think you are looking for a partition by clause usage. Solution below worked in Sql Server.
create table #eray
(team int, shirtnumber int, name varchar(200))
insert into #eray values
(1, 1, 'Seaman'),
(1, 13, 'Lucas'),
(2, 1, 'Bosnic'),
(2, 14, 'Schmidt')
;with cte as (
Select Team, ShirtNumber, Name,
ROW_NUMBER() OVER (PARTITION BY Team ORDER BY ShirtNumber ASC) AS rn
From #eray
where ShirtNumber in (1,13,14)
)
select * from cte where rn=1
If you have a table of teams, you can use cross apply:
select ts.*
from teams t cross apply
(select top (1) ts.*
from teamshirts ts
where ts.team = t.team
order by (case shirt_number when 1 then 1 when 13 then 2 when 14 then 3 else 4 end)
) ts;
If you have no numbers between 2 and 12, you can simplify this to:
select ts.*
from teams t cross apply
(select top (1) ts.*
from teamshirts ts
where ts.team = t.team
order by shirt_number
) ts;

Group parents with same children

EDIT: This is way harder to explain that I though, constatly editing based on comments. Thank you all for taking interest.
I have a table like this
ID Type ParentID
1 ChildTypeA 1
2 ChildTypeB 1
3 ChildTypeC 1
4 ChildTypeD 1
5 ChildTypeA 2
6 ChildTypeB 2
7 ChildTypeC 2
8 ChildTypeA 3
9 ChildTypeB 3
10 ChildTypeC 3
11 ChildTypeD 3
12 ChildTypeA 4
13 ChildTypeB 4
14 ChildTypeC 4
and I want to group parents that have same children - meaning same number of children of same type.
From parent point of view, there is a finite set of possible configurations (max 10).
If any parent has same set of children (by ChildType), I want to group them together (in what I call a configuration).
ChildTypeA-D = ConfigA
ChildTypeA-C = ConfigB
ChildTypeA, B, E, F = ConfigX
etc.
The output I need is parents grouped by Configurations.
Config Group ParentID
ConfigA 1
ConfigA 3
ConfigB 2
ConfigB 4
I have no idea where to even begin.
I named your table t. Please try if this is what you are looking for.
It's show matched and unmatched.
It's looking for parentids with the same number of rows (t1.cnt = t2.cnt) and that all the rows are matched (having COUNT(*) = t1.cnt).
You can try it here
;with t1 as (select parentid, type, id, count(*) over (partition by parentid order by parentid) cnt from t),
t3 as
(
select t1.parentid parentid1, t2.parentid parentid2, count(*) cn, t1.cnt cnt1, t2.cnt cnt2, ROW_NUMBER () over (order by t1.parentid) rn
from t1 join t1 as t2 on t1.type = t2.type and t1.parentid <> t2.parentid and t1.cnt = t2.cnt
group by t1.parentid, t2.parentid, t1.cnt, t2.cnt
having COUNT(*) = t1.cnt
),
notFound as (
select t1.parentid, ROW_NUMBER() over(order by t1.parentid) rn
from t1
where not exists (select 1 from t3 where t1.parentid = t3.parentid1)
group by t1.parentid
)
select 'Config'+char((select min(rn)+64 from t3 as t4 where t3.parentid1 in (t4.parentid1 , t4.parentid2))) config, t3.parentid1
from t3
union all
select 'Config'+char((select max(rn)+64+notFound.rn from t3)) config, notFound.parentid
from notFound
OUTPUT
config parentid1
ConfigA 1
ConfigA 3
ConfigB 2
ConfigB 4
If id 14 was ChildTypeZ then parentid 2 and 4 wouldn't match. This would be the output:
config parentid1
ConfigA 1
ConfigA 3
ConfigC 2
ConfigD 4
I have happen to have similar task. The data I'm working with is a bit bigger scale so I had to find an effective approach to this. Basically I've found 2 working approaches.
One is pure SQL - here's a core query. Basically it gives you smallest ParentID with same collection of children, which you can then use as a group id (you can also enumerate it with row_number). As a small note - I'm using cte here, but in real world I'd suggest to put grouped parents into temporary table and add indexes on the table as well.
;with cte_parents as (
-- You can also use different statistics to narrow the search
select
[ParentID],
count(*) as cnt,
min([Type]) as min_Type,
max([Type]) as max_Type
from Table1
group by
[ParentID]
)
select
h1.ParentID,
k.ParentID as GroupID
from cte_parents as h1
outer apply (
select top 1
h2.[ParentID]
from cte_parents as h2
where
h2.cnt = h1.cnt and
h2.min_Type = h1.min_Type and
h2.max_Type = h1.max_Type and
not exists (
select *
from (select tt.[Type] from Table1 as tt where tt.[ParentID] = h2.[ParentID]) as tt1
full join (select tt.[Type] from Table1 as tt where tt.[ParentID] = h1.[ParentID]) as tt2 on
tt2.[Type] = tt1.[Type]
where
tt1.[Type] is null or tt2.[Type] is null
)
order by
h2.[ParentID]
) as k
ParentID GroupID
----------- --------------
1 1
2 2
3 1
4 2
Another one is a bit trickier and you have to be careful when using it. But surprisingly, it works not so bad. The idea is to concatenate children into big string and then group by these strings. You can use any available concatenation method (xml trick or clr if you have SQL Server 2017). The important part is that you have to use ordered concatenation so every string will represent your group precisely. I have created a special CLR function (dbo.f_ConcatAsc) for this.
;with cte1 as (
select
ParentID,
dbo.f_ConcatAsc([Type], ',') as group_data
from Table1
group by
ParentID
), cte2 as (
select
dbo.f_ConcatAsc(ParentID, ',') as parent_data,
group_data,
row_number() over(order by group_data) as rn
from cte1
group by
group_data
)
select
cast(p.value as int) as ParentID,
c.rn as GroupID,
c.group_data
from cte2 as c
cross apply string_split(c.parent_data, ',') as p
ParentID GroupID group_data
----------- -------------------- --------------------------------------------------
2 1 ChildTypeA,ChildTypeB,ChildTypeC
4 1 ChildTypeA,ChildTypeB,ChildTypeC
1 2 ChildTypeA,ChildTypeB,ChildTypeC,ChildTypeD
3 2 ChildTypeA,ChildTypeB,ChildTypeC,ChildTypeD