SQL & SQL Server - Denormalized table, Distinct and Order By Complex Query - sql

I've been working lately in this and I couldn't work it out. I have the following table:
ID Language Text
---- -------- --------
1 spanish Hola
1 english Hello
2 spanish Chau
2 english Goodbye
2 french Au revoir
3 english Thank you
I need to get each ID once and the text in Spanish but if there wasn't any text in Spanish I should get the English one and so on.
So if I run this query I should get:
ID Language Text
---- -------- --------
1 spanish Hola
2 spanish Chau
3 english Thank you
I can not use
Select ID, Language, Text From table Where Language = 'spanish'
Because in the case there is no Spanish set I would not retrieve that ID and I need one record per ID. I though of maybe using something like this:
select Distinct(Id), Text from table
order by FIELD(Language, 'Spanish', 'English', 'French', 'Italian')
But It didn't work. I get:
'FIELD' is not a recognized built-in function name.
Can someone help me?
Thanks you all very much!

For this type of prioritization, you can use row_number():
Select t.*
From (select t.*,
row_number() over (partition by id
order by (case when Language = 'Spanish' then 1
when Language = 'English' then 2
else 3
end)
) as seqnum
from table t
) t
where seqnum = 1;

This is another option:
SELECT i.ID, w.Text
FROM (
SELECT ID
FROM Words
GROUP BY ID) i(ID)
CROSS APPLY (
SELECT TOP 1 [Text]
FROM Words
WHERE ID = i.ID AND [Language] IN ('spanish', 'english')
ORDER BY (CASE [Language] WHEN 'spanish' THEN 1
ELSE 2
END)
) w([Text])
For each ID contained in the Words table we perform a CROSS APPLY to find the matching Text that satisfies the criteria set by the OP.

select COALESCE(SP.id, EN.id, FR.id) as id,
COALESCE(SP.Language, EN.Language, FR.Language) as Language,
COALESCE(SP.text, EN.text, FR.text) as text
from (select disctinct id from table) as IDs
left join (select id, Language, text from table where Language = 'spanish') as SP on IDs.id = SP.id
left join (select id, Language, text from table where Language = 'english') as EN on IDs.id = EN.id
left join (select id, Language, text from table where Language = 'french') as FR on IDs.id = FR.id
where COALESCE(SP.id, EN.id, FR.id) is not null -- avoids null results for other languages (if there are)
This solution (or the equivalent replacement for COALESCE) works on any SQL variant I know

Related

SQL Server - Find similarities in column and write them into new column

I have a big table with data like this:
ID Title
-- ------------------------
1 01_SOMESTRING_038
2 01_SOMESTRING K5038
3 01_SOMESTRING-648
4 K-OTHERSTRING_T_73474
5 K-OTHERSTRING_T_ffk
6 ABC
7 DEF
And the task is now to find similarities in that column, and write that found similarity to a new column.
So the desired output would be like this:
ID Title Similarity
-- ------------------------ -----------------
1 01_SOMESTRING_038 01_SOMESTRING
2 01_SOMESTRING K5038 01_SOMESTRING
3 01_SOMESTRING-648 01_SOMESTRING
4 K-OTHERSTRING_T_73474 K-OTHERSTRING_T_
5 K-OTHERSTRING_T_ffk K-OTHERSTRING_T_
6 ABC NULL
7 DEF NULL
How can I achieve that in MS SQL Server 17?
Any help is much appreciated. Thanks!
EDIT: The strings are not only broken by delimiters such as "-", "_".
And for handling competeing similrities I would set a minimum length for the similarity. For instance 10.
Try the following, using a recursive CTE to split out the letters, then we can group them up to find the greatest match:
WITH TITLE_EXPAND AS (
SELECT
1 MatchLen
,CAST(SUBSTRING(Title,1,1) as NVARCHAR(255)) MatchString
,Title
,ID
FROM
[SourceDataTable]
UNION ALL
SELECT
MatchLen + 1
,CAST(SUBSTRING(Title,1,MatchLen+1) AS NVARCHAR(255))
,Title
,ID
FROM
TITLE_EXPAND
WHERE
MatchLen < LEN(Title)
)
SELECT DISTINCT
SDT.ID
,SDT.title
,FIRST_VALUE(MatchString) OVER (PARTITION BY SDT.ID ORDER BY SC.MatchLen DESC, SC.MatchCount DESC) Similarity
FROM
[SourceDataTable] SDT
LEFT JOIN
(SELECT
*
,COUNT(*) OVER (PARTITION BY MatchString, MatchLen) MatchCount
FROM
TITLE_EXPAND) SC
ON
SDT.ID = SC.ID
AND
SC.MatchCount > 1
ORDER BY SDT.ID
Where SourceDataTable is your source table. The Similarity value will be the longest matched similar value.

Query for transitive word translation

I have trouble constructing what I guess should be a CTE query, which would return a word translation.
I have two tables - WORDS:
ID | WORD | LANG
1 'Cat' 'ENG'
2 'Kot' 'POL'
3 'Katze' 'GER'
and CONNECTIONS:
ID | WORD_A | WORD_B
1 1 2
2 1 3
As ENG->POL, and ENG->GER translations already exist, I would like to receive a POL->GER translation.
So for any given word, I have to check it's connected translations, if target language is not there, I should check connected translations of those connected translations etc., up to the point where either translation is found, or nothing is returned;
I have no idea where to start, it wouldn't be a problem if there was a constant number of transitions required, but it might as well require transition like POL->ITA->...->ENG->GER.
This is a complicated graph walking problem. The idea is to set up a connections2 relationship that has words and languages in both directions, and then use this for walking the graph.
To get all German translations for Polish words, you can use:
with words AS (
select 1 as id, 'Cat' as word, 'ENG' as lang union all
select 2, 'Kot', 'POL' union all
select 3, 'Katze', 'GER'
),
connections as (
select 1 as id, 1 as word_a, 2 as word_b union all
select 2 as id, 1 as word_a, 3 as word_b
),
connections2 as (
select c.word_a as id_a, c.word_b as id_b, wa.lang as lang_a, wb.lang as lang_b
from connections c join
words wa
on c.word_a = wa.id join
words wb
on c.word_b = wb.id
union -- remove duplicates
select c.word_b, c.word_a, wb.lang as lang_a, wa.lang as lang_b
from connections c join
words wa
on c.word_a = wa.id join
words wb
on c.word_b = wb.id
),
cte as (
select id as pol_word, id as other_word, 'POL' as other_lang, 1 as lev, ',POL,' as langs
from words
where lang = 'POL'
union all
select cte.pol_word, c2.id_b, c2.lang_b, lev + 1, langs || c2.lang_b || ','
from cte join
connections2 c2
on cte.other_lang = c2.lang_a
where langs not like '%,' || c2.lang_b || ',%' or c2.lang_b = 'GER'
)
select *
from cte
where cte.other_lang = 'GER';
Here is a db<>fiddle.
Presumably, you want the shortest path. For that, you would use this query (after the CTEs):
select *
from cte
where cte.other_lang = 'GER' and
cte.lev = (select min(cte2.lev) from cte cte2 where ct2.pol_word = cte.pol_word);
I recommend changing your table design to something along these lines:
ID | WORD_ID | WORD | LANG
1 | 1 | 'Cat' | 'ENG'
2 | 1 | 'Kot' | 'POL'
3 | 1 | 'Katze' | 'GER'
Now suppose that you want to go from Polish to German, and you are starting off with the Polish word for cat, Kot. Then, you may use the following query:
SELECT t2.WORD
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.WORD_ID = t1.WORD_ID AND t2.LANG = 'GER'
WHERE
t1.WORD = 'Kot' AND t1.LANG = 'POL'
Demo
The idea here is that you want one of your tables to maintain a mapping of every logically distinct word to some common id or reference. Then, you only need an incoming word and language, and a desired output language, to be able to do the mapping.
Note: For simplicity I am only showing a table with a single word across a handful of language, but the logic should still work with multiple words and languages. Your current table may not be normalized, but optimizing that might be another task separate from your immediate question.

Select on 3 columns without duplicate

I have a table with 3 columns in SQL Server:
Article_Id LanguageCode Text
---------------------------------------------
1 FRA Sac en papier
1 GER Tragtasche
2 GER Pizzapapier
3 FRA Couteau
The result of the query would need to be:
Article_Id LanguageCode Text
-------------------------------------------
1 FRA Sac en papier
2 GER Pizzapapier
3 FRA Couteau
Every Article_Id only once and first LanguageCode is FRA and if not exist then GER
I tried with distinct but it didn't result right.
This is a prioritization query. With just two languages, I think the easiest method is:
select a.*
from atable a
where a.LanguageCode = 'FRA'
union all
select a.*
from atable a2
where a.LanguageCode = 'GER' and
not exists (select 1
from atable a2
where a2.article_id = a.article_id and a2.Language_code = 'FRA'
);
If you have more than two languages, the above gets cumbersome. So, use row_number():
select a.*
from (select a.*,
row_number() over (partition by article_id
order by charindex(languagecode, 'FRA,GER')
from atable a
) a
where seqnum = 1;
The charindex() is a short-cut so you don't have to write a long case statement. As written, it should be fine if we assume that all language codes have the same length.
I tried with distinct but it didn't result right.
DISTINCT removes duplicate rows. You want to remove duplicate Article_Ids and show the "first" row for each one. So start there:
select Article_Id, min(LanguageCode) as LanguageCode
from T
group by Article_Id
To get rows in table A matching rows in table B, use EXISTS:
select * from T as main
where exists (
select 1 from T
where Article_Id = main. Article_Id
group by Article_Id
having min(LanguageCode) = main.LanguageCode
)
You can get there with a join, too, by joining your table to the first query, but being comfortable with correlated subqueries will save you work over time.

Filter data based on multiple rows SQL

This is probably as simple SQL query. I'm finding it little tricky, as it's been a while I've written SQL.
ID NAME VALUE
--- ------ -------
1 Country Brazil
1 Country India
2 Country US
2 EmpLevel 1
3 EmpLevel 3
Pseudo Query:
Select *
from table_name
where (country = US or country = Brazil)
and (Employee_level = 1 or Employee_level = 3)
This query should return
ID NAME VALUE
--- ------ -------
2 Country US
2 EmpLevel 1
(As record with ID - 2 has Country as 'US' and EmpLevel '1')
I went through couple SO posts as well.
Multiple row SQL Where clause
SQL subselect filtering based on multiple sub-rows
Evaluation of multiples 'IN' Expressions in 'WHERE' clauses in mysql
I assume you're expected results for the country should be US instead of Brazil. Here's one option using a join with conditional aggregation:
select y.*
from yourtable y join (
select id
from yourtable
group by id
having max(case when name = 'Country' then value end) in ('US','Brazil') and
max(case when name = 'EmpLevel' then value end) in ('1','3')
) y2 on y.id = y2.id
SQL Fiddle Demo

SQL Query - Display Count & All ID's With Same Name

I'm trying to display the amount of table entries with the same name and the unique ID's associated with each of those entries.
So I have a table like so...
Table Names
------------------------------
ID Name
0 John
1 Mike
2 John
3 Mike
4 Adam
5 Mike
I would like the output to be something like:
Name | Count | IDs
---------------------
Mike 3 1,3,5
John 2 0,2
Adam 1 4
I have the following query which does this except display all the unique ID's:
select name, count(*) as ct from names group by name order by ct desc;
select name,
count(id) as ct,
group_concat(id) as IDs
from names
group by name
order by ct desc;
You can use GROUP_CONCAT for that
Depending on version of MSSQL you are using (2005+), you can use the FOR XML PATH option.
SELECT
Name,
COUNT(*) AS ct,
STUFF((SELECT ',' + CAST(ID AS varchar(MAX))
FROM names i
WHERE i.Name = n.Name FOR XML PATH(''))
, 1, 1, '') as IDs
FROM names n
GROUP BY Name
ORDER BY ct DESC
Closest thing to group_concat you'll get on MSSQL unless you use the SQLCLR option (which I have no experience doing). The STUFF function takes care of the leading comma. Also, you don't want to alias the inner SELECT as it will wrap the element you're selecting in an XML element (alias of TD causes each element to return as <TD>value</TD>).
Given the input above, here's the result I get:
Name ct IDs
Mike 3 1,3,5
John 2 0,2
Adam 1 4
EDIT: DISCLAIMER
This technique will not work as intended for string fields that could possibly contain special characters (like ampersands &, less than <, greater than >, and any number of other formatting characters). As such, this technique is most beneficial for simple integer values, although can still be used for text if you are ABSOLUTELY SURE there are no special characters that would need to be escaped. As such, read the solution posted HERE to ensure these characters get properly escaped.
Here is another SQL Server method, using recursive CTE:
Link to SQLFiddle
; with MyCTE(name,ids, name_id, seq)
as(
select name, CAST( '' AS VARCHAR(8000) ), -1, 0
from Data
group by name
union all
select d.name,
CAST( ids + CASE WHEN seq = 0 THEN '' ELSE ', ' END + cast(id as varchar) AS VARCHAR(8000) ),
CAST( id AS int),
seq + 1
from MyCTE cte
join Data d
on cte.name = d.name
where d.id > cte.name_id
)
SELECT name, ids
FROM ( SELECT name, ids,
RANK() OVER ( PARTITION BY name ORDER BY seq DESC )
FROM MyCTE ) D ( name, ids, rank )
WHERE rank = 1