Query for transitive word translation - sql

I have trouble constructing what I guess should be a CTE query, which would return a word translation.
I have two tables - WORDS:
ID | WORD | LANG
1 'Cat' 'ENG'
2 'Kot' 'POL'
3 'Katze' 'GER'
and CONNECTIONS:
ID | WORD_A | WORD_B
1 1 2
2 1 3
As ENG->POL, and ENG->GER translations already exist, I would like to receive a POL->GER translation.
So for any given word, I have to check it's connected translations, if target language is not there, I should check connected translations of those connected translations etc., up to the point where either translation is found, or nothing is returned;
I have no idea where to start, it wouldn't be a problem if there was a constant number of transitions required, but it might as well require transition like POL->ITA->...->ENG->GER.

This is a complicated graph walking problem. The idea is to set up a connections2 relationship that has words and languages in both directions, and then use this for walking the graph.
To get all German translations for Polish words, you can use:
with words AS (
select 1 as id, 'Cat' as word, 'ENG' as lang union all
select 2, 'Kot', 'POL' union all
select 3, 'Katze', 'GER'
),
connections as (
select 1 as id, 1 as word_a, 2 as word_b union all
select 2 as id, 1 as word_a, 3 as word_b
),
connections2 as (
select c.word_a as id_a, c.word_b as id_b, wa.lang as lang_a, wb.lang as lang_b
from connections c join
words wa
on c.word_a = wa.id join
words wb
on c.word_b = wb.id
union -- remove duplicates
select c.word_b, c.word_a, wb.lang as lang_a, wa.lang as lang_b
from connections c join
words wa
on c.word_a = wa.id join
words wb
on c.word_b = wb.id
),
cte as (
select id as pol_word, id as other_word, 'POL' as other_lang, 1 as lev, ',POL,' as langs
from words
where lang = 'POL'
union all
select cte.pol_word, c2.id_b, c2.lang_b, lev + 1, langs || c2.lang_b || ','
from cte join
connections2 c2
on cte.other_lang = c2.lang_a
where langs not like '%,' || c2.lang_b || ',%' or c2.lang_b = 'GER'
)
select *
from cte
where cte.other_lang = 'GER';
Here is a db<>fiddle.
Presumably, you want the shortest path. For that, you would use this query (after the CTEs):
select *
from cte
where cte.other_lang = 'GER' and
cte.lev = (select min(cte2.lev) from cte cte2 where ct2.pol_word = cte.pol_word);

I recommend changing your table design to something along these lines:
ID | WORD_ID | WORD | LANG
1 | 1 | 'Cat' | 'ENG'
2 | 1 | 'Kot' | 'POL'
3 | 1 | 'Katze' | 'GER'
Now suppose that you want to go from Polish to German, and you are starting off with the Polish word for cat, Kot. Then, you may use the following query:
SELECT t2.WORD
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.WORD_ID = t1.WORD_ID AND t2.LANG = 'GER'
WHERE
t1.WORD = 'Kot' AND t1.LANG = 'POL'
Demo
The idea here is that you want one of your tables to maintain a mapping of every logically distinct word to some common id or reference. Then, you only need an incoming word and language, and a desired output language, to be able to do the mapping.
Note: For simplicity I am only showing a table with a single word across a handful of language, but the logic should still work with multiple words and languages. Your current table may not be normalized, but optimizing that might be another task separate from your immediate question.

Related

Split record into 2 records with distinct values based on a unique id

I have a table with some IDs that correspond to duplicate data that i would like to get rid of. They are linked by a groupid number. Currently my data looks like this:
|GroupID|NID1 |NID2 |
|S1 |644763|643257|
|T2 |4759 |84689 |
|W3 |96676 |585876|
In order for the software to run, I need the data in the following format:
|GroupID|NID |
|S1 |644763|
|S1 |643257|
|T2 |4759 |
|T2 |84689 |
|W3 |96676 |
|W3 |585876|
Thank you for your time.
You want union all :
select groupid, nid1 as nid
from table t
union all -- use "union" instead if you don't want duplicate rows
select groupid, nid2
from table t;
In Oracle 12C+, you can use lateral joins:
select t.groupid, v.nid
from t cross apply
(select t.nid1 as nid from dual union all
select t.nid2 as nid from dual
) v;
This is more efficient than union all because it only scans the table once.
You can also express this as:
select t.groupid,
(case when n.n = 1 then t.nid1 when n.n = 2 then t.nid2 end) as nid
from t cross join
(select 1 as n from dual union all select 2 from dual) n;
A little more complicated, but still only one scan of the table.

Recursive query using postgreSQL

My dataBase contains data (Image for example) and this data can be modified by a program (Image processing for example) so I get a new image derived from the other, and this image could be modified as well, etc...
2 Images could also be used to create a new one, for example: image a + image b = image c
So in my dataBase I have a table call "Derived from" which contains 2 columns (previous_id, new_id), previous_id is the image before an image processing and new_id is the result. So I can have a "change history" like this:
+------------------+------------------+
| id_previous | id_new |
+------------------+------------------+
| a | c |
| b | c |
| c | d |
| d | e |
+------------------+------------------+
So my questions is:
Is it possible to make a recursive query to have all the history of an data ID ?
Something like this:
Select * from derived_from where id_new = 'e'
Should return (d,c,b,a)
Thank you for your help
Yes, you can achieve this with a recursive CTE:
with recursive r as (
select id_previous
from derived_from
where id_new = 'e'
union
select d.id_previous
from derived_from d
join r on id_new = r.id_previous
)
select id_previous
from r
http://rextester.com/NZKT73800
Notes:
UNION can stop the recursion even when you have loops. With UNION ALL, you should handle loops yourself, unless you are really sure you have no loops.
This will give you separate rows (one for each "ascendant"). You can aggregate this too, but it's typically much more easier to consume than comma separated lists or arrays.
You can use a recursive CTE:
with recursive cte as (
select df.id_new, df.id_previous as parent
from derived_from df
where df.id_new = 'e'
union all
select cte.id_new, df.id_previous
from cte join
derived_from df
on cte.parent = df.id_new
)
select id_new, array_agg(parent)
from cte
group by id_new;

SQL & SQL Server - Denormalized table, Distinct and Order By Complex Query

I've been working lately in this and I couldn't work it out. I have the following table:
ID Language Text
---- -------- --------
1 spanish Hola
1 english Hello
2 spanish Chau
2 english Goodbye
2 french Au revoir
3 english Thank you
I need to get each ID once and the text in Spanish but if there wasn't any text in Spanish I should get the English one and so on.
So if I run this query I should get:
ID Language Text
---- -------- --------
1 spanish Hola
2 spanish Chau
3 english Thank you
I can not use
Select ID, Language, Text From table Where Language = 'spanish'
Because in the case there is no Spanish set I would not retrieve that ID and I need one record per ID. I though of maybe using something like this:
select Distinct(Id), Text from table
order by FIELD(Language, 'Spanish', 'English', 'French', 'Italian')
But It didn't work. I get:
'FIELD' is not a recognized built-in function name.
Can someone help me?
Thanks you all very much!
For this type of prioritization, you can use row_number():
Select t.*
From (select t.*,
row_number() over (partition by id
order by (case when Language = 'Spanish' then 1
when Language = 'English' then 2
else 3
end)
) as seqnum
from table t
) t
where seqnum = 1;
This is another option:
SELECT i.ID, w.Text
FROM (
SELECT ID
FROM Words
GROUP BY ID) i(ID)
CROSS APPLY (
SELECT TOP 1 [Text]
FROM Words
WHERE ID = i.ID AND [Language] IN ('spanish', 'english')
ORDER BY (CASE [Language] WHEN 'spanish' THEN 1
ELSE 2
END)
) w([Text])
For each ID contained in the Words table we perform a CROSS APPLY to find the matching Text that satisfies the criteria set by the OP.
select COALESCE(SP.id, EN.id, FR.id) as id,
COALESCE(SP.Language, EN.Language, FR.Language) as Language,
COALESCE(SP.text, EN.text, FR.text) as text
from (select disctinct id from table) as IDs
left join (select id, Language, text from table where Language = 'spanish') as SP on IDs.id = SP.id
left join (select id, Language, text from table where Language = 'english') as EN on IDs.id = EN.id
left join (select id, Language, text from table where Language = 'french') as FR on IDs.id = FR.id
where COALESCE(SP.id, EN.id, FR.id) is not null -- avoids null results for other languages (if there are)
This solution (or the equivalent replacement for COALESCE) works on any SQL variant I know

How can I select unique rows in a database over two columns?

I have found similar solutions online but none that I've been able to apply to my specific problem.
I'm trying to "unique-ify" data from one table to another. In my original table, data looks like the following:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
1 3 EF
1 3 GH
The user IDs are composed of two parts, USERIDP1 and USERIDP2 concatenated. I want to transfer all the rows that correspond to a user who has QUALIFIER=TRUE in ANY row they own, but ignore users who do not have a TRUE QUALIFIER in any of their rows.
To clarify, all of User 12's rows would be transferred, but not User 13's. The output would then look like:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
So basically, I need to find rows with distinct user ID components (involving two unique fields) that also possess a row with QUALIFIER=TRUE and copy all and only all of those users' rows.
Although this nested query will be very slow for large tables, this could do it.
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
WHERE EXISTS (SELECT 1 FROM YOUR_TABLE_NAME AS Y WHERE Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE)
It could be written as an inner join with itself too:
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
INNER JOIN YOUR_TABLE_NAME AS Y ON Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE
For a large table, create a new auxiliary table containing only USERIDP1 and USERIDP2 columns for rows that have QUALIFIER = TRUE and then join this table with your original table using inner join similar to the second option above. Remember to create appropriate indexes.
This should do the trick - if the id fields are stored as integers then you will need to convert / cast into Varchars
SELECT 1 as id1,2 as id2,'TRUE' as qualifier,'AB' as data into #sampled
UNION ALL SELECT 1,2,NULL,'CD'
UNION ALL SELECT 1,3,NULL,'EF'
UNION ALL SELECT 1,3,NULL,'GH'
;WITH data as
(
SELECT
id1
,id2
,qualifier
,data
,SUM(CASE WHEN qualifier = 'TRUE' THEN 1 ELSE 0 END)
OVER (PARTITION BY id1 + '' + id2) as num_qualifier
from #sampled
)
SELECT
id1
,id2
,qualifier
,data
from data
where num_qualifier > 0
Select *
from yourTable
INNER JOIN (Select UserIDP1, UserIDP2 FROM yourTable WHERE Qualifier=TRUE) B
ON yourTable.UserIDP1 = B.UserIDP1 and YourTable.UserIDP2 = B.UserIDP2
How about a subquery as a where clause?
SELECT *
FROM theTable t1
WHERE CAST(t1.useridp1 AS VARCHAR) + CAST(t1.useridp2 AS VARCHAR) IN
(SELECT CAST(t2.useridp1 AS VARCHAR) + CAST(t.useridp2 AS VARCHAR)
FROM theTable t2
WHERE t2.qualified
);
This is a solution in mysql, but I believe it should transfer to sql server pretty easily. Use a subquery to pick out groups of (id1, id2) combinations with at least one True 'qualifier' row; then join that to the original table on (id1, id2).
mysql> SELECT u1.*
FROM users u1
JOIN (SELECT id1,id2
FROM users
WHERE qualifier
GROUP BY id1, id2) u2
USING(id1, id2);
+------+------+-----------+------+
| id1 | id2 | qualifier | data |
+------+------+-----------+------+
| 1 | 2 | 1 | aa |
| 1 | 2 | 0 | bb |
+------+------+-----------+------+
2 rows in set (0.00 sec)

How to select the top n from a union of two queries where the resulting order needs to be ranked by individual query?

Let's say I have a table with usernames:
Id | Name
-----------
1 | Bobby
20 | Bob
90 | Bob
100 | Joe-Bob
630 | Bobberino
820 | Bob Junior
I want to return a list of n matches on name for 'Bob' where the resulting set first contains exact matches followed by similar matches.
I thought something like this might work
SELECT TOP 4 a.* FROM
(
SELECT * from Usernames WHERE Name = 'Bob'
UNION
SELECT * from Usernames WHERE Name LIKE '%Bob%'
) AS a
but there are two problems:
It's an inefficient query since the sub-select could return many rows (looking at the execution plan shows a join happening before top)
(Almost) more importantly, the exact match(es) will not appear first in the results since the resulting set appears to be ordered by primary key.
I am looking for a query that will return (for TOP 4)
Id | Name
---------
20 | Bob
90 | Bob
(and then 2 results from the LIKE query, e.g. 1 Bobby and 100 Joe-Bob)
Is this possible in a single query?
You could use a case to place the exact matches on top:
select top 4 *
from Usernames
where Name like '%Bob%'
order by
case when Name = 'Bob' then 1 else 2 end
Or, if you're worried about performance and have an index on (Name):
select top 4 *
from (
select 1 as SortOrder
, *
from Usernames
where Name = 'Bob'
union all
select 2
, *
from Usernames
where Name like '%Bob%'
and Name <> 'Bob'
and 4 >
(
select count(*)
from Usernames
where Name = 'Bob'
)
) as SubqueryAlias
order by
SortOrder
A slight modification to your original query should solve this. You could add in an additional UNION that matches WHERE Name LIKE 'Bob%' and give this priority 2, changing the '%Bob' priority to 3 and you'd get an even better search IMHO.
SELECT TOP 4 a.* FROM
(
SELECT *, 1 AS Priority from Usernames WHERE Name = 'Bob'
UNION
SELECT *, 2 from Usernames WHERE Name LIKE '%Bob%'
) AS a
ORDER BY Priority ASC
This might do what you want with better performance.
SELECT TOP 4 a.* FROM
(
SELECT TOP 4 *, 1 AS Sort from Usernames WHERE Name = 'Bob'
UNION ALL
SELECT TOP 4 *, 2 AS Sort from Usernames WHERE Name LIKE '%Bob%' and Name <> 'Bob'
) AS a
ORDER BY Sort
This works for me:
SELECT TOP 4 * FROM (
SELECT 1 as Rank , I, name FROM Foo WHERE Name = 'Bob'
UNION ALL
SELECT 2 as Rank,i,name FROM Foo WHERE Name LIKE '%Bob%'
) as Q1
ORDER BY Q1.Rank, Q1.I
SET ROWCOUNT 4
SELECT * from Usernames WHERE Name = 'Bob'
UNION
SELECT * from Usernames WHERE Name LIKE '%Bob%'
SET ROWCOUNt 0
The answer from Will A got me over the line, but I'd like to add a quick note, that if you're trying to do the same thing and incorporate "FOR XML PATH", you need to write it slightly differently.
I was specifying XML attributes and so had things like :
SELECT Field_1 as [#attr_1]
What you have to do is remove the "#" symbol in the sub queries and then add them back in with the outer query. Like this:
SELECT top 1 a.SupervisorName as [#SupervisorName]
FROM
(
SELECT (FirstNames + ' ' + LastName) AS [SupervisorName],1 as OrderingVal
FROM ExamSupervisor SupervisorTable1
UNION ALL
SELECT (FirstNames + ' ' + LastName) AS [SupervisorName],2 as OrderingVal
FROM ExamSupervisor SupervisorTable2
) as a
ORDER BY a.OrderingVal ASC
FOR XML PATH('Supervisor')
This is a cut-down version of my final query, so it doesn't really make sense, but you should get the idea.