Join by a prefix, similar to SQL's ON ... LIKE - sql

How does one perform a join/merge using a prefix (of varying length) of a column as a key? I am trying to translate the following SQL code:
SELECT a.person_id, a.tn_code, b.list_id
FROM tblA a
INNER JOIN tblB b
ON tn_code LIKE TnCode + "%"
tblA
person_id tn_code
1 C18.4
2 M8820/9
3 X20.
...
tblB
ListID TnCode
1.01 A0.1
1.01 A0.2
...
I have ideas such as preparing a new key TnCode_prefix = gsub("^(.*)\\.(.*)$", "\\1", TnCode) and then joining on the new column, or using data.table's rolling join, but they are only approximate translations? Is there an exact equivalent in R?
I am aware of using sqldf and simply passing the original SQL statement to sqldf, but I'm wondering if there is another way.

What about creating a prefix on the fly and joining on that? I used dplyr to create the prefix and do the join.
library(dplyr)
# Fake Data
set.seed(1093)
tblA = data.frame(person_id=sample(1:10, 50, replace=TRUE),
tn_code = paste0(sample(paste0(paste0(rep(LETTERS[1:3],3),c(40:42,401:403,421:423))), 50, replace=TRUE),
".", sample(160:170, 50, replace=TRUE)))
tblB = data.frame(ListID=paste0(sample(1:10, 50, replace=TRUE),".",
sample(10:20, 50, replace=TRUE)),
TnCode = paste0(sample(paste0(paste0(rep(LETTERS[1:3],3),c(40:42,401:403,421:423))), 50, replace=TRUE),
".", sample(160:170, 50, replace=TRUE)))
# Join on first letter of tn_code and TnCode
newTbl = tblA %>% mutate(join_prefix=gsub("(.*)\\..*", "\\1", tn_code)) %>%
left_join(tblB %>% mutate(join_prefix=gsub("(.*)\\..*", "\\1", TnCode)),
by="join_prefix")

Related

PostgreSQL - left join generate_series() and table

I use generate series to create a series of numbers from 0 to 200.
I have a table that contains dirtareas in mm² in a column called polutionmm2. What I need is to left join this table to the generated series, but the dirt area must be in cm² so /100. I was not able to make this work, as I can't figure out how I can connect a table to a series that has no name.
This Is what I have so far:
select generate_series(0,200,1) as x, cast(p.polutionmm2/100 as char(8)) as metric
from x
left join polutiondistributionstatistic as p on metric = x
error: relation X does not exist
Here is some sample data: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=3d7d851887adb938819d6cf3e5849719
what I would need, is the first column (x) counting all the way from 0 to 200, and where there is a matching value, to show it in the second column.
Like this:
x, metric
0, 0
1, 1
2, 2
3, null
4, 4
5, null
... , ...
... , ...
200, null
You can put generate_series() in the FROM. So, I think you want something like this:
select gs.x, cast(p.polutionmm2/100 as char(8)) as metric
from generate_series(0,200,1) gs(x) left join
p
on gs.x = (p.polutionmm2/100);
I imagine there is also more to your query, because this doesn't do much that is useful.

How to select columns from more then thee tables using dplyr&

I develop a Shiny app using reactivity programming assuming that reactive objects are functions thus in order to refer to some table I have to pass () after table we are refering to.
An algorithm I've worked out is well realised using SQL syntax (in this case sqldf package). I provide you with one query as an example:
ratios_135_final <- sqldf("select
b.tot_cap_after_stress*100/c.rwa_0_after_stress as \"n1.0_after_stress\",
b.osn_cap_after_stress*100/c.rwa_2_after_stress as \"n1.2_after_stress\",
b.bas_cap_after_stress*100/c.rwa_1_after_stress as \"n1.1_after_stress\",
a.\"REGN\", d.\"NAME\", a.date, f.buff
from ratios a
inner join capital_final b on (a.\"REGN\" = b.\"REGN\")
inner join rwa_final c on (a.\"REGN\" = c.\"REGN\")
inner join names d on (a.\"REGN\" = d.\"REGN\")
inner join buffer_bank f on (a.\"REGN\" = f.\"REGN\") ")
As you can see there are 5 tables that I'm refering to build a query. But I can't write for instanse ...*from ratios()*. I tried to learn dplyr syntax but I've revealed that dplyr does not provide any functions for working with three or more tables.
Could you help me you to handle this problem?
Thanks in advance.
This is the equivalent code however this assumes that "REGN" is the only column that exists in multiple tables. If there are other columns names that are shared among different tables it will need further modifications.
ratios_135_final <-
ratios %>%
inner_join(capital_final, by = "REGN") %>%
inner_join(rwa_final, by = "REGN") %>%
inner_join(names, by = "REGN") %>%
inner_join(buffer_bank, by "REGN") %>%
mutate(n1.0_after_stress = tot_cap_after_stress * 100 / rwa_0_after_stress,
n1.2_after_stress = osn_cap_after_stress * 100 / rwa_2_after_stress,
n1.1_after_stress = bas_cap_after_stress * 100 / rwa_1_after_stress) %>%
select(n1.0_after_stress, n1.2_after_stress, n1.1_after_stress, REGN.x, NAME, date, buff) %>%
rename(REGN = REGN.x)

Oracle: outer join(+) with or clause replacement

I have an enormous select that schematically looks like this:
SELECT c_1, c_2, ..., c_j FROM t_1, t_2, ..., t_k
WHERE e_11 = e_12(+)
AND e_21 = e_22(+)
AND ...
AND e_l1 = e_l2(+)
ORDER BY o
where j, k and l are in hundreds and e_mn is a column from some table. I need to add new columns A_1 and A_2 to the select from a new table T. The new columns are connected to the former select via a column call it B from a table R. I want those rows where A_1 = B or A_2 = B or those rows where there is no correspondeing A_i to the value B.
Suppose I only had to deal with tables T and R then I want this:
SELECT * FROM R
LEFT OUTER JOIN T
ON (A_1 = B OR A_2 = B)
To mimic this behaviour I'd want something like this in the big select:
SELECT c_1, c_2, ..., c_j, A_1, A_2 FROM t_1, t_2, ..., t_k, T
WHERE e_11 = e_12(+)
AND e_21 = e_22(+)
AND ...
AND e_l1 = e_l2(+)
AND (B = A_1(+) OR B = A_2(+))
ORDER BY o
this is, however, syntactically incorrect since the (+) operator cannot be used with the OR caluse. And if I leave out the (+)'s I lose those rows where there is no corresponding A_i to B.
What are my options here? Can I somehow find a way to do this without changing the whole body of the select? I doubt there is a reasonable way to do this, nevertheless I'd appreciate any help.
Thanks.

SQL - WHERE (X, Y) IN (A, B)

I have some kind of blockage currently.
My theoretic query looks something like this:
SELECT * FROM Table WHERE X in (a, b, c) AND Y IN (d, e, f)
So basically, I want all rows having multiple columns match, meaning:
X, Y
1, 2
3, 4
5, 6
7, 8,
9, 10
If I want to get all rows where (X=1, Y=2) or (X=5, Y=6), so X and Y are correlated, how would I do that?
(MS SQL2005+)
Why not something simple like the following?
WHERE (X = 1 AND Y = 2) OR (X = 5 AND Y = 6) ...
Or, if you're looking for rows (based on your example) where Y should be X + 1, then:
WHERE Y = X + 1
If you have thousands of OR clauses like the above, then I would suggest you populate a criterion table ahead of time, and rewrite your query as a join. Suppose you have such a table Criteria(X, Y) then your query becomes much simpler:
SELECT Table.*
FROM Table
INNER JOIN Criteria ON Table.X = Criteria.X AND Table.Y = Criteria.Y
Don't forget to add an index / foreign keys as necessary to the new table.
If for some reason you prefer to not create a table ahead of time, you can use a temporary table or table variable and populate it within your procedure.
If X and Y are in a table then a JOIN would be cleanest:
SELECT * FROM Table t
INNER JOIN XandY xy
WHERE tX = xy.X AND t.Y = xy.Y
If there not in a table I would strongly suggest putting them in one. IN only works with single-value sets and there's no way to line up results using multiple IN clauses.

MSSQL - howto to get full list from interval of IDs

I have two tables
Created_Labels:
IF label_ID_from = label_ID_to (it means it has been only one label created), in column label_number is number of created label.
IF label_ID_from <> label_ID_to (more labels was created), in column label_number is NULL and in next two columns is interval of created labels with ID from table bellow.
Labels (list of existing lables):
How can I get the complete list of created label_numbers (get labels with ID 105, 110, 111, 112..120, 200, 201, 202..210, 394, 554)?
SELECT
L.ID
, L.label_number
FROM
Labels L
JOIN
Created_Labels CL
ON
L.ID BETWEEN CL.label_ID_from
AND CL.label_ID_to
Did you try this?
SELECT distinct label_number
FROM created_labeles c;
OR
SELEC distinct l.label_number
FROM created_labeles c,
labels l
WHERE c.label_number = l.label_numbers(+)
AND c.label_number is null
The above second query is the left outer join equivalent in Oracle SQl.