How do i create a join query that uses two or more columns?
Im trying to do something like this but I cant find any examples on how to join on multiple columns
let logMaster = Table1
let logClient = Table1
logMaster
| join kind=innerunique (logClient) on ($left.field1 == $right.field1 && $left.field2 == $right.field2)
Ive tried comma separated (which I think the documents kind of hint at, and &&, and AND, but none of them seem to work.
use the "and" keyword, here is an example:
let logMaster = datatable(a:string, b:string, c:long) ["a", "b", 5, "a", "v", 10] ;
let logClient = datatable(a:string, b:string, d:long) ["a", "b", 5, "a", "y", 10] ;
logMaster
| join kind=innerunique (logClient) on $left.a == $right.a and $left.b == $right.b
As a side note, using the "lookup" operator will likely provide you with better perf and remove the duplicate join columns
Related
Using relational db as an example, given two tables like below, where when rows in tableA and tableB have the same values, they represent the same "thing" but in different "state". So for ID=1, this thing has gone through stage1 and 2. But for ID=2, this thing has only gone through stage1.
tableA (Id, columnA, columnB)
1, "a", "b"
2, "x", "y"
3, "z", "a"
tableB (Id, columnA, columnB)
1, "e", "f"
I want to find all the rows from tableA that don't exist in tableB.
select a.*
from tableA a
left join
tableB b
on a.Id = b.Id
where b.Id is null
So above SQL will show rows 2 and 3 from tableA.
How can I do similar things in CouchDB? Say I have 4 docs that look like below.
{ "_id":"a-1", "type":"A", "correlation_id": "1" }
{ "_id":"a-2", "type":"A", "correlation_id": "2" }
{ "_id":"a-3", "type":"A", "correlation_id": "3" }
{ "_id":"b-1", "type":"B", "correlation_id": "1" }
How can I create a view that only show docs id = a-2 and a-3? I don't want to filter, just want to show all docs that haven't got type B. I could kinda do a group by and count(*) equivalent with view, but I can't do a group by, having count(*) = 1 equivalent.
I'm using CouchDB 3.0.
you could write a view :
function (doc) {
emit(doc.type, 1);
}
and then query the view using the key "A" and include_docs=true if you wanted the whole content of those documents.
If you not only want "A" but "everything but B" you can query from start to B and then from B to end and get all the documents that way.
Depending on your setup it might be easier to query the view with group_level=1 so you get all the keys and then loop through them excluding B to get the rest of the info you're interested in.
When you use the Ibis API to query impala, for some reason Ibis API forces it to become a subquery (when you join 4-5 tables it suddenly becomes super slow). It simply won't join normally, due to column name overlap problem on joins. I want a way to quickly rename the columns perhaps, isn't that's how SQL usually works?
i0 = impCon.table('shop_inventory')
s0 = impCon.table('shop_expenditure')
s0 = s0.relabel({'element_date': 'spend_element_date', 'element_shop_item': 'spend_shop_item'})
jn = i0.inner_join(s0, [i0['element_date'] == s0['spend_element_date'], i0['element_shop_item'] == s0['spend_shop_item']])
jn.materialize()
jn.execute(limit=900)
Then you have IBIS generating SQL that is SUBQUERYING it without me suggesting it:
SELECT *
FROM (
SELECT `element_date`, `element_shop_item`, `element_address`, `element_expiration`,
`element_category`, `element_description`
FROM dbp.`shop_inventory`
) t0
INNER JOIN (
SELECT `element_shop_item` AS `spend_shop_item`, `element_comm` AS `spend_comm`,
`element_date` AS `spend_date`, `element_amount`,
`element_spend_type`, `element_shop_item_desc`
FROM dbp.`shop_spend`
) t1
ON (`element_shop_item` = t1.`spend_shop_item`) AND
(`element_category` = t1.`spend_category`) AND
(`element_subcategory` = t1.`spend_subcategory`) AND
(`element_comm` = t1.`spend_comm`) AND
(`element_date` = t1.`spend_date`)
LIMIT 900
Why is this so difficult?
It should be ideally as simple as:
jn = i0.inner_join(s0, [s0['element_date'].as('spend_date') == i0['element_date']]
to generate a single: SELECT s0.element_date as spend_date, i0.element_date INNER JOIN s0 dbp.shop_spend ON s0.spend_date == i0.element_date
right?
Are we not ever allowed to have same column names on tables that are being joined? I am pretty sure in raw SQL you can just use "X AS Y" without having to need subquery.
I spent the last few hours struggling with this same issue. A better solution I found is to do the following. Join keeping the variable names the same. Then, before you materialize, only select a subset of the variables such that there isn't any overlap.
So in your code it would look something like this:
jn = i0.inner_join(s0, [i0['element_date'] == s0['element_date'], i0['element_shop_item'] == s0['element_shop_item']])
expr = jn[i0, s0['variable_of_interest_1'],s0['variable_of_interest_2']]
expr.materialize()
See here for more resources
https://docs.ibis-project.org/sql.html
I have an enormous select that schematically looks like this:
SELECT c_1, c_2, ..., c_j FROM t_1, t_2, ..., t_k
WHERE e_11 = e_12(+)
AND e_21 = e_22(+)
AND ...
AND e_l1 = e_l2(+)
ORDER BY o
where j, k and l are in hundreds and e_mn is a column from some table. I need to add new columns A_1 and A_2 to the select from a new table T. The new columns are connected to the former select via a column call it B from a table R. I want those rows where A_1 = B or A_2 = B or those rows where there is no correspondeing A_i to the value B.
Suppose I only had to deal with tables T and R then I want this:
SELECT * FROM R
LEFT OUTER JOIN T
ON (A_1 = B OR A_2 = B)
To mimic this behaviour I'd want something like this in the big select:
SELECT c_1, c_2, ..., c_j, A_1, A_2 FROM t_1, t_2, ..., t_k, T
WHERE e_11 = e_12(+)
AND e_21 = e_22(+)
AND ...
AND e_l1 = e_l2(+)
AND (B = A_1(+) OR B = A_2(+))
ORDER BY o
this is, however, syntactically incorrect since the (+) operator cannot be used with the OR caluse. And if I leave out the (+)'s I lose those rows where there is no corresponding A_i to B.
What are my options here? Can I somehow find a way to do this without changing the whole body of the select? I doubt there is a reasonable way to do this, nevertheless I'd appreciate any help.
Thanks.
I am looking for advice on optimizing the following sample query and processing the result. The SQL variant in use is the internal FileMaker ExecuteSQL engine which is limited to the SELECT statement with the following syntax:
SELECT [DISTINCT] {* | column_expression [[AS] column_alias],...}
FROM table_name [table_alias], ...
[ WHERE expr1 rel_operator expr2 ]
[ GROUP BY {column_expression, ...} ]
[ HAVING expr1 rel_operator expr2 ]
[ UNION [ALL] (SELECT...) ]
[ ORDER BY {sort_expression [DESC | ASC]}, ... ]
[ OFFSET n {ROWS | ROW} ]
[ FETCH FIRST [ n [ PERCENT ] ] { ROWS | ROW } {ONLY | WITH TIES } ]
[ FOR UPDATE [OF {column_expression, ...}] ]
The query:
SELECT item1 as val ,interval, interval_next FROM meddata
WHERE fk = 12 AND active1 = 1 UNION
SELECT item2 as val ,interval, interval_next FROM meddata
WHERE fk = 12 AND active2 = 1 UNION
SELECT item3 as val ,interval, interval_next FROM meddata
WHERE fk = 12 AND active3 = 1 UNION
SELECT item4 as val ,interval, interval_next FROM meddata
WHERE fk = 12 AND active4 = 1 ORDER BY val
This may give the following result as a sample:
val,interval,interval_next
Artelac,0,1
Artelac,3,6
Celluvisc,1,3
Celluvisc,12,24
What I am looking to achieve (in addition to suggestions for optimization) is a result formatted like this:
val,interval,interval_next,interval,interval_next,interval,interval_next,interval,interval_next ->etc
Artelac,0,1,3,6
Celluvisc,1,3,12,24
Preferably I would like this processed result to be produced by the SQL engine.
Possible?
Thank you.
EDIT: I included the column names in the result for clarity, though they are not part of the result. I wish to illustrate that there may be an arbitrary number of 'interval' and 'interval_next' columns in the result.
I do not think you need to optimise you query, looks fine to me.
You are looking for something like PIVOT in TSQL, which is not supported in FQL. You biggest issue is going to be a variable number of columns returned.
I think the best approach is to get your intermediate result and use a FileMaker script or Custom Function to pivot it.
An alternative is to get the list of distinct val and loop through them (with CF or script) with FQL Statement for each row. You will not be able to combine them with union as it needs the same number of columns.
How does one perform a join/merge using a prefix (of varying length) of a column as a key? I am trying to translate the following SQL code:
SELECT a.person_id, a.tn_code, b.list_id
FROM tblA a
INNER JOIN tblB b
ON tn_code LIKE TnCode + "%"
tblA
person_id tn_code
1 C18.4
2 M8820/9
3 X20.
...
tblB
ListID TnCode
1.01 A0.1
1.01 A0.2
...
I have ideas such as preparing a new key TnCode_prefix = gsub("^(.*)\\.(.*)$", "\\1", TnCode) and then joining on the new column, or using data.table's rolling join, but they are only approximate translations? Is there an exact equivalent in R?
I am aware of using sqldf and simply passing the original SQL statement to sqldf, but I'm wondering if there is another way.
What about creating a prefix on the fly and joining on that? I used dplyr to create the prefix and do the join.
library(dplyr)
# Fake Data
set.seed(1093)
tblA = data.frame(person_id=sample(1:10, 50, replace=TRUE),
tn_code = paste0(sample(paste0(paste0(rep(LETTERS[1:3],3),c(40:42,401:403,421:423))), 50, replace=TRUE),
".", sample(160:170, 50, replace=TRUE)))
tblB = data.frame(ListID=paste0(sample(1:10, 50, replace=TRUE),".",
sample(10:20, 50, replace=TRUE)),
TnCode = paste0(sample(paste0(paste0(rep(LETTERS[1:3],3),c(40:42,401:403,421:423))), 50, replace=TRUE),
".", sample(160:170, 50, replace=TRUE)))
# Join on first letter of tn_code and TnCode
newTbl = tblA %>% mutate(join_prefix=gsub("(.*)\\..*", "\\1", tn_code)) %>%
left_join(tblB %>% mutate(join_prefix=gsub("(.*)\\..*", "\\1", TnCode)),
by="join_prefix")