SQL Query: Update From - sql

Is this correct SQL:
UPDATE T1alias
SET T1alias.Row2 = T2alias.Row2
FROM
(
T1 AS T1alias
INNER JOIN
T2 AS T2alias
ON T1alias.Row1 = T2alias.Row1
)
This query seems to return the right results, but I dont understand why.
I mean the FROM clause refers to an complete different Dataset as to the table T1 which has to be updated.
F.e.:
T1 T2
---------------------- ----------------------
| Row1 | Row2 | Row3 | | Row1 | Row2 | Row3 |
---------------------- ----------------------
| 1 | 2 | 3 | | 1 | 7 | 8 |
--------------------- ----------------------
| 4 | 5 | 6 | | 9 | 10 | 11 |
---------------------- ----------------------
T1 INNER JOIN T2 ON T1alias.Row1 = T2alias.Row1
-------------------------------------------------------------
| T1.Row1 | T1.Row2 | T1.Row3 | T2.Row1 | T2.Row2 | T2.Row3 |
-------------------------------------------------------------
| 1 | 2 | 3 | 1 | 7 | 8 |
-------------------------------------------------------------
So how can I UPDATE T1 from the joined Table?
In my opinion these are complete different datasets.
I would understand the sql query if it would look like:
UPDATE T1alias
SET T1alias.Row2 = T2alias.Row2
FROM
(
T1 AS T1alias
INNER JOIN
T2 AS T2alias
ON T1alias.Row1 = T2alias.Row1
) AS T1T2JoinedAlias
WHERE T1T2JoinedAlias.Row1 = T1alias.Row1
Could someone explain these to me, please.
(I m working on Microsoft SQL Server 2008 R2)

If you look at the execution plan of your SQL statement you will understand what is going on:
As you can see (in my case) the Query Optimiser does a scan of both tables specified in the FROM clause and retrieves rows that fulfil the inner join.
These rows are then passed along the chain to the Table Update physical operator which, as you can see, is told to perform an update on T1 (you tell it to do this by saying "Update T1Alias" in your query above, you also tell it which field(s) to update by your SET command)
The query analyser tends to choose the best execution plan for your query after the algebrizer has compiled it into binary, so whether you get the same execution plan as me or not will depend on a number of factors including whether you have indexes on the tables.
Hope this helps.

Related

MS Access String Replacement Query

I have 2 tables, Table1 and Table2. I need to replace a string or a series of strings (separated by commas) in Table1 referred from Table2.
I did a query on this but no luck:
TableNew: Iif(Instr([Table1.ColumnX1],[Table2.ColumnY1],Replace([Table1.ColumnX1],[Table2.ColumnY1],[Table2.ColumnY2]),[Table1.ColumnX1])
What i wanted to achieve was like this, in Table1 ColumnX1 there is:
A,B,C,1,2,3,4,D,E,F,5,6
Then in Table2 I have:
+----------+-----------+
| ColumnY1 | ColumnY2 |
+----------+-----------+
| A | Z |
| B | Y |
| C | X |
| D | W |
| E | V |
| F | U |
+----------+-----------+
After running that Query, it would result to
Z,Y,X,1,2,3,4,W,V,U,5,6
I would like this to run in each row available in Table1.
Thanks in advance.
You can use a query such as the following to modify the values held by Table1:
update table1 inner join table2 on instr(1, table1.columnx1, table2.columny1) > 0
set table1.columnx1 = replace(table1.columnx1, table2.columny1, table2.columny2)
Note that the joins implemented in the above query cannot be displayed by the MS Access query designer, however, it is valid SQL which may be successfully executed by the JET database engine used by MS Access.

MariaDB: How to using "INSERT ... SELECT" with WITH statement

Note: This involves ColumnStore.
At work, we have a big SQL statement that takes too much memory to execute on prod. I'm currently working on reducing the size the query consumes. I've tried using different approaches, but nothing has solved the issue so far, except for WITH ... AS (...), for some reason. However, I need to combine this with an INSERT INTO ....
This is the code I'm trying to get working
TRUNCATE db1.myTable;
INSERT INTO db1.myTable(`all`, `needed`, `columns`)
(WITH everything AS (
SELECT all, needed, columns
FROM db1.mainTable T1
JOIN db1.secondTable T2
ON (T1.someCol = T2.someCol)
JOIN db2.thirdTable T3
ON (T1.anotherCol = T3.anotherCol)
LEFT JOIN db1.fourthTable T4
ON (T4.anotherCol = T1.anotherCol)
WHERE T2.yetAnotherCol >= (some_SELECT_subquery)
AND T1.valid = 1
) SELECT * FROM everything);
EXPLAIN (WITH everything AS ... returns
+------+-------------+-----------------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 16000000000000 | |
| 2 | PRIMARY | T1 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where with pushed condition |
| 2 | PRIMARY | T2 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where; Using join buffer (flat, BNL join) |
| 2 | PRIMARY | T3 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where; Using join buffer (flat, BNL join) |
| 2 | PRIMARY | T4 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where |
| 3 | SUBQUERY | some_SELECT_subquery | ALL | NULL | NULL | NULL | NULL | 2000 | Using where with pushed condition |
+------+-------------+-----------------------+------+---------------+------+---------+------+------+-------------------------------------------------+
5 rows in set (0,21 sec)
If I only use the WITH-statement, I can get it to work. As in, I don't use the INSERT INTO. No issues at all, and the query is even faster this way. I also did I quick test with trying to divide the query into several WITHs, but gave up since I believe I messed up the syntax. I'm not too good with SQL, and even less so with JOINs(junior developer).
When I combine the WITH-statement with an INSER INTO ..., MariaDB responds with ERROR 1064 (42000) at line 3: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ') SELECT * FROM everything)' at line 1. I've also tried adding a semicolon after ... valid = 1, merging the two last lines, positioning the open parentheses after ... AS on a new line, and some other issues I could think of that might be syntax related. No luck.
My current thought is that you can't combine INSERT INTO ... SELECT ... with a WITH .... At least not having the WITH at the beginning, where the SELECT should be. This is what I can gather from the docs.
So, in short, my question is: can I combine INSERT INTO ... SELECT with a WITH-statement at all? If not, can I achieve something similar with another technique?
Are there any other ways I can improve the memory utilization of my query? I'd rather not mess with configuration options for MariaDB or Docker, but if that's the only possibility, I'll consider it.
Have you tried this?
TRUNCATE db1.myTable;
WITH everything AS (
SELECT all, needed, columns
FROM db1.mainTable T1
JOIN db1.secondTable T2
ON (T1.someCol = T2.someCol)
JOIN db2.thirdTable T3
ON (T1.anotherCol = T3.anotherCol)
LEFT JOIN db1.fourthTable T4
ON (T4.anotherCol = T1.anotherCol)
WHERE T2.yetAnotherCol >= (some_SELECT_subquery)
AND T1.valid = 1
) INSERT INTO db1.myTable SELECT * FROM everything;
Although I didn't find an answer to my original question, we decided to work around the problem by reducing the amount of data gathered in the subquery. I didn't disclose this in the original question because that wasn't a solution I was aware of when posting the question. We'll just call the SQL from a Python script where we can loop over the week numbers we'd like to fetch.
WHERE T2.ID >= (SELECT ID - {week_number} FROM db1.secondTable WHERE NOW() BETWEEN monday AND sunday) AND T1.valid = 1);

Why is this Query not Updateable?

I was looking to provide an answer to this question in which the OP has two tables:
Table1
+--------+--------+
| testID | Status |
+--------+--------+
| 1 | |
| 2 | |
| 3 | |
+--------+--------+
Table2
+----+--------+--------+--------+
| ID | testID | stepID | status |
+----+--------+--------+--------+
| 1 | 1 | 1 | pass |
| 2 | 1 | 2 | fail |
| 3 | 1 | 3 | pass |
| 4 | 2 | 1 | pass |
| 5 | 2 | 2 | pass |
| 6 | 3 | 1 | fail |
+----+--------+--------+--------+
Here, the OP is looking to update the status field for each testID in Table1 with pass if the status of all stepID records associated with the testID in Table2 have a status of pass, else Table1 should be updated with fail for that testID.
In this example, the result should be:
+--------+--------+
| testID | Status |
+--------+--------+
| 1 | fail |
| 2 | pass |
| 3 | fail |
+--------+--------+
I wrote the following SQL code in an effort to accomplish this:
update Table1 a inner join
(
select
b.testID,
iif(min(b.status)=max(b.status) and min(b.status)='pass','pass','fail') as v
from Table2 b
group by b.testID
) c on a.testID = c.testID
set a.testStatus = c.v
However, MS Access reports the all-too-familiar, 'operation must use an updateable query' response.
I know that a query is not updateable if there is a one-to-many relationship between the record being updated and the set of values, but in this case, the aggregated subquery would yield a one-to-one relationship between the two testID fields.
Which left me asking, why is this query not updateable?
You're joining in a query with an aggregate (Max).
Aggregates are not updateable. In Access, in an update query, every part of the query has to be updateable (with the exception of simple expressions, and subqueries in WHERE part of your query), which means your query is not updateable.
You can work around this by using domain aggregates (DMin and DMax) instead of real ones, but this query will take a large performance hit if you do.
You can also work around it by rewriting your aggregates to take place in an EXISTS or NOT EXISTS clause, since that's part of the WHERE clause thus doesn't need to be updateable. That would likely minimally affect performance, but means you have to split this query in two: 1 query to set all the fields to "pass" that meet your condition, another to set them to "fail" if they don't.

Oracle 10 SQL: FULL JOIN through Cross Reference Table

http://sqlfiddle.com/#!4/24637/1
I have three tables, (better details/data shown in sqlfiddle link), one replacing another, and a cross reference table in between. One of the fields in each of the table uses the cross reference (version), and another one of the fields in each of the tables is the same (changeID).
I need a query that when passed a list of new_version + new_changeType, along with the equivalent original_version + old_changeType (if there is an old version equivalent) PLUS any old changeIDs that were 'missed' in the conversion of data.
TABLES (fields on the same line are equivalent)
OLD_table | XREF_table | NEW_Table
original_version | original_version |
changeID | | changeID
OLD_changeType | |
| new_version | new_version
| | NEW_changeType
DATA
111,1,CT1 | 111,AAA | AAA,1,ONE
111,2,CT2 | 222,BBB | AAA,2,TWO
222,1,CT1 | 333,DDD | BBB,1,ONE
222,2,CT2 | | BBB,2,TWO
222,3,CT3 | | CCC,1,ONE
333,1,CT1 | |
444,1,CT1 | |
If passed the following list, the result set should look like so. (order doesnt matter)
AAA,BBB,CCC
| NEW_VERSION | NEW_CHANGE_TYPE| ORIGINAL_VERSION | CHANGEID | OLD_CHANGE_TYPE |
|-------------|----------------|------------------|----------|-----------------|
| AAA | ONE | 111 | 1 | CT1 |
| AAA | TWO | 111 | 2 | CT2 |
| BBB | ONE | 222 | 1 | CT1 |
| BBB | TWO | 222 | 2 | CT2 |
| CCC | ONE | (null) | (null) | (null) |
| (null) | (null) | 222 | 3 | CT3 |
I'm having trouble getting ALL the data required. I've played with the following query, however I seem to either 1) miss a row or 2) get additional rows not matching the requirements.
The following queries I've played with are as follows.
select
a.new_version,
a.Change_type,
c.original_version,
c.changeID,
c.OLD_Change_type
from NEW_TABLE a
LEFT OUTER JOIN XREF_TABLE b on a.new_version = b.new_version
FULL OUTER JOIN OLD_TABLE c on
b.original_version = c.original_version and a.changeID = c.changeID
where (b.new_version in ('AAA','BBB','CCC') or b.new_version is null);
select
a.new_version,
a.Change_type,
c.original_version,
c.changeID,
c.OLD_Change_type
from NEW_TABLE a
FULL JOIN XREF_TABLE b on a.new_version = b.new_version
FULL JOIN OLD_TABLE c on
b.original_version = c.original_version and a.changeID = c.changeID
where (a.new_version in ('AAA','BBB','CCC'));
The first returns one 'extra' row with the 333,DDD data, which is not specified from the input.
The seconds returns one less row (with the changeID from the old table "missed" from when this data was converted over.
Any thoughts or suggestions on how to solve this?
First inner join old_table and xref_table, as you are not interested in any old_table entries without an xref_table entry. Then full outer join new_table. In your WHERE clause be aware that new_table.new_version can be null, so use coalesce to use xref_table.new_version in this case to limit your results to AAA, BBB and CCC. That's all.
select
coalesce(n.new_version, x.new_version) as new_version,
n.change_type,
o.original_version,
o.changeid,
o.old_change_type
from old_table o
inner join xref_table x
on x.original_version = o.original_version
full outer join new_table n
on n.new_version = x.new_version
and n.changeid = o.changeid
where coalesce(n.new_version, x.new_version) in ('AAA','BBB','CCC')
order by 1,2,3,4,5
;
Here is your fiddle: http://sqlfiddle.com/#!4/24637/11.
BTW: Better never use random aliases like a, b and c that don't indicate what table is meant. That makes the query harder to understand. Use the table's first letter(s) or an acronym instead.

Joining tables if the reference exists

I got a PostgreSQL database with 4 tables:
Table A
---------------------------
| ID | B_ID | C_ID | D_ID |
---------------------------
| 1 | 1 | NULL | NULL |
---------------------------
| 2 | NULL | 1 | NULL |
---------------------------
| 3 | 2 | 2 | 1 |
---------------------------
| 4 | NULL | NULL | 2 |
---------------------------
Table B
-------------
| ID | DATA |
-------------
| 1 | 123 |
-------------
| 2 | 456 |
-------------
Table C
-------------
| ID | DATA |
-------------
| 1 | 789 |
-------------
| 2 | 102 |
-------------
Table D
-------------
| ID | DATA |
-------------
| 1 | 654 |
-------------
| 2 | 321 |
-------------
I'm trying to retrieve a result set which has joined the data from table B and the data from table C, only if one of booth IDs is not null.
SELECT "Table_A"."ID", "Table_A"."ID_B", "Table_A"."ID_C", "Table_A"."ID_D", "Table_B"."DATA", "Table_C"."DATA"
FROM "Table_A"
LEFT JOIN "Table_B" on "Table_A"."ID_B" = "Table_B"."ID"
LEFT JOIN "Table_C" on "Table_A"."ID_C" = "Table_C"."ID"
WHERE "Table_A"."ID_B" IS NOT NULL OR "Table_A"."ID_C" IS NOT NULL;
Is this recommended or should I better split this in multiple queries?
Is there a way to do an inner join between these tables?
The result I expect is:
-------------------------------------------------
| ID | ID_B | ID_C | ID_D | DATA (B) | DATA (C) |
-------------------------------------------------
| 1 | 1 | NULL | NULL | 123 | NULL |
-------------------------------------------------
| 2 | NULL | 1 | NULL | NULL | 789 |
-------------------------------------------------
| 3 | 2 | 2 | NULL | 456 | 102 |
-------------------------------------------------
EDIT: ID_B, ID_C, ID_D are foreign keys to the tables table_b, table_c, table_d
The WHERE "Table_A"."ID_B" IS NOT NULL OR "Table_A"."ID_C" IS NOT NULL; can be replaced by the corresponding clause on the B and C tables : WHERE "Table_B"."ID" IS NOT NULL OR "Table_C"."ID" IS NOT NULL; . This would also work if table_a.id_b and table_a.id_c are not FKs to the B and C tables. Otherwise, a table_a row with { 5, 5,5,5} would retrieve two NULL rows from the B and C tables.
SELECT ta."ID" AS a_id
, ta."ID_B" AS b_id
, ta."ID_C" AS c_id
, ta."ID_D" AS d_id
, tb."DATA" AS bdata
, tc."DATA" AS cdata
FROM "Table_a" ta
LEFT JOIN "Table_B" tb on ta."ID_B" = tb."ID"
LEFT JOIN "Table_C" tc on ta."ID_C" = tc."ID"
WHERE tb."ID" IS NOT NULL OR tc."ID" IS NOT NULL
;
Since you have foreign key constraints in place, referential integrity is guaranteed and the query in your Q is already the best answer.
Also indexes on Table_B.ID and Table_C.ID are given.
If matching cases in Table_A are rare (less than ~ 5 %, depending on row with and data distribution) a partial multi-column index would help performance:
CREATE INDEX table_a_special_idx ON "Table_A" ("ID_B", "ID_C")
WHERE "ID_B" IS NOT NULL OR "ID_C" IS NOT NULL;
In PostgreSQL 9.2 a covering index (index-only scan in Postgres parlance) might help even more - in which case you would include all columns of interest in the index (not in my example). Depends on several factors like row width and frequency of updates in your table.
Given your requirements, your query seems good to me.
An alternative would be to use nested selects in the projection, but depending on your data, indexes and constraints, that might be slower, as nested selects usually result in nested loops, whereas joins can be performed as merge joins or nested loops:
SELECT
"Table_A"."ID",
"Table_A"."ID_B",
"Table_A"."ID_C",
"Table_A"."ID_D",
(SELECT "DATA" FROM "Table_B" WHERE "Table_A"."ID_B" = "Table_B"."ID"),
(SELECT "DATA" FROM "Table_C" WHERE "Table_A"."ID_C" = "Table_C"."ID")
FROM "Table_A"
WHERE "Table_A"."ID_B" IS NOT NULL OR "Table_A"."ID_C" IS NOT NULL;
If Postgres does scalar subquery caching (as Oracle does), then nested selects might help in case you have a lot of data repetition in Table_A
Generally spealking the recommended way is to do it in one query only, and let the database do as much work as possible, especially if you add other operations like sorting (order by) or pagination later (limit ... offset ...) later. We have done some measurements, and there is no way to sort/paginate faster in Java/Scala, if you use any of the higher level collections like lists etc.
RDBMS deal very well with single complex statements, but they have difficulties in handling many small queries. For example, if you query the "one" and the "many relation" in one query, it will be faster than doing this in 1 + n select statements.
As for the outer join, we have done measurements, and there is no real performance penalty compared with inner joins. So if your data model and/or your query require an outer join, just do it. If it was a performance problem, you can tune it later.
As for your null comparisons, it might indicate that your data model could be optimized, but that is just a guess. Chances are that you can improve the design so that null is not allowed in these columns.