INSERT INTO SELECT CROSS JOIN Composite Primary Key - sql

I'm performing an INSERT INTO SELECT statement in SQL Server. The situation is that there are two Primary keys of two different tables, without anything in common, that are both foreign keys of a third table, forming a composite primary key in that last table. This can usually be accomplished with a cross join - for example,
Table1.ID(PK)
Table2.Code(PK)
-- Composite PK for Table3
Table3.ID(FK)
Table3.Code(FK)
INSERT INTO Table3
SELECT ID, Code
FROM Table1
CROSS JOIN Table2
WHERE Some_conditions...
I'm getting a "Cannot insert duplicate key row" error. It will not allow Table2.Code to be repeated in Table3, since it is a unique ID, even though the primary key of Table3 is Table1.ID combined with Table2.Code. Hence, the following pairs should be recognized as different PK values in Table3 for example: {1024, PSV} and {1027, PSV}.
Is there a way to fix this, or have I designed this database incorrectly?
I have considered creating a third unique ID for Table3, but it is highly impractical in this scenario.

This will help you locate the problem:
SELECT ID, Code
FROM Table1
CROSS JOIN Table2
WHERE Some_conditions...
GROUP BY ID, Code
HAVING COUNT(*) > 1

I presume that the reason you are getting this error is because table 2 has multiple rows of the same code for the same ID.
For example, table 2 might have two or more rows of ID 1024 and code 'PSV'.
A simple solution to fix this would be to modify your code as follows:
INSERT INTO Table3
SELECT DISTINCT ID, Code
FROM Table1
CROSS JOIN Table2
WHERE Some_conditions...

SQL Server had created a unique, non-clustered index for Table3 that was preventing the INSERT INTO statement from executing. I disabled it with SQL Server Management Studio Object Explorer and it allowed me to enter the rows.

Related

Why query with "in" and "on" statement runs infinitely

I have three tables, table3 is bascially the intermediate table of table1 and table2. When I execute the query statement that contains "in" and joins table1 and table3, it just kept running and I could not get the result. If I use id=134 instead of id in (134,267,390,4234 ... ), the result comes up. I don't understand why "in" has the effect, does anyone have an idea?
Query statement:
select count(*) from table1, table3 on id=table3.table1_id where table3.table2_id = 123 and id in (134,267,390,4234) and item = 30;
table structure:
table1:
id integer primary key,
item integer
table2:
id integer,
item integer
table3:
table1_id integer,
table2_id integer
-- the DB without index was 0.8 TB after the three indices is now 2.5 TB
indices on: table1.item, table3.table1_id, table3.table2_id
env: Linux, sqlite 3.7.17
from table1, table3 is a cross join on most databases, with the size of your data a cross join is enormous, but in SQLite3 it's an inner join. From the SQLite SELECT docs
Side note: Special handling of CROSS JOIN. There is no difference between the "INNER JOIN", "JOIN" and "," join operators. They are completely interchangeable in SQLite.
That's not your problem in this specific instance, but let's not tempt fate; always write out your joins explicitly.
select count(*)
from table1
join table3 on id=table3.table1_id
where table3.table2_id = 123
and id in (134,267,390,4234);
Since you're just counting, you don't need any data from table1 but the ID. table3 has table1_id, so there's no need to join with table1. We can do this entirely with the table3 join table.
select count(*)
from table3
where table2_id = 123
and table1_id in (134,267,390,4234);
SQLite can only use one index per table. For this to be performant on such a large data set, you need a composite index of both columns: table3(table1_id, table2_id). Presumably you don't want duplicates, so this should take the form of a unique index. That will cover queries for just table1_id and queries for both table1_id and table2_id; you should drop your table1_id index to save space and time.
create unique index table3_unique on table3(table1_id, table2_id);
The composite index will not for queries which use only table2_id, keep your existing table2_id index.
Your query should now run lickity-split.
For more, read about the SQLite Query Optimizer.
A terabyte is a lot of data. While SQLite technicly can handle this, it might not be the best choice. It's great for small and simple databases, but it's missing a lot of features. You should look into a more powerful database such as PostgreSQL. It is not a magic bullet, all the same principles apply, but it is much more appropriate for data at that scale.

Joining multiple tables with single join clause (sqlite)

So I'm learning SQL (sqlite flavour) and looking through the sqlite JOIN-clause documentation, I figure that these two statements are valid:
SELECT *
FROM table1
JOIN (table2, table3) USING (id);
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id)
(or even, but that's beside the point:
SELECT *
FROM table1
JOIN (table 2 JOIN table3 USING id) USING id
)
Now I've seen the second one (chained join) a lot in SO questions on JOIN clauses, but rarely the first (grouped table-query). Both querys execute in SQLiteStudio for the non-simplified case.
A minimal example is provided here based on this code
CREATE TABLE table1 (
id INTEGER PRIMARY KEY,
field1 TEXT
)
WITHOUT ROWID;
CREATE TABLE table2 (
id INTEGER PRIMARY KEY,
field2 TEXT
)
WITHOUT ROWID;
CREATE TABLE table3 (
id INTEGER PRIMARY KEY,
field3 TEXT
)
WITHOUT ROWID;
INSERT INTO table1 (field1, id)
VALUES ('FOO0', 0),
('FOO1', 1),
('FOO2', 2),
('FOO3', 3);
INSERT INTO table2 (field2, id)
VALUES ('BAR0', 0),
('BAR2', 1),
('BAR3', 3);
INSERT INTO table3 (field3, id)
VALUES ('PIP0', 0),
('PIP1', 1),
('PIP2', 2);
SELECT *
FROM table1
JOIN (table2, table3) USING (id);
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id);
Could someone explain why one would use one over the other and if they are not equivalent for certain input data, provide an example? The first certainly looks cleaner (at least less redundant) to me.
INNER JOIN ON vs WHERE clause has been suggested as a possible duplicate. While it touches on the use of , as a join operator, I feel the questions and especially the answers are more focussed on the readability aspect and use of WHERE vs JOIN. My question is more about the general validity and possible differences in outcome (given the necessary input to induce the difference).
SQLite does not enforce a proper join syntax. It sees the join operator ([INNER] JOIN, LEFT [OUTER] JOIN, etc., even the comma of the outdated 1980s join syntax) separate from the condition (ON, USING). That is not good, because it makes joins more prone to errors. The SQLite docs are hence a very bad reference for learning joins. (And SQLite itself a bad system for learning them, because the DBMS doesn't detect standard SQL join violations.)
Stick to the syntax defined by the SQL standard (and don't ever use comma-separated joins):
FROM table [alias]
((([INNER] | [(LEFT|FULL) [OUTER]]) JOIN table [alias] (ON conditions | USING ( columns ))) | (CROSS JOIN table [alias]))
((([INNER] | [(LEFT|FULL) [OUTER]]) JOIN table [alias] (ON conditions | USING ( columns ))) | (CROSS JOIN table [alias]))
...
(Hope, I've got this right :-) And I also hope this is readable enough :-| I've omitted NATURAL JOIN and RIGHT [OUTER] JOIN here, because I don't recommend using them at all.)
For table you can place some table name or view or a subquery (the latter including parentheses, e.g. (select * from mytable)). Columns in USING have to be surrounded by parentheses (e.g. USING (a, b, c)). (You can of couse use parentheses around ON conditions as well, if you find this more readable.)
In your case, a properly written query would be:
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id)
or
SELECT *
FROM table1 t1
JOIN table2 t2 ON t2.id = t1.id
JOIN table3 t3 ON t3.id = t1.id
for instance. The example suggests three 1:1 related tables, though. In real life these are extremely rare and a more typical example would be
SELECT *
FROM table1 t1
JOIN table2 t2 ON t2.t1_id = t1.id
JOIN table3 t3 ON t3.t2_id = t2.id
After fixing syntax, these are not the same for all tables, read the syntax & definitions of the join operators in the manual. Comma is cross join with lower precedence than join keyword joins. Different DBMS's SQLs have syntax variations. Read the manual. Some allow naked join for cross join.
using returns only one column for each specified column name & natural is using for all common columns; but other joins are based on cross join & return a column for every input column. So since here tables 2 & 3 have id columns the comma returns a table with 2 id columns. Then using (id) doesn't make sense since one operand has 2 id columns.
If only tables 1 & 3 have an id column, clearly the 2nd query can't join 1 & 2 using id.
There are always many ways to express things. In particular SQL DBMSs execute many different expressions the same way. Research re relational query implementation/optimization in general, in SQL & in your DBMS manual. Generally no simple query variations like these make a difference in execution for the simplest query engine. (We see that in SQLite cross join "is handled differently by the query optimizer".)
First learn to write straightforward queries & learn what the operators do & what their syntax & restrictions are.

How to sum or add two values by using SQL command

How to add them together?
Need to be in vb.net
Two value statement as below:
(SELECT SUM(ChildName) FROM Child SA WHERE SA.Name=A.Name AND SA.Health_Status=1 AND SA.Parrent_ID IS NOT NULL) AS Present_CHILD
(SELECT SUM(LATE_COMING_CHILD) FROM LATE_COME SB WHERE SB.Name=A.Name) AS LATE_CHILD
You can use what is referred to as a "scalar subquery":
select (select Name from table1) + (select Name from table2)
In your example, if Table1 and Table2 have a reference that relate them to each other, you can add their fields. The best way is defining a foreign key for one of them, referring to primary key of other table.
For example you can define a new column in Table2 named Table1Id, and rewrite the query as bellow:
SELECT Table1.Name+Table2.Name
FROM Table2
INNER JOIN Table1
ON Table2.Table1Id=Table1.Id
If there is no relation between Table1 and Table2, so there is no meaning for adding fields of these tables.
In the edited situation, the query may be as follows:
SELECT SA.ChildName+' '+SB.LATE_COMING_CHILD AS AllNames
FROM LATE_COME SB
INNER JOIN Child SA
ON SB.ChildId=SA.Id
WHERE
SA.Health_Status=1
AND
SA.Parrent_ID IS NOT NULL
I don't understand why all names of a person must use in one filed!
Additionally I suggest you learn SQL from scratch...

MS SQL Insert from multiple rows

I´m trying to insert values into a database from several tables:
insert dbo.Table1
(ID, IDTable2, IDCounter)
select
s.ID
o.IDTable2
o.IDCounter
from dbo.table3 as o, dbo.table2 as s
But the above code leads to the situation that i get duplicate values for the values from table 3, where i just want one value from table3 per value from table 2. (The tables don´t have any relationship info)
Thanks in advance.
Edit:
Thanks everyone. I managed to solve it by adding a reference id to table2 and then using an inner join.
You forgot the clause for joining the tables table2 and table3, thus building a cartesian product. There must be something linking the tables; otherwise it makes no sense to combine them as you do.
My advice: Stay away from that old join syntax where you list the tables comma-separated. Use up-to-date join syntax (INNER JOIN ON etc.) instead.
Having said this, I don't see how your select will make sense anyhow. ID is to become the ID of table2 and IDTable2 is to become the ID of table2, too?
You should not use full join as this will lead to table3 rows x table2 rows. Use inner join instead.

How to auto increment a value in one table when inserted a row in another table

I currently have two tables:
Table 1 has a unique ID and a count.
Table 2 has some data columns and one column where the value of the unique ID of Table 1 is inside.
When I insert a row of data in Table 2, the the count for the row with the referenced unique id in Table 1 should be incremented.
Hope I made myself clear. I am very new to PostgreSQL and SQL in general, so I would appreciate any help how to do that. =)
You could achieve that with triggers.
Be sure to cover all kinds of write access appropriately if you do. INSERT, UPDATE, DELETE.
Also be aware that TRUNCATE on Table 2 or manual edits in Table 1 could break data integrity.
I suggest you consider a VIEW instead to return aggregated results that are automatically up to date. Like:
CREATE VIEW tbl1_plus_ct AS
SELECT t1.*, t2.ct
FROM tbl1 t1
LEFT JOIN (
SELECT tbl1_id, count(*) AS ct
FROM tbl2
GROUP BY 1
) t2 USING (tbl1_id)
If you use a LEFT JOIN, all rows of tbl1 are included, even if there is no reference in tbl2. With a regular JOIN, those rows would be omitted from the VIEW.
For all or much of the table, it is fastest to aggregate tbl2 first in a subquery, then join to tbl1 - like demonstrated above.
Instead of creating a view, you could also just use the query directly, and if you only fetch a single row, or only few, this alternative form would perform better:
SELECT t1.*, count(t2.tbl1_id) AS ct
FROM tbl1 t1
LEFT JOIN tbl2 t2 USING (tbl1_id)
WHERE t1.tbl1_id = 123 -- for example
GROUP BY t1.tbl1_id -- being the primary key of tbl1!