how to insert using join in query - sql

i have a table like this
registrationId | standardId | courseId | marks
===============================================
5001 | 1 | 1 | 67
5001 | 1 | 2 | 87
and so on my question is the standard name and course name come from different tables so while updating i have to use 3 diffent queries 1st to get standardId according to standard name courseid acording to coursename and later update into this table. can it be done in one query?

Punctuation exists for a reason; you should use it. Without it, such a long sentence is difficult to read and understand.
Title says "insert", body says "update"; which one is it?
For UPDATE, something like this might do what you need:
update this_table tt set
tt.standardid = (select s.standardid
from standard s
where s.standard_name = :some_unique_standard_name
),
tt.courseid = (select c.courseid
from course c
where c.course_name = :some_unique_course_name
)
where tt.registrationid = :some_registration_id;
For INSERT:
insert into this_table
(registrationid, standardid, courseid, marks)
select
:some_registration_id,
(select s.standardid
from standard s
where s.standard_name = :some_unique_standard_name
),
(select c.courseid
from course c
where c.course_name = :some_unique_course_name
),
:marks_value
from dual;
in both cases, subqueries must return a single value
colon (:) represents a variable whose value you need to provide
syntax is based on Oracle; your database might differ (especially for INSERT, as there's probably no DUAL table elsewhere).

Related

Can I get duplicate results (from one table) in an INTERSECT operation between two tables?

I know the wording of the question is awkward, but I couldn't phrase it any better. Let me explain the situation.
There's table A which has a bunch of columns (a, b, c ... ) and I run a SELECT query on it like so:
SELECT a FROM A WHERE b IN ('....') (the ellipsis indicates a number of values to be matched to)
There's another table B which has a bunch of columns (d, e, f ... ) and I run a SELECT query on it like so:
SELECT d FROM B WHERE f = '...' (the ellipsis indicates a single value to be matched to)
Now I should say here that the two tables store different types of information about the same entity, but the columns a and d contain the exact same data (in this case, an ID). I want to find out the intersection of the two tables so I run this:
SELECT a FROM A WHERE b IN ('....') INTERSECT SELECT d FROM B WHERE f = '...'
Now here's the problem:
The first SELECT contains a set of values in the WHERE clause, right? So let's say the set is (1234, 2345,3456). Now, the result of this query when b is matched ONLY to 1234 is, let's say, abc. When it's matched to 2345, it's def, suppose. And matching to 3456, it gives abc.
Let's suppose these two results (abc and def) are also in the set of results from the second SELECT.
So, now, putting back the entire set of values to matched into the WHERE clause, the INTERSECT operation will give me abc and def. But I want abc twice since two values in the WHERE clause set match to the second SELECT.
Is there any way I can get that?
I hope it's not too complicated to understand my problem. This is a real-life problem I'm facing in my job.
Data structure and my code
Table A contains general information about a company:
company_id | branch_id | no_of_employees | city
Table B contains the financials of the company:
company_id | branch_id | revenue | profits
First SELECT:
SELECT branch_id FROM A WHERE CITY IN ('Dallas', 'Miami', 'New Orleans')
Now, running each city separately in the first SELECT, I get the branch_ids:
branch_id | city
23 | Dallas
45 | Miami
45 | New Orleans
Once again, this seems impractical as to how two cities can have the same branch ids, but please bear with me on this.
Second SELECT:
SELECT branch_id FROM B
WHERE REVENUE = 5000000
I know this is a little impractical, but for the purpose of this example, it suffices.
Running this query I get the following set:
11
23
45
22
10
So the INTERSECT will give me just 23 and 45. But I want 45 twice, since both Miami and New Orleans have that branch_id and that branch_id has generated a revenue of 5 million.
Directly from Microsoft's documentation (https://msdn.microsoft.com/en-us/library/ms188055.aspx)
:
"INTERSECT returns distinct rows that are output by both the left and right input queries operator."
So NO, it is not possible to get the same value twice when using INTERSECT because the results will be DISTINCT. However if you build an INNER JOIN correctly you can do essentially the same thing as INTERSECT except keep the repetitive results by NOT using distinct or group by.
SELECT
A.a
FROM
A
INNER JOIN B
ON A.a = B.d
AND B.F = '....'
WHERE b IN ('....')
And for your specific Example that you edited:
SELECT
branch_id
FROM
A
INNER JOIN B
ON A.branch_id = B.branch_id
AND B.REVENUE = 5000000
WHERE A.CITY IN ('Dallas', 'Miami', 'New Orleans')
You overcomplicated your task a lot:
SELECT *
FROM A
WHERE CITY IN (...)
AND EXISTS
(
SELECT 1 FROM B
WHERE B.REVENUE = 5000000
AND B.branch_id = A.branch_id
)
INTERSECT and EXCEPT are both returning row sets with DISTINCT applied.
Regular joining/filtering operations are not performed by INTERSECT or EXCEPT.

MS Access Database tables comparison

I am trying to compare three MS Access tables for any given field. For example, I have a Main Table, which holds the record for school children. It has the fields Student ID and Name. Then there are 3 sub-tables schools, but they have some data discrepancy. So lets call these schools, A, B and C. These schools have somehow mixed up Student ID with Name, so I need a way to return any Student ID, which has a mismatch for Name. The Main table has student ID as the PKey, and the other; A, B & C have student ID as PKey as well. But the problem is that when I build relationships in Access, it only returns IDs that are common in all 3 tables - INNER JOIN. I need an efficient way to match schools, A -> B & A -> C and concatenate the results. I think JOINING each of these in pairs might take far too long. Please let me know if you have any other alternatives.
So, you have two problems:
You have bad data that needs to be fixed Student_ID and NAme mixed
up
Your schema is not good.
Addressing the data issue:
If your student_ids are all numeric, you could try something like:
UPDATE subA SET student_id = [name], [name]=student_id WHERE isnumeric([name]);
And repeat for the other mixed up sub tables.
Addressing the schema issue:
You have three "Subtables" one for each school. These three tables should be a single table, and "School" should be a field in that table. So your data looks something like:
+--------+------------+---------+
| School | Student_Id | Name |
+--------+------------+---------+
| A | 1 | John |
| A | 2 | Jasmine |
| B | 3 | Fred |
| C | 5 | Harold |
| C | 6 | Donna |
+--------+------------+---------+
This way you only join in a single table, and your data only grows in rows as new schools are brought into your database.
Second, if I'm reading your question correctly, you have both student_id and name in the main table as well as the three sub-tables? It seems like you should only keep these in a single table, maybe named student.
Lastly, you can combine the three subtables into a single view that will make it 9000% (guesstimate) easier to join for future queries, using a UNION query:
SELECT 'A' as school, student_id, name FROM subA
UNION ALL
SELECT 'B', student_id, name FROM subB
UNION ALL
SELECT 'C', student_id, name FROM subC
This will stack all three tables on top of each other and give you a schema similar to the example above. You can join to your main table like:
SELECT *
FROM mainTable
INNER JOIN
(
SELECT 'A' as school, student_id, name FROM subA
UNION ALL
SELECT 'B', student_id, name FROM subB
UNION ALL
SELECT 'C', student_id, name FROM subC
) AS subs ON
mainTable.student_id = subs.student_id

Return rows of a table that actually changed in an UPDATE

Using Postgres, I can perform an update statement and return the rows affected by the commend.
UPDATE accounts
SET status = merge_accounts.status,
field1 = merge_accounts.field1,
field2 = merge_accounts.field2,
etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
RETURNING accounts.*
This will give me a list of all records that matched the WHERE clause, however will not tell me which rows were actually updated by the operation.
In this simplified use-case it of course would be trivial to simply add another guard AND status != 'Closed, however my real world use-case involves updating potentially dozens of fields from a merge table with 10,000+ rows, and I want to be able to detect which rows were actually changed, and which are identical to their previous version. (The expectation is very few rows will actually have changed).
The best I've got so far is
UPDATE accounts
SET x=..., y=...
FROM accounts as old WHERE old.uid = accounts.uid
FROM merge_accounts WHERE merge_accounts.uid = accounts.uid
RETURNING accounts, old
Which will return a tuple of old and new rows that can then be diff'ed inside my Java codebase itself - however this requires significant additional network traffic and is potentially error prone.
The ideal scenario is to be able to have postgres return just the rows that actually had any values changed - is this possible?
Here on github is a more real world example of what I'm doing, incorporating some of the suggestions so far.
Using Postgres 9.1, but can use 9.4 if required. The requirements are effectively
Be able to perform an upsert of new data
Where we may only know the specific key/value pair to update on any given row
Get back a result containing just the rows that were actually changed by the upsert
Bonus - get a copy of the old records as well.
Since this question was opened I've gotten most of this working now, although I'm unsure if my approach is a good idea or not - it's a bit hacked together.
Only update rows that actually change
That saves expensive updates and expensive checks after the UPDATE.
To update every column with the new value provided (if anything changes):
UPDATE accounts a
SET (status, field1, field2) -- short syntax for ..
= (m.status, m.field1, m.field2) -- .. updating multiple columns
FROM merge_accounts m
WHERE m.uid = a.uid
AND (a.status IS DISTINCT FROM m.status OR
a.field1 IS DISTINCT FROM m.field1 OR
a.field2 IS DISTINCT FROM m.field2)
RETURNING a.*;
Due to PostgreSQL's MVCC model any change to a row writes a new row version. Updating a single column is almost as expensive as updating every column in the row at once. Rewriting the rest of the row comes at practically no cost, as soon as you have to update anything.
Details:
How do I (or can I) SELECT DISTINCT on multiple columns?
UPDATE a whole row in PL/pgSQL
Shorthand for whole rows
If the row types of accounts and merge_accounts are identical and you want to adopt everything from merge_accounts into accounts, there is a shortcut comparing the whole row type:
UPDATE accounts a
SET (status, field1, field2)
= (m.status, m.field1, m.field2)
FROM merge_accounts m
WHERE a.uid = m.uid
AND m IS DISTINCT FROM a
RETURNING a.*;
This even works for NULL values. Details in the manual.
But it's not going to work for your home-grown solution where (quoting your comment):
merge_accounts is identical, save that all non-pk columns are array types
It requires compatible row types, i.e. each column shares the same data type or there is at least an implicit cast between the two types.
For your special case
UPDATE accounts a
SET (status, field1, field2)
= (COALESCE(m.status[1], a.status) -- default to original ..
, COALESCE(m.field1[1], a.field1) -- .. if m.column[1] IS NULL
, COALESCE(m.field2[1], a.field2))
FROM merge_accounts m
WHERE m.uid = a.uid
AND (m.status[1] IS NOT NULL AND a.status IS DISTINCT FROM m.status[1]
OR m.field1[1] IS NOT NULL AND a.field1 IS DISTINCT FROM m.field1[1]
OR m.field2[1] IS NOT NULL AND a.field2 IS DISTINCT FROM m.field2[1])
RETURNING a.*
m.status IS NOT NULL works if columns that shouldn't be updated are NULL in merge_accounts.
m.status <> '{}' if you operate with empty arrays.
m.status[1] IS NOT NULL covers both options.
Related:
Return pre-UPDATE column values using SQL only
if you aren't relying on side-effectts of the update, only update the records that need to change
UPDATE accounts
SET status = merge_accounts.status,
field1 = merge_accounts.field1,
field2 = merge_accounts.field2,
etc.
FROM merge_accounts WHERE merge_accounts.uid =accounts.uid
AND NOT (status IS NOT DISTINCT FROM merge_accounts.status
AND field1 IS NOT DISTINCT FROM merge_accounts.field1
AND field2 IS NOT DISTINCT FROM merge_accounts.field2
)
RETURNING accounts.*
I would recommend using the information_schema.columns table to introspect the columns dynamically, and then use those within a plpgsql function to dynamically generate the UPDATE statement.
i.e. this DDL:
create table foo
(
id serial,
val integer,
name text
);
insert into foo (val, name) VALUES (10, 'foo'), (20, 'bar'), (30, 'baz');
And this query:
select column_name
from information_schema.columns
where table_name = 'foo'
order by ordinal_position;
will yield the columns for the table in the order that they were defined in the table DDL.
Essentially you would use the above SELECT within the function to dynamically build up your UPDATE statement by iterating over the results of the above SELECT in a FOR LOOP to dynamically build up both the SET and WHERE clauses.
Some variation of this ?
SELECT * FROM old;
id | val
----+-----
1 | 1
2 | 2
4 | 5
5 | 1
6 | 2
SELECT * FROM new;
id | val
----+-----
1 | 2
2 | 2
3 | 2
5 | 1
6 | 1
SELECT * FROM old JOIN new ON old.id = new.id;
id | val | id | val
----+-----+----+-----
1 | 1 | 1 | 2
2 | 2 | 2 | 2
5 | 1 | 5 | 1
6 | 2 | 6 | 1
(4 rows)
WITH sel AS (
SELECT o.id , o.val FROM old o JOIN new n ON o.id=n.id ),
upd AS (
UPDATE old SET val = new.val FROM new WHERE new.id=old.id RETURNING old.* )
SELECT * from sel, upd WHERE sel.id = upd.id AND sel.val <> upd.val;
id | val | id | val
----+-----+----+-----
1 | 1 | 1 | 2
6 | 2 | 6 | 1
(2 rows)
Refer SO answer and read the entire discussion.
If you are updating a single table and want to know if the row is actually changed you can use this query:
with rows_affected as (
update mytable set (field1, field2, field3)=('value1', 'value2', 3) where id=1 returning *
)
select count(*)>0 as is_modified from rows_affected
join mytable on mytable.id=rows_affected.id
where rows_affected is distinct from mytable;
And you can wrap your existing queries into this one without the need to modify the actual update statements.

How to get sum of values per id and update existing records in other table

I have two tables like:
ID | TRAFFIC
fd56756 | 4398
645effa | 567899
894fac6 | 611900
894fac6 | 567899
and
USER | ID | TRAFFIC
andrew | fd56756 | 0
peter | 645effa | 0
john | 894fac6 | 0
I need to get SUM ("TRAFFIC") from first table AND set column traffic to the second table where first table ID = second table ID. ID's from first table are not unique, and can be duplicated.
How can I do this?
Table names from your later comment. Chances are, you are reporting table and column names incorrectly.
UPDATE users u
SET "TRAFFIC" = sub.sum_traffic
FROM (
SELECT "ID", sum("TRAFFIC") AS sum_traffic
FROM stats.traffic
GROUP BY 1
) sub
WHERE u."ID" = sub."ID";
Aside: It's unwise to use mixed-case identifiers in Postgres. Use legal, lower-case identifiers, which do not need to be double-quoted, to make your life easier. Start by reading the manual here.
Something like this?
UPDATE users t2 SET t2.traffic = t1.sum_traffic FROM
(SELECT sum(t1.traffic) t1.sum_traffic FROM stats.traffic t1)
WHERE t1.id = t2.id;

Adding string to the primary key?

I want to add some string with the primary key value while creating the table in sql?
Example:
my primary key column should automatically generate values like below:
'EMP101'
'EMP102'
'EMP103'
How to achieve it?
Try this: (For SQL Server 2012)
UPDATE MyTable
SET EMPID = CONCAT('EMP' , EMPID)
Or this: (For SQL Server < 2012)
UPDATE MyTable
SET EMPID = 'EMP' + EMPID
SQLFiddle for SQL Server 2008
SQLFiddle for SQL Server 2012
Since you want to set auto increment in VARCHAR type column you can try this table schema:
CREATE TABLE MyTable
(EMP INT NOT NULL IDENTITY(1000, 1)
,[EMPID] AS 'EMP' + CAST(EMP AS VARCHAR(10)) PERSISTED PRIMARY KEY
,EMPName VARCHAR(20))
;
INSERT INTO MyTable(EMPName) VALUES
('AA')
,('BB')
,('CC')
,('DD')
,('EE')
,('FF')
Output:
| EMP | EMPID | EMPNAME |
----------------------------
| 1000 | EMP1000 | AA |
| 1001 | EMP1001 | BB |
| 1002 | EMP1002 | CC |
| 1003 | EMP1003 | DD |
| 1004 | EMP1004 | EE |
| 1005 | EMP1005 | FF |
See this SQLFiddle
Here you can see EMPID is auto incremented column with Primary key.
Source: HOW TO SET IDENTITY KEY/AUTO INCREMENT ON VARCHAR COLUMN IN SQL SERVER (Thanks to #bvr)
What the rule of thumb is, is that never use meaningful information in primary keys (like Employee Number / Social Security number). Let that just be a plain autoincremented integer. However constant the data seems - it may change at one point (new legislation comes and all SSNs are recalculated).
it seems the only reason you are want to use a non-integer keys is that the key is generated as string concatenation with another column to make it unique.
From a best practice perspective, it is strongly recommended that integer primary keys are used, but often, this guidance is ignored.
May be going through the following posts might be of help:
Should I design a table with a primary key of varchar or int?
SQL primary key: integer vs varchar
You can achieve it at least in two ways:
Generate new id on the fly when you insert a new record
Create INSTEAD OF INSERT trigger that will do that for you
If you have a table schema like this
CREATE TABLE Table1
([emp_id] varchar(12) primary key, [name] varchar(64))
For the first scenario you can use a query
INSERT INTO Table1 (emp_id, name)
SELECT newid, 'Jhon'
FROM
(
SELECT 'EMP' + CONVERT(VARCHAR(9), COALESCE(REPLACE(MAX(emp_id), 'EMP', ''), 0) + 1) newid
FROM Table1 WITH (TABLOCKX, HOLDLOCK)
) q
Here is SQLFiddle demo
For the second scenario you can a trigger like this
CREATE TRIGGER tg_table1_insert ON Table1
INSTEAD OF INSERT AS
BEGIN
DECLARE #max INT
SET #max =
(SELECT COALESCE(REPLACE(MAX(emp_id), 'EMP', ''), 0)
FROM Table1 WITH (TABLOCKX, HOLDLOCK)
)
INSERT INTO Table1 (emp_id, name)
SELECT 'EMP' + CONVERT(VARCHAR(9), #max + ROW_NUMBER() OVER (ORDER BY (SELECT 1))), name
FROM INSERTED
END
Here is SQLFiddle demo
I am looking to do something similar but don't see an answer to my problem here.
I want a primary Key like "JonesB_01" as this is how we want our job number represented in our production system.
--ID | First_Name | Second_Name | Phone | Etc..
-- Bob Jones 9999-999-999
--ID = "Second_Name"+"F"irst Initial+"_(01-99)"
The number 01-99 has been included to allow for multiple instances of a customer with the same surname and first initial. In our industry it's not unusual for the same customer to have work done on multiple occasions but are not repeat business on an ongoing basis. I expect this convention to last a very long time. If we ever exceed it, then I can simply add a third interger.
I want this to auto populate to keep data entry as simple as possible.
I managed to get a solution to work using Excel formulars and a few helper cells but am new to SQL.
--CellA2 = JonesB_01 (=concatenate(D2+E2))
--CellB2 = "Bob"
--CellC2 = "Jones"
--CellD2 = "JonesB" (=if(B2="","",Concatenate(C2,Left(B2)))
--CellE2 = "_01" (=concatenate("_",Text(F2,"00"))
--CellF2 = "1" (=If(D2="","",Countif($D$2:$D2,D2))
Thanks.
SELECT 'EMP' || TO_CHAR(NVL(MAX(TO_NUMBER(SUBSTR(A.EMP_NO, 4,3))), '000')+1) AS NEW_EMP_NO
FROM
(SELECT 'EMP101' EMP_NO
FROM DUAL
UNION ALL
SELECT 'EMP102' EMP_NO
FROM DUAL
UNION ALL
SELECT 'EMP103' EMP_NO
FROM DUAL
) A