SQL exclude values that are in another data frame column

SQL exclude values that are in another data frame column - sql

Say I have two tables
First_table
id
occupation
efg
carpenter
hjk
teacher
moo
scientist
dss
engineer
Second_table
id
state
efg
PA
loi
DE
moo
NY
nbw
MD
Now I want to write a query that gets rid of the rows of the first table, if first_table.id is in second_table.id. So the output would be
id
occupation
hjk
teacher
dss
engineer
One way I could do this is by writing a where clause, and then put parameters into the where clause such as
where first_table.id != moo and first_table.id != efg
but that would require me to write some logic to figure out which data to exclude, and I would want all the logic to be in a query anyways.

This sounds like not exists:
select f.*
from first_database f
where not exists (select 1 from second_database s where s.id = f.id);

Related

Query rows and include rows with columns reversed

I'm trying to query a table. I want the results to include the FROM and TO columns, but then also include rows with these two values reversed. And then I want to eliminate all duplicates. (A duplicate is the same two cities in the same order.)
For example, given this data.
Trips
FROM TO
-------------------- --------------------
West Jordan Taylorsville
Salt Lake City Ogden
West Jordan Taylorsville
Sandy South Jordan
Taylorsville West Jordan
I would want the following results.
West Jordan Taylorsville
Taylorsville West Jordan
Salt Lake City Ogden
Ogden Salt Lake City
Sandy South Jordan
South Jordan Sandy
I want to do this using C# and Entity Framework, but I could use raw SQL if I need to.
Is it possible to do this in a query, or do I need to manually perform some of this logic?

Not sure if I'm following, but doesn't just a simple union work for your sample?
select from, to
from some_table
union
select to, from
from some_table

I do believe the first sub query should handle the first part of your question. the WHERE ID NOT IN will handle the second part of your question.
SELECT *
FROM
(
SELECT *
FROM Trips
WHERE ID IN (
SELECT ID
FROM Trips t1
INNER JOIN Trips AS t2
ON t2.To = t1.From AND t2.From = t1.To
)
)
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Trips
GROUP BY [From], [To]
)
I am assuming there is more to the table than just those fields. Usually you have a field (primary key) to uniquely identify the row. I am using ID for that field, replace with whatever your table is using.

SAS left join getting more records than what in first table

table a
x y z
123 london Data engineer
345 United states Software engineer
678 South africa Electrical engineer
Table b
X U
123 David
345 Mike
910 Mark
678 Steve
121 Kyle
Output
X Y Z U
123 London Data engineer David
345 United states Software engineer Mike
678 South Africa Electrical engineer Steve
when I use proc SQL left join as below I'm getting more than 3 records. Could you please help me
Proc SQL;
create table x as
select
a.*,
b.u
from
table a
left join table b
on a.x = b.x;
quit;

If a left join returns more records than are in the original table, then you have duplicate values of the join variable in the right-hand table or query.
The common ways to solve this are:
If the right hand table has multiple records per ID1, but they are guaranteed to have the same values for the variable(s) you are returning from that table, you can use distinct in a subquery to reduce them to a single record.
select * from a left join (select distinct x,y from b) b on a.x=b.x
If the right hand table is returning values where you'd like to summarize (say, use the sum or the mean) when you have multiple records, do so in a subquery, grouping by the ID variable(s).
select * from a left join (select x, sum(y) from b group by x) b on a.x=b.x
Otherwise, you will have to use some logic - either in the query, or before you join. This is where doing this in the SAS data step can sometimes be better; for example, if you want the most recent value, it's a bit easier to pre-process that in a data step (SQL can also do that, but it's a more complicated, and often slower, query).
1 I use the term ID to refer to whatever you are joining the table on.

Can I get duplicate results (from one table) in an INTERSECT operation between two tables?

I know the wording of the question is awkward, but I couldn't phrase it any better. Let me explain the situation.
There's table A which has a bunch of columns (a, b, c ... ) and I run a SELECT query on it like so:
SELECT a FROM A WHERE b IN ('....') (the ellipsis indicates a number of values to be matched to)
There's another table B which has a bunch of columns (d, e, f ... ) and I run a SELECT query on it like so:
SELECT d FROM B WHERE f = '...' (the ellipsis indicates a single value to be matched to)
Now I should say here that the two tables store different types of information about the same entity, but the columns a and d contain the exact same data (in this case, an ID). I want to find out the intersection of the two tables so I run this:
SELECT a FROM A WHERE b IN ('....') INTERSECT SELECT d FROM B WHERE f = '...'
Now here's the problem:
The first SELECT contains a set of values in the WHERE clause, right? So let's say the set is (1234, 2345,3456). Now, the result of this query when b is matched ONLY to 1234 is, let's say, abc. When it's matched to 2345, it's def, suppose. And matching to 3456, it gives abc.
Let's suppose these two results (abc and def) are also in the set of results from the second SELECT.
So, now, putting back the entire set of values to matched into the WHERE clause, the INTERSECT operation will give me abc and def. But I want abc twice since two values in the WHERE clause set match to the second SELECT.
Is there any way I can get that?
I hope it's not too complicated to understand my problem. This is a real-life problem I'm facing in my job.
Data structure and my code
Table A contains general information about a company:
company_id | branch_id | no_of_employees | city
Table B contains the financials of the company:
company_id | branch_id | revenue | profits
First SELECT:
SELECT branch_id FROM A WHERE CITY IN ('Dallas', 'Miami', 'New Orleans')
Now, running each city separately in the first SELECT, I get the branch_ids:
branch_id | city
23 | Dallas
45 | Miami
45 | New Orleans
Once again, this seems impractical as to how two cities can have the same branch ids, but please bear with me on this.
Second SELECT:
SELECT branch_id FROM B
WHERE REVENUE = 5000000
I know this is a little impractical, but for the purpose of this example, it suffices.
Running this query I get the following set:
11
23
45
22
10
So the INTERSECT will give me just 23 and 45. But I want 45 twice, since both Miami and New Orleans have that branch_id and that branch_id has generated a revenue of 5 million.

Directly from Microsoft's documentation (https://msdn.microsoft.com/en-us/library/ms188055.aspx)
:
"INTERSECT returns distinct rows that are output by both the left and right input queries operator."
So NO, it is not possible to get the same value twice when using INTERSECT because the results will be DISTINCT. However if you build an INNER JOIN correctly you can do essentially the same thing as INTERSECT except keep the repetitive results by NOT using distinct or group by.
SELECT
A.a
FROM
A
INNER JOIN B
ON A.a = B.d
AND B.F = '....'
WHERE b IN ('....')
And for your specific Example that you edited:
SELECT
branch_id
FROM
A
INNER JOIN B
ON A.branch_id = B.branch_id
AND B.REVENUE = 5000000
WHERE A.CITY IN ('Dallas', 'Miami', 'New Orleans')

You overcomplicated your task a lot:
SELECT *
FROM A
WHERE CITY IN (...)
AND EXISTS
(
SELECT 1 FROM B
WHERE B.REVENUE = 5000000
AND B.branch_id = A.branch_id
)
INTERSECT and EXCEPT are both returning row sets with DISTINCT applied.
Regular joining/filtering operations are not performed by INTERSECT or EXCEPT.

SQL Insert with value from different table

I have 2 tables storing information. For example:
Table 1 contains persons:
ID NAME CITY
1 BOB 1
2 JANE 1
3 FRED 2
The CITY is a id to a different table:
ID NAME
1 Amsterdam
2 London
The problem is that i want to insert data that i receive in the format:
ID NAME CITY
1 PETER Amsterdam
2 KEES London
3 FRED London
Given that the list of Cities is complete (i never receive a city that is not in my list) how can i insert the (new/received from outside)persons into the table with the right ID for the city?
Should i replace them before I try to insert them, or is there a performance friendly (i might have to insert thousands of lines at one) way to make the SQL do this for me?
The SQL server i'm using is Microsoft SQL Server 2012

First, load the data to be inserted into a table.
Then, you can just use a join:
insert into persons(id, name, city)
select st.id, st.name, c.d
from #StagingTable st left join
cities c
on st.city = c.name;
Note: The persons.id should probably be an identity column so it wouldn't be necessary to insert it.

insert into persons (ID,NAME,CITY) //you dont need to include ID if it is auto increment
values
(1,'BOB',(select Name from city where ID=1)) //another select query is getting Name from city table
if you want to add 1000 rows at a time that'd be great if you use stored procedure like this link

UPDATE query that fixes orphaned records

I have an Access database that has two tables that are related by PK/FK. Unfortunately, the database tables have allowed for duplicate/redundant records and has made the database a bit screwy. I am trying to figure out a SQL statement that will fix the problem.
To better explain the problem and goal, I have created example tables to use as reference:
alt text http://img38.imageshack.us/img38/9243/514201074110am.png
You'll notice there are two tables, a Student table and a TestScore table where StudentID is the PK/FK.
The Student table contains duplicate records for students John, Sally, Tommy, and Suzy. In other words the John's with StudentID's 1 and 5 are the same person, Sally 2 and 6 are the same person, and so on.
The TestScore table relates test scores with a student.
Ignoring how/why the Student table allowed duplicates, etc - The goal I'm trying to accomplish is to update the TestScore table so that it replaces the StudentID's that have been disabled with the corresponding enabled StudentID. So, all StudentID's = 1 (John) will be updated to 5; all StudentID's = 2 (Sally) will be updated to 6, and so on. Here's the resultant TestScore table that I'm shooting for (Notice there is no longer any reference to the disabled StudentID's 1-4):
alt text http://img163.imageshack.us/img163/1954/514201091121am.png
Can you think of a query (compatible with MS Access's JET Engine) that can accomplish this goal? Or, maybe, you can offer some tips/perspectives that will point me in the right direction.
Thanks.

The only way to do this is through a series of queries and temporary tables.
First, I would create the following Make Table query that you would use to create a mapping of the bad StudentID to correct StudentID.
Select S1.StudentId As NewStudentId, S2.StudentId As OldStudentId
Into zzStudentMap
From Student As S1
Inner Join Student As S2
On S2.Name = S1.Name
Where S1.Disabled = False
And S2.StudentId <> S1.StudentId
And S2.Disabled = True
Next, you would use that temporary table to update the TestScore table with the correct StudentID.
Update TestScore
Inner Join zzStudentMap
On zzStudentMap.OldStudentId = TestScore.StudentId
Set StudentId = zzStudentMap.NewStudentId

The most common technique to identify duplicates in a table is to group by the fields that represent duplicate records:
ID FIRST_NAME LAST_NAME
1 Brian Smith
3 George Smith
25 Brian Smith
In this case we want to remove one of the Brian Smith Records, or in your case, update the ID field so they both have the value of 25 or 1 (completely arbitrary which one to use).
SELECT min(id)
FROM example
GROUP BY first_name, last_name
Using min on ID will return:
ID FIRST_NAME LAST_NAME
1 Brian Smith
3 George Smith
If you use max you would get
ID FIRST_NAME LAST_NAME
25 Brian Smith
3 George Smith
I usually use this technique to delete the duplicates, not update them:
DELETE FROM example
WHERE ID NOT IN (SELECT MAX (ID)
FROM example
GROUP BY first_name, last_name)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas