when to use intersect in a query - sql

i am a bit comfused about when do i have to use intersect in sql.The example that i am given is the following:
I have two tables:
MovieStar(name, address, gender, birthdate)
MovieExec(name, address, cert#, netWorth)
The example asks to find the name and address of all female actors who also are a movie executor and have networth over 10000000.The solution of the example in the book is the following:
(SELECT name, address
FROM MovieStar
WHERE gender = 'F')
INTERSECT
(SELECT name, address
FROM MovieExec
WHERE netWorth > 10000000);
So my problem is why i have to use INTERSECT while i could use the "AND" operator like:
SELECT name, address
FROM MovieStar, MovieExec
WHERE gender = 'F' AND netWorth > 10000000
Is there any tricky way to figure out when is better to use INTERSECT or "AND"?

Use INTERSECT when it suits you and you get correct results. Second always compare execution plan and statistics, because the way you get result may vary.
SqlFiddleDemo
1)
SELECT name, address
FROM MovieStar
WHERE gender = 'F'
INTERSECT
SELECT name, address
FROM MovieExec
WHERE netWorth > 10000000;
It means take name and addresses from MovieStar where gender is 'F',
take name and address from MovieExec where networth > 100000 and find records which are in both sets.
2)
SELECT ms.name, ms.address
FROM MovieStar AS ms, MovieExec AS me
WHERE gender = 'F' AND netWorth > 10000000
It means that you generate CROSS JOIN Cartesian Product(MxN records) and then take only records where gender = 'F' AND netWorth > 10000000
I guess that the first approach will returns result faster and use less memory(but Query Optimizer can do a lot).
When you should use INTERSECT:
you want to get intersection of both sets and you cannot JOIN them explicitly

Related

How to find every combination of features shared across multiple rows?

I am pretty new to using SQL (using StandardSQL via Big Query currently) and unfortunately my Google-fu could not find me a solution to this issue.
I'm working with a dataset where each row is a different person and each column is an attribute (name, age, gender, weight, ethnicity, height, bmi, education level, GPA, etc.). I am tying to 'cluster' these people into all of the feature combinations that match 5 or more people.
Originally I did this manually with 3 feature columns where I would essentially concatenate a 'cluster name' column and then have 7 select queries for each grouping with a >5 where clause, which I then UNIONed together:
gender
age
ethnicity
gender + age
gender + ethnicity
age + ethnicity
gender + age + ethnicity
^ unfortunately doing it this way just balloons the number of combinations and with my anticipated ~15 total features doing it this way seems really unfeasible. I'd also like to do this through a less manual approach so that if a new feature is added in the future it does not require major edits to include it in my cluster identification.
Is there a function or existing process that could accomplish something like this? I'd ideally like to be able to identify ALL combinations that meet my combination user count minimum (so it's expected the same rows would match multiple different clusters here. Any advice or help here would be appreciated! Thanks.
If only BQ supported grouping sets or cube, this would be simple. One method that is pretty generalizable enumerates the 7 groups and then uses bits to figure out what to aggregate:
select (case when n & 1 > 0 then gender end) as gender,
(case when n & 2 > 0 then age end) as age,
(case when n & 4 > 0 then ethnicity end) as ethnicity,
count(*)
from t cross join
unnest(generate_array(1, 7)) n
group by n, 1, 2, 3;
Another method which is trickier is to reconstruct the groups using rollup(). Something like this:
select gender, age, ethnicity, count(*)
from t
group by rollup(gender, age, ethnicity);
Produces three of the groups you want. So:
select gender, age, ethnicity, count(*)
from t
group by rollup(gender, age, ethnicity)
union all
select gender, null, ethnicity, count(*)
from t
group by gender, ethnicity
union all
select null, age, ethnicity, count(*)
from t
group by rollup (ethnicity, age);
The above reconstructs all your groups using rollup().

Access SQL Group by with condition

I'm using MS Access for the following task (due to office restrictions). I'm quite new to SQL.
I have the following table:
I want to select all stores grouped by street, zip and place. But i only want to group them, if the SquareSum (after Group by) is < 1000. Rue de gare 2 should be grouped, while Bahnhofstrasse 23 should be seperate lines.
So far as i know MS Access doesn't allow a case statement. So my query looks like this:
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare,
FROM Table1
SWITCH (SumSquare > 1000, GROUP BY (Street, ZIP, Place))
I also tried:
GROUP BY
SWITCH (SumSquare > 1000, (Street, ZIP, Place))
But it keeps telling me i have a syntax error. Could someone please help me?
In Access, I would do this with several queries.
This would be easier to do if you had an id on the rows (such as an autonumber).
First query identifies the streets that should be summed.
query: SumTheseStreets
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare
FROM Table1
GROUP BY Street, ZIP, Place
HAVING sum(Square) < 1000
Note the HAVING which is a bit like a WHERE clause that's applied outside of the GROUP BY or SUM
Second query identifies the other rows (notes on this one below):
query: StreetsNotSummed
SELECT
Street,
ZIP,
Place,
Square AS SumSquare
FROM Table1
LEFT JOIN SumTheseStreets ON Table1.Street = SumTheseStreets.Street AND Table1.ZIP = SUmTheseStreets.ZIP AND Table1.Place = SumTheseStreets.Place
WHERE SumTheseStreets.Street IS NULL;
A couple of notes:
I've called the field SumSquare because I want it to be the same name as the SumSquare field in the first query
It uses the first query as one of the input "tables"
This uses a LEFT JOIN which means "give me all of the rows in the first table (table1) and if any rows in the second table (SumTheseStreets) match, put those in as well.
but then it filters out the rows that DO match.
So this query only lists the streets that you want NOT summed.
So now you need a third query.
This simply includes all of the rows in both of those queries.
I'm not too sure on the Access syntax on this one, but there's a union query wizard if this isn't right.
Query: TheAnswerRequired
SELECT
Street,
ZIP,
Place,
SumSquare
FROM SumTheseStreets
UNION
SELECT
Street,
ZIP,
Place,
SumSquare
FROM StreetsNotSummed
(it might need to be UNION ALL)
Good luck.
You can use UNION ALL:
SELECT ts.*
FROM (SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
WHERE ts.SumSquare < 1000
UNION ALL
SELECT t1.*
FROM Table1 as t1 INNER JOIN
(SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
ON t1.Street = ts.Street AND t1.Zip = ts.Zip and t1.Place = ts.Place
WHERE ts.SumSquare >= 1000

SQL Assignment about joining tables

I am working on a SQL assignment in Oracle. There are two tables.
table1 is called Person10:
fields include: ID, Fname, Lname, State, DOH, JobTitle, Salary, Cat.
table2 is called StateInfo:
fields include: State, Statename, Capital, Nickname, Pop2010, pop2000, pop1990, sqmiles.
Question:
Create a view named A10T2 that will display the StateName, Capital and Nickname of the states that have at least 25 people in the Person10 table with a Cat value of N and an annual salary between $75,000 and $125,000. The three column headings should be StateName, Capital and Nickname. The rows should be sorted by the name of the state.
What I have :
CREATE VIEW A10T2 AS
SELECT StateName, Capital, Nickname
FROM STATEINFO INNER JOIN PERSON10 ON
STATEINFO.STATE = PERSON10.STATE
WHERE Person10.CAT = 'N' AND
Person10.Salary in BETWEEN (75000 AND 125000) AND
count(Person10.CAT) >= 25
ORDER BY STATE;
It gives me an error saying missing expression. I may need a group expression... but i dont know what I am doing wrong.
Yeah I originally messed this up when I first answered this because it was on the fly and I didn't have a chance to test what I was putting down. I forgot using a GROUP BY is more suited for aggregate functions (Like SUM, AVG and COUNT in the select) and that's probably why it's throwing the error. Using a ORDER BY is probably the correct option in this case. And you want to order your results by the state so you would use StateName.
SELECT S.StateName, S.Capital, S.Nickname
FROM STATEINFO S
INNER JOIN PERSON10 P ON S.STATE = P.STATE
WHERE P.CAT = 'N'
AND P.Salary BETWEEN 75000 AND 125000
ORDER BY S.StateName
HAVING count(P.CAT) >= 25;
Try moving your count() to HAVING instead of WHERE. You'll also need a GROUP BY clause containing StateName, Capital, and Nickname.
I know this link is Microsoft, not Oracle, but it should be helpful.
https://msdn.microsoft.com/en-us/library/ms180199.aspx?f=255&MSPPError=-2147217396
I'm no Oracle expert, but I'm pretty sure
Person10.Salary in BETWEEN (75000 AND 125000)
should be
Person10.Salary BETWEEN 75000 AND 125000
(no IN and no parentheses). That's how all other SQL dialects I know of work.
Also, move the COUNT() from the WHERE clause to a HAVING clause:
CREATE VIEW A10T2 AS
SELECT StateName, Capital, Nickname
FROM STATEINFO INNER JOIN PERSON10 ON
STATEINFO.STATE = PERSON10.STATE
WHERE Person10.CAT = 'N' AND
Person10.Salary BETWEEN 75000 AND 125000
ORDER BY STATE
HAVING count(Person10.CAT) >= 25;
You can try using a Sub Query like this.
CREATE VIEW A10T2 AS
SELECT statename, capital, nickname
FROM stateinfo
WHERE statename IN (SELECT statename
FROM person10
WHERE Cat = 'N'
AND Salary BETWEEN 75000 AND 125000
GROUP BY statename
HAVING COUNT(*) >= 25)
ORDER BY statename

DB2 SQL Join and Max value

The database I'm accessing has two tables I need to query using DB2 SQL, shown here as nametable and addresstable. The query is for finding all of the people with a certain balance due. The addresses are stored in a separate table to keep track of address changes. In addresstable, the latest address is determined by a sequence number (ADDRSEQUENCE). The AddressID field is present in both tables, and is what ties each person to specific addresses. The highest sequence number is the current address. I need that current address for each person and only that one. I know I'm going to have to use MAX somewhere for the sequence number, but I can't figure out how to position it given the join. Here's my current query, which of course returns all addresses...
SELECT NAMETABLE.ACCTNUM AS ACCOUNTNUMBER,
NAMETABLE.NMELASTBUS AS LASTNAME,
NAMETABLE.NAME_FIRST AS FIRSTNAME,
NAMETABLE.BALDUE AS BALANCEDUE,
ADDRESSTABLE.STREETNAME AS ADDR,
ADDRESSTABLE.ADDRLINE2 AS
ADDRLINE2,ADDRESSTABLE.CITYPARISH AS CITY,
ADDRESSTABLE.ADDRSTATE AS STATE,
ADDRESSTABLE.ZIPCODE AS ZIP,
ADDRESSTABLE.ADDIDSEQNO AS ADDRSEQUENCE
FROM NAMETABLE JOIN ADDRESSTABLE ON NAMETABLE.ADDRESSID = ADDRESSTABLE.ADDRESSID
WHERE NAMETABLE.BALANCEDUE >= '50.00'
You can do a sub-select on the MAX(ADDRSEQUENCE) like so:
SELECT
N.ACCTNUM AS ACCOUNTNUMBER
,N.NMELASTBUS AS LASTNAME
,N.NAME_FIRST AS FIRSTNAME
,N.BALDUE AS BALANCEDUE
,A.STREETNAME AS ADDR,
,A.ADDRLINE2 AS
,A.ADDRLINE2
,A.CITYPARISH AS CITY,
,A.ADDRSTATE AS STATE,
,A.ZIPCODE AS ZIP,
FROM NAMETABLE AS N
JOIN ADDRESSTABLE AS A
ON N.ADDRESSID = A.ADDRESSID
WHERE N.BALANCEDUE >= '50.00'
AND A.ADDRSEQUENCE = (
SELECT MAX(ADDRSEQUENCE)
FROM ADDRESSTABLE AS A2
WHERE A.ADDRESSID = A2.ADDRESSID
)
This is pretty quick in DB2.
You can use a row_number and partition by to do this. Something like this:
with orderedaddress as (
select row_number() over (partition by ADDRESSID order by ADDRSEQUENCE desc) as rown,
STREETNAME,ADDRESSID, ... from ADDRESSTABLE
)
select NAMETABLE.ACCTNUM AS ACCOUNTNUMBER,
...
oa.STREETNAME
...
from NAMETABLE JOIN orderedaddress oa on NAMETABLE.ADDRESSID = oa.ADDRESSID
where oa.rown = 1
and NAMETABLE.BALANCEDUE >= '50.00'

Select exactly one row for each employee using unordered field as criteria

I have a data set that looks like the following.
EMPLID PHONE_TYPE PHONE
------ ---------- --------
100 HOME 111-1111
100 WORK 222-2222
101 HOME 333-3333
102 WORK 444-4444
103 OTHER 555-5555
I want to select exactly one row for each employee using the PHONE_TYPE field to establish preferences. I want the HOME phone number if the employee has one as is the case for employee 100 and 101. If the HOME number is not present, I want the WORK number (employee 102), and as a last resort I'll take the OTHER number as with employee 103. In reality my table has about a dozen values for the PHONE_TYPE field, so I need to be able to extend any solution to include more than just the three values I've shown in the example. Any thoughts? Thanks.
You need to add a phone_types table (Phone_Type TEXT(Whatever), Priority INTEGER). In this table, list each Phone_Type value once and assign a priority to it (in your example, HOME would be 1, WORK 2, OTHER 3 and so on).
Then, create a view that joins the Priority column from Phone_Types to your Phone_Numbers table (imagine we call it Phone_Numbers_Ex).
Now, you have several options for how to get record from Phone_Numbers_Ex with the MIN(Priority) for a given emplID, of which probably the clearest is:
SELECT * FROM Phone_Numbers_Ex P1 WHERE NOT EXISTS
(SELECT * FROM Phone_Numbers_Ex P2 WHERE P2.EmplID = P1.EmplID AND P2.Priority < P1.Priority)
Another way is to declare another view, or inner query, along the lines of SELECT EmplID, MIN(Priority) AS Priority FROM Phone_Numbers_Ex GROUP BY EmplID and then joining this back Phone_Numbers_Ex on both EmplID and Priority.
I forget, does Server 2000 support Coalesce? If it does, I think this will work:
Select Distinct EmplID, Coalesce(
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'HOME'),
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'WORK'),
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'OTHER')
) as Phone
From Employees e1
Your requirements may not be complete if an employee is allowed to have more than one phone number for a given phone type. I've added a phone_number_id just to make things unique and assumed that you would want the lowest id if the person has two phones of the same type. That's pretty arbitrary, but you can replace it with your own business logic.
I've also assumed some kind of a Phone_Types table that includes your priority for which phone number should be used. If you don't already have this table, you should probably add it. If nothing else, it lets you constrain the phone types with a foreign key.
SELECT
PN1.employee_id,
PN1.phone_type,
PN1.phone_number
FROM
Phone_Numbers PN1
INNER JOIN Phone_Types PT1 ON
PT1.phone_type = PN1.phone_type
WHERE
NOT EXISTS
(
SELECT *
FROM
Phone_Numbers PN2
INNER JOIN Phone_Types PT2 ON
PT2.phone_type = PN2.phone_type AND
(
(PT2.priority < PT1.priority)
--OR (PT2.priority = PT1.priority AND PN2.phone_number_id > PN1.phone_number_id)
)
)
You could also implement this with a LEFT JOIN instead of the NOT EXISTS or you could use TOP if you were looking for the phone number for a single employee. Just do a TOP 1 ORDER BY priority, phone_number_id.
Finally, if you were to move up to SQL 2005 or SQL 2008, you could use a CTE with ROWNUMBER() OVER (ORDER BY priority, phone_number, PARTITION BY employee_id) <- I think my syntax may be slightly off with the parentheses on that, but hopefully it's clear enough. That would allow you to get the top one for all employees by checking that ROWNUMBER() = 1.
As an alternative g.d.d.c's answer that uses queries in the Select clause you could use left joins. You might get better perf, but you should test of course.
SELECT
e1.iD,
Coalesce(phoneHome.Phone,phoneWork.Phone,phoneOther) phone
FROm
employees e1
LEFT JOIN phone phoneHome
ON e1.emplId = phoneHome
and phone_type = 'HOME'
LEFT JOIN phone phoneWork
ON e1.emplId = phoneWork
and phone_type = 'WORK'
LEFT JOIN phone phoneWork
ON e1.emplId = phoneOTHER
and phone_type = 'OTHER'