Outer and union all on the same table? - sql

I have recently faced a query which has been written longtime ago for an Informix database.
This query seems a bit strange and nonsense to me.
I know This query returns all the rows from the city table with the rows that match in the ocw table. If no record for a city appears in the ocw table, the returned code column for that city has a NULL value.
I understand also that UNION removes duplicates, whereas UNION ALL does not.
Is my understanding about outer and union all correct?
Can anyone explain what they try to achieve with this query and is there a better way to do this?
SELECT * FROM city as c, OUTER ocw o
WHERE c.mutual = o.code
INTO temp city_ocw;
SELECT
name ,
year ,
mutual ,
0 animalId
FROM
city_ocw
WHERE
code IS NULL
GROUP BY
1, 2, 3 , 4
UNION ALL
SELECT
name ,
year ,
mutual ,
animalId
FROM
city_ocw
WHERE
NOT code IS NULL
GROUP BY
1, 2, 3 , 4
INTO TEMP city_ocw_final ;

#TheImpaler is right that grouping by 5 columns when your result set only has 4 columns doesn't make much sense, but I'll ignore that.
As I see it, your understanding of OUTER and UNION ALL is correct. The goal appears to be to generate a stacked result set with 2 versions of city joined to ocw, 1 with an actual animalId, and 1 with animalId = 0.
I'm not familiar with OUTER being used by itself (I always use it with LEFT/RIGHT/FULL), but would assume the default to be LEFT OUTER.
If no record for a city appears in the ocw table, the returned code column for that city has a NULL value.
That would be true, but the line WHERE c.mutual = o.code will make that unimportant. You could rewrite the join as LEFT JOIN ocw o ON c.mutual = o.code
The GROUP BY may have been done in the past for some aggregate column that no longer exists... perhaps that's column 5?
I think it could be redone as:
SELECT name,
year,
mutual,
0 as animalId
FROM city c
LEFT JOIN ocw o ON c.mutual = o.code
UNION --don't need the all since animalId ensures rows are different
SELECT name,
year,
mutual,
animalId
FROM city c
LEFT JOIN ocw o ON c.mutual = o.code

Related

Transpose only certain data in SQL

My data looks like this:
Company Year Total Comment
Comp A 01-01-2000 5,000 Checked
Comp A 01-01-2001 6,000 Checked
Comp B 05-05-2007 3,000 Not checked completely
Comp B 05-05-2008 4,000 Checked
Comp C 18-01-2003 1,500 Not checked completely
Comp C 18-01-2002 3,500 Not checked completely
I've been asked to transpose certain data, but I do not believe this can be done using SQL (Server) so that it looks like this:
Company Base Date Base Date-1 Comment Base Date Comment Base Date-1
Comp A 01-01-2001 01-01-2000 Checked Checked
Comp B 05-05-2008 05-05-2007 Checked Not completely checked
Comp C 18-01-2003 18-01-2002 Not completely checked Not completely checked
I have never built anything like this. If I would then maybe Excel is a better alternative? How should I tackle this?
Is it possible using SELECT MAX(Base Date) and MIN(Base Date)? And how would I then tackle the strings like that..
You can use a self join to do this. However, you should think about dates like February 29 as they only occur in leap years.
select t1.company,t1.year as basedate,t2.year as basedate_1,
t1.comment as comment_basedate,t2.comment as comment_basedate_1
from t t1
left join t t2 on t1.company=t2.company dateadd(year,1,t2.year)=t1.year
Change the left join to an inner join if you only need results where both the date values exist for a company. This solution assumes there can only be one comment per day.
I'd assign a row number to each record partitioned by company ordered by year desc though an analytical function in a common table expression... then use a left self join... on the row number + 1 and company.
This assumes you only want 1 record per company using the 2 most recent years. and if only 1 record exists for a company null values are acceptable for the second year. If not we can change the left join to an inner and eliminate both records...
We use a common table expression (though a inline view would work as well) to assign a row number to each record. That value is then made available in our self join so we don't have to worry about different dates and max values. We then use our RowNumber (RN) and company to join the 2 desired records together. To save on some performance we limit 1 table to RN 1 and the second table to RN 2.
WITH CTE AS (
SELECT *, Row_Number() over (Partition by Company Order by Year Desc) RN FROM TABLE)
SELECT A.Company
, A.Year as Base_Date
, B.Year as Base_Date1
, A.comment as Base_Date_Comment
, B.Comment as Base_Date1_Comment
FROM CTE A
LEFT JOIN CTE B
on A.RN+1 = B.RN
and A.Company = B.Company
and B.RN = 2
WHERE A.RN = 1
Note the limit on RN=2 must be on the join since it's an outer join or we would eliminate the companies without 2 years. (in essence making the left join an inner)
This approach makes all columns of the data available for each row.
If there are only two rows each, then that's pretty simple. If there are more than two rows, you could do something like this -- essentially joining all rows, then making sure A represents the earliest row and B represents the latest row.
SELECT A.Company, A.Year AS [Base Date], B.Year AS [Base Date 1],
A.Comment AS [Comment Base Date], B.Comment AS [Comment Base Date 1]
FROM MyTable A
INNER JOIN MyTable B ON A.Company = B.Company
WHERE A.Year = (SELECT MIN(C.YEAR) FROM MyTable C WHERE C.Company = A.Company)
AND B.Year = (SELECT MAX(C.YEAR) FROM MyTable C WHERE C.Company = B.Company)
There might be a more efficient way to do this with Row_Number or something.

Limit Query Result Using Count

I need to limit the results of my query so that it only pulls results where the total number of lines on the ID is less than 4, and am unsure how to do this without losing the select statement columns.
select fje.journalID, fjei.ItemID, fjei.acccount, fjei.debit, fjei.credit
from JournalEntry fje
inner join JournalEntryItem fjei on fjei.journalID = fje.journalID
inner join JournalEntryItem fjei2 on fjei.journalID = fjei2.journalID and
fjei.ItemID != fjei2.ItemID
order by fje.journalID
So if journalID 1 has 5 lines, it should be excluded, but if it has 4 lines, I should see it in my query. Just need a push in the right direction. Thanks!
A subquery with an alias has many names, but it's effectively a table. In your case, you would do something like this.
select your fields
from your tables
join (
select id, count(*) records
from wherever
group by id ) derivedTable on someTable.id = derivedTable.id
and records < 4

All parent table rows left jon child table directly

I have three tables: patron, patron_address, and patron_phone
for every patron/
Patron has 1-3 Patron_address rows
patron_address has 0-4
I want to display all the rows from the patron table and all the phone numbers of phone_type = '4'.
However, when I use the query below, I only get rows that have a phone{type of 4, not all the patron rows.
I tried to get Access 2007 query designer to do this, but something is off-kilter. Patron_address rows have an address_type. Only patron_address rows with address_type 1 have a child phone record.
So how do I get all the patron rows regardless of whether they have a patron_phone of phone_type 1?
SELECT
PATRON.patron_id, PATRON_PHONE.PHONE_TYPE,
PATRON_PHONE.PHONE_NUMBER, PATRON_ADDRESS.ADDRESS_TYPE
FROM
(PATRON
INNER JOIN
PATRON_ADDRESS ON PATRON.PATRON_ID = PATRON_ADDRESS.PATRON_ID)
LEFT JOIN
PATRON_PHONE ON PATRON_ADDRESS.ADDRESS_ID = PATRON_PHONE.ADDRESS_ID
WHERE
(((PATRON_PHONE.PHONE_TYPE) = '4'))
ORDER BY
PATRON.patron_id;
If I add the criterion that the address type must equal 1, I get absolutely nothing back, even though this combination exists in the database. Isn't the behavior I want the point of a left outer join? Thanks.
You have an INNER JOIN from the Patron table to Patron_Addr. This means that there must be matching records in both tables (by PATRON_ID) to return a value.
SELECT PATRON.patron_id, PATRON_PHONE.PHONE_TYPE, PATRON_PHONE.PHONE_NUMBER, PATRON_ADDRESS.ADDRESS_TYPE
FROM PATRON
INNER JOIN PATRON_ADDRESS ON PATRON.PATRON_ID = PATRON_ADDRESS.PATRON_ID
LEFT JOIN PATRON_PHONE ON PATRON_ADDRESS.ADDRESS_ID = PATRON_PHONE.ADDRESS_ID
WHERE PATRON_PHONE.PHONE_TYPE = 4
ORDER BY PATRON.patron_id
You have a LEFT JOIN from address to table so there doesn't need to be a matching phone number.
Since you are filtering for a Phone Type of 4, it will ONLY allow records that do have a phone record where the PHONE_TYPE = 4.
Is your Phone Type field a number or text? SQL Server will try to convert them back and forth but others may not and give an error or just not match - I don't remember how Access might handle this situation.
If you remove the PHONE TYPE criteria, your address criteria should work.
If you want to get all records of parton with address 1 but only phone numbers that are TYPE = 4, change the WHERE PHONE_TYPE=4 to part of the LEFT JOIN:
SELECT PATRON.patron_id, PATRON_PHONE.PHONE_TYPE, PATRON_PHONE.PHONE_NUMBER, PATRON_ADDRESS.ADDRESS_TYPE
FROM PATRON
INNER JOIN PATRON_ADDRESS ON PATRON.PATRON_ID = PATRON_ADDRESS.PATRON_ID
LEFT JOIN PATRON_PHONE ON PATRON_ADDRESS.ADDRESS_ID = PATRON_PHONE.ADDRESS_ID AND PATRON_PHONE.PHONE_TYPE = 4
WHERE PATRON_PHONE.ADDRESS_TYPE = 1
ORDER BY PATRON.patron_id
Access SQL:
SELECT PATRON.patron_id, P_PHONE.PHONE_TYPE, P_PHONE.PHONE_NUMBER, PATRON_ADDRESS.ADDRESS_TYPE
FROM PATRON
INNER JOIN PATRON_ADDRESS ON PATRON.PATRON_ID = PATRON_ADDRESS.PATRON_ID
LEFT JOIN [SELECT * FROM PATRON_PHONE WHERE PATRON_PHONE.PHONE_TYPE = 4 ]. AS P_PHONE ON PATRON_ADDRESS.ADDRESS_ID = P_PHONE.ADDRESS_ID
WHERE PATRON_PHONE.ADDRESS_TYPE = 1
ORDER BY PATRON.patron_id
Access has some goofy syntax for subqueries. You could create a separate query for the P_Phone subquery instead but the results would (should?) be the same.

Help converting subquery to query with joins

I'm stuck on a query with a join. The client's site is running mysql4, so a subquery isn't an option. My attempts to rewrite using a join aren't going too well.
I need to select all of the contractors listed in the contractors table who are not in the contractors2label table with a given label ID & county ID. Yet, they might be listed in
contractors2label with other label and county IDs.
Table: contractors
cID (primary, autonumber)
company (varchar)
...etc...
Table: contractors2label
cID
labelID
countyID
psID
This query with a subquery works:
SELECT company, contractors.cID
FROM contractors
WHERE contractors.complete = 1
AND contractors.archived = 0
AND contractors.cID NOT IN (
SELECT contractors2label.cID FROM contractors2label
WHERE labelID <> 1 AND countyID <> 1
)
I thought this query with a join would be the equivalent, but it returns no results. A manual scan of the data shows I should get 34 rows, which is what the subquery above returns.
SELECT company, contractors.cID
FROM contractors
LEFT OUTER JOIN contractors2label ON contractors.cID = contractors2label.cID
WHERE contractors.complete = 1
AND contractors.archived = 0
AND contractors2label.labelID <> 1
AND contractors2label.countyID <> 1
AND contractors2label.cID IS NULL
When doing a LEFT JOIN, you need to put all conditions of the JOIN into the ON clause.
In your example you get NULL for left joined columns that do not exist, but you then compare them to values again (<> 1) which does not work.
SELECT c.company, c.cID
FROM contractors c
LEFT JOIN contractors2label c2
ON ( c2.cID = c.cID AND c2.labelID <> 1 AND c2.countyID <> 1 )
WHERE c.complete = 1
AND c.archived = 0
AND c2.cID IS NULL
BTW: Using aliases (like c in my example) makes reading and writing your queries easier.
When you restrict on a where clause using the columns in a table that's LEFT joined, you are effectively removing the LEFT OUTER part of the join, because you're filtering on columns that have to be there. Try this instead:
SELECT company, contractors.cID
FROM contractors
LEFT OUTER JOIN contractors2label
ON (contractors.cID = contractors2label.cID
AND contractors2label.labelID <> 1
AND contractors2label.countyID <> 1)
WHERE contractors.complete = 1
AND contractors.archived = 0
AND contractors2label.cID IS NULL
This does the restriction as part of the join, so nulls can still be used in the larger query.

A messy SQL statement

I have a case where I wanna choose any database entry that have an invalid Country, Region, or Area ID, by invalid, I mean an ID for a country or region or area that no longer exists in my tables, I have four tables: Properties, Countries, Regions, Areas.
I was thinking to do it like this:
SELECT * FROM Properties WHERE
Country_ID NOT IN
(
SELECT CountryID FROM Countries
)
OR
RegionID NOT IN
(
SELECT RegionID FROM Regions
)
OR
AreaID NOT IN
(
SELECT AreaID FROM Areas
)
Now, is my query right? and what do you suggest that i can do and achieve the same result with better performance?!
Your query in fact is optimal.
LEFT JOIN's proposed by others are worse, as they select ALL values and then filter them out.
Most probably your subquery will be optimized to this:
SELECT *
FROM Properties p
WHERE NOT EXISTS
(
SELECT 1
FROM Countries i
WHERE i.CountryID = p.CountryID
)
OR
NOT EXISTS
(
SELECT 1
FROM Regions i
WHERE i.RegionID = p.RegionID
)
OR
NOT EXISTS
(
SELECT 1
FROM Areas i
WHERE i.AreaID = p.AreaID
)
, which you should use.
This query selects at most 1 row from each table, and jumps to the next iteration right as it finds this row (i. e. if it does not find a Country for a given Property, it will not even bother checking for a Region).
Again, SQL Server is smart enough to build the same plan for this query and your original one.
Update:
Tested on 512K rows in each table.
All corresponding ID's in dimension tables are CLUSTERED PRIMARY KEY's, all measure fields in Properties are indexed.
For each row in Property, PropertyID = CountryID = RegionID = AreaID, no actual missing rows (worst case in terms of execution time).
NOT EXISTS 00:11 (11 seconds)
LEFT JOIN 01:08 (68 seconds)
You could rewrite it differently as follows:
SELECT p.*
FROM Properties p
LEFT JOIN Countries c ON p.Country_ID = c.CountryID
LEFT JOIN Regions r on p.RegionID = r.RegionID
LEFT JOIN Areas a on p.AreaID = a.AreaID
WHERE c.CountryID IS NULL
OR r.RegionID IS NULL
OR a.AreaID IS NULL
Test the performance difference (if there is any - there should be as NOT IN is a nasty search, especially over a lot of items as it HAS to test every single one).
You can also make this faster by indexing the IDS being searched - in each master table (Country, Region, Area) they should be clustered primary keys.
Since this seems to be cleanup sql, this should be ok. But how about using foreign keys so that it does not bother you next time around?
Well, you could try things like UNION (instead of OR) - but I expect that the optimizer is already doing the best it can given the information available:
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
Subqueries in the conditions can be quite inefficient. Instead you can do left joins against the related tables. Where there are no matching record you get a null value. You can use this in the condition to select only the records where there is a matching record missing:
select p.*
from Properties p
left join Countries c on c.CountryID = p.Country_ID
left join Regions r on r.RegionID = p.RegionID
left join Areas a on a.AreaID = p.AreaID
where c.CountryID is null or r.RegionID is null or a.AreaID is null
If you're not grabbing the row data from countries/regions/areas you can try using "exists":
SELECT Properties.*
FROM Properties
WHERE Properties.CountryID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
OR Properties.RegionID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
OR Properties.AreaID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
This will typically hint to use the pkey indices of countries et al for the existence check... but whether that is an improvement depends on your data stats, you simply have to plug it into query analyzer and try it.