pgSQL FULL OUTER JOIN 'WHERE' Condition - sql

I am trying to create a single query to retrieve the current price and special sale price if a sale is running; When there isn't a sale on I want store_picture_monthly_price_special.price AS special_price to return as null.
Before adding the 2nd WHERE condition the query executes as I expect it to: store_picture_monthly_price_special.price returns null since there is no sale running at present.
store_picture_monthly_reference | tenure | name | regular_price | special_price
3 | 12 | 12 Months | 299.99 | {Null}
2 | 3 | 3 Months | 79.99 | {Null}
1 | 1 | 1 Month | 29.99 | {Null}
pgSQL is treating the 2nd WHERE condition as "all or none". If there is no sale running there are no results.
Is it possible to tweak this query so I get regular pricing each and every time and special sale price either as a dollar value when a special is running or returning null Is what I am trying to do be accomplished require sub-query?
This is the query how I presently have it:
SELECT store_picture_monthly.reference AS store_picture_monthly_reference , store_picture_monthly.tenure , store_picture_monthly.name , store_picture_monthly_price_regular.price AS regular_price , store_picture_monthly_price_special.price AS special_price
FROM ( store_picture_monthly INNER JOIN store_picture_monthly_price_regular ON store_picture_monthly_price_regular.store_picture_monthly_reference = store_picture_monthly.reference )
FULL OUTER JOIN store_picture_monthly_price_special ON store_picture_monthly.reference = store_picture_monthly_price_special.store_picture_monthly_reference
WHERE
( store_picture_monthly_price_regular.effective_date < NOW() )
AND
( NOW() BETWEEN store_picture_monthly_price_special.begin_date AND store_picture_monthly_price_special.end_date )
GROUP BY store_picture_monthly.reference , store_picture_monthly_price_regular.price , store_picture_monthly_price_regular.effective_date , store_picture_monthly_price_special.price
ORDER BY store_picture_monthly_price_regular.effective_date DESC
Table "store_picture_monthly"
reference bigint,
name text,
description text,
tenure bigint,
available_date timestamp with time zone,
available_membership_reference bigint
Table store_picture_monthly_price_regular
reference bigint ,
store_picture_monthly_reference bigint,
effective_date timestamp with time zone,
price numeric(10,2),
membership_reference bigint
Table store_picture_monthly_price_special
reference bigint,
store_picture_monthly_reference bigint,
begin_date timestamp with time zone,
end_date timestamp with time zone,
price numeric(10,2),
created_date timestamp with time zone DEFAULT now(),
membership_reference bigint

The description of the problem suggests that you want a LEFT JOIN, not a FULL JOIN. FULL JOINs are quite rare, particularly in databases with well defined foreign key relationships.
In your case, the WHERE clause is turning your FULL JOIN into a LEFT JOIN anyway, because the WHERE clause requires valid values from the first table.
SELECT spm.reference AS store_picture_monthly_reference,
spm.tenure, spm.name,
spmpr.price AS regular_price,
spmps.price AS special_price
FROM store_picture_monthly spm INNER JOIN
store_picture_monthly_price_regularspmpr
ON spmpr.store_picture_monthly_reference = spm.reference LEFT JOIN
store_picture_monthly_price_special spmps
ON spm.reference = spmps.store_picture_monthly_reference AND
NOW() BETWEEN spmps.begin_date AND spmps.end_date
WHERE spmpr.effective_date < NOW();
Notes:
I introduced table aliases so the query is easier to write and to read.
The condition on the dates for the sale are now in the ON clause.
I removed the GROUP BY. It doesn't seem necessary. If it is, you can use SELECT DISTINCT instead. And, I would investigate data problems if this is needed.
I am suspicious about the date comparisons. NOW() has a time component. The naming of the comparison columns suggests that the are just dates with no time.

Any time that you put a where predicate on a table that is outer joined it converts the outer join to an inner join because the nulls introduced by the outer join can never be compared to anything to produce a true (so the outer join puts a load of rows-with-nulls in where rows don't match, and then the WHERE takes the entire row out again)
Consider this simpler example:
SELECT * FROM
a LEFT JOIN b ON a.id = b.id
WHERE b.col = 'value'
Is identical to:
SELECT * FROM
a INNER JOIN b ON a.id = b.id
WHERE b.col = 'value'
To resolve this, move the predicate out of the where and into the ON
SELECT * FROM
a LEFT JOIN b ON a.id = b.id AND b.col = 'value'
You can also consider:
SELECT * FROM
a LEFT JOIN b ON a.id = b.id
WHERE b.col = 'value' OR b.col IS NULL
but this might pick up data you don't want, if b.col naturally contains some nulls; it cannot differentiate between nulls that are natively present in b.col and nulls that are introduced by a fail in the join to match a row from b with a row from a (unless we also look at the nullness of the joined id column)
A
id
1
2
3
B
id, col
1, value
3, null
--wrong, behaves like inner
A left join B ON a.id=b.id WHERE b.col = 'value'
1, 1, value
--maybe wrong, b.id 3 might be unwanted
A left join B ON a.id=b.id WHERE b.col = 'value' or b.col is null
1, 1, value
2, null, null
3, 3, null
--maybe right, simpler to maintain than the above
A left join B ON a.id=b.id AND b.col = 'value'
1, 1, value
2, null, null
3, null, null
In these last two the difference is b.id is null or not, though the row count is the same. If we were counting b.id our count could end up wrong. It's important to appreciate this nuance of join behavior. You might even want it, if you were looking to exclude row 3 but include row 2, by crafting a where clause of a LEFT JOIN b ON a.id=b.id WHERE b.col = 'value' OR b.id IS NULL - this will keep row 2 but exclude row 3 because even though the join succeeds to find a b.id of 3 it is not kept by either predicate

Related

Outer and union all on the same table?

I have recently faced a query which has been written longtime ago for an Informix database.
This query seems a bit strange and nonsense to me.
I know This query returns all the rows from the city table with the rows that match in the ocw table. If no record for a city appears in the ocw table, the returned code column for that city has a NULL value.
I understand also that UNION removes duplicates, whereas UNION ALL does not.
Is my understanding about outer and union all correct?
Can anyone explain what they try to achieve with this query and is there a better way to do this?
SELECT * FROM city as c, OUTER ocw o
WHERE c.mutual = o.code
INTO temp city_ocw;
SELECT
name ,
year ,
mutual ,
0 animalId
FROM
city_ocw
WHERE
code IS NULL
GROUP BY
1, 2, 3 , 4
UNION ALL
SELECT
name ,
year ,
mutual ,
animalId
FROM
city_ocw
WHERE
NOT code IS NULL
GROUP BY
1, 2, 3 , 4
INTO TEMP city_ocw_final ;
#TheImpaler is right that grouping by 5 columns when your result set only has 4 columns doesn't make much sense, but I'll ignore that.
As I see it, your understanding of OUTER and UNION ALL is correct. The goal appears to be to generate a stacked result set with 2 versions of city joined to ocw, 1 with an actual animalId, and 1 with animalId = 0.
I'm not familiar with OUTER being used by itself (I always use it with LEFT/RIGHT/FULL), but would assume the default to be LEFT OUTER.
If no record for a city appears in the ocw table, the returned code column for that city has a NULL value.
That would be true, but the line WHERE c.mutual = o.code will make that unimportant. You could rewrite the join as LEFT JOIN ocw o ON c.mutual = o.code
The GROUP BY may have been done in the past for some aggregate column that no longer exists... perhaps that's column 5?
I think it could be redone as:
SELECT name,
year,
mutual,
0 as animalId
FROM city c
LEFT JOIN ocw o ON c.mutual = o.code
UNION --don't need the all since animalId ensures rows are different
SELECT name,
year,
mutual,
animalId
FROM city c
LEFT JOIN ocw o ON c.mutual = o.code

Left outer join not fetching all records of left table

I have scenario here, have 2 tables say A and B.
A table has emp_id and date and B table has 2 dates ppl_d, expr_d and emp_id
When did the left join in hive like,
select A.emp_id
from A
LEFT JOIN B
ON a.emp_id=b.emp_id
where A.date between B.appl_d and B.expr_d
I see there is one employee in Table A and not in B, and when I do LEFT JOIN the particular emp_id has to come but it is not coming because in the where condition there is NULL for both appl_id and expr id...
How can I handle NULL's so that the particular emp_id should come into my result. I tried coalesce function also, but no luck... tried putting default value but still no luck...
Let me know for any details. Thanks in advance... and these dates are in string format..
The between condition does not allow nulls add left join is transformed to inner. Add OR b.emp_id is NULL (join key) this will allow not joined records, no need to add the same conditions for all columns used in the between.
select *
from A
LEFT JOIN B ON a.emp_id=b.emp_id
LEFT JOIN C on a.emp_id=c.emp_id
where ((A.date between B.appl_d and B.expr_d) OR b.emp_id is NULL)
and
((a.date between c.del_d and c.fin_d) OR c.emp_id is NULL)
And this is a test:
with
A as
(
select stack(3,100,'2019-01-13',
200,'2019-01-13',
300,'2019-01-13'
) as (emp_id, date)
),
B as (
select stack(1,100,'2019-12-30','3000-01-01') as (emp_id, appl_d, expr_d)
),
C as
(
select stack(1,100,'2015-06-07', '9999-12-31') as (emp_id, del_d, fin_d)
)
select A.*
from A
LEFT JOIN B ON a.emp_id=b.emp_id
LEFT JOIN C on a.emp_id=c.emp_id
where ((A.date between B.appl_d and B.expr_d) OR b.appl_d is NULL)
and
((a.date between c.del_d and c.fin_d) OR c.emp_id is NULL)
Result:
OK
200 2019-01-13
300 2019-01-13
Time taken: 84.475 seconds, Fetched: 2 row(s)
Obviously this approach does not work. emp_id=100 should be in the dataset returned.
And the question is interesting, I will continue investigating a bit later. You guys can use my test to find the working solution.

Transpose only certain data in SQL

My data looks like this:
Company Year Total Comment
Comp A 01-01-2000 5,000 Checked
Comp A 01-01-2001 6,000 Checked
Comp B 05-05-2007 3,000 Not checked completely
Comp B 05-05-2008 4,000 Checked
Comp C 18-01-2003 1,500 Not checked completely
Comp C 18-01-2002 3,500 Not checked completely
I've been asked to transpose certain data, but I do not believe this can be done using SQL (Server) so that it looks like this:
Company Base Date Base Date-1 Comment Base Date Comment Base Date-1
Comp A 01-01-2001 01-01-2000 Checked Checked
Comp B 05-05-2008 05-05-2007 Checked Not completely checked
Comp C 18-01-2003 18-01-2002 Not completely checked Not completely checked
I have never built anything like this. If I would then maybe Excel is a better alternative? How should I tackle this?
Is it possible using SELECT MAX(Base Date) and MIN(Base Date)? And how would I then tackle the strings like that..
You can use a self join to do this. However, you should think about dates like February 29 as they only occur in leap years.
select t1.company,t1.year as basedate,t2.year as basedate_1,
t1.comment as comment_basedate,t2.comment as comment_basedate_1
from t t1
left join t t2 on t1.company=t2.company dateadd(year,1,t2.year)=t1.year
Change the left join to an inner join if you only need results where both the date values exist for a company. This solution assumes there can only be one comment per day.
I'd assign a row number to each record partitioned by company ordered by year desc though an analytical function in a common table expression... then use a left self join... on the row number + 1 and company.
This assumes you only want 1 record per company using the 2 most recent years. and if only 1 record exists for a company null values are acceptable for the second year. If not we can change the left join to an inner and eliminate both records...
We use a common table expression (though a inline view would work as well) to assign a row number to each record. That value is then made available in our self join so we don't have to worry about different dates and max values. We then use our RowNumber (RN) and company to join the 2 desired records together. To save on some performance we limit 1 table to RN 1 and the second table to RN 2.
WITH CTE AS (
SELECT *, Row_Number() over (Partition by Company Order by Year Desc) RN FROM TABLE)
SELECT A.Company
, A.Year as Base_Date
, B.Year as Base_Date1
, A.comment as Base_Date_Comment
, B.Comment as Base_Date1_Comment
FROM CTE A
LEFT JOIN CTE B
on A.RN+1 = B.RN
and A.Company = B.Company
and B.RN = 2
WHERE A.RN = 1
Note the limit on RN=2 must be on the join since it's an outer join or we would eliminate the companies without 2 years. (in essence making the left join an inner)
This approach makes all columns of the data available for each row.
If there are only two rows each, then that's pretty simple. If there are more than two rows, you could do something like this -- essentially joining all rows, then making sure A represents the earliest row and B represents the latest row.
SELECT A.Company, A.Year AS [Base Date], B.Year AS [Base Date 1],
A.Comment AS [Comment Base Date], B.Comment AS [Comment Base Date 1]
FROM MyTable A
INNER JOIN MyTable B ON A.Company = B.Company
WHERE A.Year = (SELECT MIN(C.YEAR) FROM MyTable C WHERE C.Company = A.Company)
AND B.Year = (SELECT MAX(C.YEAR) FROM MyTable C WHERE C.Company = B.Company)
There might be a more efficient way to do this with Row_Number or something.

Handling null values with join across multiple tables

My mind is exploding right now.. I can't get any of this to work the way I want to! SQL is seriously such a pain in the butt. (/End Rant)
I have three tables that have some common columns to link with. I am trying to retrieve the ID off one table based on the name from the middle table based on the code from the farthest table. (Excuse my vocabulary, I am not skilled with SQL or its' lingo) If the farthest table has a code not found in the middle table, it is to default to a certain value. Then, the first table will return the default for null values. etc.
Example,
tblCounty table has an ID and name column. I am to return the ID from tblCounty based on the name column matching the name column of tblCode.
tblCode has two columns name and code. tblCode returns the respective name based on the matching code column with tblAddress's code column.
tblAdress has many columns, but shares in common a code field.
My attempt,
INSERT INTO vendor (CountyID, Contact)
SELECT
(SELECT a.id
FROM county a
WHERE a.name = (CASE WHEN (SELECT TOP(1) c.countyID
FROM tblAdress c
INNER JOIN tblCode d ON c.CountyID = d.CodeID
WHERE d.CodeID = b.CountyID) IS NULL THEN '**NONE**'
ELSE (SELECT a.CodeName
FROM tblCode a
WHERE a.CodeID = b.CountyID) END)),
b.Contact
FROM
tblAdress b
The error I am receiving is:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Now of course I googled this and looked at results on StackOverflow, but I was unable to apply what I found to my problem.
Vendor:
CountyID | ....
-------------------
1 | ...
2 | ...
3 | ...
2 | ...
tblCounty:
ID | Name | ...
----------------------
1 | **None**
2 | NYC
3 | Buffalo
tblCode:
Name | Code
--------------
**None** | **None**
NYC | 56A
Buffalo | 75B
tblAdress:
Code | ....
----------------
**None** | ....
56A | ......
75B | .....
56A | .....
Using the above tables, I want to transfer all data out of tblAdress into another table (vendor). In the process I will convert column Code to tblCode's column name via code comparison, then to tblCounty.ID via name comparison.
Essentially a catch all is needed. If a code in tblAddress does not exist in tblCode or the code is null in tblAddress, it will return a default value (None). Then tblCounty will convert that default value into ID = 1, then store it into the Vendor table.
Edit
(SELECT TOP(1)
c.ID
FROM
dbo.Address a
LEFT OUTER JOIN
dbo.tblCode cd ON ISNULL(CASE a.CountyID WHEN ' ' THEN '**None**' ELSE a.CountyID END, '**None**') = cd.CodeID
LEFT OUTER JOIN
dbo.tblCounty c ON c.NAME = cd.CodeName
WHERE a.CountyID = b.countyID)
Firstly, your database doesn't seem to be following the best practices of creating a database.
Ideally the design of the relationships and tables should prevent you having to do null checks in joins and the majority of the time a simple left join would do most of what you want. Could you can use constraints and ISNULLs when the data is being added to ensure its integrity? Also, I would advise against joining tables on text like county if you can - It would be much more elegant to use an integer primary key.
I suggest that you make sure that your design is solid before progressing, as these problems may just multiply in the future.
That being said, if you are insistent on continuing the way you're going, the following query should do what you want:
SELECT tblCounty.ID,
ISNULL(tblAddress.Code, 'none')
--Whatever you want to select
FROM tblCounty
LEFT JOIN tblCode ON tblCounty.Name = tblCode.Name
LEFT JOIN tblAddres ON ISNULL(tblCode.Code, 'none') = ISNULL(tblAddress.Code, 'none')
Would this not get you the desired results?
select isnull(a.ID, '**NONE**') as CountyID, c.Contact
from tblCounty a
left join tblCode b on a.Name = b.Name
left join tblAddress c on b.Code = c.Code
OK, so let's try to build this query:
from tblCounty, you want the ID?
tblCounty and tblCode are linked via the name column? (bad idea - opens all sorts of issues - I'd rather use code or something!)
tblAdress is linked to tblCode via the code column
Right?
OK, so let's try this:
if you want to "link" two tables that have a column in common, and you want only rows that exist in both tables - use an INNER JOIN
if you want to "link" two tables that have a column in common, and you want all rows, even those that don't exist in the "right" table, use a LEFT OUTER JOIN
So I'd say you need something like this:
SELECT
c.ID, c.Name, ...(whatever other columns you want),
-- if there's no entry in `tblAddress`, then `a.Name` will be `NULL`
-- so just replace that `NULL` with your default value
ISNULL(a.Name, '*DEFAULT NAME*')
FROM
dbo.tblCounty c
INNER JOIN
dbo.tblCode cd ON c.Name = cd.Name
LEFT OUTER JOIN
dbo.tblAddress a ON cd.Code = a.Code
Update: OK, so I tried with your sample data - how about this query?
SELECT
c.ID, cd.Code,
a.StreetName
FROM
dbo.tblAdress a
LEFT OUTER JOIN
dbo.tblCode cd ON ISNULL(a.Code, 'None') = cd.COde
LEFT OUTER JOIN
dbo.tblCounty c ON c.NAME = cd.NAME

SQL Count on multiple joins with dynamic WHERE

My issue is that I have a Select statement that has a where clause that is generated on the fly. It is joined across 5 tables.
I basically need a Count of each DISTINCT instance of a USER ID in table 1 that falls into the scope of the WHERE. This has to be able to be executed in one statement as well. So, Esentially, I can't do a global GROUP BY because of the other 4 tables data I need returned.
If I could get a column that had the count that was duplicated where the primary key column is that would be perfect. Right now this is what I'm looking at as my query:
SELECT *
FROM TBL1 1
INNER JOIN TBL2 2 On 2.FK = 1.FK
INNER JOIN TBL3 3 On 3.PK = 2.PK INNER JOIN TBL4 4 On 4.PK = 3.PK
LEFT OUTER JOIN TBL5 5 ON 4.PK = 5.PK
WHERE 1.Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00'
ORDER BY
4.Column
, 3.Column
, 3.Column2
, 1.Date_Time_In DESC
So instead of selecting all columns, I will be filtering it down to about 5 or 6 but with that I need something like a Total column that is the Distinct count of TBL1's Primary Key that applies the WHERE clause that has a possibility of growing and shrinking in size.
I almost wish there was a way to apply the same WHERE clause to a subselect because I realize that would work but don't know of a way other than creating a variable and just placing it in both places which I can't do either.
If you are using SQL Server 2005 or higher, you could use one of the AGGREGATE OVER functions.
SELECT *
, COUNT(UserID) OVER(PARTITION BY UserID) AS 'Total'
FROM TBL1 1
INNER JOIN TBL2 2 On 2.FK = 1.FK
INNER JOIN TBL3 3 On 3.PK = 2.PK INNER JOIN TBL4 4 On 4.PK = 3.PK
LEFT OUTER JOIN TBL5 5 ON 4.PK = 5.PK
WHERE 1.Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00'
ORDER BY
4.Column, 3.Column, 3.Column2, 1.Date_Time_In DESC
something like adding:
inner join (select pk, count(distinct user_id) from tbl1 WHERE Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00') as tbl1too on 1.PK = tbl1too.PK