Selecting Max Value, However Prioritising Certain Values - sql

I have three tables that are joined. TableA has unique values for Column1 (ID) which joins on TableC on Column1 which has non-unique values. I'm currently joining these based off the max value for Column2 in TableC. The returning a value in TableB which is simply joined off TableC.
However I want to adjust this so that if TableB.Column2 has any value greater than 0 in TableC.Column2 then this is chosen as the max value, if it is 0 then the max value is chosen normally based off numeric value.
The current query I have is this:
Select [TableA].Column2,
FIRST_VALUE([TableB].Column2) OVER (PARTITION BY [TableA].Column2 ORDER BY MAX([TableC].Column2) Desc)
From [TableC] Left Join
[TableA]
On [TableA].Column1 = [TableC].Column1 Left Join
[TableB]
On [TableB].Column3 = [Table3].Column3
What I am expecting to happen is that if:
TableC.Column2 > '0' where TableB.Column2 = 'KEYVALUE' then show Table.Column2 based off TableC.Column3, however if TableC.Column2 = '0' where TableB.Column2 = 'KEYVALUE' then show result of [TableB].Column2 based off MAX [TableC].Column2
Sample Data:
Example Output:
S7000,KEYVALUE
S6500,OTHERVALUE1
Hope that all makes sense, thank you.

I find your conditions hard to follow, but you seem to want apply:
Select a.*, bc.column2
From a outer apply
(select top (1) b.column2
from c join
b
on c.column3 = b.column3
where c.column1 = a.column1
order by (case when c.column2 > 0 and b.column2 = 'KEYVALUE'
then 1
else 2
end),
c.column2 desc
) bc;

Related

How to do "case when exists..." in spark sql

What I am trying to do is
case when exists (select 1 from table B where A.id = B.id and B.value in (1,2,3)) then 'Y' else 'N' end as Col_1
It seems like "left semi join" can take care of multiple matching issue, but my understanding is that "left semi join" does not allow using columns from the right (B) table, so how can I add condition "B.value in (1,2,3)"?
The normal way to do this is to left outer join to a summary of table b:
Select a.id, Case When IsNull(b.id) Then 'N' else 'Y' end as Col_1
From A Left Outer Join
(Select distinct id from tableb) b On A.id=b.id
That way you are not repeatedly executing a lookup query for every id in A.
Addition
Your comment indicated that you are trying to create multiple Y/N columns based on b values. Your example had a Y/N for col1 when there was a 1,2,3 and a Y/N for col2 when there was a 4,5,6.
You can get there easily with one summarization of table b :
Select a.id, Case When IsNull(b.val123) Then 'N' else 'Y' end as Col_1,
Case When IsNull(b.val456) Then 'N' Else 'Y' end as Col_2
From A Left Outer Join
(Select id, max(Case When value in (1,2,3) Then 'Y' End) as val123
max(Case When value in (4,5,6) Then 'Y' End) as val456
From tableb
Group By id) b On A.id=b.id
This still accomplishes that lookup with only one summarization of table b.

Boolean - Does ID Exist in Table?

I have two tables... A master ID table and a results ID table with only a few IDs from the master table. I'm looking to create the following SQL Query:
Select
A.ID
(Case when B.ID is in A.ID 1 Else 0 End) as is_found
From
master_table as A
LEFT JOIN results_table as B
ON A.ID = B.ID
The resulting table should have all IDs from master table with a boolean column saying if the ID was found in the results table. Thank you for your help!!
I would use case . . . exists:
Select mt.id,
(case when exists (select 1 from results_table rt where rt.id = mt.id) then 1 else 0 end) as is_found
From master_table ;
First, consider the case where results_table will have either zero or one matching row; in this case, the LEFT JOIN will always give one row for each ID, and B.ID will be NULL if there is no corresponding row in results_table.
We can therefore use a simple CASE to test this:
Select
A.ID,
CASE WHEN B.ID IS NOT NULL THEN 1 ELSE 0 END as is_found
From
master_table as A
LEFT JOIN results_table as B
ON A.ID = B.ID
If there may be more than one row in results_table for the same ID, the LEFT JOIN may in turn create several rows, one for each match.
The result of the CASE statement will be the same for all values of A.ID - if there are zero matches, it will occur once with value 0, and if there are one or more, it will always have the value 1. So we can simply take distinct values of the entire query:
Select Distinct
A.ID,
CASE WHEN B.ID IS NOT NULL THEN 1 ELSE 0 END as is_found
From
master_table as A
LEFT JOIN results_table as B
ON A.ID = B.ID

Combining SQL Queries to pass result from 1 as a parameter of the 2nd (SQL Server)

I am a little out of practice with SQL and I am trying to verify some data that has been converted in a system. Some of the queries I originally developed prior to the conversion are not proving out the work. I have been able to trace the source data back and verify that conversion was correct, but this is on an account by account basis. I would like to have a query to show the full dataset.
I have been able to work a solution down to 2 queries, but I cannot figure out how to combine them into one piece to show the full data set, where one value from the first query needs to be an element in the second query.
Query 1
select distinct
CreatedDate, AccountNum
From
Table1 A
Join
Table2 B on A.Column1 = B.Column1 and a.Column2 = b.Column2
Join
Table3 C on A.Column3 = C.Column3 and A.Column4 = C.Column4
where
Condition A and Condition B
Query 2
Select distinct
AccountNum, Responsible
From
Table3 D
Join
Table4 E on D.Column1 = E.Column2
where
StartDate <= 'DateValue' and EndDate > 'DateValue'
I would like to use the CreatedDate value from query 1 as the DateValue in query 2, but I have not found a solution to give the results I am looking for.
If I add a qualifier to each query, like account number, I end up with 1 result from query 1. I then put that CreatedDate into query 2 and I get the results I want. If I only have the account number on the 2nd query, I get two results, one from time period A to B with a responsible value of X and the 2nd from time period C to D with Responsible Value Y, which is where the CreateDate value falls between. Everything I have tried to combine these queries either ends up with a Responsible value of X (or no results), when I want that Y value.
I have not been able to successfully integrate the two queries, so that I can have that CreatedDate value passed as a parameter to figure out the Responsible value.
A solution that would work would be to create an intermediate table for the results of the 1st query and then join that table to 2nd query. However, I do not have access to create/insert/update tables/records on the database, so I cannot use this method.
I think you are looking for this
SELECT DISTINCT accountnum,
responsible
FROM table1 A
JOIN table2 B
ON A.column1 = B.column1
AND a.column2 = b.column2
JOIN table3 C
ON A.column3 = C.column3
AND A.column4 = C.column4
JOIN table4 D
ON D.column1 = C.column2
AND startdate <= createddate
AND enddate > createddate
where Condition A and Condition B
Note: You may have to add proper alias name to the columns
Select distinct AccountNum, Responsible
From Table3 D
Join Table4 E on D.Column1 = E.Column2
Join (
select distinct CreatedDate, AccountNum
From Table1 A
Join Table2 B on A.Column1 = B.Column1 and a.Column2 = b.Column2
Join Table3 C on A.Column3 = C.Column3 and A.Column4 = C.Column4
where Condition A and Condition B
) X
on D.AccountNum=X.AccountNum
and D.StartDate <= X.CreatedDate and EndDate > X.CreatedDate
Another solution is to make the first query into a table-valued UDF:
Create function GetCreateDateAndAcctId([Parameters for 2 conditions here])
Returns table As
Return
select distinct CreatedDate, AccountNum
From Table1 a
Join Table2 b
on b.Column1 = a.Column1
and b.Column2 = a.Column2
Join Table3 c
on c.Column3 = a.Column3
and c.Column4 = a.Column4
where condition1 -- here put predicate
and condition2 -- using input parameters
Then, to use it, just include it as a table in your second query like this:
Select distinct AccountNum, Responsible
From Table3 d
Join Table4 e
on e.Column2 = d.Column1
outer apply dbo.GetCreateDateAndAcctId(Parameters) cd
where StartDate <= cd.CreatedDate and EndDate > cd.CreatedDate
If you do this, the logic for the first query remains in a separate database object for reusability (you can use it in any other process without copying it). and better maintainability, (it's in only one place for fixing bugs and enhancements, etc. Also, since it's a table valued UDF, the SQL Server query processor will actually combine it with the second query's SQL into a single reusable compiled execution plan.

Hive SQL - Refining JOIN query to ignore Null values

I'm a little new with SQL so bear with me.
I have two tables, each with an ID column. Table A has a column titled role, Table B has a column titled outcome. I want to query these tables to find which rows based on the ID have role = 'PS' and outcome = 'DE'. Here is my code:
SELECT count(*)
FROM A JOIN B
ON (A.id = B.id
AND A.role = 'PS'
AND B.outcome = 'DE')
I've been searching the internet for a way to do this so that it doesn't include rows that have null values for either A.role or B.outcome.
The above code returns lets say 40,100, even though the total number of entries in B where B.outcome = 'DE' is only 40,000. So it is obviously including entries that do not fit my conditions. Is there a way to better refine my query?
Your query already excludes rows with a null value in A.role. After all, null = 'PS' is not true, and you're using an inner join.
There's an easy explanation of how you can retrieve more rows from the join than there are in B. Say you have these rows for A:
A.id A.role
1 'A'
1 'A'
And these rows for B:
B.id B.outcome
1 'A'
1 'A'
Then this query:
select *
from A
join B
on A.id = B.id and A.role = 'A' and B.role = 'A'
will return 4 rows. That's more than there are in table A or B!
So I'd investigate whether id is unique:
select count(*) from A group by id having count(*) > 1
select count(*) from B group by id having count(*) > 1
If these queries return a count greater than zero, id is not unique. Since a join repeats rows for each match, that would explain a large increase in the amount of returned records.

Update a column in a table with values from two other tables

I have to update a column in Table 'A' with values from Table 'B'. If any value in Table 'B'
is null or empty then I have to get the value from table 'C'.
Manu
Use:
UPDATE A
SET column = (SELECT COALESCE(b.val, c.value)
FROM B b
JOIN C c ON c.col = b.col)
COALESCE will return the first non-null value from the list of columns, processing from left to right.
What's odd is you haven't provided how tables B and C relate to one another - if they don't in anyway, you're looking at a cartesian product of the two tables (not ideal). My answer uses a JOIN, in hopes it is possible depending on the data.
Basically:
UPDATE a SET a.FIELD = (CASE WHEN b.FIELD IS NULL or b.FIELD = '' THEN c.FIELD ELSE b.FIELD END)
FROM TABLEA a
LEFT JOIN TABLEB b on a.id = b.someid
LEFT JOIN TABLEC c on a.id = c.someid
Joins may or may not be LEFT, depending on your data, and you may want to handle the case where both b.field and c.field are null.