SQL Condition based dataset - sql

I have a SQL Server database that I did not design.
The employees have degrees, licensures and credentials stored in a few different tables.
I have written the query to join all of this information together so I can see an over all result of what the data looks like. I have been asked to create a view for this data that returns only the highest degree they have obtained and the two highest certifications.
The problem is, as it is pre existing data, there is no hierarchy built into the data. All of the degrees and certifications are simply stored as a string associated with their employee number.
The first logical step was to create an adjacency list(I believe this is the correct term).
For example 'MD' is the highest degree you can obtain in our list. So I have given that the "ranking" of 1. The next lower degree is "ranked" as 2. and so forth.
I can join on the text field that contains these and return their associated rank.
The problem I am having is returning only the two highest based on this ranking.
If the employee has multiple degrees or certifications they are listed on a second or third row. From a logical standpoint, I need to group the employee ID, First name and Last name. Then some how concatenate the degrees, certifications and licensures based on the "ranking" I created for them. It is not a true hierarchy in the way that I am thinking about it because I only need to know the highest two and not necessarily the relationship between the results.
Another potential caveat is that the database must remain in SQL Server 2000 compatibility mode.
Any help that can be given would be much appreciated. Thank you.
select a.EduRank as 'Licensure Rank',
b.EduRank as 'Degree Rank',
EmpComp.EecEmpNo,
EmpPers.EepNameFirst,
EmpPers.EepNameLast,
RTRIM(EmpEduc.EfeLevel),
RTRIM(EmpLicns.ElcLicenseID),
a.EduType,
b.EduType
from empcomp
join EmpPers on empcomp.eeceeid = EmpPers.eepEEID
join EmpEduc on empcomp.Eeceeid = EmpEduc.EfeEEID
join EmpLicns on empcomp.eeceeid = EmpLicns.ElcEEID
join yvDegreeRanks a on a.EduCode = EmpLicns.ElcLicenseID
join yvDegreeRanks b on b.EduCode = EmpEduc.EfeLevel

I think I can see what your problem is - however I'm not sure. Joining the tables together has given you "double rows". The "quick-and-dirty" way to solve this query, would be to use Subqueries other than Joins. Doing so, you can select only the TOP 1 Degree, and TOP 2 certifications.
EDIT : Can you try this query ?
SELECT *
FROM employSELECT tblLicensures.EduRank as 'Licensure Rank',
tblDegrees.EduRank as 'Degree Rank',
EmpComp.EecEmpNo,
EmpPers.EepNameFirst,
EmpPers.EepNameLast,
RTRIM(tblDegrees.EfeLevel),
RTRIM(tblLicensures.ElcLicenseID),
tblLicensures.EduType,
tblDegrees.EduType
FROM EmpComp
LEFT OUTER JOIN EmpPers ON empcom.eeceeid = EmpPers.eepEEID
LEFT OUTER JOIN
-- Select TOP 2 Licensure Ranks
(
SELECT TOP 2 a.EduType, a.EduRank, EmpLicns.ElcEEID
FROM yvDegreeRanks a
INNER JOIN EmpLicns on a.EduCode = EmpLicns.ElcLicenseID
WHERE EmpLincs.ElcEEID = empcomp.eeceeid
ORDER BY a.EduRank ASC
) AS tblLicensures ON tblLicensures.ElcEEID = empcomp.Eeceeid
LEFT OUTER JOIN
-- SELECT TOP 1 Degree
(
SELECT TOP 1 b.EduType, b.EduRank, EmpEduc.EfeEEID, EmpEduc.EfeLevel
FROM yvDegreeRanks b
INNER JOIN EmpEduc on b.EduCode = EmpEduc.EfeLevel
WHERE EmpEduc.EfeEEID = empcomp.Eeceeid
ORDER BY b.EduRank ASC
) AS tblDegrees ON tblDegrees.EfeEEID = empcomp.Eeceeid

This is not the most elegant solution, but hopefully it will at least help you out in some way.
create table #dataset (
licensurerank [datatype],
degreerank [datatype],
employeeid [datatype],
firstname varchar,
lastname varchar,
efeLevel [datatype],
elclicenseid [datatype],
edutype1 [datatype],
edutype2 [datatype]
)
select distinct identity(int,1,1) [ID], EecEmpNo into #employeeList from EmpComp
declare
#count int,
#rows int,
#employeeNo int
select * from #employeeList
set #rows = ##rowcount
set #count = 1
while #count <= #ROWS
begin
select #employeeNo = EecEmpNo from #employeeList where id = #count
insert into #dataset
select top 2 a.EduRank as 'Licensure Rank',
b.EduRank as 'Degree Rank',
EmpComp.EecEmpNo,
EmpPers.EepNameFirst,
EmpPers.EepNameLast,
RTRIM(EmpEduc.EfeLevel),
RTRIM(EmpLicns.ElcLicenseID),
a.EduType,
b.EduType
from empcomp
join EmpPers on empcomp.eeceeid = EmpPers.eepEEID
join EmpEduc on empcomp.Eeceeid = EmpEduc.EfeEEID
join EmpLicns on empcomp.eeceeid = EmpLicns.ElcEEID
join yvDegreeRanks a on a.EduCode = EmpLicns.ElcLicenseID
join yvDegreeRanks b on b.EduCode = EmpEduc.EfeLevel
where EmpComp.EecEmpNo = #employeeNo
set #count = #count + 1
end

Have tables for employees, types of degrees (including a rank), types of certs (including a rank), and join tables employees_degrees and employees_certs. [It might be better to put degrees and certs in one table with a flag is_degree, if all their other fields are the same.] You can extract the existing string values and replace them with FK ids into the degree and cert tables.
The query itself is harder, because PARTITION BY is not available in SQL Server 2000 (according to Google). UW's answer has at least two problems: you need LEFT JOINs because not all employees have degrees and certs, and there is no ORDER BY to show what you want to take the best of. TOP 2 subqueries are particularly difficult to use in this context. So for that, I can't yet give an answer.

Related

How to get different data from two different tables in SQL query?

I have two table named Soft and Web, table containing multiple data in that which data is different that data I want. For Ex :
In soft table containing 5 data i.e.
Also in Web table containing 5 data i.e.
Now I want output i.e.
I have done query but unfortunately didnt succed, lets see my query i.e.
SELECT DISTINCT soft.GSTNo AS SoftGST
,web.GSTNo AS WebGST
,soft.InvoiceNumber AS SoftInvoice
,web.InvoiceNumber AS WebInvoice
,soft.Rate AS SoftRate
,web.Rate AS WebRate
FROM soft
LEFT OUTER JOIN web ON web.GstNo = soft.GSTNo
AND web.InvoiceNumber = soft.invoicenumber
AND web.rate = soft.rate
Also I apply inner join bt same thing didnt work.
You can achieve this by
;WITH cte_soft AS
(SELECT * FROM soft
EXCEPT
SELECT * FROM web)
,cte_web AS
(SELECT * FROM web
EXCEPT
SELECT * FROM soft)
SELECT *
FROM
(SELECT gst softgst, NULL webgst, invoice softinvoice, NULL webinvoice, rate softrate, NULL webrate
FROM cte_soft
UNION ALL
SELECT NULL, gst, NULL, invoice, NULL , rate
FROM cte_web) tbl
ORDER BY coalesce(softgst, webgst),coalesce(softinvoice,webinvoice)
Fiddle
You can use full join:
SELECT s.gst as softgst, w.gst as webgst,
s.invoice as softinvoice, w.invoice as webinvoice,
s.rate as softrate, w.rate as webrate
FROM soft s FULL JOIN
web w
ON s.gst = w.gst AND s.invoice = w.invoice AND s.rate = w.rate
WHERE s.gst IS NULL OR w.gst IS NULL
ORDER BY COALESCE(s.gst, w.gst), COALESCE(s.invoice, w.invoice);
No subqueries are CTEs are needed. This is really just a slight variant of your query.

SELECT and JOIN column not in Group by function

I have to join two different table to get my result.
The table 'Resource' it is simple, while the table 'Dimension.[Code]' contains, among the others, a column with different values (i.e :
Code
SILO
GRADE
OTHER 1
OTHER2
This is the reason why a join twice that column to get two different columns called GRADE and SILO.
Now, I have a query that selects the maximum value of a grade within the group as follows:
`SELECT
R.[ID] -- If I inserted that here, it is not working obviously.
-- This cannot But this is the additional column I need (see later)
DD_SILO.[Value] DIR ,
max(R.[GRADE]) GRADE_DIR
FROM [Resource] R
LEFT JOIN
Dimension DD_SILO ON R.[ID] = DD_SILO.[ID] AND DD_SILO.[Code] = 'SILO'
group by DD_SILO.[Value]'
What I need is basically to have, beside GRADE AND SILO, also the ID name, which is contained into the [Resource] table.
Please notice that [Resource].ID = [Dimension].ID
I would have solved the problem with ROW_NUMBER () to select the highest within the group, avoiding then then 'group by', but as the query has to be inserted in a bigger one, that would take too much time to run. I am using Microsoft SQL Server 2016.
Could you use a virtual table something like: -
`
select
a.max_grade_silo,
a.max_grade_value,
(select max(r.id)
from [resource] r,
[dimension] d
where r.[ID] = d.[ID] and
d.[CODE]= 'SILO' and
r.[GRADE] = a.[max_grade_value]
),
max_grade_silo a
from
(SELECT
DD_SILO.[Value] DIR ,
max(R.[GRADE]) GRADE_DIR
FROM [Resource] R
LEFT JOIN
Dimension DD_SILO ON R.[ID] = DD_SILO.[ID] AND DD_SILO.[Code] = 'SILO'
group by DD_SILO.[Value]
) temp_result (max_grade_silo, max_grade_value)
'
Probably better to look at normalizing the tables?
SELECT
MAX(R.[ID]) as ID ,
DD_SILO.[Value] DIR ,
max(R.[GRADE]) GRADE_DIR
FROM [Resource] R
LEFT JOIN
Dimension DD_SILO ON R.[ID] = DD_SILO.[ID] AND DD_SILO.[Code] = 'SILO'
group by DD_SILO.[Value]

SQL Join taking too much time to run

This query shown below is taking almost 2 hrs to run and I want to reduce the execution time of this query. Any help would be really helpful for me.
Currently:
If Exists (Select 1
From PRODUCTS prd
Join STORE_RANGE_GRP_MATCH srg On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Range_Event_Id Not IN (Select distinct Range_Event_Id
From Last_Authorised_Range)
)
I have tried replacing the Not IN clause by Not Exists and Left join but no luck in runtime execution.
What I have used:
If Exists( Select top 1 *
From PRODUCTS prd
Join STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID
And srg.Match_Flag = 'Y'
And prd.Range_Event_Id = srg.LAR_Range_Event_Id
and srg.Range_Event_Id ='45655'
Where NOT EXISTS (Select top 1 *
From Last_Authorised_Range where Range_Event_Id=srg.Range_Event_Id)
)
Product table has 432837 records and the Store table also has almost the same number of records. This table I am creating in the stored procedure itself and then dropping it in the end in the stored procedure.
Create Table PRODUCTS
(
Range_Event_Id int,
Store_Range_Grp_Id int,
Ranging_Prod_No nvarchar(14) collate database_default,
Space_Break_Code nchar(1) collate database_default
)
Create Clustered Index Idx_tmpLAR_PRODUCTS
ON PRODUCTS (Range_Event_Id, Ranging_Prod_No, Store_Range_Grp_Id, Space_Break_Code)
Should I use non clustered index on this table or what all can I do to lessen the execution time? Thanks in advance
First, you don't need top 1 or distinct in exists and in subqueries. But this shouldn't affect performance.
This is the query, slightly re-arranged so I can understand it better:
Select 1
From PRODUCTS prd Join
STORE srg
On prd.Store_Range_Grp_Id = srg.Orig_Store_Range_Grp_ID and
prd.Range_Event_Id = srg.LAR_Range_Event_Id
Where srg.Match_Flag = 'Y'
srg.Range_Event_Id = 45655 and
Where NOT EXISTS (Select 1
From Last_Authorised_Range lar
where lar.Range_Event_Id = srg.Range_Event_Id)
)
Do note that I removed the double quotes around 45655. I presume this column is actually a number. If so, don't confuse yourself and the optimizer by using a string for the comparison.
Then, try indexes. I think the best indexes are:
store(Range_Event_Id, Match_Flag, Orig_Store_Range_Grp_ID, LAR_Range_Event_Id)
products(Store_Range_Grp_Id, Range_Event_Id) (or any index, clustered or otherwise, that starts with these two columns in either order)
Last_Authorised_Range(Range_Event_Id)
From what you describe as the volume of data, your query should not be taking hours. I think indexes can help.

SQL query: Iterate over values in table and use them in subquery

I have a simple SQL table containing some values, for example:
id | value (table 'values')
----------
0 | 4
1 | 7
2 | 9
I want to iterate over these values, and use them in a query like so:
SELECT value[0], x1
FROM (some subquery where value[0] is used)
UNION
SELECT value[1], x2
FROM (some subquery where value[1] is used)
...
etc
In order to get a result set like this:
4 | x1
7 | x2
9 | x3
It has to be in SQL as it will actually represent a database view. Of course the real query is a lot more complicated, but I tried to simplify the question while keeping the essence as much as possible.
I think I have to select from values and join the subquery, but as the value should be used in the subquery I'm lost on how to accomplish this.
Edit: I oversimplified my question; in reality I want to have 2 rows from the subquery and not only one.
Edit 2: As suggested I'm posting the real query. I simplified it a bit to make it clearer, but it's a working query and the problem is there. Note that I have hardcoded the value '2' in this query two times. I want to replace that with values from a different table, in the example table above I would want a result set of the combined results of this query with 4, 7 and 9 as values instead of the currently hardcoded 2.
SELECT x.fantasycoach_id, SUM(round_points)
FROM (
SELECT DISTINCT fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
LEFT JOIN fantasyworld_fantasyformation AS ff ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
AND _rr.sequence <= 2 /* HARDCODED USE OF VALUE */
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
AND fpc.round_sequence = 2 /* HARDCODED USE OF VALUE */
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Edit 3: I'm using PostgreSQL.
SQL works with tables as a whole, which basically involves set operations. There is no explicit iteration, and generally no need for any. In particular, the most straightforward implementation of what you described would be this:
SELECT value, (some subquery where value is used) AS x
FROM values
Do note, however, that a correlated subquery such as that is very hard on query performance. Depending on the details of what you're trying to do, it may well be possible to structure it around a simple join, an uncorrelated subquery, or a similar, better-performing alternative.
Update:
In view of the update to the question indicating that the subquery is expected to yield multiple rows for each value in table values, contrary to the example results, it seems a better approach would be to just rewrite the subquery as the main query. If it does not already do so (and maybe even if it does) then it would join table values as another base table.
Update 2:
Given the real query now presented, this is how the values from table values could be incorporated into it:
SELECT x.fantasycoach_id, SUM(round_points) FROM
(
SELECT DISTINCT
fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
-- one row for each combination of coach and value:
CROSS JOIN values
LEFT JOIN fantasyworld_fantasyformation AS ff
ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr
ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff
ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
-- use the value obtained from values:
AND _rr.sequence <= values.value
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
-- use the value obtained from values again:
AND fpc.round_sequence = values.value
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Note in particular the CROSS JOIN which forms the cross product of two tables; this is the same thing as an INNER JOIN without any join predicate, and it can be written that way if desired.
The overall query could be at least a bit simplified, but I do not do so because it is a working example rather than an actual production query, so it is unclear what other changes would translate to the actual application.
In the example I create two tables. See how outer table have an alias you use in the inner select?
SQL Fiddle Demo
SELECT T.[value], (SELECT [property] FROM Table2 P WHERE P.[value] = T.[value])
FROM Table1 T
This is a better way for performance
SELECT T.[value], P.[property]
FROM Table1 T
INNER JOIN Table2 p
on P.[value] = T.[value];
Table 2 can be a QUERY instead of a real table
Third Option
Using a cte to calculate your values and then join back to the main table. This way you have the subquery logic separated from your final query.
WITH cte AS (
SELECT
T.[value],
T.[value] * T.[value] as property
FROM Table1 T
)
SELECT T.[value], C.[property]
FROM Table1 T
INNER JOIN cte C
on T.[value] = C.[value];
It might be helpful to extract the computation to a function that is called in the SELECT clause and is executed for each row of the result set
Here's the documentation for CREATE FUNCTION for SQL Server. It's probably similar to whatever database system you're using, and if not you can easily Google for it.
Here's an example of creating a function and using it in a query:
CREATE FUNCTION DoComputation(#parameter1 int)
RETURNS int
AS
BEGIN
-- Do some calculations here and return the function result.
-- This example returns the value of #parameter1 squared.
-- You can add additional parameters to the function definition if needed
DECLARE #Result int
SET #Result = #parameter1 * #parameter1
RETURN #Result
END
Here is an example of using the example function above in a query.
SELECT v.value, DoComputation(v.value) as ComputedValue
FROM [Values] v
ORDER BY value

Providing Language FallBack In A SQL Select Statement

I have a table that represents an Object. It has many columns but also fields that require language support.
For simplicity let's say I have 3 tables:
MainObjectTable
LanguageDependantField1
LanguageDependantField2.
MainObjectTable has a PK int called ID, and both LanguageDependantTables have a foreign key link back to the MainObjectTable along with a language code and the date they were added.
I've created a stored procedure that accepts the MainObjectTable ID and a Language. It will return a single row containing the most recent items from the language tables. The select statement looks like
SELECT
MainObjectTable.VariousColumns,
LanguageDependantField1.Description,
LanguageDependantField2.SomeOtherText
FROM
MainObjectTable
OUTER APPLY
(SELECT TOP 1 LanguageDependantField1.Description
FROM LanguageDependantField1
WHERE LanguageDependantField1.MainObjectTable_ID = MainObjectTable.ID
AND LanguageDependantField1.Language_ID = #language
ORDER BY
LanguageDependantField1.[Default], LanguageDependantField1.CreatedDate DESC) LanguageDependantField1
OUTER APPLY
(SELECT TOP 1 LanguageDependantField2.SomeOtherText
FROM LanguageDependantField2
WHERE LanguageDependantField2.MainObjectTable_ID = MainObjectTable.ID
AND LanguageDependantField2.Language_ID = #language
ORDER BY
LanguageDependantField2.[Default] DESC, LanguageDependantField2.CreatedDate DESC) LanguageDependantField2
WHERE
MainObjectTable.ID = #MainObjectTableID
What I want to add is the ability to fallback to a default language if a row isn't found in the specified language. Let's say we use "German" as the selected language. Is it possible to return an English row from LanguageDependantField1 if the German does not exist presuming we have #fallbackLanguageID
Also am I right to use OUTER APPLY in this scenario or should I be using JOIN?
Many thanks for your help.
Try this:
SELECT MainObjectTable.VariousColumns,
COALESCE(PrefLang.Description,Fallback.Description,'Not Found Desc')
as Description,
COALESCE(PrefLang.SomeOtherText,FallBack.SomeOtherText,'Not found')
as SomeOtherText
FROM MainObjectTable
LEFT JOIN
(SELECT TOP 1 pl.Description,pl.SomeOtherText
FROM LanguageDependantField1 pl
WHERE pl.MainObjectTable_ID = MainObjectTable.ID
AND pl.Language_ID = #language
ORDER BY
pl.[Default], pl.CreatedDate DESC)
PrefLang ON 1=1
LEFT JOIN
(SELECT TOP 1 fb.Description,fb.SomeOtherText
FROM LanguageDependantField1 fb
WHERE fb.MainObjectTable_ID = MainObjectTable.ID
AND fb.Language_ID = #fallbackLanguageID
ORDER BY
fb.[Default], fb.CreatedDate DESC)
Fallback ON 1=1
WHERE
MainObjectTable.ID = #MainObjectTableID
Basically, make two queries, one to the preferred language and one to English (Default). Use the LEFT JOIN, so if the first one isn't found, the second query is used...
I don't have your actual tables, so there might be a syntax error in above, but hope it gives you the concept you want to try...
Yes, the use of Outer Apply is correct if you want to correlate the MainObjectTable table rows to the inner queries. You cannot use Joins with references in the derived table to the outer table. If you wanted to use Joins, you would need to include the joining column(s) and in this case pre-filter the results. Here is what that might look like:
With RankedLanguages As
(
Select LDF1.MainObjectTable_ID, LDF1.Language_ID, LDF1.Description, LDF1.SomeOtherText, ...
, Row_Number() Over ( Partition By LDF1.MainObjectTable_ID, LDF1.Language_ID
Order By LDF1.[Default] Desc, LDF1.CreatedDate Desc ) As Rnk
From LanguageDependantField1 As LDF1
Where LDF1.Language_ID In( #languageId, #defaultLanguageId )
)
Select M.VariousColumns
, Coalesce( SpecificLDF.Description, DefaultLDF.Description ) As Description
, Coalesce( SpecificLDF.SomeOtherText, DefaultLDF.SomeOtherText ) As SomeOtherText
, ...
From MainObjectTable As M
Left Join RankedLanguages As SpecificLDF
On SpecificLDF.MainObjectTable_ID = M.ID
And SpecifcLDF.Language_ID = #languageId
And SpecifcLDF.Rnk = 1
Left Join RankedLanguages As DefaultLDF
On DefaultLDF.MainObjectTable_ID = M.ID
And DefaultLDF.Language_ID = #defaultLanguageId
And DefaultLDF.Rnk = 1
Where M.ID = #MainObjectTableID