Problem with creating a SQL view with multiple joins - sql

I am trying to create a view which will be a base for Excel export in API. Basically, what it contains is information about particular projects. To said projects calculations can be added (it all happens on the form on frontend). Those calculations are called EBIT, EBIT+ and OVI. User can add either one, two or all of them, so for example there will be projects with only EBIT, but also projects with only for example EBIT, but also only with EBIT+ and OVI. View needs to return the project information with all chosen calculations in one row, so because some type of calculations wont be chosen by user there needs to be a type safety as well.
Code of my view:
CREATE VIEW [Signoff].[ExecelReport_uvw]
AS SELECT
project.ProjectName,
project.CreatedOn,
project.ProjectId,
subCategory.SubCategoryName,
projectStatus.StatusName,
overallCategory.CategoryName,
projectUserResponsible.UserName,
valueImprovementType.ValueImprovementTypeName,
OVI.OverallImprovementTypeName,
project.NameOfSuplier,
improvementCalculation.Baseline,
improvementCalculation.ImpactValue,
project.ContractStartDate,
project.ContractEndDate,
userbusinessController.UserName as businessControllerName,
businessControllerStatus.ApprovalStatusName businessControllerStatus,
userbusinesOwner.UserName as businessOwnerName,
businesOwnerStatus.ApprovalStatusName as businessOwnerStatus,
userbusinessCFO.UserName as businessCFOName,
businessCFOStatus.ApprovalStatusName as businessCFOStatus,
project.IsEbitda,
improvementCalculation.EBITDA
FROM [Signoff].[Project] as project
LEFT JOIN [Signoff].[OverallImprovementType] as OVI on project.OverallImprovementTypeId = OVI.OverallImprovementTypeId
LEFT JOIN [Signoff].[SubCategory] as subCategory on project.GPSubCategory = subCategory.SubCategoryId
LEFT JOIN [Signoff].[Category] as overallCategory on project.GPCategory = overallCategory.CategoryId
LEFT JOIN [Signoff].[ValueImprovementType] as valueImprovementType on project.ValueImprovementTypeId = valueImprovementType.ValueImprovementTypeId
LEFT JOIN [Signoff].[Status] as projectStatus on project.ProjectStatus = projectStatus.StatusId
LEFT JOIN [Signoff].[User] as projectUserResponsible on project.ProjectResponsible = projectUserResponsible.UserId
LEFT JOIN [Signoff].[ProjectUser] as projectUserbusinessControler on project.ProjectId = projectUserbusinessControler.ProjectId AND projectUserbusinessControler.ProjectRoleId = 'A36FC6CD-9ED7-4AA8-B1BE-355E48BDE25A'
LEFT JOIN [Signoff].[User] as userbusinessController on projectUserbusinessControler.ApproverId = userbusinessController.UserId
LEFT JOIN [Signoff].[ApprovalStatus] as businessControllerStatus on projectUserbusinessControler.ApprovalStatusId = businessControllerStatus.ApprovalStatusId
LEFT JOIN [Signoff].[ProjectUser] as projectUserbusinessOwner on project.ProjectId = projectUserbusinessOwner.ProjectId AND projectUserbusinessOwner.ProjectRoleId = 'E1E23E4F-1CA4-4869-9387-43CEDAEBBBB0'
LEFT JOIN [Signoff].[User] as userbusinesOwner on projectUserbusinessOwner.ApproverId = userbusinesOwner.UserId
LEFT JOIN [Signoff].[ApprovalStatus] as businesOwnerStatus on projectUserbusinessOwner.ApprovalStatusId = businesOwnerStatus.ApprovalStatusId
LEFT JOIN [Signoff].[ProjectUser] as projectUserbusinessCFO on project.ProjectId = projectUserbusinessCFO.ProjectId AND projectUserbusinessCFO.ProjectRoleId = 'DA17CF66-1D61-460E-BF87-5D86744DF22A'
LEFT JOIN [Signoff].[User] as userbusinessCFO on projectUserbusinessCFO.ApproverId = userbusinessCFO.UserId
LEFT JOIN [Signoff].[ApprovalStatus] as businessCFOStatus on projectUserbusinessCFO.ApprovalStatusId = businessCFOStatus.ApprovalStatusId
LEFT JOIN [Signoff].[ProjectImprovementCalculation] as projectImprovementCalculation on project.ProjectId = projectImprovementCalculation.ProjectId
LEFT JOIN [Signoff].[ImprovementCalculation] as improvementCalculation on projectImprovementCalculation.ImprovementCalculationId = improvementCalculation.ImprovementCalculationId
Improvement calculation table:
CREATE TABLE [Signoff].[ImprovementCalculation]
(
[ImprovementCalculationId] INT NOT NULL IDENTITY,
[Baseline] INT NOT NULL,
[TotalSpend] INT NOT NULL,
[ImpactValue] INT NOT NULL,
[ImpactPercentage] INT NOT NULL,
[EBITDA] INT NOT NULL,
[CalculationType] VARCHAR (255) NOT NULL
)
GO
ALTER TABLE [Signoff].[ImprovementCalculation]
ADD CONSTRAINT [PK_ImprovemntCalculation] PRIMARY KEY([ImprovementCalculationId]);
GO
Project Improvement Calculation table:
CREATE TABLE [Signoff].[ProjectImprovementCalculation]
(
[ProjectImprovementCalculationId] INT NOT NULL IDENTITY,
[ProjectId] UNIQUEIDENTIFIER NOT NULL,
[ImprovementCalculationId] INT NOT NULL,
)
GO
ALTER TABLE [Signoff].[ProjectImprovementCalculation]
ADD CONSTRAINT [PK_ProjectImprovementCalculation] PRIMARY KEY([ProjectImprovementCalculationId]);
GO
ALTER TABLE [Signoff].[ProjectImprovementCalculation]
ADD CONSTRAINT FK_ProjectProjectImprovementCalculation
FOREIGN KEY (ProjectId) REFERENCES [Signoff].[Project](ProjectId);
GO
ALTER TABLE [Signoff].[ProjectImprovementCalculation]
ADD CONSTRAINT FK_ImprovementCalculationProjectImprovementCalculation
FOREIGN KEY (ImprovementCalculationId) REFERENCES [Signoff].[ImprovementCalculation](ImprovementCalculationId);
GO
Just in case, although I don't think it's needed, the project table:
CREATE TABLE [Signoff].[Project]
(
[ProjectId] UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID()),
[ProjectName] NVARCHAR(50) NOT NULL,
[LegalEntity] UNIQUEIDENTIFIER NOT NULL,
[ValueImprovementTypeId] INT NOT NULL,
[OverallImprovementTypeId] INT NOT NULL,
[NameOfSuplier] NVARCHAR(50) NOT NULL,
[ContractStartDate] DATE NOT NULL,
[ContractEndDate] DATE NOT NULL,
[GPCategory] UNIQUEIDENTIFIER NOT NULL,
[GPSubCategory] UNIQUEIDENTIFIER NOT NULL,
[ProjectResponsible] UNIQUEIDENTIFIER NOT NULL,
[ProjectNumber] INT,
[FullProjectNumber] VARCHAR(55),
[ProjectStatus] UNIQUEIDENTIFIER NOT NULL DEFAULT '05c2f392-8b69-4915-a166-c4418889f9e8',
[IsCanceled] BIT NULL DEFAULT 0,
[IsEbitda] BIT NOT NULL DEFAULT 0,
[CreatedOn] DATETIME NOT NULL DEFAULT SYSDATETIME()
)
GO
ALTER TABLE [Signoff].[Project]
ADD CONSTRAINT [PK_Project] PRIMARY KEY([ProjectId]);
GO
ALTER TABLE [Signoff].[Project]
ADD CONSTRAINT [FK_ProjectStatus] FOREIGN KEY ([ProjectStatus]) REFERENCES [Signoff].[Status]([StatusId]);
GO
I have so came up with this solution, but it returns every single calculation in a different row in a table, and I want all calculations be in a single row with a project, so not what I am looking for:
LEFT JOIN [Signoff].[ProjectImprovementCalculation] as projectImprovementCalculation on project.ProjectId = projectImprovementCalculation.ProjectId
LEFT JOIN [Signoff].[ImprovementCalculation] as improvementCalculation on projectImprovementCalculation.ImprovementCalculationId = improvementCalculation.ImprovementCalculationId
Does anyone knows how to do it? Or I am approaching a problem from completely wrong way? If the information I have written is a bit chaotic, something isn't understandable, I can rephrase it.

I will assume that the available CalculationType values is fixed, that each project will have at most one improvement calculation per type, and that you wish to define fixed dedicated columns for BaseLine and ImpactValue each calculation type.
One approach is to use nested joins that will effectively LEFT JOIN the INNER JOINed combination of ProjectImprovementCalculation and ImprovementCalculation once for each calculation type. The results of each can then be referenced individually in the final select list.
Something like:
SELECT ...
IC_AAA.BaseLine, IC_AAA.ImpactValue,
IC_BBB.BaseLine, IC_BBB.ImpactValue,
...
FROM ...
LEFT JOIN Signoff.ProjectImprovementCalculation as PIC_AAA
JOIN Signoff.ImprovementCalculation as IC_AAA
ON IC_AAA.ImprovementCalculationId = PIC_AAA.ImprovementCalculationId
AND IC_AAA.CalculationType = 'AAA'
ON PIC_AAA.ProjectId = project.ProjectId
LEFT JOIN Signoff.ProjectImprovementCalculation as PIC_BBB
JOIN Signoff.ImprovementCalculation as IC_BBB
ON IC_BBB.ImprovementCalculationId = PIC_BBB.ImprovementCalculationId
AND IC_BBB.CalculationType = 'BBB'
ON PIC_BBB.ProjectId = project.ProjectId
...
The syntax is somewhat odd with two JOINs followed by two ON clauses. It would be clearer if parentheses were allowed, but (as far as I know) that is not part of the syntax.
There are several alternatives that accomplish the same. The following uses an OUTER APPLY:
SELECT ...
AAA.BaseLine, AAA.ImpactValue,
BBB.BaseLine, BBB.ImpactValue,
...
FROM ...
OUTER APPLY (
SELECT IC.*
FROM Signoff.ProjectImprovementCalculation as PIC
JOIN Signoff.ImprovementCalculation as IC
ON IC.ImprovementCalculationId = PIC.ImprovementCalculationId
AND IC.CalculationType = 'AAA'
WHERE PIC.ProjectId = project.ProjectId
) AAA
OUTER APPLY (
SELECT IC.*
FROM Signoff.ProjectImprovementCalculation as PIC
JOIN Signoff.ImprovementCalculation as IC
ON IC.ImprovementCalculationId = PIC.ImprovementCalculationId
AND IC.CalculationType = 'BBB'
WHERE PIC.ProjectId = project.ProjectId
) BBB
...
Using a Common Table Expression (CTE) can reduce some of the duplication as well as making the query a bit more readable.
;WITH ImprovementCTE AS (
SELECT PIC.ProjectId, IC.*
FROM Signoff.ProjectImprovementCalculation as PIC
JOIN Signoff.ImprovementCalculation as IC
ON IC.ImprovementCalculationId = PIC.ImprovementCalculationId
)
SELECT ...
AAA.BaseLine, AAA.ImpactValue,
BBB.BaseLine, BBB.ImpactValue,
...
FROM ...
LEFT JOIN ImprovementCTE AAA
ON AAA.ProjectId = project.ProjectId
AND AAA.CalculationType = 'AAA'
LEFT JOIN ImprovementCTE BBB
ON BBB.ProjectId = project.ProjectId
AND BBB.CalculationType = 'BBB'
...
You might also try using conditional aggregation in a single CROSS APPLY:
SELECT ...
IC.BaseLineAAA, IC.ImpactValueAAA,
IC.BaseLineBBB, IC.ImpactValueBBB,
...
FROM ...
CROSS APPLY (
SELECT
BaseLineAAA = SUM(CASE WHEN IC.CalculationType = 'AAA' THEN IC.BaseLine),
ImpactValueAAA = SUM(CASE WHEN IC.CalculationType = 'AAA' THEN IC.ImpactValue),
BaseLineBBB = SUM(CASE WHEN IC.CalculationType = 'BBB' THEN IC.BaseLine),
ImpactValueBBB = SUM(CASE WHEN IC.CalculationType = 'BBB' THEN IC.ImpactValue),
...
FROM Signoff.ProjectImprovementCalculation as PIC
JOIN Signoff.ImprovementCalculation as IC
ON IC.ImprovementCalculationId = PIC.ImprovementCalculationId
WHERE PIC.ProjectId = project.ProjectId
) IC
I expect there are additional approaches such as using a PIVOTs.
If the above appears to suit your needs, you should still run tests and examine the execution plans to see which performs best. Some may have a tendency to retrieve all ImprovementCalculation rows even when a subset of projects are selected.
To handle missing calculation types, you can use the ISNULL() function to provide a default. If you need to force a blank value in an otherwise numeric result, you might need to use something like ISNULL(CONVERT(VARCHAR(50), result), '').

Related

SQL selecting not surely existing column from on table or another

I use generalisation/specialize table Zadavatel (which contains primary key) as either table Sukromna_osoba or Firma (both contains foreign key that points on Zadavatel).
I need to select Sukromna_osoba table if Sukromna_osoba.meno = 'string' exists or Firma table if Firma.nazov_firmy = 'string' exists, both if both conditions are true. I also need this to be in one select.
CREATE TABLE Zadavatel (
id_zadavatela INTEGER,
adresa VARCHAR(25)
);
CREATE TABLE Sukromna_osoba (
id_sukromnej_osoby INTEGER,
meno VARCHAR(20),
mobil INTEGER,
email VARCHAR(20)
);
CREATE TABLE Firma (
id_firmy INTEGER,
nazov_firmy VARCHAR(20),
ico INTEGER,
bankove_spojenie INTEGER
);
id_zadavatela is primary key, and id_sukromnej_osoby and id_firmy are foreign keys which points at id_zadavatela.
I tried something like this:
SELECT PR.id_projektu, PR.popis, ZAD.id_zadavatela, FI.nazov_firmy
FROM Projekt PR JOIN Zamestnanec ZAM ON PR.manazer = ZAM.osobne_cislo
JOIN Zadavatel ZAD ON PR.zadavatel = ZAD.id_zadavatela
JOIN Firma FI ON ZAD.id_zadavatela = FI.id_firmy
WHERE ZAM.meno = 'Jan Novák' OR (
SELECT PR1.id_projektu, PR1.popis, ZAD1.id_zadavatela, SO1.meno
FROM Projekt PR1 JOIN Zamestnanec ZAM1 ON PR1.manazer = ZAM1.osobne_cislo
JOIN Zadavatel ZAD1 ON PR1.zadavatel = ZAD1.id_zadavatela
JOIN Sukromna_osoba SO1 ON ZAD1.id_zadavatela = SO1.id_sukromnej_osoby
WHERE ZAM1.meno = 'Jan Novák'
)
Since you do not know if it's a firm or a person, you could use left outer join on both, like this:
SELECT PR.id_projektu, PR.popis, ZAD.id_zadavatela, FI.nazov_firmy, SO.meno
FROM Projekt PR
JOIN Zamestnanec ZAM ON PR.manazer = ZAM.osobne_cislo
JOIN Zadavatel ZAD ON PR.zadavatel = ZAD.id_zadavatela
LEFT OUTER JOIN Firma FI ON ZAD.id_zadavatela = FI.id_firmy
LEFT OUTER JOIN Sukromna_osoba SO ON ZAD.id_zadavatela = SO.id_sukromnej_osoby
WHERE ZAM.meno = 'Jan Novák'
One of the two result columns, nazov_firmy or meno, will be NULL.

How can I efficiently query and index a view that involves a full outer join?

We have a data processing application that has two separate paths that should eventually produce similar results. We also have a database-backed monitoring service that compares and utilizes the results of this processing. At any point in time, either of the two paths may or may not have produced results for the operation, but I want to be able to query a view that tells me about any results that have been produced.
Here's a simplified example of the schema I started with:
create table LeftResult (
DateId int not null,
EntityId int not null,
ProcessingValue int not null
primary key ( DateId, EntityId ) )
go
create table RightResult (
DateId int not null,
EntityId int not null,
ProcessingValue int not null
primary key ( DateId, EntityId ) )
go
create view CombinedResults
as
select
DateId = isnull( l.DateId, r.DateId ),
EntityId = isnull( l.EntityId, r.EntityId ),
LeftValue = l.ProcessingValue,
RightValue = r.ProcessingValue,
MaxValue = case
when isnull( l.ProcessingValue, 0 ) > isnull( r.ProcessingValue, 0 )
then isnull( l.ProcessingValue, 0 )
else isnull( r.ProcessingValue, 0 )
end
from LeftResult l
full outer join RightResult r
on l.DateId = r.DateId
and l.EntityId = r.EntityId
go
The problem with this is that Sql Server always chooses to scan the PK on LeftResult and RightResult rather than seek, even when queries to the view include DateId and EntityId as predicates. This seems to be due to the isnull() checks on the results. (I've even tried using index hints and forceseek, but without avail -- the query plan still shows a scan.)
However, I can't simply replace the isnull() results, since either the left or right side could be missing from the join (because the associated process hasn't populated the table yet).
I don't particularly want to duplicate the MaxValue logic across all of the consumers of the view (in reality, it's quite a bit more complex calculation, but the same idea applies.)
Is there a good strategy I can use to structure this view or queries against it so that the
query plan will utilize a seek rather than a scan?
try using left outer join for one of the tables, then union those results with the excluded rows from the other table.
like:
select (...)
from LeftResult l
left outer join RightResult r
on l.DateId = r.DateId
and l.EntityId = r.EntityId
(...)
UNION ALL
select (...)
from RightResult r
leftouter join LeftResult l
on l.DateId = r.DateId
and l.EntityId = r.EntityId
WHERE
l.dateid is null

SQL Join With Fallback

Given
CREATE TABLE Addresses
Id INT NOT NULL
Zip NVARCHAR(5) NULL
ZipPlus4 NVARCHAR(9) NULL
CREATE TABLE ZipLookup
Zip NVARCHAR(5) NULL
Code NVARCHAR(10) NULL
CREATE TABLE ZipPlus4Lookup
ZipPlus4 NVARCHAR(9) NULL
Code NVARCHAR(10) NULL
And data like
Addresses
1 | 92123 | 921234444
ZipLookup
92123 | Type A
ZipPlus4Lookup
921234444 | Type B
Is it possible to construct a query such that:
A given row in Addresses is outer joined to ZipPlus4Lookup if there is a match
Addresses.ZipPlus4 = ZipPlus4Lookup.ZipPlus4
Otherwise, the given row in Addresses is outer joined to ZipLookup if there is a match
Addresses.Zip = ZipLookup.Zip
Otherwise neither table is outer joined
In plain English, the Addresses table has a Zip and a ZipPlus4 column and I need to look up a code using the most precise match. If there's a match on Zip+4, use the code from that match. Otherwise, use the code from a Zip match.
I wish I had an attempted query to share, but with this one I don't know where to start.
This basic query will work:
SELECT
A.*,
Code = IsNull(Z4.Code, Z.Code)
FROM
dbo.Addresses A
LEFT JOIN dbo.ZipPlus4Lookup Z4
ON A.ZipPlus4 = Z4.ZipPlus4
LEFT JOIN dbo.ZipLookup Z
ON A.Zip = Z.Zip
AND Z4.ZipPlus4 IS NULL;
Or you could try something like this:
SELECT
A.*,
Z.Code
FROM
dbo.Addresses A
OUTER APPLY (
SELECT TOP 1 Code
FROM (
SELECT 0, Code FROM dbo.ZipPlus4Lookup Z4
WHERE A.ZipPlus4 = Z4.ZipPlus4
UNION ALL
SELECT 1, Code FROM dbo.ZipLookup Z
WHERE A.Zip = Z.Zip
) X (Seq, Code)
ORDER BY X.Seq
) Z;
They may have different performance characteristics. It's worth testing. My guess is the second query is unnecessary but it's still conceptually possible to be better.
See these in action in a SQL Fiddle.

Mysql - help me optimize this query

About the system:
-The system has a total of 8 tables
- Users
- Tutor_Details (Tutors are a type of User,Tutor_Details table is linked to Users)
- learning_packs, (stores packs created by tutors)
- learning_packs_tag_relations, (holds tag relations meant for search)
- tutors_tag_relations and tags and
orders (containing purchase details of tutor's packs),
order_details linked to orders and tutor_details.
For a more clear idea about the tables involved please check the The tables section in the end.
-A tags based search approach is being followed.Tag relations are created when new tutors register and when tutors create packs (this makes tutors and packs searcheable). For details please check the section How tags work in this system? below.
Following is a simpler representation (not the actual) of the more complex query which I am trying to optimize:- I have used statements like explanation of parts in the query
============================================================================
select
SUM(DISTINCT( t.tag LIKE "%Dictatorship%" )) as key_1_total_matches,
SUM(DISTINCT( t.tag LIKE "%democracy%" )) as key_2_total_matches,
td.*, u.*, count(distinct(od.id_od)), `if (lp.id_lp > 0) then some conditional logic on lp fields else 0 as tutor_popularity`
from Tutor_Details AS td JOIN Users as u on u.id_user = td.id_user
LEFT JOIN Learning_Packs_Tag_Relations AS lptagrels ON td.id_tutor = lptagrels.id_tutor
LEFT JOIN Learning_Packs AS lp ON lptagrels.id_lp = lp.id_lp
LEFT JOIN `some other tables on lp.id_lp - let's call learning pack tables set (including
Learning_Packs table)`
LEFT JOIN Order_Details as od on td.id_tutor = od.id_author LEFT JOIN Orders as o on
od.id_order = o.id_order
LEFT JOIN Tutors_Tag_Relations as ttagrels ON td.id_tutor = ttagrels.id_tutor
JOIN Tags as t on (t.id_tag = ttagrels.id_tag) OR (t.id_tag = lptagrels.id_tag)
where `some condition on Users table's fields`
AND CASE WHEN ((t.id_tag = lptagrels.id_tag) AND (lp.id_lp > 0)) THEN `some
conditions on learning pack tables set` ELSE 1 END
AND CASE WHEN ((t.id_tag = wtagrels.id_tag) AND (wc.id_wc > 0)) THEN `some
conditions on webclasses tables set` ELSE 1 END
AND CASE WHEN (od.id_od>0) THEN od.id_author = td.id_tutor and `some conditions on Orders table's fields` ELSE 1 END
AND ( t.tag LIKE "%Dictatorship%" OR t.tag LIKE "%democracy%")
group by td.id_tutor HAVING key_1_total_matches = 1 AND key_2_total_matches = 1
order by tutor_popularity desc, u.surname asc, u.name asc limit
0,20
=====================================================================
What does the above query do?
Does AND logic search on the search keywords (2 in this example - "Democracy" and "Dictatorship").
Returns only those tutors for which both the keywords are present in the union of the two sets - tutors details and details of all the packs created by a tutor.
To make things clear - Suppose a Tutor name "Sandeepan Nath" has created a pack "My first pack", then:-
Searching "Sandeepan Nath" returns Sandeepan Nath.
Searching "Sandeepan first" returns Sandeepan Nath.
Searching "Sandeepan second" does not return Sandeepan Nath.
======================================================================================
The problem
The results returned by the above query are correct (AND logic working as per expectation), but the time taken by the query on heavily loaded databases is like 25 seconds as against normal query timings of the order of 0.005 - 0.0002 seconds, which makes it totally unusable.
It is possible that some of the delay is being caused because all the possible fields have not yet been indexed, but I would appreciate a better query as a solution, optimized as much as possible, displaying the same results
==========================================================================================
How tags work in this system?
When a tutor registers, tags are entered and tag relations are created with respect to tutor's details like name, surname etc.
When a Tutors create packs, again tags are entered and tag relations are created with respect to pack's details like pack name, description etc.
tag relations for tutors stored in tutors_tag_relations and those for packs stored in learning_packs_tag_relations. All individual tags are stored in tags table.
====================================================================
The tables
Most of the following tables contain many other fields which I have omitted here.
CREATE TABLE IF NOT EXISTS `users` (
`id_user` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL DEFAULT '',
`surname` varchar(155) NOT NULL DEFAULT '',
PRIMARY KEY (`id_user`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=636 ;
CREATE TABLE IF NOT EXISTS `tutor_details` (
`id_tutor` int(10) NOT NULL AUTO_INCREMENT,
`id_user` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_tutor`),
KEY `Users_FKIndex1` (`id_user`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=51 ;
CREATE TABLE IF NOT EXISTS `orders` (
`id_order` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id_order`),
KEY `Orders_FKIndex1` (`id_user`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=275 ;
ALTER TABLE `orders`
ADD CONSTRAINT `Orders_ibfk_1` FOREIGN KEY (`id_user`) REFERENCES `users`
(`id_user`) ON DELETE NO ACTION ON UPDATE NO ACTION;
CREATE TABLE IF NOT EXISTS `order_details` (
`id_od` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_order` int(10) unsigned NOT NULL DEFAULT '0',
`id_author` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_od`),
KEY `Order_Details_FKIndex1` (`id_order`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=284 ;
ALTER TABLE `order_details`
ADD CONSTRAINT `Order_Details_ibfk_1` FOREIGN KEY (`id_order`) REFERENCES `orders`
(`id_order`) ON DELETE NO ACTION ON UPDATE NO ACTION;
CREATE TABLE IF NOT EXISTS `learning_packs` (
`id_lp` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_author` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id_lp`),
KEY `Learning_Packs_FKIndex2` (`id_author`),
KEY `id_lp` (`id_lp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=23 ;
CREATE TABLE IF NOT EXISTS `tags` (
`id_tag` int(10) unsigned NOT NULL AUTO_INCREMENT,
`tag` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id_tag`),
UNIQUE KEY `tag` (`tag`),
KEY `id_tag` (`id_tag`),
KEY `tag_2` (`tag`),
KEY `tag_3` (`tag`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=3419 ;
CREATE TABLE IF NOT EXISTS `tutors_tag_relations` (
`id_tag` int(10) unsigned NOT NULL DEFAULT '0',
`id_tutor` int(10) DEFAULT NULL,
KEY `Tutors_Tag_Relations` (`id_tag`),
KEY `id_tutor` (`id_tutor`),
KEY `id_tag` (`id_tag`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `tutors_tag_relations`
ADD CONSTRAINT `Tutors_Tag_Relations_ibfk_1` FOREIGN KEY (`id_tag`) REFERENCES
`tags` (`id_tag`) ON DELETE NO ACTION ON UPDATE NO ACTION;
CREATE TABLE IF NOT EXISTS `learning_packs_tag_relations` (
`id_tag` int(10) unsigned NOT NULL DEFAULT '0',
`id_tutor` int(10) DEFAULT NULL,
`id_lp` int(10) unsigned DEFAULT NULL,
KEY `Learning_Packs_Tag_Relations_FKIndex1` (`id_tag`),
KEY `id_lp` (`id_lp`),
KEY `id_tag` (`id_tag`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `learning_packs_tag_relations`
ADD CONSTRAINT `Learning_Packs_Tag_Relations_ibfk_1` FOREIGN KEY (`id_tag`)
REFERENCES `tags` (`id_tag`) ON DELETE NO ACTION ON UPDATE NO ACTION;
===================================================================================
Following is the exact query (this includes classes also - tutors can create classes and search terms are matched with classes created by tutors):-
SELECT SUM(DISTINCT( t.tag LIKE "%Dictatorship%" )) AS key_1_total_matches,
SUM(DISTINCT( t.tag LIKE "%democracy%" )) AS key_2_total_matches,
COUNT(DISTINCT( od.id_od )) AS tutor_popularity,
CASE
WHEN ( IF(( wc.id_wc > 0 ), ( wc.wc_api_status = 1
AND wc.wc_type = 0
AND wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'classes_published',
CASE
WHEN ( IF(( lp.id_lp > 0 ), ( lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'packs_published',
td . *,
u . *
FROM tutor_details AS td
JOIN users AS u
ON u.id_user = td.id_user
LEFT JOIN learning_packs_tag_relations AS lptagrels
ON td.id_tutor = lptagrels.id_tutor
LEFT JOIN learning_packs AS lp
ON lptagrels.id_lp = lp.id_lp
LEFT JOIN learning_packs_categories AS lpc
ON lpc.id_lp_cat = lp.id_lp_cat
LEFT JOIN learning_packs_categories AS lpcp
ON lpcp.id_lp_cat = lpc.id_parent
LEFT JOIN learning_pack_content AS lpct
ON ( lp.id_lp = lpct.id_lp )
LEFT JOIN webclasses_tag_relations AS wtagrels
ON td.id_tutor = wtagrels.id_tutor
LEFT JOIN webclasses AS wc
ON wtagrels.id_wc = wc.id_wc
LEFT JOIN learning_packs_categories AS wcc
ON wcc.id_lp_cat = wc.id_wp_cat
LEFT JOIN learning_packs_categories AS wccp
ON wccp.id_lp_cat = wcc.id_parent
LEFT JOIN order_details AS od
ON td.id_tutor = od.id_author
LEFT JOIN orders AS o
ON od.id_order = o.id_order
LEFT JOIN tutors_tag_relations AS ttagrels
ON td.id_tutor = ttagrels.id_tutor
JOIN tags AS t
ON ( t.id_tag = ttagrels.id_tag )
OR ( t.id_tag = lptagrels.id_tag )
OR ( t.id_tag = wtagrels.id_tag )
WHERE ( u.country = 'IE'
OR u.country IN ( 'INT' ) )
AND CASE
WHEN ( ( t.id_tag = lptagrels.id_tag )
AND ( lp.id_lp > 0 ) ) THEN lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( ( t.id_tag = wtagrels.id_tag )
AND ( wc.id_wc > 0 ) ) THEN wc.wc_api_status = 1
AND wc.wc_type = 0
AND
wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor
AND o.order_status = 'paid'
AND CASE
WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1
ELSE 1
END
ELSE 1
END
GROUP BY td.id_tutor
HAVING key_1_total_matches = 1
AND key_2_total_matches = 1
ORDER BY tutor_popularity DESC,
u.surname ASC,
u.name ASC
LIMIT 0, 20
Please note - The provided database structure does not show all the fields and tables as in this query
=====================================================================================
The explain query output:-
Please see this screenshot
http://www.test.examvillage.com/Explain_query.jpg
Information on row counts, value distributions, indexes, size of the database, size of memory, disk layout - raid 0, 5, etc - how many users are hitting your database when queries are slow - what other queries are running. All these things factor into performance.
Also a print out of the explain plan output may shed some light on the cause if it's simply a query / index issue. The exact query would be needed as well.
You really should use some better formatting for the query.
Just add at least 4 spaces to the beginning of each row to get this nice code formatting.
SELECT * FROM sometable
INNER JOIN anothertable ON sometable.id = anothertable.sometable_id
Or have a look here: https://stackoverflow.com/editing-help
Could you provide the execution plan from mysql? You need to add "EXPLAIN" to the query and copy the result.
EXPLAIN SELECT * FROM ...complexquery...
will give you some useful hints (execution order, returned rows, available/used indexes)
Your question is, "how can I find tutors that match certain tags?" That's not a hard question, so the query to answer it shouldn't be hard either.
Something like:
SELECT *
FROM tutors
WHERE tags LIKE '%Dictator%' AND tags LIKE '%Democracy%'
That will work, if you modify your design to have a "tags" field in your "tutors" table, in which you put all the tags that apply to that tutor. It will eliminate layers of joins and tables.
Are all those layers of joins and tables providing real functionality, or just more programming headaches? Think about the functionality that your app REALLY needs, and then simplify your database design!!
Answering my own question.
The main problem with this approach was that too many tables were joined in a single query. Some of those tables like Tags (having large number of records - which can in future hold as many as all the English words in the vocabulary) when joined with so many tables cause this multiplication effect which can in no way be countered.
The solution is basically to make sure too many joins are not made in a single query. Breaking one large join query into steps, using the results of the one query (involving joins on some of the tables) for the next join query (involving joins on the other tables) reduces the multiplication effect.
I will try to provide better explanation to this later.

querying 2 tables with the same spec for the differences

I recently had to solve this problem and find I've needed this info many times in the past so I thought I would post it. Assuming the following table def, how would you write a query to find all differences between the two?
table def:
CREATE TABLE feed_tbl
(
code varchar(15),
name varchar(40),
status char(1),
update char(1)
CONSTRAINT feed_tbl_PK PRIMARY KEY (code)
CREATE TABLE data_tbl
(
code varchar(15),
name varchar(40),
status char(1),
update char(1)
CONSTRAINT data_tbl_PK PRIMARY KEY (code)
Here is my solution, as a view using three queries joined by unions. The diff_type specified is how the record needs updated: deleted from _data(2), updated in _data(1), or added to _data(0)
CREATE VIEW delta_vw AS (
SELECT feed_tbl.code, feed_tbl.name, feed_tbl.status, feed_tbl.update, 0 as diff_type
FROM feed_tbl LEFT OUTER JOIN
data_tbl ON feed_tbl.code = data_tbl.code
WHERE (data_tbl.code IS NULL)
UNION
SELECT feed_tbl.code, feed_tbl.name, feed_tbl.status, feed_tbl.update, 1 as diff_type
FROM data_tbl RIGHT OUTER JOIN
feed_tbl ON data_tbl.code = feed_tbl.code
where (feed_tbl.name <> data_tbl.name) OR
(data_tbl.status <> feed_tbl.status) OR
(data_tbl.update <> feed_tbl.update)
UNION
SELECT data_tbl.code, data_tbl.name, data_tbl.status, data_tbl.update, 2 as diff_type
FROM feed_tbl LEFT OUTER JOIN
data_tbl ON data_tbl.code = feed_tbl.code
WHERE (feed_tbl.code IS NULL)
)
UNION will remove duplicates, so just UNION the two together, then search for anything with more than one entry. Given "code" as a primary key, you can say:
edit 0: modified to include differences in the PK field itself
edit 1: if you use this in real life, be sure to list the actual column names. Dont use dot-star, since the UNION operation requires result sets to have exactly matching columns. This example would break if you added / removed a column from one of the tables.
select dt.*
from
data_tbl dt
,(
select code
from
(
select * from feed_tbl
union
select * from data_tbl
)
group by code
having count(*) > 1
) diffs --"diffs" will return all differences *except* those in the primary key itself
where diffs.code = dt.code
union --plus the ones that are only in feed, but not in data
select * from feed_tbl ft where not exists(select code from data_tbl dt where dt.code = ft.code)
union --plus the ones that are only in data, but not in feed
select * from data_tbl dt where not exists(select code from feed_tbl ft where ft.code = dt.code)
I would use a minor variation in the second union:
where (ISNULL(feed_tbl.name, 'NONAME') <> ISNULL(data_tbl.name, 'NONAME')) OR
(ISNULL(data_tbl.status, 'NOSTATUS') <> ISNULL(feed_tbl.status, 'NOSTATUS')) OR
(ISNULL(data_tbl.update, '12/31/2039') <> ISNULL(feed_tbl.update, '12/31/2039'))
For reasons I have never understood, NULL does not equal NULL (at least in SQL Server).
You could also use a FULL OUTER JOIN and a CASE ... END statement on the diff_type column along with the aforementioned where clause in querying 2 tables with the same spec for the differences
That would probably achieve the same results, but in one query.