What's the best way to amalgamate the 2 queries below - sql

I wrote the query below as part of a larger query to create a table. As I'm new to SQL, this was done in a very step-by-step manner so that I could easily understand the individual steps in the query and what each part was doing.
However, I've now been tasked to make the below 2 parts of the query more efficient by joining them together, and this is where I'm struggling.
I feel like I should be creating a single table rather than 2 and that the single table should contain all of the columns/values that I require. However, I am not at all sure of the syntax required to make this happen or the order in which I need to re-write the query.
Is anyone able to offer any advice?
Many thanks
sys_type as (select nvl(dw_start_date,sysdate) date_updated, id, descr
from scd2.scd2_table_a
inner join year_month_period
on 1=1
WHERE batch_end_date BETWEEN dw_start_date and NVL(dw_end_date,sysdate)),
sys_type_2 as (select -1 as sys_typ_id,
'Unassigned' as sys_typ_desc,
sysdate as date_updated
from dual
union
select id as sys_typ_id, descr as sys_typ_desc, date_updated
from sys_type),

Assuming you are using Oracle database, the queries above seem fine. I don't think you can make them more efficient just by 'joining' them (joining defined very loosely here. Is there a performance issue?
I think you can get better results by tuning your first inline query 'sys_type'.
You have a cartesian product there. Do you need that? Why don't you put the condition in the where clause as the join clause?
Basically
sys_type as (select nvl(dw_start_date,sysdate) date_updated, id, descr
from scd2.scd2_table_a
inner join year_month_period
on (batch_end_date BETWEEN dw_start_date and NVL(dw_end_date,sysdate)))

Related

Long SQL subquery trouble

I just registered and want to ask.
I learn sql queries not so long time and I got a trouble when I decided to move a table to another database. A few articles were read about building long subqueries , but they didn't help me.
Everything works perfect before that my action.
I just moved the table and tried to rewrite the query while whole day.
update [dbo].Full
set [salary] = 1000
where [dbo].Full.id in (
select distinct k1.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.S_School)
) k1
where k1.id not in (
select distinct k2.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.Shool)
) k2,
List_School t3
where charindex (t3.NameApp, k2.Topic)>5
)
)
I moved table List_School to database [DB_1] and I can't to bend with it.
I can't write [DB_1].dbo.List_School. Should I use one more subquery?
I even thought about create a few temporary tables but it can influence on speed of execution.
Sql gurus , please invest some your time on me. Thank you in advance.
I will be happy for each hint, which you give me.
There appear to be a number of issues. You are comparing the user column to the topic_name column. An expected meaning of those column names would suggest you are not comparing the correct columns. But that is a guess.
In the final subquery you have an ansi join on table List_School but no join columns which means the join witk k2 is a cartesian product (aka cross join) which is not what you would want in most situations. Again a guess as no details of actual problem data or error messages was provided.

Better way to do a multi-join in this SQL Query?

I am trying to pull data from a table (a.table) to join to another table (b.table). For me to do that, I need to join a third table (c.table) to reference between tables Plan_Code and Policy_Riders. Please see the code below
USE [CDS]
GO
SELECT riders.ExpiryDt--
,riders.TerminationDt
,[ModalPremium]--
,plan_code
FROM a.table as riders
JOIN c.table as policy
ON policy.Policy_Num = riders.Policy_Num
JOIN b.table AS plan_code
on policy.Plan_Code_ID = plan_code.Plan_Code_ID
WHERE plan_code.Plan_Code LIKE '%EIUL3%'
OR plan_code.Plan_Code LIKE '%LBIUL%'
OR plan_code.Plan_Code LIKE '%MEIUL3%'
GO
For me to get the field name Plan_code from b.table to my output, I need to join first a.table to c.table, then c.table to b.table. My question is that is there a better way to approach this query to join better between the three tables? Any help would be appreciated. Thank you!
First off, use a derived table for the filters:
...
JOIN (SELECT columns
FROM b.table
WHERE Plan_Code LIKE '%EIUL3%'
OR Plan_Code LIKE '%LBIUL%'
OR Plan_Code LIKE '%MEIUL3%'
) AS plan_code ON policy.Plan_Code_ID = plan_code.Plan_Code_ID
This will generally make sure those filters are applied against the smallest set of data, instead of after all the tables are joined. Another option would be to make the above a temp table, then join to it. Same concept, just helping the optimizer work efficiently. In smaller queries you'll see no difference, but in larger ones (especially those like this, with many filters from a single table) it will be night and day.
Second, your filters specifically. Using a LIKE with front and back wildcards %ex% is not good. It won't be able to use indexes. Use one ex% or the other %ex if possible.
Other than that, your joins are fine and are the correct approach to getting columns from each table.

Creating view ,SQL Query performance

I am trying to create view, But select statement from this view is taking more than 15 secs.How can i make it faster. My query for the view is below.
create view Summary as
select distinct A.Process_date,A.SN,A.New,A.Processing,
COUNT(case when B.type='Sold' and A.status='Processing' then 1 end) as Sold,
COUNT(case when B.type='Repaired' and A.status='Processing' then 1 end) as Repaired,
COUNT(case when B.type='Returned' and A.status='Processing' then 1 end) as Returned
from
(select distinct M.Process_date,M.SN,max(P.enter_date) as enter_date,M.status,
COUNT(case when M.status='New' then 1 end) as New,
COUNT(case when M.status='Processing' and P.cn is null then 1 end) as Processing
from DB1.dbo.Item_details M
left outer join DB2.dbo.track_data P on M.SN=P.SN
group by M.Process_date,M.SN,M.status) A
left outer join DB2.dbo.track_data B on A.SN=B.SN
where A.enter_date=B.enter_date or A.enter_date is null
group by A.Process_date,A.New,A.Processing,A.SN
After this view..my select query is
select distinct process_date,sum(New),sum(Processing),sum(sold),sum(repaired),sum(returned) from Summary where month(process_date)=03 and year(process_date)=2011
Please suggest me on what changes to be made for the query to perform faster.
Thank you
ARB
It is hard to give advices without seeing the actual data and the structure of the tables. I would rewrite the query keeping in mind these principles:
Use inner join instead of outer join if possible.
Get rid of case operator inside COUNT function. Build a query so you use conditions in WHERE section not in COUNT.
Try to not use aggregated values in GROUP BY. Currently you use aggregated values New and Processing for grouping. Use GROUP BY by existing table values if possible.
If the query gets too complicated, break it into smaller queries and combine results in the final query. Writing a store procedure may help in this case.
I hope this helps.
For tuning a database query, I shall add few items additional to what #Davyd has already listed:
Look at the tables and indexing on those tables. Putting the right index and avoiding the wrong ones always speed up the query.
Is there anything in the where condition that is not part of any index? At times we put index on a column and in the query we use a cast or convert on the column. So the underlying index is not effective. You may consider setting the index on the cast/convert of the column.
Look at the normal form conformity or over normalisation. 3.
Good luck.
If your are using Postgresql, I suggest you use a tool like "http://explain.depesz.com/" in order to see more clearly what part of your query is slow. Depending on what you get, you could either optimize your indexes, or rewrite part of your query. If your are using another database, I'm sure a similar tool exists.
If none of these ideas help, the final solution would be to create a "materialized query". There are plenty of infos on the web regarding this.
Good luck.

Avoiding a problematic nested query in MySQL

I have this SQL query which due to my own lack of knowledge and problem with mysql handling nested queries, is really slow to process. The query is...
SELECT DISTINCT PrintJobs.UserName
FROM PrintJobs
LEFT JOIN Printers
ON PrintJobs.PrinterName = Printers.PrinterName
WHERE Printers.PrinterGroup
IN (
SELECT DISTINCT Printers.PrinterGroup
FROM PrintJobs
LEFT JOIN Printers
ON PrintJobs.PrinterName = Printers.PrinterName
WHERE PrintJobs.UserName='<username/>'
);
I would like to avoid splitting this into two queries and inserting the values of the subquery into the main query progamatically.
This is probably not exactly what you are looking for however, i will contribute my 2 cents. First off you should show us your schema and exactly what you are trying to accomplish with that query. However from the looks of it you are not using numeric IDs in the table and are instead using varchar fields to join tables, this is not really a good idea performance wise. Also i am not sure why you are doing:
(select PrinterName, UserName
from PrintJobs) AS Table1
instead of just joining on PrintJobs? Similar stuff for this one:
(select
PrinterName,
PrinterGroup
from Printers) as Table1
Maybe i am just not seeing it right. I would recommend that you simplify the query as much as possible and try it. Also tell us what exactly you are hoping to accomplish with the query and give us some schema to work with.
Removed the bad query from the answer.
This query you have is pretty messed up, not sure if this will handle everything you need but simplifying like this kills all the nested queries and it way faster. You can also use the EXPLAIN command to know how mysql will fetch your query.
SELECT DISTINCT PrintJobs.UserName
FROM PrintJobs
LEFT JOIN Printers ON PrintJobs.PrinterName = Printers.PrinterName
AND Printers.Username = '<username/>'
;

Is there a better way to sort this query?

We generate a lot of SQL procedurally and SQL Server is killing us. Because of some issues documented elsewhere we basically do SELECT TOP 2 ** 32 instead of TOP 100 PERCENT.
Note: we must use the subqueries.
Here's our query:
SELECT * FROM (
SELECT [me].*, ROW_NUMBER() OVER( ORDER BY (SELECT(1)) )
AS rno__row__index FROM (
SELECT [me].[id], [me].[status] FROM (
SELECT TOP 4294967296 [me].[id], [me].[status] FROM
[PurchaseOrders] [me]
LEFT JOIN [POLineItems] [line_items]
ON [line_items].[id] = [me].[id]
WHERE ( [line_items].[part_id] = ? )
ORDER BY [me].[id] ASC
) [me]
) [me]
) rno_subq
WHERE rno__row__index BETWEEN 1 AND 25
Are there better ways to do this that anyone can see?
UPDATE: here is some clarification on the whole subquery issue:
The key word of my question is "procedurally". I need the ability to reliably encapsulate resultsets so that they can be stacked together like building blocks. For example I want to get the first 10 cds ordered by the name of the artist who produced them and also get the related artist for each cd.. What I do is assemble a monolithic subselect representing the cds ordered by the joined artist names, then apply a limit to it, and then join the nested subselects to the artist table and only then execute the resulting query. The isolation is necessary because the code that requests the ordered cds is unrelated and oblivious to the code selecting the top 10 cds which in turn is unrelated and oblivious to the code that requests the related artists.
Now you may say that I could move the inner ORDER BY into the OVER() clause, but then I break the encapsulation, as I would have to SELECT the columns of the joined table, so I can order by them later. An additional problem would be the merging of two tables under one alias; if I have identically named columns in both tables, the select me.* would stop right there with an ambiguous column name error.
I am willing to sacrifice a bit of the optimizer performance, but the 2**32 seems like too much of a hack to me. So I am looking for middle ground.
If you want top rows by me.id, just ask for that in the ROW_NUMBER's ORDER BY. Don't chase your tail around subqueries and TOP.
If you have a WHERE clause on a joined table field, you can have an outer JOIN. All the outer fields will be NULL and filtered out by the WHERE, so is effectively an inner join.
.
WITH cteRowNumbered AS (
SELECT [me].id, [me].status
ROW_NUMBER() OVER (ORDER BY me.id ASC) AS rno__row__index
FROM [PurchaseOrders] [me]
JOIN [POLineItems] [line_items] ON [line_items].[id] = [me].[id]
WHERE [line_items].[part_id] = ?)
SELECT me.id, me.status
FROM cteRowNumbered
WHERE rno__row__index BETWEEN 1 and 25
I use CTEs instead of subqueries just because I find them more readable.
Use:
SELECT x.*
FROM (SELECT po.id,
po.status,
ROW_NUMBER() OVER( ORDER BY po.id) AS rno__row__index
FROM [PurchaseOrders] po
JOIN [POLineItems] li ON li.id = po.id
WHERE li.pat_id = ?) x
WHERE x.rno__row__index BETWEEN 1 AND 25
ORDER BY po.id ASC
Unless you've omitted details in order to simplify the example, there's no need for all your subqueries in what you provided.
Kudos to the only person who saw through naysaying and actually tried the query on a large table we do not have access to. To all the rest saying this simply will not work (will return random rows) - we know what the manual says, and we know it is a hack - this is why we ask the question in the first place. However outright dismissing a query without even trying it is rather shallow. Can someone provide us with a real example (with preceeding CREATE/INSERT statements) demonstrating the above query malfunctioning?
Your update makes things much clearer. I think that the approach which you're using is seriously flawed. While it's nice to be able to have encapsulated, reusable code in your applications, front-end applications are a much different animal than a database. They typically deal with small structures and small, discrete process that run against those structures. Databases on the other hand often deal with tables that are measured in the millions of rows and sometimes more than that. Using the same methodologies will often result in code that simply performs so badly as to be unusable. Even if it works now, it's very likely that it won't scale and will cause major problems down the road.
Best of luck to you, but I don't think that this approach will end well in all but the smallest of databases.