How can I improve this SQL query? - sql

I'm checking for existing of a row in in_fmd, and the ISBN I look up can be the ISBN parameter, or another ISBN in a cross-number table that may or may not have a row.
select count(*)
from in_fmd i
where (description='GN')
and ( i.isbn in
(
select bwi_isbn from bw_isbn where orig_isbn = ?
union all
select cast(? as varchar) as isbn
)
)
I don't actually care about the count of the rows, but rather mere existence of at least one row.
This used to be three separate queries, and I squashed it into one, but I think there's room for more improvement. It's PostgreSQL 8.1, if it matters.

Why bother with the UNION ALL
select count(*)
from in_fmd i
where (description='GN')
and (
i.isbn in (
select bwi_isbn from bw_isbn where orig_isbn = ?
)
or i.isbn = cast(? as varchar)
)
I would probably use a LEFT JOIN-style query instead of the IN, but that's more personal preference:
select count(*)
from in_fmd i
left join bw_isbn
on bw_isbn.bwi_isbn = i.isbn
and bw_isbn.orig_isbn = ?
where (i.description='GN')
and (
bw_isbn.bwi_isbn is not null
or i.isbn = cast(? as varchar)
)
The inversion discussed over IM:
SELECT SUM(ct)
FROM (
select count(*) as ct
from in_fmd i
inner join bw_isbn
on bw_isbn.bwi_isbn = i.isbn
and bw_isbn.orig_isbn = ?
and i.isbn <> cast(? as varchar)
and i.description = 'GN'
UNION
select count(*) as ct
from in_fmd i
where i.isbn = cast(? as varchar)
and i.description = 'GN'
) AS x

I don't actually care about the count of the rows, but rather mere existence of at least one row.
Then how about querying SELECT ... LIMIT 1 and checking in the calling program whether you get one result row or not?

apart from what other posters have noted , just changing
select count(*)
to
exists(..)
would improve things quite a bit

SELECT SUM(ct)
FROM (select count(*) as ct
from in_fmd i
inner join bw_isbn
on bw_isbn.bwi_isbn = i.isbn
and bw_isbn.orig_isbn = ?
and i.isbn <> cast(? as varchar)
and i.description = 'GN'
UNION
select count(*) as ct
from in_fmd i
where i.isbn = cast(? as varchar)
and i.description = 'GN'
) AS x

select count(*)
from in_fmd i
where description = 'GN'
and exists (select 1
from bwi_isbn
where bw_isbn.bwi_isbn = in_fmd.isbn)

Related

Get the latest value for each couple (name,address)

I have the following table for which I need to get the latest values for the group ( name , address ).
This means I need to have the max(time) for each ( name, address ) couple and get the last_read for that time value.
How can I do that in one select query ?
I think below should work for you, you need to use GROUP BY and MAX
select max([time]),
[name],
[address]
from yourTable
group by [name],
[address]
EDIT
You can try other answers or check below, which uses temp table
select max([time]) as last_read_time,
[name],
[address]
into #temp_table
from yourTable
group by [name],
[address]
select last_read
from yourTable as yt
inner join #temp_table t
on yt.[name] = t.[name]
and yt.[time] = t.last_read_time
IF OBJECT_ID('tempdb..#temp_table', 'U') IS NOT NULL
In most databases, a correlated subquery is a good approach:
select t.*
from t
where t.time = (select max(t2.time)
from t t2
where t2.name = t.name and t2.address = t.address
);
In particular, this can take advantage of an index on (name, address, time).
you can use row_number() if your dbms support
select * from
(select *,row_number() over(partition by name,address order by time desc) rn
) t where t.rn=1
None of the above solutions worked.
I finally found the following query that works :
select
l2.MAX t,
l.name,
l.address,
l.last_read
from links_data l
inner join (select name,
address,
max(time) as "MAX"
from links_data
group by name, address) l2
on l.name = l2.name
and l.address = l2.address
and l.time = l2.MAX;
Thanks everyone.

How to optimize this UNION SQL Query?

I have a query which looks like this:
SELECT DISTINCT * FROM (
SELECT FundId AS Id,
PeriodYearMonth
FROM [Fund.Period] F
INNER JOIN (
SELECT * FROM (
SELECT FundId as Id,
MIN(PeriodYearMonth) AS MinPeriodYearMonth
FROM (
SELECT FundId,
PeriodYearMonth,
PublishedOn
FROM [Fund.Period] FP
UNION ALL --Changed to UNION ALL as it is more efficient and we wont ever need a UNION as the result set would never match
SELECT FundId,
MAX(PeriodYearMonth) + 1,
NULL
FROM [Fund.Period]
GROUP BY FundId
) FP WHERE PublishedOn IS NULL GROUP BY FundId
) MFP
) FP ON F.FundId = FP.Id AND (F.PeriodYearMonth = FP.MinPeriodYearMonth OR (f.PeriodYearMonth +1) = FP.MinPeriodYearMonth)
) FP
If possible, I would like to remove the UNION ALL. Does anyone know how this can be optimized?
removed union all
SELECT DISTINCT * FROM (
SELECT FundId AS Id,
PeriodYearMonth
FROM [Fund.Period] F
INNER JOIN (
SELECT * FROM (
SELECT FundId as Id,
MIN(PeriodYearMonth) AS MinPeriodYearMonth,MAX(PeriodYearMonth) + 1 as PeriodYearMonth
FROM (
SELECT FundId,
PeriodYearMonth,
PublishedOn
FROM [Fund.Period] FP
) FP WHERE PublishedOn IS NULL
GROUP BY FundId
) MFP
) FP ON F.FundId = FP.Id AND (F.PeriodYearMonth = FP.MinPeriodYearMonth OR (f.PeriodYearMonth +1) = FP.MinPeriodYearMonth)
) FP

Why is Oracle REPLACE function not working for this string?

We have a pattern we use all the time and this is usually pretty straightforward.
sortOrder IN VARCHAR2 := 'Title'
query VARCHAR2(32767) := q'[
SELECT
Columns
FROM tables
ORDER BY {sortOrder}
]';
query := REPLACE(query, '{sortOrder}', sortOrder);
But for this string it is not working:
WITH appl_List
AS
(
SELECT DISTINCT
appls.admin_phs_ORG_code || TO_CHAR(appls.serial_num, 'FM000000') AS core_proj_number,
appls.Appl_ID
FROM TABLE(:portfolioTable) appls
),
g1SupportingProjCount AS
(
SELECT
gen1grants.silverchair_id,
COUNT(DISTINCT al.Appl_ID) AS ApplCount
FROM
appl_List al
JOIN cg_cited_reference_gen1_grant gen1grants
ON al.core_proj_number = gen1grants.ic_serial_num
JOIN cg_cited_reference_gen1 gen1refs
ON gen1grants.silverchair_id = gen1refs.silverchair_id
GROUP BY gen1grants.Silverchair_id
),
g1SupportedPubCount AS
(
SELECT
gen1grants.silverchair_id,
COUNT(DISTINCT gen1refs.gen1_wos_uid) AS PubCount
FROM
appl_List al
JOIN cg_cited_reference_gen1_grant gen1grants
ON al.core_proj_number = gen1grants.ic_serial_num
JOIN cg_cited_reference_gen1 gen1refs
ON gen1grants.silverchair_id = gen1refs.silverchair_id
GROUP BY gen1grants.Silverchair_id
),
g2SupportingProjCount AS
(
SELECT
gen2grants.silverchair_id,
COUNT(DISTINCT al.Appl_ID) AS ApplCount
FROM
appl_List al
JOIN cg_cited_reference_gen2_grant gen2grants
ON al.core_proj_number = gen2grants.ic_serial_num
JOIN cg_cited_reference_gen2 gen2refs
ON gen2grants.silverchair_id = gen2refs.silverchair_id
GROUP BY gen2grants.Silverchair_id
),
g2SupportedPubCount AS
(
SELECT
gen2grants.silverchair_id,
COUNT(DISTINCT gen2refs.gen2_wos_uid) AS PubCount
FROM
appl_List al
JOIN cg_cited_reference_gen2_grant gen2grants
ON al.core_proj_number = gen2grants.ic_serial_num
JOIN cg_cited_reference_gen2 gen2refs
ON gen2grants.silverchair_id = gen2refs.silverchair_id
GROUP BY gen2grants.Silverchair_id
),
portfolio_cg_ids AS
(
SELECT DISTINCT md.silverchair_id
FROM
(
SELECT silverchair_id
FROM cg_cited_reference_gen1_grant gen1Grants
JOIN Appl_List appls
ON appls.core_proj_number = gen1Grants.ic_serial_num
UNION
SELECT silverchair_id
FROM cg_cited_reference_gen2_grant gen2Grants
JOIN Appl_List appls
ON appls.core_proj_number = gen2Grants.ic_serial_num
) grant_sc_ids
JOIN cg_metadata md
ON grant_sc_ids.silverchair_id = md.silverchair_id
)
SELECT
silverchairId,
TITLE,
PMID,
PMCID,
publication_year as year,
referenceCount1Gen,
supportingProjectCount1Gen,
supportedPublicationCount1Gen,
referenceCount2Gen,
supportingProjectCount2Gen,
supportedPublicationCount2Gen,
COUNT(1) OVER() as TotalCount
FROM
(
SELECT
md.SILVERCHAIR_ID silverchairId,
md.TITLE,
md.PMID,
md.PMCID ,
md.publication_year as year,
g1RefCounts.referenceCount1Gen as referenceCount1Gen,
g1SupportingProjCount.ApplCount as supportingProjectCount1Gen,
g1SupportedPubCount.PubCount as supportedPublicationCount1Gen,
g2RefCounts.referenceCount2Gen as referenceCount2Gen,
g2SupportingProjCount.ApplCount as supportingProjectCount2Gen,
g2SupportedPubCount.PubCount as supportedPublicationCount2Gen,
--COUNT(1) OVER() as TotalCount
FROM cg_metadata md
-- BEGIN datascope to current portfolio
JOIN portfolio_cg_ids
ON portfolio_cg_ids.silverchair_id = md.silverchair_id
-- END datascope to current portfolio
LEFT JOIN g1SupportingProjCount
ON g1SupportingProjCount.Silverchair_id = md.silverchair_id
LEFT JOIN g2SupportingProjCount
ON g2SupportingProjCount.Silverchair_id = md.silverchair_id
LEFT JOIN g1SupportedPubCount
ON g1SupportedPubCount.Silverchair_id = md.silverchair_id
LEFT JOIN g2SupportedPubCount
ON g2SupportedPubCount.Silverchair_id = md.silverchair_id
OUTER APPLY
(
Select Count(*) as referenceCount1Gen
FROM cg_cited_reference_gen1 g1Refs
WHERE g1Refs.silverchair_id = md.silverchair_id
) g1RefCounts
OUTER APPLY
(
Select Count(*) as referenceCount2Gen
FROM cg_cited_reference_gen2 g2Refs
WHERE g2Refs.silverchair_id = md.silverchair_id
) g2RefCounts
) results
ORDER BY {sortOrder}
Are there cases where some kind of special char in the string can keep this from working?
I'm kind of perplexed. I've been using this pattern for like 3 years and I've never had this not work.
What could be breaking this?
The query has 4000+ characters.
The text is probably being truncated somewhere down the line.

How to optimize the sql query

This query takes dynamic input in the place of cg.ownerid IN (294777,228649 ,188464).when the input increases in the IN condition the query is taking too much time to execute. Please suggest me a way to optimize it.
For example, the below query is taking 4 seconds, if I reduce the list to just IN(188464) its just taking 1 second.
SELECT *
FROM
(SELECT *,
Row_number() OVER(
ORDER BY datecreated DESC) AS rownum
FROM
(SELECT DISTINCT c.itemid,
(CASE WHEN (Isnull(c.password, '') <> '') THEN 1 ELSE 0 END) AS password,
c.title,
c.encoderid,
c.type,
(CASE WHEN c.author = 'education' THEN 'Discovery' ELSE c.type END) AS TYPE,
c.publisher,
c.description,
c.author,
c.duration,
c.copyright,
c.rating,
c.userid,
Stuff(
(SELECT DISTINCT ' ' + NAME AS [text()]
FROM firsttable SUB
LEFT JOIN secondtable AS rgc ON thirdtable = rgc.id
WHERE SUB.itemid = c.itemid
FOR xml path('')), 1, 1, '')AS [Sub_Categories]
FROM fourthtable AS cg
LEFT JOIN item AS c ON c.itemid = cg.itemid
WHERE Isnull(title, '') <> ''
AND c.active = '1'
AND c.systemid = '20'
AND cg.ownerid IN (294777,
228649,
188464)) AS a) AS b
WHERE rownum BETWEEN 1 AND 32
ORDER BY datecreated DESC
As I haven't further information, I just would suggest a first change of your where clause. They should be moved to a subquery as you left join those columns.
SELECT *
FROM(
SELECT *,
Row_number() OVER(
ORDER BY datecreated DESC) AS rownum
FROM
(SELECT DISTINCT c.itemid,
(CASE WHEN (Isnull(c.password, '') <> '') THEN 1 ELSE 0 END) AS password,
c.title,
c.encoderid,
c.type,
(CASE WHEN c.author = 'education' THEN 'Discovery' ELSE c.type END) AS TYPE,
c.publisher,
c.description,
c.author,
c.duration,
c.copyright,
c.rating,
c.userid,
Stuff(
(
SELECT DISTINCT ' ' + NAME AS [text()]
FROM firsttable SUB
LEFT JOIN secondtable AS rgc ON thirdtable = rgc.id
WHERE SUB.itemid = c.itemid
FOR xml path('')
), 1, 1, ''
) AS [Sub_Categories]
FROM (
SELECT cg.itemid
FROM fourthtable as cg
WHERE cg.ownerid IN (294777,228649, 188464)
) AS cg
LEFT JOIN (
SELECT DISTINCT c.itemid, c.type, c.author, c.title, c.encoderid, c.type, c.publisher, c.description, c.author, c.duration, c.copyright, c.rating,c.userid
FROM item as c
WHERE Isnull(c.title, '') <> ''
AND c.active = '1'
AND c.systemid = '20'
) AS c
ON c.itemid = cg.itemid
) AS a
) AS b
WHERE rownum BETWEEN 1 AND 32
ORDER BY datecreated DESC
But not quite sure if everything is connected right away, your missing some aliases which makes it hard for me to get through your query. But I thing you'll get my idea. :-)
With this little information it's impossible to give any specific ideas, but the normal general things apply:
Turn on statistics io and check what's causing most of the logical I/O and try to solve that
Look at the actual plan and check if there's something that doesn't look ok, for example:
Clustered index / table scans (new index could solve this)
Key lookups with a huge amount of rows (adding more columns to index could solve this, either as normal or included fields)
Spools (new index could solve this)
Big difference between estimated and actual number of rows (10x, 100x and so on)
To give any better hints you should really include the actual plan, table / index structure at least on the essential parts and tell what is too much time (seconds, minutes, hours?)

why are the results of the two queries different

the first query:
SELECT u.id , prop1.id
FROM ( SELECT '9fbc6e9b59504c08ac643752c1e2d033' AS id ,
'|6813dbbfec6241a19b8d2316d2cb2a65,1|' AS customprop
UNION
SELECT 'f2271c45682f45fc84527c4afff0ab16' AS id ,
'|6813dbbfec6241a19b8d2316d2cb2a65,2|' AS customprop
) u
INNER JOIN ( SELECT ROW_NUMBER() OVER ( ORDER BY a.Id ) id ,
A.Id propId ,
B.NAME
FROM ( SELECT '6813dbbfec6241a19b8d2316d2cb2a65' AS id ,
CONVERT(XML, '<v>1,职业资格1</v><v>2,职业资格2</v>') AS value
) A
OUTER APPLY ( SELECT Name = N.v.value('.',
'nvarchar(Max)')
FROM A.[VALUE].nodes('/v') N ( v )
) B
) prop1 ON CHARINDEX('|' + prop1.propid + ','
+ CONVERT(NVARCHAR(10), prop1.id)
+ '|', u.customprop) > 0
GROUP BY u.id ,
prop1.id
the second query:
SELECT u.id ,prop1.id, count(*)
FROM ( SELECT '9fbc6e9b59504c08ac643752c1e2d033' AS id ,
'|6813dbbfec6241a19b8d2316d2cb2a65,1|' AS customprop
UNION
SELECT 'f2271c45682f45fc84527c4afff0ab16' AS id ,
'|6813dbbfec6241a19b8d2316d2cb2a65,2|' AS customprop
) u
INNER JOIN ( SELECT ROW_NUMBER() OVER ( ORDER BY a.Id ) id ,
A.Id propId ,
B.NAME
FROM ( SELECT '6813dbbfec6241a19b8d2316d2cb2a65' AS id ,
CONVERT(XML, '<v>1,职业资格1</v><v>2,职业资格2</v>') AS value
) A
OUTER APPLY ( SELECT Name = N.v.value('.',
'nvarchar(Max)')
FROM A.[VALUE].nodes('/v') N ( v )
) B
) prop1 ON CHARINDEX('|' + prop1.propid + ','
+ CONVERT(NVARCHAR(10), prop1.id)
+ '|', u.customprop) > 0
GROUP BY u.id ,
prop1.id
sql can be executed on sqlserver 2005 directly.
the first query can produce one item and the second query produce two items.
I think that the two queries should both produce two items.
I have thouht for three days and I really want to konw why.
I'm a Chinese and my English is poor.I hope you can understand my description
Tough question, but the problem is with this line:
INNER JOIN ( SELECT ROW_NUMBER() OVER ( ORDER BY a.Id ) id ,
The ORDER BY is ambiguous and consequently, if it is executed multiple times (which it can be because of the INNER JOIN it is contained in), it may not always return the same ordering/assignment. This can cause a latter join condition to only match on one record instead of two, which is what happens in the query plan being used for the version without the count(*) column.
To fix this, you just need to add something to make the ordering assignment unique, like this:
INNER JOIN ( SELECT ROW_NUMBER() OVER ( ORDER BY a.Id, B.Name ASC ) id ,
Try it like this, it should work.
Your problem is with the ORDER BY clause of the ROW_NUMBER - since the a.ID is not unique the outcome is unpredictable. Make that unique and your problem will go away - you can use something like
...SELECT ROW_NUMBER() OVER ( ORDER BY newid() ) id...