I'm new to SQL Server Reporting Services, and was wondering the best way to do the following:
Query to get a list of popular IDs
Subquery on each item to get properties from another table
Ideally, the final report columns would look like this:
[ID] [property1] [property2] [SELECT COUNT(*)
FROM AnotherTable
WHERE ForeignID=ID]
There may be ways to construct a giant SQL query to do this all in one go, but I'd prefer to compartmentalize it. Is the recommended approach to write a VB function to perform the subquery for each row? Thanks for any help.
I would recommend using a SubReport. You would place the SubReport in a table cell.
Depending on how you want the output to look, a subreport could do, or you could group on ID, property1, property2 and show the items from your other table as detail items (assuming you want to show more than just count).
Something like
select t1.ID, t1.property1, t1.property2, t2.somecol, t2.someothercol
from table t1 left join anothertable t2 on t1.ID = t2.ID
#Carlton Jenke I think you will find an outer join a better performer than the correlated subquery in the example you gave. Remember that the subquery needs to be run for each row.
Simplest method is this:
select *,
(select count(*) from tbl2 t2 where t2.tbl1ID = t1.tbl1ID) as cnt
from tbl1 t1
here is a workable version (using table variables):
declare #tbl1 table
(
tbl1ID int,
prop1 varchar(1),
prop2 varchar(2)
)
declare #tbl2 table
(
tbl2ID int,
tbl1ID int
)
select *,
(select count(*) from #tbl2 t2 where t2.tbl1ID = t1.tbl1ID) as cnt
from #tbl1 t1
Obviously this is just a raw example - standard rules apply like don't select *, etc ...
UPDATE from Aug 21 '08 at 21:27:
#AlexCuse - Yes, totally agree on the performance.
I started to write it with the outer join, but then saw in his sample output the count and thought that was what he wanted, and the count would not return correctly if the tables are outer joined. Not to mention that joins can cause your records to be multiplied (1 entry from tbl1 that matches 2 entries in tbl2 = 2 returns) which can be unintended.
So I guess it really boils down to the specifics on what your query needs to return.
UPDATE from Aug 21 '08 at 22:07:
To answer the other parts of your question - is a VB function the way to go? No. Absolutely not. Not for something this simple.
Functions are very bad on performance, each row in the return set executes the function.
If you want to "compartmentalize" the different parts of the query you have to approach it more like a stored procedure. Build a temp table, do part of the query and insert the results into the table, then do any further queries you need and update the original temp table (or insert into more temp tables).
Related
I am quite a newbie with SQL queries but I need to modify a column of a table relatively to the column of another table. For now I have the following query working:
UPDATE table1
SET date1=(
SELECT last_day(max(date2))+1
FROM table2
WHERE id=123
)
WHERE id=123
AND date1=to_date('31/12/9999', 'dd/mm/yyyy');
The problem with this structure is that, I suppose, the SELECT query will be executed for every line of the table1. So I tried to create another query but this one has a syntax error somewhere after the FROM keyword:
UPDATE t1
SET t1.date1=last_day(max(t2.date2))+1
FROM table1 t1
INNER JOIN table2 t2
ON t1.id=t2.id
WHERE t1.id=123
AND t1.date1=to_date('31/12/9999', 'dd/mm/yyyy');
AND besides that I don't even know if this one is faster than the first one...
Do you have any idea how I can handle this issue?
Thanks a lot!
Kind regards,
Julien
The first code you wrote is fine. It won't be executed for every line of the table1 as you fear. It will do the following:
it will run the subquery to find a value you want to use in your UPDATE statement, searching through table2, but as you have stated the exact id from
the table, it should be as fast as possible, as long as you have
created an index on that (I guess a primary key) column
it will run the outer query, finding the single row you want to update. As before, it should be as fast as possible as you have stated the exact id, as long as there is an index on that column
To summarize, If those ID's are unique, both your subquery and your query should return only one row and it should execute as fast as possible. If you think that execution is not fast enough (at least that it takes longer than the amount of data would justify) check if those columns have unique values and if they have unique indexes on them.
In fact, it would be best to add those indexes regardless of this problem, if they do not exist and if these columns have unique values, as it would drastically improve all of the performances on these tables that search through these id columns.
Please try to use MERGE
MERGE INTO (
SELECT id,
date1
FROM table1
WHERE date1 = to_date('31/12/9999', 'dd/mm/yyyy')
AND id = 123
) t1
USING (
SELECT id,
last_day(max(date2))+1 max_date
FROM table2
WHERE id=123
GROUP BY id
) t2 ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET t1.date1 = t2.max_date
;
I have five results to retrieve from a table and I want to write a store procedure that will return all desired rows.
I can write the query like that temporarily:
Select * from Table where Id = 1 OR Id = 2 or Id = 3
I supposed I need to receive a list of Ids to split, but how do I write the WHERE clause?
So, if you're just trying to learn SQL, this is a short and good example to get to know the IN operator. The following query has the same result as your attempt.
SELECT *
FROM TABLE
WHERE ID IN (SELECT ID FROM TALBE2)
This translates into what is your attempt. And judging by your attempt, this might be the simplest version for you to understand. Although, in the future I would recommend using a JOIN.
A JOIN has the same functionality as the previous code, but will be a better alternative. If you are curious to read more about JOINs, here are a few links from the most important sources
Joins - wikipedia
and also a visual representation of how different types of JOIN work
Another way to do it. The inner join will only include rows from T1 that match up with a row from T2 via the Id field.
select T1.* from T1 inner join T2 on T1.Id = T2.Id
In practice, inner joins are usually preferable to subqueries for performance reasons.
I have the following tables
nid timestamp title
82 1245157883 Home
61 1245100302 Minutes
132 1245097268 Sample Form
95 1245096985 Goals & Objectives
99 1245096952 Members
AND
pid src dst language
70 node/82 department/34-section-2
45 node/61/feed department/22-section-2/feed
26 node/15 department/department1/15-department1
303 node/101 department/101-section-4
These are fragments of the tables, and is missing the rest of the data (they are both quite large), but I am trying to join the dst column from the second table into the first one. They should match up on their "nid", but the second table has node/[nid] which makes this more complicated. I also want to ignore the ones that end in "feed" since they are not needed for what I am doing.
Much thanks
EDIT: I feel bad for not mentioning this, but the first table is an sql result from
select nid, MAX(timestamp) as timestamp, title from node_revisions group by nid ORDER BY timestamp DESC LIMIT 0,5
The second table has the name "url_alias"
try
select * from table1 inner join table2 on src=concat('node/',nid)
Edit
edited to reflect change in OP
select `nid`, MAX(`timestamp`) as `timestamp`, `title` from `node_revisions` inner join `url_alias` on `src`=concat('node/',`nid`) group by `nid` ORDER BY `timestamp` DESC LIMIT 0,5
I don't know what database you are using. However, I suggest you write a parsing function that returns the nid from that column. Then, you can have this kind of query (assuming GET_NID is the function you defined):
SELECT * from T1, T2
WHERE T1.nid = GET_NID( T2.node)
You have a few options.
write a function that converts src to an nid and join on t1.nid = f(t2.src) -- you didn't say what DBMS you use, but most have a way to do that. It will be slow, but that depends on how big the tables are.
Similar to that, make a view that has a computed field using that function -- same speed, but might be easier to understand.
Create a new nid field in t2 and use the function to populate it. Make insert and update triggers to keep it up to date, then join on that. This is better if you query this frequently.
Convert t2 so that it has a nid field and compute the src from that and another field that is a template that the nid needs to be inserted into.
I'd pull the node id in the second table into a separate column. Otherwise any attempt to join the two tables will result in a table scan with some processing on the src field (I assume you meant the src field and not the dst field) and performance will be problematic.
SELECT *
FROM (SELECT *, 'node/' + nid AS src FROM table1) t1
INNER JOIN table2 t2
ON t1.src = t2.src
You haven't specified with DBMS are you using. Most engines support the SQL-99 standard SIMILAR TO clause which is using regular expression for matching. Some engines also implement this, but use some other keywords instead of SIMILAR TO.
FirebirdSQL:
http://wiki.firebirdsql.org/wiki/index.php?page=SIMILAR+TO
PostgreSQL:
http://www.network-theory.co.uk/docs/postgresql/vol1/SIMILARTORegularExpressions.html
MySQL:
http://dev.mysql.com/doc/refman/5.0/en/regexp.html
Depending on the scenario you want to this for (if for example you are regularly going to be performing this JOIN and your 2nd table is rather large) you may want to look into a Materialized View.
Write a function that performs all the logic to extract the nid into a separate column. Aside from initial m-view creation, the function will only need to run when the basetable changes (insert, update, delete) compared to running the function against every row each time you query.
This allows a fairly simple join to the materialized view with standard benefits of tables such as Indexing.
NB: looks like I was beaten to it while writing :)
I got a query with five joins on some rather large tables (largest table is 10 mil. records), and I want to know if rows exists. So far I've done this to check if rows exists:
SELECT TOP 1 tbl.Id
FROM table tbl
INNER JOIN ... ON ... = ... (x5)
WHERE tbl.xxx = ...
Using this query, in a stored procedure takes 22 seconds and I would like it to be close to "instant". Is this even possible? What can I do to speed it up?
I got indexes on the fields that I'm joining on and the fields in the WHERE clause.
Any ideas?
switch to EXISTS predicate. In general I have found it to be faster than selecting top 1 etc.
So you could write like this IF EXISTS (SELECT * FROM table tbl INNER JOIN table tbl2 .. do your stuff
Depending on your RDBMS you can check what parts of the query are taking a long time and which indexes are being used (so you can know they're being used properly).
In MSSQL, you can use see a diagram of the execution path of any query you submit.
In Oracle and MySQL you can use the EXPLAIN keyword to get details about how the query is working.
But it might just be that 22 seconds is the best you can do with your query. We can't answer that, only the execution details provided by your RDBMS can. If you tell us which RDBMS you're using we can tell you how to find the information you need to see what the bottleneck is.
4 options
Try COUNT(*) in place of TOP 1 tbl.id
An index per column may not be good enough: you may need to use composite indexes
Are you on SQL Server 2005? If som, you can find missing indexes. Or try the database tuning advisor
Also, it's possible that you don't need 5 joins.
Assuming parent-child-grandchild etc, then grandchild rows can't exist without the parent rows (assuming you have foreign keys)
So your query could become
SELECT TOP 1
tbl.Id --or count(*)
FROM
grandchildtable tbl
INNER JOIN
anothertable ON ... = ...
WHERE
tbl.xxx = ...
Try EXISTS.
For either for 5 tables or for assumed heirarchy
SELECT TOP 1 --or count(*)
tbl.Id
FROM
grandchildtable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
-- or
SELECT TOP 1 --or count(*)
tbl.Id
FROM
mytable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
AND
EXISTS (SELECT *
FROM
yetanothertable T3
WHERE
tbl.key = T3.key /* AND T3 condition*/)
Doing a filter early on your first select will help if you can do it; as you filter the data in the first instance all the joins will join on reduced data.
Select top 1 tbl.id
From
(
Select top 1 * from
table tbl1
Where Key = Key
) tbl1
inner join ...
After that you will likely need to provide more of the query to understand how it works.
Maybe you could offload/cache this fact-finding mission. Like if it doesn't need to be done dynamically or at runtime, just cache the result into a much smaller table and then query that. Also, make sure all the tables you're querying to have the appropriate clustered index. Granted you may be using these tables for other types of queries, but for the absolute fastest way to go, you can tune all your clustered indexes for this one query.
Edit: Yes, what other people said. Measure, measure, measure! Your query plan estimate can show you what your bottleneck is.
Use the maximun row table first in every join and if more than one condition use
in where then sequence of the where is condition is important use the condition
which give you maximum rows.
use filters very carefully for optimizing Query.
can anybody explain me why this query takes 13 seconds:
SELECT Table1.Location, Table2.SID, Table2.CID, Table1.VID, COUNT(*)
FROM Table1 INNER JOIN
Table2 AS ON Table1.TID = Table2.TID
WHERE Table1.Last = Table2.Last
GROUP BY Table1.Location, Table2.SID, Table2.CID, Table1.VID
And this one only 1 second:
DECLARE #Test TABLE (Location INT, SID INT, CID INT, VID INT)
INSERT INTO #Test
SELECT Table1.Location, Table2.SID, Table2.CID, Table1.VID
FROM Table1 INNER JOIN
Table2 AS ON Table1.TID = Table2.TID
WHERE Table1.Last = Table2.Last
SELECT Location, SID, CID, VID, COUNT(*)
FROM #Test
GROUP BY Location, SID, CID, VID
When I remove the GROUP BY from the first query it needs only 1 second too. I also try to write a subselect and group the result, but it takes 13 seconds too. I don't unterstand this.
Compare the execution plans for the two queries.
The GROUP BY may perform better if you have an INDEX on each of the columns in the GROUP BY (either one each, or a combined single index of all the columns)
The reason your temporary version works better is probably because the GROUP BY is being performed on a much smaller subset of the data and therefore it is fast even without the index.
Your temp table method is by no means the wrong way of doing it. It is one of those situations where you weigh up the pro's and con's of each method. An index on the main table may slow up your inserts / updates and increase your database size. The temp table however may not perform adequately once the data size increases over time.
Could be the that in first query you group and count on larger query result than in second query where you work with already smaller dataset. Could be indexing problem. Once you fix it don't forget to re-check with larger result set because "worser" query can perform better with larger datasets.