sql join question - sql

I have the following tables
nid timestamp title
82 1245157883 Home
61 1245100302 Minutes
132 1245097268 Sample Form
95 1245096985 Goals & Objectives
99 1245096952 Members
AND
pid src dst language
70 node/82 department/34-section-2
45 node/61/feed department/22-section-2/feed
26 node/15 department/department1/15-department1
303 node/101 department/101-section-4
These are fragments of the tables, and is missing the rest of the data (they are both quite large), but I am trying to join the dst column from the second table into the first one. They should match up on their "nid", but the second table has node/[nid] which makes this more complicated. I also want to ignore the ones that end in "feed" since they are not needed for what I am doing.
Much thanks
EDIT: I feel bad for not mentioning this, but the first table is an sql result from
select nid, MAX(timestamp) as timestamp, title from node_revisions group by nid ORDER BY timestamp DESC LIMIT 0,5
The second table has the name "url_alias"

try
select * from table1 inner join table2 on src=concat('node/',nid)
Edit
edited to reflect change in OP
select `nid`, MAX(`timestamp`) as `timestamp`, `title` from `node_revisions` inner join `url_alias` on `src`=concat('node/',`nid`) group by `nid` ORDER BY `timestamp` DESC LIMIT 0,5

I don't know what database you are using. However, I suggest you write a parsing function that returns the nid from that column. Then, you can have this kind of query (assuming GET_NID is the function you defined):
SELECT * from T1, T2
WHERE T1.nid = GET_NID( T2.node)

You have a few options.
write a function that converts src to an nid and join on t1.nid = f(t2.src) -- you didn't say what DBMS you use, but most have a way to do that. It will be slow, but that depends on how big the tables are.
Similar to that, make a view that has a computed field using that function -- same speed, but might be easier to understand.
Create a new nid field in t2 and use the function to populate it. Make insert and update triggers to keep it up to date, then join on that. This is better if you query this frequently.
Convert t2 so that it has a nid field and compute the src from that and another field that is a template that the nid needs to be inserted into.

I'd pull the node id in the second table into a separate column. Otherwise any attempt to join the two tables will result in a table scan with some processing on the src field (I assume you meant the src field and not the dst field) and performance will be problematic.

SELECT *
FROM (SELECT *, 'node/' + nid AS src FROM table1) t1
INNER JOIN table2 t2
ON t1.src = t2.src

You haven't specified with DBMS are you using. Most engines support the SQL-99 standard SIMILAR TO clause which is using regular expression for matching. Some engines also implement this, but use some other keywords instead of SIMILAR TO.
FirebirdSQL:
http://wiki.firebirdsql.org/wiki/index.php?page=SIMILAR+TO
PostgreSQL:
http://www.network-theory.co.uk/docs/postgresql/vol1/SIMILARTORegularExpressions.html
MySQL:
http://dev.mysql.com/doc/refman/5.0/en/regexp.html

Depending on the scenario you want to this for (if for example you are regularly going to be performing this JOIN and your 2nd table is rather large) you may want to look into a Materialized View.
Write a function that performs all the logic to extract the nid into a separate column. Aside from initial m-view creation, the function will only need to run when the basetable changes (insert, update, delete) compared to running the function against every row each time you query.
This allows a fairly simple join to the materialized view with standard benefits of tables such as Indexing.
NB: looks like I was beaten to it while writing :)

Related

How do I put multiple criteria for a column in a where clause?

I have five results to retrieve from a table and I want to write a store procedure that will return all desired rows.
I can write the query like that temporarily:
Select * from Table where Id = 1 OR Id = 2 or Id = 3
I supposed I need to receive a list of Ids to split, but how do I write the WHERE clause?
So, if you're just trying to learn SQL, this is a short and good example to get to know the IN operator. The following query has the same result as your attempt.
SELECT *
FROM TABLE
WHERE ID IN (SELECT ID FROM TALBE2)
This translates into what is your attempt. And judging by your attempt, this might be the simplest version for you to understand. Although, in the future I would recommend using a JOIN.
A JOIN has the same functionality as the previous code, but will be a better alternative. If you are curious to read more about JOINs, here are a few links from the most important sources
Joins - wikipedia
and also a visual representation of how different types of JOIN work
Another way to do it. The inner join will only include rows from T1 that match up with a row from T2 via the Id field.
select T1.* from T1 inner join T2 on T1.Id = T2.Id
In practice, inner joins are usually preferable to subqueries for performance reasons.

Is there some equivalent to subquery correlation when making a derived table?

I need to flatten out 2 rows in a vertical table (and then join to a third table) I generally do this by making a derived table for each field I need. There's only two fields, I figure this isn't that unreasonable.
But I know that the rows I want back in the derived table, are the subset that's in my join with my third table.
So I'm trying to figure out the best derived tables to make so that the query runs most efficiently.
I figure the more restrictive I make the derived table's where clause, the smaller the derived table will be, the better response I'll get.
Really what I want is to correlate the where clause of the derived table with the join with the 3rd table, but you can't do that in sql, which is too bad. But I'm no sql master, maybe there's some trick I don't know about.
The other option is just to make the derived table(s) with no where clause and it just ends up joining the entire table twice (once for each field), and when I do my join against them the join filters every thing out.
So really what I'm asking I guess is what's the best way to make a derived table where I know pretty much specifically what rows I want, but sql won't let me get at them.
An example:
table1
------
id tag value
-- ----- -----
1 first john
1 last smith
2 first sally
2 last smithers
table2
------
id occupation
-- ----------
1 carpenter
2 homemaker
select table2.occupation, firsttable.first, lasttable.last from
table2, (select value as first from table1 where tag = 'first') firsttable,
(select value as last from table1 where tag = 'last') lasttable
where table2.id = firsttable.id and table2.id = lasttable.id
What I want to do is make the firsttable where clause where tag='first' and id = table2.id
DERIVED tables are not to store the intermediate results as you expect. These are just a way to make code simpler. Using derived table doesnt mean that the derived table expression will be executed first and output of that will be used to join with remaining tables.Optimizer will automaticaly faltten derived tables in most of the cases.
However,There are cases where the optimizer might want to store the results of the subquery and thus materilize instead of flattening.It usually happens when you have some kind of aggregate functions or like that.But in your case the query is too simple and thus optimizer will flatten query
Also,storing derived table expression wont make your query fast it will in turn could make it worse.Your real problem is too much normalization.Fix that query will be just a join of two tables.
Why you have this kind of normalization?Why you are storing col values as rows.Try to denormalize table1 so that it has two columns first and last.That will be best solution for this.
Also, do you have proper indexes on id and tag column? if yes then a merge join is quite good for your query.
Please provide index details on these tables and the plan generated by your query.
Your query will be used like an inner join query.
select table2.occupation, first.valkue as first, last.value as last
from
table2
inner join table1 first
on first.tag = 'first'
and first.id =table2.id
inner join table1 last
on last.tag = 'last'
and table2.id = last.id
I think what you're asking for is a COMMON TABLE EXPRESSION. If your platform doesn't implement them, then a temporary table may be the best alternative.
I'm a little confused. Your query looks okay . . . although it looks better with proper join syntax.
select table2.occupation, firsttable.first, lasttable.last
from table2 join
(select value as first from table1 where tag = 'first') firsttable
on table2.id = firsttable.id join
(select value as last from table1 where tag = 'last') lasttable
on table2.id = lasttable.id
This query does what you are asking it to do. SQL is a declarative language, not a procedural language. This means that you describe the result set and rely on the database SQL compiler to turn it into the right set of commands. (That said, sometimes how a query is structured does make it easier or harder for some engines to produce efficient query plans.)

sql, query optimisation with and inner join?

I'm trying to optimise my query, it has an inner join and coalesce.
The join table, is simple a table with one field of integer, I've added a unique key.
For my where clause I've created a key for the three fields.
But when I look at the plan it still says it's using a table scan.
Where am I going wrong ?
Here's my query
select date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype) as due
from billsndeposits a
inner join util_nums b on date(a.startdate, '+'||(b.n*a.interval)||'
'||a.intervaltype) <= coalesce(a.enddate, date('2013-02-26'))
where not (intervaltype = 'once' or interval = 0) and factid = 1
order by due, pid;
Most likely your JOIN expression cannot use any index and it is calculated by doing a NATURAL scan and calculate date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype) for every row.
BTW: That is a really weird join condition in itself. I suggest you find a better way to join billsndeposits to util_nums (if that is actually needed).
I think I understand what you are trying to achieve. But this kind of join is a recipe for slow performance. Even if you remove date computations and the coalesce (i.e. compare one date against another), it will still be slow (compared to integer joins) even with an index. And because you are creating new dates on the fly you cannot index them.
I suggest creating a temp table with 2 columns (1) pid (or whatever id you use in billsndeposits) and (2) recurrence_dt
populate the new table using this query:
INSERT INTO TEMP
SELECT PID, date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype)
FROM billsndeposits a, util_numbs b;
Then create an index on recurrence_dt columns and runstats. Now your select statement can look like this:
SELECT recurrence_dt
FROM temp t, billsndeposits a
WHERE t.pid = a.pid
AND recurrence_dt <= coalesce(a.enddate, date('2013-02-26'))
you can add a exp_ts on this new table, and expire temporary data afterwards.
I know this adds more work to your original query, but this is a guaranteed performance improvement, and should fit naturally in a script that runs frequently.
Regards,
Edit
Another thing I would do, is make enddate default value = date('2013-02-26'), unless it will affect other code and/or does not make business sense. This way you don't have to work with coalesce.

SQL - table alias scope

I've just learned ( yesterday ) to use "exists" instead of "in".
BAD
select * from table where nameid in (
select nameid from othertable where otherdesc = 'SomeDesc' )
GOOD
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
And I have some questions about this:
1) The explanation as I understood was: "The reason why this is better is because only the matching values will be returned instead of building a massive list of possible results". Does that mean that while the first subquery might return 900 results the second will return only 1 ( yes or no )?
2) In the past I have had the RDBMS complainin: "only the first 1000 rows might be retrieved", this second approach would solve that problem?
3) What is the scope of the alias in the second subquery?... does the alias only lives in the parenthesis?
for example
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
AND
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeOtherDesc' )
That is, if I use the same alias ( o for table othertable ) In the second "exist" will it present any problem with the first exists? or are they totally independent?
Is this something Oracle only related or it is valid for most RDBMS?
Thanks a lot
It's specific to each DBMS and depends on the query optimizer. Some optimizers detect IN clause and translate it.
In all DBMSes I tested, alias is only valid inside the ( )
BTW, you can rewrite the query as:
select t.*
from table t
join othertable o on t.nameid = o.nameid
and o.otherdesc in ('SomeDesc','SomeOtherDesc');
And, to answer your questions:
Yes
Yes
Yes
You are treading into complicated territory, known as 'correlated sub-queries'. Since we don't have detailed information about your tables and the key structures, some of the answers can only be 'maybe'.
In your initial IN query, the notation would be valid whether or not OtherTable contains a column NameID (and, indeed, whether OtherDesc exists as a column in Table or OtherTable - which is not clear in any of your examples, but presumably is a column of OtherTable). This behaviour is what makes a correlated sub-query into a correlated sub-query. It is also a routine source of angst for people when they first run into it - invariably by accident. Since the SQL standard mandates the behaviour of interpreting a name in the sub-query as referring to a column in the outer query if there is no column with the relevant name in the tables mentioned in the sub-query but there is a column with the relevant name in the tables mentioned in the outer (main) query, no product that wants to claim conformance to (this bit of) the SQL standard will do anything different.
The answer to your Q1 is "it depends", but given plausible assumptions (NameID exists as a column in both tables; OtherDesc only exists in OtherTable), the results should be the same in terms of the data set returned, but may not be equivalent in terms of performance.
The answer to your Q2 is that in the past, you were using an inferior if not defective DBMS. If it supported EXISTS, then the DBMS might still complain about the cardinality of the result.
The answer to your Q3 as applied to the first EXISTS query is "t is available as an alias throughout the statement, but o is only available as an alias inside the parentheses". As applied to your second example box - with AND connecting two sub-selects (the second of which is missing the open parenthesis when I'm looking at it), then "t is available as an alias throughout the statement and refers to the same table, but there are two different aliases both labelled 'o', one for each sub-query". Note that the query might return no data if OtherDesc is unique for a given NameID value in OtherTable; otherwise, it requires two rows in OtherTable with the same NameID and the two OtherDesc values for each row in Table with that NameID value.
Oracle-specific: When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query. When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query. See "Difference between IN and EXISTS in subqueries".
Probably.
Alias declared inside subquery lives inside subquery. By the way, I don't think your example with 2 ANDed subqueries is valid SQL. Did you mean UNION instead of AND?
Personally I would use a join, rather than a subquery for this.
SELECT t.*
FROM yourTable t
INNER JOIN otherTable ot
ON (t.nameid = ot.nameid AND ot.otherdesc = 'SomeDesc')
It is difficult to generalize that EXISTS is always better than IN. Logically if that is the case, then SQL community would have replaced IN with EXISTS...
Also, please note that IN and EXISTS are not same, the results may be different when you use the two...
With IN, usually its a Full Table Scan of the inner table once without removing NULLs (so if you have NULLs in your inner table, IN will not remove NULLS by default)... While EXISTS removes NULL and in case of correlated subquery, it runs inner query for every row from outer query.
Assuming there are no NULLS and its a simple query (with no correlation), EXIST might perform better if the row you are finding is not the last row. If it happens to be the last row, EXISTS may need to scan till the end like IN.. so similar performance...
But IN and EXISTS are not interchangeable...

Best way to perform dynamic subquery in MS Reporting Services?

I'm new to SQL Server Reporting Services, and was wondering the best way to do the following:
Query to get a list of popular IDs
Subquery on each item to get properties from another table
Ideally, the final report columns would look like this:
[ID] [property1] [property2] [SELECT COUNT(*)
FROM AnotherTable
WHERE ForeignID=ID]
There may be ways to construct a giant SQL query to do this all in one go, but I'd prefer to compartmentalize it. Is the recommended approach to write a VB function to perform the subquery for each row? Thanks for any help.
I would recommend using a SubReport. You would place the SubReport in a table cell.
Depending on how you want the output to look, a subreport could do, or you could group on ID, property1, property2 and show the items from your other table as detail items (assuming you want to show more than just count).
Something like
select t1.ID, t1.property1, t1.property2, t2.somecol, t2.someothercol
from table t1 left join anothertable t2 on t1.ID = t2.ID
#Carlton Jenke I think you will find an outer join a better performer than the correlated subquery in the example you gave. Remember that the subquery needs to be run for each row.
Simplest method is this:
select *,
(select count(*) from tbl2 t2 where t2.tbl1ID = t1.tbl1ID) as cnt
from tbl1 t1
here is a workable version (using table variables):
declare #tbl1 table
(
tbl1ID int,
prop1 varchar(1),
prop2 varchar(2)
)
declare #tbl2 table
(
tbl2ID int,
tbl1ID int
)
select *,
(select count(*) from #tbl2 t2 where t2.tbl1ID = t1.tbl1ID) as cnt
from #tbl1 t1
Obviously this is just a raw example - standard rules apply like don't select *, etc ...
UPDATE from Aug 21 '08 at 21:27:
#AlexCuse - Yes, totally agree on the performance.
I started to write it with the outer join, but then saw in his sample output the count and thought that was what he wanted, and the count would not return correctly if the tables are outer joined. Not to mention that joins can cause your records to be multiplied (1 entry from tbl1 that matches 2 entries in tbl2 = 2 returns) which can be unintended.
So I guess it really boils down to the specifics on what your query needs to return.
UPDATE from Aug 21 '08 at 22:07:
To answer the other parts of your question - is a VB function the way to go? No. Absolutely not. Not for something this simple.
Functions are very bad on performance, each row in the return set executes the function.
If you want to "compartmentalize" the different parts of the query you have to approach it more like a stored procedure. Build a temp table, do part of the query and insert the results into the table, then do any further queries you need and update the original temp table (or insert into more temp tables).