Oracle11g, Multiple Pivots - sql

So I have a working query that pivots some data for me.
SELECT * FROM (
select requisitions.ACC_ID AS "Accession #"
,tests.TEST_ID
,results.RESULT_NUMERIC
FROM requisitions
inner join req_panels ON requisitions.acc_id = req_panels.acc_id
inner join results ON req_panels.rp_id = results.rp_id
inner join tests ON results.test_id = tests.test_id
WHERE results.TEST_ID IN (1,2,3,4)
AND requisitions.RECEIVED_DATE > TO_DATE('9/1/2013', 'MM/DD/YYYY')
ORDER BY requisitions.ACC_ID
)
pivot(
MAX(RESULT_NUMERIC)
for TEST_ID IN ('1' AS Test1,'2' AS Test2,'3' AS Test3,'4' AS Test4)
)
Now, I have to include a different type of result (RESULTS_ALPHA in results table) as a column for each ACC_ID. RESULT_ALPHA is a clob. For the test_id's already included in the code above RESULTS_ALPHA is empty. But it holds a value for another test, we'll call it "TestAlpha".
So what I have currently output from the code above is;
Acc_ID | Test 1 | Test 2 | Test 3 | Test 4
-------------------------------------------
000001 | 24 | 1.5 | 0.5 | 2.1
000002 | 15 | 2.1 | 0.3 | 1.3
And I need to get
Acc_ID | Test 1 | Test 2 | Test 3 | Test 4 | TestAlpha
--------------------------------------------------------
000001 | 24 | 1.5 | 0.5 | 2.1 | abcd
000002 | 15 | 2.1 | 0.3 | 1.3 | efgh
How can I accomplish this? Another pivot?
Thanks.

If you can use just the first 4000 characters of a CLOB field then you can just substring it:
SELECT * FROM (
select requisitions.ACC_ID AS "Accession #"
,tests.TEST_ID
,results.RESULT_NUMERIC
,dbms_lob.substr(results.RESULT_ALPHA, 4000, 1) as result_alpha
FROM requisitions
...
Of course that gives you a 4000-character-wide column in your output, but not much you can do about that really, unless you can set a lower length based on knowledge of what's in the column. (Though if it's less than 4K, it doesn't really make sense to store it as a CLOB; sounds like you have a mix of data in there though).
Even if the value is more than 4000 characters this will show the start of it. Whether that's acceptable, or useful, depends what you're doing with the result of the pivot.
What you're doing does seem to assume that the RESULTS_ALPHA is the same for all results records for each TEST_ID; or even for each ACC_ID. Which seems a bit wasteful if it's true.
I'm not sure if there a non-programmatic solution to get the full CLOB back.

Related

Recursive self join over file data

I know there are many questions about recursive self joins, but they're mostly in a hierarchical data structure as follows:
ID | Value | Parent id
-----------------------------
But I was wondering if there was a way to do this in a specific case that I have where I don't necessarily have a parent id. My data will look like this when I initially load the file.
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
Essentially, its a CSV file where each row in the table is a line in the file. Lines 1 and 5 identify an object header and lines 3, 4, 7, and 8 identify the rows belonging to the object. The object header lines can have only 40 attributes which is why the object is broken up across multiple sections in the CSV file.
What I'd like to do is take the table, separate out the record # column, and join it with itself multiple times so it achieves something like this:
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,5,6,7,8,...
2 | *,record,abc,efg,hij,lmn,opq,rst
3 | ,,1,x,y,z,t,u,v,...
4 | ,,2,q,r,s,l,m,n,...
I know its probably possible, I'm just not sure where to start. My initial idea was to create a view that separates out the first and second columns in a view, and use the view as a way of joining in a repeated fashion on those two columns. However, I have some problems:
I don't know how many sections will occur in the file for the same
object
The file can contain other objects as well so joining on the first two columns would be problematic if you have something like
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
9 | ,4,Data,1,2,3,4,...
10 | *,record,lmn,opq,rst,...
11 | ,,1,t,u,v,...
In the above case, my plan could join rows from the Data object in row 9 with the first rows of the Formula object by matching the record value of 1.
UPDATE
I know this is somewhat confusing. I tried doing this with C# a while back, but I had to basically write a recursive decent parser to parse the specific file format and it simply took to long because I had to get it in the database afterwards and it was too much for entity framework. It was taking hours just to convert one file since these files are excessively large.
Either way, #Nolan Shang has the closest result to what I want. The only difference is this (sorry for the bad formatting):
+----+------------+------------------------------------------+-----------------------+
| ID | header | x | value
|
+----+------------+------------------------------------------+-----------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 |3,Formula,1,2,3,4,5,6,7,8 |
| 2 | ,, | ,1,x,y,z,t,u,v | ,1,x,y,z,t,u,v |
| 3 | ,, | ,2,q,r,s,l,m,n | ,2,q,r,s,l,m,n |
| 4 | *,record, | ,abc,efg,hij,lmn,opq,rst |*,record,abc,efg,hij,lmn,opq,rst |
| 5 | ,4, | ,Data,1,2,3,4 |,4,Data,1,2,3,4 |
| 6 | *,record, | ,lmn,opq,rst | ,lmn,opq,rst |
| 7 | ,, | ,1,t,u,v | ,1,t,u,v |
+----+------------+------------------------------------------+-----------------------------------------------+
I agree that it would be better to export this to a scripting language and do it there. This will be a lot of work in TSQL.
You've intimated that there are other possible scenarios you haven't shown, so I obviously can't give a comprehensive solution. I'm guessing this isn't something you need to do quickly on a repeated basis. More of a one-time transformation, so performance isn't an issue.
One approach would be to do a LEFT JOIN to a hard-coded table of the possible identifying sub-strings like:
3,Formula,
*,record,
,,1,
,,2,
,4,Data,
Looks like it pretty much has to be human-selected and hard-coded because I can't find a reliable pattern that can be used to SELECT only these sub-strings.
Then you SELECT from this artificially-created table (or derived table, or CTE) and LEFT JOIN to your actual table with a LIKE to get all the rows that use each of these values as their starting substring, strip out the starting characters to get the rest of the string, and use the STUFF..FOR XML trick to build the desired Line.
How you get the ID column depends on what you want, for instance in your second example, I don't know what ID you want for the ,4,Data,... line. Do you want 5 because that's the next number in the results, or do you want 9 because that's the ID of the first occurrance of that sub-string? Code accordingly. If you want 5 it's a ROW_NUMBER(). If you want 9, you can add an ID column to the artificial table you created at the start of this approach.
BTW, there's really nothing recursive about what you need done, so if you're still thinking in those terms, now would be a good time to stop. This is more of a "Group Concatenation" problem.
Here is a sample, but has some different with you need.
It is because I use the value the second comma as group header, so the ,,1 and ,,2 will be treated as same group, if you can use a parent id to indicated a group will be better
DECLARE #testdata TABLE(ID int,Line varchar(8000))
INSERT INTO #testdata
SELECT 1,'3,Formula,1,2,3,4,...' UNION ALL
SELECT 2,'*,record,abc,efg,hij,...' UNION ALL
SELECT 3,',,1,x,y,z,...' UNION ALL
SELECT 4,',,2,q,r,s,...' UNION ALL
SELECT 5,'3,Formula,5,6,7,8,...' UNION ALL
SELECT 6,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 7,',,1,t,u,v,...' UNION ALL
SELECT 8,',,2,l,m,n,...' UNION ALL
SELECT 9,',4,Data,1,2,3,4,...' UNION ALL
SELECT 10,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 11,',,1,t,u,v,...'
;WITH t AS(
SELECT *,REPLACE(SUBSTRING(t.Line,LEN(c.header)+1,LEN(t.Line)),',...','') AS data
FROM #testdata AS t
CROSS APPLY(VALUES(LEFT(t.Line,CHARINDEX(',',t.Line, CHARINDEX(',',t.Line)+1 )))) c(header)
)
SELECT MIN(ID) AS ID,t.header,c.x,t.header+STUFF(c.x,1,1,'') AS value
FROM t
OUTER APPLY(SELECT ','+tb.data FROM t AS tb WHERE tb.header=t.header FOR XML PATH('') ) c(x)
GROUP BY t.header,c.x
+----+------------+------------------------------------------+-----------------------------------------------+
| ID | header | x | value |
+----+------------+------------------------------------------+-----------------------------------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 | 3,Formula,1,2,3,4,5,6,7,8 |
| 3 | ,, | ,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v | ,,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v |
| 2 | *,record, | ,abc,efg,hij,lmn,opq,rst,lmn,opq,rst | *,record,abc,efg,hij,lmn,opq,rst,lmn,opq,rst |
| 9 | ,4, | ,Data,1,2,3,4 | ,4,Data,1,2,3,4 |
+----+------------+------------------------------------------+-----------------------------------------------+

Access query fails when WHERE statement is added to subquery referencing ODBC link

Original post
Given two tables structured like this:
t1 (finished goods) t2 (component parts)
sku | desc | fcst sku | part | quant
0001 | Car | 10000 0001 | wheel | 4
0002 | Boat | 5000 0001 | door | 2
0003 | Bike | 7500 0002 | hull | 1
0004 | Shirt | 2500 0002 | rudder | 1
... | ... | ... 0003 | wheel | 2
0005 | rotor | 2
... | ... | ...
I am trying to append wheel requirements to the forecast, while leaving all records in the forecast. My results would look like this:
sku | desc | fcst | wheels | wheelfcst
0001 | Car | 10000 | 4 | 40000
0002 | Boat | 5000 | |
0003 | Bike | 7500 | 2 | 15000
0004 | Shirt | 2500 | |
... | ... | ... | ... | ...
The most efficient way to go about this in my eyes is something like this query:
SELECT
t1.sku,
t1.desc,
t1.fcst,
q.quant as wheels,
t1.fcst * q.quant as wheelfcst
FROM
t1
LEFT JOIN
(
SELECT *
FROM t2
WHERE part LIKE "wheel"
)
as q
ON t1.sku = q.sku
The problem is that it gives a very elaborate Invalid Operation. error when ran.
If I remove the WHERE statement: I get the wheel parts as desired but I also pull door, hull, and rudder quantities.
If I move the WHERE statement to the main query (WHERE q.part LIKE "wheel"): I only see goods that contain wheels, but boats are then missing from the results.
I have considered a UNION statement, taking the results of the previously mentioned moving the WHERE out of the subquery (WHERE q.part LIKE "wheel"), but there doesn't seem to be a good way to grab every final item that doesn't have a wheel component because each sku can have anywhere from 0 to many components.
Is there something I'm overlooking in my desired query, or is this something requiring a UNION approach?
EDIT #1 - To answer questions raised by Andre
The full error message is Invalid operation.
sku is the primary key of t1, and there are 1426 records.
t2 contains ~446,000 records, the primary key is a composite of sku and part.
The actual WHERE statement is a partial search. All "wheels" have the same suffix but different component item numbers.
Additionally, I am in Access 2007, it may be an issue related to software version.
Making the subquery into a temporary table works, but the goal is to avoid that procedure.
EDIT #2 - A flaw in my environment
I created a test scenario identical to the one I have posted here, and I get the same results as Andre. At this point, combining these results with the fact that the temporary table method does in fact work, I am led to believe that it is an issue with query complexity and record access. Despite the error message not being the typical Query is too complex. message.
EDIT #3 - Digging deeper into "Complexity"
My next test will be to make the where clause simpler. Sadly, the systems I work on update at lunch each day and I currently cannot reach any data servers. I hope to update my progress at a later point today.
EDIT #4 - Replacing the partial search
Ok, we're back from a meeting and ready to go. I've just ran six queries with three different WHERE clauses:
WHERE part LIKE "*heel" / WHERE component_item LIKE "*SBP" (Original large scale issue)
Works in small scale test, Invalid operation on large scale.
WHERE part LIKE "wheel" / WHERE component_item LIKE "VALIDPART" (Original small scale)
Works in small scale test, Invalid operation on large scale.
WHERE part LIKE "wh33l" / WHERE component_item LIKE "NOTVALIDPART"(Where statements that do not return any records)
Small Scale
sku | desc | fcst | wheels | wheelfcst
0001 | Car | 10000 | |
0002 | Boat | 5000 | |
0003 | Bike | 10000 | |
0004 | Shirt | 5000 | |
Large Scale
sku |description |forecast |component_item |dip_rate
#####|RealItem1 | ###### | |
#####|RealItem2 | ###### | |
#####|RealItem3 | ###### | |
... |... | ... | |
Tl;dr The filter specifics did not make a difference unless the filter resulted in a subquery that returned 0 records.
EDIT #5 - An interesting result
Under the idea of trying every possible solution and test everything I can, I made a local temporary table which contained every field and every record from t2 (~25MB). Referencing this table instead of the ODBC link to t2 works with the partial search query (WHERE component_item LIKE "*SBP"). I am updating the title of this question to reflect that the issue is specific to a linked table.
I copied the sample data to Access 2010 and ran the query, it worked without problems.
What is the full error message you get? ("very elaborate Invalid Operation.")
How many records are in your tables?
Is sku primary key in t1?
In any case, I suggest changing the subquery to:
SELECT sku, quant
FROM t2
WHERE part = "wheel"
LIKE is only needed for partial searches.
A workaround
I have devised a set of queries that works by using one of my original hunches of a UNION statement. Hopefully this allows for anyone that stumbles across this issue to save a bit of time.
Setting up the UNION
My initial problem was with the idea that there was no way to select only wheel record and then join them to null records, because (using t1 as an example) the Boat has parts in t2, but none of them are wheels. I had to first devise a method to see which products had wheels without using a filter.
My intermediary solution:
Query: t1haswheel
SELECT
t1.sku,
t1.desc,
t1.fcst,
SUM(
IF(
t2.part = "wheel",
1, 0
) as haswheel
FROM
t1
LEFT JOIN
(
SELECT *
FROM t2
WHERE part LIKE "wheel"
)
as q
ON t1.sku = q.sku
GROUP BY
t1.sku,
t1.desc,
t1.fcst
This query returns every record from t1 followed by a number based on the number of wheel records there are in the part list for that item number. If there are no records in t2 with "wheel" in the field part, the query returns 0 for the record. This list is what I needed for the UNION statement in the original question.
UNION-ing it all together
At this point, all that is needed is a UNION which uses the summation field from the previous query (haswheel).
SELECT
q1.sku,
q1.desc,
q1.fcst,
t2.quant,
q1.fcst * t2.quant as wheelfcst
FROM
t1haswheel as q1
LEFT JOIN t2
ON q1.sku = t2.sku
WHERE
q1.haswheel > 0 AND
t2.part = "wheel"
UNION ALL
SELECT
q1.sku,
q1.desc,
q1.fcst,
null,
null
FROM
t1haswheel as q1
WHERE q1.haswheel = 0
This pulls in the correct results from records with wheels, and then attaches the records without wheels, while never using the WHERE statement in a subquery which references an ODBC linked table:
sku | desc | fcst | wheels | wheelfcst
0001 | Car | 10000 | 4 | 40000
0003 | Bike | 7500 | 2 | 15000
... | ... | ... | ... | ...
0002 | Boat | 5000 | |
0004 | Shirt | 2500 | |
... | ... | ... | ... | ...

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

How do I do a WHERE NOT IN for Hierarchical data?

I have a table that is a list of paths between points. I want to create a query to return a list with pointID and range(number of point) from a given point. But have spent a day trying to figure this out and haven't go any where, does any one know how this should be done? ( I am writing this for MS-SQL 2005)
example
-- fromPointID | toPointID |
---------------|-----------|
-- 1 | 2 |
-- 2 | 1 |
-- 1 | 3 |
-- 3 | 1 |
-- 2 | 3 |
-- 3 | 2 |
-- 4 | 2 |
-- 2 | 4 |
with PointRanges ([fromPointID], [toPointID], [Range])
AS
(
-- anchor member
SELECT [fromPointID],
[toPointID],
0 AS [Range]
FROM dbo.[Paths]
WHERE [toPointID] = 1
UNION ALL
-- recursive members
SELECT P.[fromPointID],
P.[toPointID],
[Range] + 1 AS [Range]
FROM dbo.[Paths] AS P
INNER JOIN PointRanges AS PR ON PR.[toPointID] = P.[fromPointID]
WHERE [Range] < 5 -- This part is just added to limit the size of the table being returned
--WHERE P.[fromPointID] NOT IN (SELECT [toPointID] FROM PointRanges)
--Cant do the where statment I want to because it wont allow recurssion in the sub query
)
SELECT * FROM PointRanges
--Want this returned
-- PointID | Range |
-----------|-------|
-- 1 | 0 |
-- 2 | 1 |
-- 3 | 1 |
-- 4 | 2 |
Markus Jarderot's link gives a good answer for this. I end tried using his answer and also tried rewriting my problem in C# and linq but it ended up being more of a mathematical problem than a coding problem because I had a table with several thousands of point that interlinked. This is still something I am interested in and trying to get a better understanding of by reading books on mathematics and graph theory but if anyone else runs into this problem I think Markus Jarderot's link is the best answer you will find.

return latest version of a drupal node

I'm writing a drupal module, and I need to write a query that returns particular rows of one of my content_type tables. My query so far is:
SELECT DISTINCT pb.*, f.filepath FROM {content_type_promo_box} pb LEFT JOIN {files} f ON pb.field_promo_image_fid = f.fid
I realized as I was working that the table not only contains each cck field of this content type, it also contains multiple versions for each field. How do I limit my query to the rows that only contain values for the current versions of the nodes?
UPDATE: I need to clarify my question a little. I've been down the views path already, and I did think about using node_load (thanks for the answer, though, Jeremy!). Really, my question is more about how to write an appropriate SQL statement than it is about drupal specifically. I only want to return rows that contain the latest versions (vid is the greatest) for any particular node (nid). So here's an example:
-------------
| nid | vid |
-------------
| 45 | 3 |
| 23 | 5 |
| 45 | 9 |
| 23 | 12 |
| 45 | 36 |
| 33 | 44 |
| 33 | 78 |
-------------
My query should return the following:
-------------
| nid | vid |
-------------
| 23 | 12 |
| 45 | 36 |
| 33 | 78 |
-------------
Make sense? Thanks!
The node table stores the current version of the node, and revision ids are unique across all content. This makes for a pretty simple query:
SELECT m.*
FROM
{mytable} AS m
JOIN {node} AS n
ON m.vid = n.vid
If there is no content in {mytable} for the node, it will not be returned by the query; change to a RIGHT JOIN to return all nodes.
Assuming that (nid, vid) combination is unqiue:
SELECT m.*
FROM (
SELECT nid, MAX(vid) AS mvid
FROM mytable
GROUP BY
nid
) q
JOIN mytable m
ON (m.nid, m.vid) = (q.nid, q.mvid)
You may be better off using node_load() to get the load object rather than trying to write a query yourself.
Or even use views to do what you need.
The reason for doing this over writing a query for yourself is that Drupal and it's modules sit together as a framework. Most of the time you will want to use that framework to do what you want rather than side stepping it to write your own query. In future if you upgrade Drupal or a module node_load() will still work but your code may not.