Oracle sql joining derived table with union - sql

I have a query like this:
select product_code,
count(foo.container_id) as "quantity_of_containers",
max((trunc(foo.update_date - foo.creation_date))) as "max_days"
from product
inner join stock on stock.product_id = product.product_id
inner join (
select arch.container_id,
arch.creation_date,
arch.update_date
from arch
union all
select
container.container_id,
container.creation_date,
container.update_date
from container) foo on stock.container_id = foo.container_id
group by product_code
order by "max_days" desc
the question is what is wrong with this query, when i run it, it seems okay but after further inspection it looks like only records from container are there, for some reason there is not a single record from arch table. Is there different way to write this query with same logic behind it? i need to get records from both arch and container since one table is archive and another is current table. Also approach with unioning two separate queries on container and arch is working but i was looking for something faster.
#edit:
im sending some sample data to make it more clear what i have actually meant.
+---------------+----------+----------+
| product_code | quantity | max_days |
+---------------+----------+----------+
| 5999990024965 | 345 | 85 |
| 5999990027614 | 326 | 81 |
| 5999990023753 | 87 | 77 |
+---------------+----------+----------+
data from arch table,
+---------------+----------+----------+
| product_code | quantity | max_days |
+---------------+----------+----------+
| 5999990082415 | 11 | 84 |
| 5999990059615 | 2 | 58 |
| 5999990023470 | 1 | 41 |
+---------------+----------+----------+
data from container table.
However when i run query i've pasted here i only got records from container table,
and yes stock.container_id really matches foo.container_ids (arch and container)

This isn't an answer, but it's too long to go in a comment.
What do you get when you run the following query?
select *
from product
inner join stock on stock.product_id = product.product_id
inner join (
select arch.container_id,
arch.creation_date,
arch.update_date,
'arch' qry
from arch
union all
select
container.container_id,
container.creation_date,
container.update_date,
'container' qry
from container) foo on stock.container_id = foo.container_id
where foo.qry = 'arch';

i'm really sorry but it seems like the reason was software bug? after server restart query provided needed data. Thanks everyone for your time, seems like first query was actually correct.
Really sorry for taking your time, thank you everyone.

Related

SQL structure for multiple queries of the same table (using window function, case, join)

I have a complex production SQL question. It's actually PrestoDB Hadoop, but conforms to common SQL.
I've got to get a bunch of metrics from a table, a little like this (sorry if the tables are mangled):
+--------+--------------+------------------+
| device | install_date | customer_account |
+--------+--------------+------------------+
| dev 1 | 1-Jun | 123 |
| dev 1 | 4-Jun | 456 |
| dev 1 | 10-Jun | 789 |
| dev 2 | 20-Jun | 50 |
| dev 2 | 25-Jun | 60 |
+--------+--------------+------------------+
I need something like this:
+--------+------------------+-------------------------+
| device | max_install_date | previous_account_number |
+--------+------------------+-------------------------+
| dev 1 | 10-Jun | 456 |
| dev 2 | 25-Jun | 50 |
+--------+------------------+-------------------------+
I can do two separate queries to get max install date and previous account number, like this:
select device, max(install_date) as max_install_date
from (select [a whole bunch of stuff], dense_rank() over(partition by device order by [something_else]) rnk
from some_table a
)
But how do you combine them into one query to get one line for each device? I have rank, with statements, case statements, and one join. They all work individually but I'm banging my head to understand how to combine them all.
I need to understand how to structure big queries.
ps. any good books you recommend on advanced SQL for data analysis? I see a bunch on Amazon but nothing that tells me how to construct big queries like this. I'm not a DBA. I'm a data guy.
Thanks.
You can use correlated subquery approach :
select t.*
from table t
where install_date = (select max(install_date) from table t1 where t1.device = t.device);
This assumes install_date has resonbale date format.
I think you want:
select t.*
from (select t.*, max(install_date) over (partition by device) as max_install_date,
lag(customer_account) over (partition by device order by install-date) as prev_customer_account
from t
) t
where install_date = max_install_date;

Multiple Self-Join based on GROUP BY results

I'm attempting to collect details about backup activity from a ProgreSQL DB table on a backup appliance (Avamar). The table has several columns including: client_name, dataset, plugin_name, type, completed_ts, status_code, bytes_modified and more. Simplified example:
| session_id | client_name | dataset | plugin_name | type | completed_ts | status_code | bytes_modified |
|------------|-------------|---------|---------------------|------------------|----------------------|-------------|----------------|
| 1 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-05T01:00:00Z | 30900 | 11111111 |
| 2 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-04T01:00:00Z | 30000 | 22222222 |
| 3 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-03T01:00:00Z | 30000 | 22222222 |
| 4 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-02T01:00:00Z | 30000 | 22222222 |
| 5 | server01 | Windows | Windows VSS | Scheduled Backup | 2017-12-01T01:00:00Z | 30000 | 33333333 |
| 6 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-05T02:00:00Z | 30000 | 44444444 |
| 7 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-04T02:00:00Z | 30900 | 55555555 |
| 8 | server03 | Windows | Windows File System | On-Demand Backup | 2017-12-05T03:00:00Z | 30000 | 66666666 |
| 9 | server04 | Windows | Windows File System | Validate | 2017-12-05T03:00:00Z | 30000 | 66666666 |
Each client_name (server) can have multiple datasets, and each dataset can have multiple plugin_names. So I have a created a SQL statement that does a GROUP BY of these three columns to get a list of "job" activity over time.
(http://sqlfiddle.com/#!15/f15556/1)
select
client_name,
dataset,
plugin_name
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
Each of these Jobs can be successful or fail based on a status_code column. Using self-join with subqueries I'm able to get results of the Last Good backup along with it's completed_ts (completed time) and bytes_modified and more:
(http://sqlfiddle.com/#!15/f15556/16)
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
I can do the same thing separately to get the Last Attempt details by removing WHERE's status_code line: http://sqlfiddle.com/#!15/f15556/3. Note that most times LastGood and LastAttempted are the same row but sometimes they are not, depending if the last backup was successful.
What I'm having problems with is merging these two statements together (if possible). So I will get this result:
| client_name | dataset | plugin_name | lastgood | lastgood_bytes | lastattempt | lastattempt_bytes |
|-------------|---------|---------------------|----------------------|-----------------|----------------------|-------------------|
| server01 | Windows | Windows File System | 2017-12-04T01:00:00Z | 22222222 | 2017-12-05T01:00:00Z | 11111111 |
| server01 | Windows | Windows VSS | 2017-12-01T01:00:00Z | 33333333 | 2017-12-01T01:00:00Z | 33333333 |
| server02 | Windows | Windows File System | 2017-12-05T02:00:00Z | 44444444 | 2017-12-05T02:00:00Z | 44444444 |
| server03 | Windows | Windows File System | 2017-12-05T03:00:00Z | 66666666 | 2017-12-05T03:00:00Z | 66666666 |
I attempted just adding another RIGHT JOIN to the end (http://sqlfiddle.com/#!15/f15556/4) and getting NULL rows. After doing some reading I see that the first two JOINs run first creating a temporary table before the 2nd join occurs, but at that point the data I need is lost so I get NULL rows.
Using PostgreSQL 8 via groovy scripting. I also only have read-only access to the DB.
You apparently have two intermediate inner join output tables and you want to get columns from each about some things identified by a common key. So inner join them on the key.
select
g.client_name,
g.dataset,
g.plugin_name,
LastGood,
g.status_code,
LastGood_bytes
LastAttempt,
l.status_code,
LastAttempt_bytes
from
( -- cut & pasted Last Good http://sqlfiddle.com/#!15/f15556/16
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
) as g
join
( -- cut & pasted Last Attempt http://sqlfiddle.com/#!15/f15556/3
select
a1.client_name,
a1.dataset,
a1.plugin_name,
a1.LastAttempt,
a3.status_code,
a3.bytes_modified as LastAttempt_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2 a2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as a1
on a3.client_name = a1.client_name and
a3.dataset = a1.dataset and
a3.plugin_name = a1.plugin_name and
a3.completed_ts = a1.LastAttempt
) as l
on l.client_name = g.client_name and
l.dataset = g.dataset and
l.plugin_name = g.plugin_name
order by client_name, dataset, plugin_name
This uses one of the applicable approaches in Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs. However the correspondence of chunks of code might not be so clear. Its intermediate are left vs your inner & group_concat is your max. (But it has more approaches because of particulars of group_concat & its query.)
A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.
A correct cumulative LEFT JOIN approach: JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then left join that & q3--1:many--then GROUP BY & GROUP_CONCAT.
Whether this actually serves your purpose in general depends on your actual specification and constraints. Even if the two joins you link are what you want you need to explain exactly what you mean by "merge". You don't say what you want if the joins have different sets of values for the grouped columns. Force yourself to use the English language to say what rows go in the result based on what rows are in the input.
PS 1 You have undocumented/undeclared/unenforced constraints. Please declare when possible. Otherwise enforce by triggers. Document in question text if not in code. Constraints are fundamental to multiple subrow value instances in join & to group by.
PS 2 Learn the syntax/semantics for select. Learn what left/right outer join ons return--whatinner join on does plus unmatched left/right table rows extended by nulls.
PS 3 Is there any rule of thumb to construct SQL query from a human-readable description?
Here is an alternate way that also works but harder to follow and likely more particular to my dataset: http://sqlfiddle.com/#!15/f15556/114
select
Actvty.client_name,
Actvty.dataset,
Actvty.plugin_name,
ActvtyGood.LastGood,
ActvtyGood.status_code as LastGood_status,
ActvtyGood.bytes_modified as LastGood_bytes,
ActvtyOnly.LastAttempt,
Actvty.status_code as LastAttempt_status,
Actvty.bytes_modified as LastAttempt_bytes
from v_activities_2 Actvty
-- 1. Get last attempt of each job (which may or may not match last good)
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as ActvtyOnly
on Actvty.client_name = ActvtyOnly.client_name and
Actvty.dataset = ActvtyOnly.dataset and
Actvty.plugin_name = ActvtyOnly.plugin_name and
Actvty.completed_ts = ActvtyOnly.LastAttempt
-- 4. join the list of good runs with the table of last attempts, there would never be a job that has a last good without also a last attempt.
join (
-- 3. join last good runs with the full table to get the additional details of each
select
ActvtyGoodSub.client_name,
ActvtyGoodSub.dataset,
ActvtyGoodSub.plugin_name,
ActvtyGoodSub.LastGood,
ActvtyAll.status_code,
ActvtyAll.bytes_modified
from v_activities_2 ActvtyAll
-- 2. Get last Good run of each job
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as ActvtyGoodSub
on ActvtyAll.client_name = ActvtyGoodSub.client_name and
ActvtyAll.dataset = ActvtyGoodSub.dataset and
ActvtyAll.plugin_name = ActvtyGoodSub.plugin_name and
ActvtyAll.completed_ts = ActvtyGoodSub.LastGood
) as ActvtyGood
on Actvty.client_name = ActvtyGood.client_name and
Actvty.dataset = ActvtyGood.dataset and
Actvty.plugin_name = ActvtyGood.plugin_name

Access query fails when WHERE statement is added to subquery referencing ODBC link

Original post
Given two tables structured like this:
t1 (finished goods) t2 (component parts)
sku | desc | fcst sku | part | quant
0001 | Car | 10000 0001 | wheel | 4
0002 | Boat | 5000 0001 | door | 2
0003 | Bike | 7500 0002 | hull | 1
0004 | Shirt | 2500 0002 | rudder | 1
... | ... | ... 0003 | wheel | 2
0005 | rotor | 2
... | ... | ...
I am trying to append wheel requirements to the forecast, while leaving all records in the forecast. My results would look like this:
sku | desc | fcst | wheels | wheelfcst
0001 | Car | 10000 | 4 | 40000
0002 | Boat | 5000 | |
0003 | Bike | 7500 | 2 | 15000
0004 | Shirt | 2500 | |
... | ... | ... | ... | ...
The most efficient way to go about this in my eyes is something like this query:
SELECT
t1.sku,
t1.desc,
t1.fcst,
q.quant as wheels,
t1.fcst * q.quant as wheelfcst
FROM
t1
LEFT JOIN
(
SELECT *
FROM t2
WHERE part LIKE "wheel"
)
as q
ON t1.sku = q.sku
The problem is that it gives a very elaborate Invalid Operation. error when ran.
If I remove the WHERE statement: I get the wheel parts as desired but I also pull door, hull, and rudder quantities.
If I move the WHERE statement to the main query (WHERE q.part LIKE "wheel"): I only see goods that contain wheels, but boats are then missing from the results.
I have considered a UNION statement, taking the results of the previously mentioned moving the WHERE out of the subquery (WHERE q.part LIKE "wheel"), but there doesn't seem to be a good way to grab every final item that doesn't have a wheel component because each sku can have anywhere from 0 to many components.
Is there something I'm overlooking in my desired query, or is this something requiring a UNION approach?
EDIT #1 - To answer questions raised by Andre
The full error message is Invalid operation.
sku is the primary key of t1, and there are 1426 records.
t2 contains ~446,000 records, the primary key is a composite of sku and part.
The actual WHERE statement is a partial search. All "wheels" have the same suffix but different component item numbers.
Additionally, I am in Access 2007, it may be an issue related to software version.
Making the subquery into a temporary table works, but the goal is to avoid that procedure.
EDIT #2 - A flaw in my environment
I created a test scenario identical to the one I have posted here, and I get the same results as Andre. At this point, combining these results with the fact that the temporary table method does in fact work, I am led to believe that it is an issue with query complexity and record access. Despite the error message not being the typical Query is too complex. message.
EDIT #3 - Digging deeper into "Complexity"
My next test will be to make the where clause simpler. Sadly, the systems I work on update at lunch each day and I currently cannot reach any data servers. I hope to update my progress at a later point today.
EDIT #4 - Replacing the partial search
Ok, we're back from a meeting and ready to go. I've just ran six queries with three different WHERE clauses:
WHERE part LIKE "*heel" / WHERE component_item LIKE "*SBP" (Original large scale issue)
Works in small scale test, Invalid operation on large scale.
WHERE part LIKE "wheel" / WHERE component_item LIKE "VALIDPART" (Original small scale)
Works in small scale test, Invalid operation on large scale.
WHERE part LIKE "wh33l" / WHERE component_item LIKE "NOTVALIDPART"(Where statements that do not return any records)
Small Scale
sku | desc | fcst | wheels | wheelfcst
0001 | Car | 10000 | |
0002 | Boat | 5000 | |
0003 | Bike | 10000 | |
0004 | Shirt | 5000 | |
Large Scale
sku |description |forecast |component_item |dip_rate
#####|RealItem1 | ###### | |
#####|RealItem2 | ###### | |
#####|RealItem3 | ###### | |
... |... | ... | |
Tl;dr The filter specifics did not make a difference unless the filter resulted in a subquery that returned 0 records.
EDIT #5 - An interesting result
Under the idea of trying every possible solution and test everything I can, I made a local temporary table which contained every field and every record from t2 (~25MB). Referencing this table instead of the ODBC link to t2 works with the partial search query (WHERE component_item LIKE "*SBP"). I am updating the title of this question to reflect that the issue is specific to a linked table.
I copied the sample data to Access 2010 and ran the query, it worked without problems.
What is the full error message you get? ("very elaborate Invalid Operation.")
How many records are in your tables?
Is sku primary key in t1?
In any case, I suggest changing the subquery to:
SELECT sku, quant
FROM t2
WHERE part = "wheel"
LIKE is only needed for partial searches.
A workaround
I have devised a set of queries that works by using one of my original hunches of a UNION statement. Hopefully this allows for anyone that stumbles across this issue to save a bit of time.
Setting up the UNION
My initial problem was with the idea that there was no way to select only wheel record and then join them to null records, because (using t1 as an example) the Boat has parts in t2, but none of them are wheels. I had to first devise a method to see which products had wheels without using a filter.
My intermediary solution:
Query: t1haswheel
SELECT
t1.sku,
t1.desc,
t1.fcst,
SUM(
IF(
t2.part = "wheel",
1, 0
) as haswheel
FROM
t1
LEFT JOIN
(
SELECT *
FROM t2
WHERE part LIKE "wheel"
)
as q
ON t1.sku = q.sku
GROUP BY
t1.sku,
t1.desc,
t1.fcst
This query returns every record from t1 followed by a number based on the number of wheel records there are in the part list for that item number. If there are no records in t2 with "wheel" in the field part, the query returns 0 for the record. This list is what I needed for the UNION statement in the original question.
UNION-ing it all together
At this point, all that is needed is a UNION which uses the summation field from the previous query (haswheel).
SELECT
q1.sku,
q1.desc,
q1.fcst,
t2.quant,
q1.fcst * t2.quant as wheelfcst
FROM
t1haswheel as q1
LEFT JOIN t2
ON q1.sku = t2.sku
WHERE
q1.haswheel > 0 AND
t2.part = "wheel"
UNION ALL
SELECT
q1.sku,
q1.desc,
q1.fcst,
null,
null
FROM
t1haswheel as q1
WHERE q1.haswheel = 0
This pulls in the correct results from records with wheels, and then attaches the records without wheels, while never using the WHERE statement in a subquery which references an ODBC linked table:
sku | desc | fcst | wheels | wheelfcst
0001 | Car | 10000 | 4 | 40000
0003 | Bike | 7500 | 2 | 15000
... | ... | ... | ... | ...
0002 | Boat | 5000 | |
0004 | Shirt | 2500 | |
... | ... | ... | ... | ...

SQL Composite key grouping issue

I've got a very frustrating SQL issue which i can't for the life of me solve with a derived query returning a composite key but also performing a MIN() aggregate function on another field within that table. If i was performing the MIN() on one of the composite keys it would be easy, but since i need to return both keys and perform the MIN() function as well to the outer query i can't work out how to do this. The entire query looks like this:
SELECT
p.name as productname
,tmp.packageid
,tmp.price
,ppk2.packageoptionid
,ppk2.selcomproductid
FROM (
SELECT ppk.productid, ppk.packageid, MIN(ppk.price) as price
FROM product_package ppk
INNER JOIN package pk ON ppk.packageid = pk.id
INNER JOIN [plan] pl ON pk.planid = pl.id
WHERE pk.networkid = 1
GROUP BY ppk.productid, ppk.packageid
) tmp
INNER JOIN product_package ppk2 ON (
ppk2.productid = tmp.productid
AND ppk2.packageid = tmp.packageid
)
INNER JOIN product p ON (p.id = ppk2.productid)
WHERE p.isenabled = 1;
Current Results:
--------------------------------------
productid | packageid | price
1 500 0
1 501 19.95
1 502 29.95
2 501 0
3 500 15
3 504 39.95
Desired Results:
--------------------------------------
productid | packageid | price
1 500 0
2 501 0
3 500 15
The derived query "tmp" is where my issue lies as i need a unique rows back for each product/package combination with the lowest price, before joining onto the outer tables.
Any help would be greatly appreciated!
I've used this trick whenever I need a sub query along with the smallest of something. The idea is to combine the value and key together with the value in the most significant bits and take the min of that. Then take it apart in the outer select.
The best way to combine to values depends on what RDBMS you're using. You don't mention which one you're using so I'm just providing pseudo code:
select ..., (tmp.c >> 32) price
from
(select productid, min((price << 32) | packageid) c
from product_package
where networkid=1
group by productid) tmp
inner join product_package ppk on ppk.productid=tmp.productid
and ppk.packageid=(tmp.c & 0xFFFFFFFF)
inner join product p on p.id=ppk.productid
where p.isenabled=1
<< 32 means shift the value 32 bits to the left and | is the bitwise "or". So this is assuming packageid is defined as a 32bit integer (or number(4)). The & 0xFFFFFFFF is the bitwise "and" and the hex value for 32 bits used to mask and return just packageid.
Depending on your RDBMS you may need to find its specific syntax for these things or if they're aren't supported you can use plain math - << 32 is equivalent to multiplying by 4294967296 and & 0xFFFFFFFF to dividing by 4294967296. If you're using MSSQL you can use convert(binary,price)+convert(binary,packageid) to combine them and substring(..) to separate.
Well, I don't know the data you actually have in your table. I just have the data your query returns. You didn't answer to my comment asking for a sample of the data of your table and the DBMS you were using.
However, assuming the current data of your table is the one that comes out of your query, the following query will give you the "Desired Result" you've specified:
select t1.* from t t1
left join t t2
on t1.productid = t2.productid and t1.details > t2.details
where t2.details is null
In table words, the query turns this:
+-----------+-----------+---------+
| PRODUCTID | PACKAGEID | DETAILS |
+-----------+-----------+---------+
| 1 | 500 | 0 |
| 1 | 501 | 20 |
| 1 | 502 | 30 |
| 2 | 501 | 0 |
| 3 | 500 | 15 |
| 3 | 504 | 40 |
+-----------+-----------+---------+
Into this:
+-----------+-----------+---------+
| PRODUCTID | PACKAGEID | DETAILS |
+-----------+-----------+---------+
| 1 | 500 | 0 |
| 2 | 501 | 0 |
| 3 | 500 | 15 |
+-----------+-----------+---------+
Let me know if it's clear or not.
Easy (read: expensive) way: build two views: One that gets just the minimum ppk.price of each productid WHERE pk.networkid = 1, and group that by productid. Call it Product_MinPrice_VIEW or whatever.
Build a second view, Product_VIEW, that replaces all that sub-SELECT INNER JOIN work you're trying to get away with via an INNER JOIN on the results of the Product_MinPrice_VIEW you just made.
I swear, wrangling with sub-SELECTS, HAVINGS and GROUP-BYs is tedious and error prone. I can't stand it sometimes. Hopefully, this will get you far enough to develop a solution that can be later optimized and made more correct.
FINAL ANSWER
I have an extremely similar problem with the application I'm working on, and in the mean time (while I hit this site up for a better answer), I just passed the buck, and wrote some application-level code to deal with any duplicates, and let the program's logic find the true minimum when encountered. Not pretty, but then again I don't have all day to try and figure it out!
I'm sorry my answer couldn't help you. Good luck!

return latest version of a drupal node

I'm writing a drupal module, and I need to write a query that returns particular rows of one of my content_type tables. My query so far is:
SELECT DISTINCT pb.*, f.filepath FROM {content_type_promo_box} pb LEFT JOIN {files} f ON pb.field_promo_image_fid = f.fid
I realized as I was working that the table not only contains each cck field of this content type, it also contains multiple versions for each field. How do I limit my query to the rows that only contain values for the current versions of the nodes?
UPDATE: I need to clarify my question a little. I've been down the views path already, and I did think about using node_load (thanks for the answer, though, Jeremy!). Really, my question is more about how to write an appropriate SQL statement than it is about drupal specifically. I only want to return rows that contain the latest versions (vid is the greatest) for any particular node (nid). So here's an example:
-------------
| nid | vid |
-------------
| 45 | 3 |
| 23 | 5 |
| 45 | 9 |
| 23 | 12 |
| 45 | 36 |
| 33 | 44 |
| 33 | 78 |
-------------
My query should return the following:
-------------
| nid | vid |
-------------
| 23 | 12 |
| 45 | 36 |
| 33 | 78 |
-------------
Make sense? Thanks!
The node table stores the current version of the node, and revision ids are unique across all content. This makes for a pretty simple query:
SELECT m.*
FROM
{mytable} AS m
JOIN {node} AS n
ON m.vid = n.vid
If there is no content in {mytable} for the node, it will not be returned by the query; change to a RIGHT JOIN to return all nodes.
Assuming that (nid, vid) combination is unqiue:
SELECT m.*
FROM (
SELECT nid, MAX(vid) AS mvid
FROM mytable
GROUP BY
nid
) q
JOIN mytable m
ON (m.nid, m.vid) = (q.nid, q.mvid)
You may be better off using node_load() to get the load object rather than trying to write a query yourself.
Or even use views to do what you need.
The reason for doing this over writing a query for yourself is that Drupal and it's modules sit together as a framework. Most of the time you will want to use that framework to do what you want rather than side stepping it to write your own query. In future if you upgrade Drupal or a module node_load() will still work but your code may not.