How to convert a SQL subquery to a join - sql

I have two tables with a 1:n relationship: "content" and "versioned-content-data" (for example, an article entity and all the versions created of that article). I would like to create a view that displays the top version of each "content".
Currently I use this query (with a simple subquery):
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1
WHERE (version = (SELECT MAX(version) AS topversion
FROM mytable
WHERE (fk_idothertable = t1.fk_idothertable)))
The subquery is actually a query to the same table that extracts the highest version of a specific item. Notice that the versioned items will have the same fk_idothertable.
In SQL Server I tried to create an indexed view of this query but it seems I'm not able since subqueries are not allowed in indexed views. So... here's my question... Can you think of a way to convert this query to some sort of query with JOINs?
It seems like indexed views cannot contain:
subqueries
common table expressions
derived tables
HAVING clauses
I'm desperate. Any other ideas are welcome :-)
Thanks a lot!

This probably won't help if table is already in production but the right way to model this is to make version = 0 the permanent version and always increment the version of OLDER material. So when you insert a new version you would say:
UPDATE thetable SET version = version + 1 WHERE id = :id
INSERT INTO thetable (id, version, title, ...) VALUES (:id, 0, :title, ...)
Then this query would just be
SELECT id, title, ... FROM thetable WHERE version = 0
No subqueries, no MAX aggregation. You always know what the current version is. You never have to select max(version) in order to insert the new record.

Maybe something like this?
SELECT
t2.id,
t2.title,
t2.contenttext,
t2.fk_idothertable,
t2.version
FROM mytable t1, mytable t2
WHERE t1.fk_idothertable == t2.fk_idothertable
GROUP BY t2.fk_idothertable, t2.version
HAVING t2.version=MAX(t1.version)
Just a wild guess...

You Might be able to make the MAX a table alias that does group by.
It might look something like this:
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1 JOIN
(SELECT fk_idothertable, MAX(version) AS topversion
FROM mytable
GROUP BY fk_idothertable) as t2
ON t1.version = t2.topversion

I think FerranB was close but didn't quite have the grouping right:
with
latest_versions as (
select
max(version) as latest_version,
fk_idothertable
from
mytable
group by
fk_idothertable
)
select
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable,
t1.version
from
mytable as t1
join latest_versions on (t1.version = latest_versions.latest_version
and t1.fk_idothertable = latest_versions.fk_idothertable);
M

If SQL Server accepts LIMIT clause, I think the following should work:
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1 ordery by t1.version DESC LIMIT 1;
(DESC - For descending sort; LIMIT 1 chooses only the first row and
DBMS usually does good optimization on seeing LIMIT).

I don't know how efficient this would be, but:
SELECT t1.*, t2.version
FROM mytable AS t1
JOIN (
SElECT mytable.fk_idothertable, MAX(mytable.version) AS version
FROM mytable
) t2 ON t1.fk_idothertable = t2.fk_idothertable

Like this...I assume that the 'mytable' in the subquery was a different actual table...so I called it mytable2. If it was the same table then this will still work, but then I imagine that fk_idothertable will just be 'id'.
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1
INNER JOIN (SELECT MAX(Version) AS topversion,fk_idothertable FROM mytable2 GROUP BY fk_idothertable) t2
ON t1.id = t2.fk_idothertable AND t1.version = t2.topversion
Hope this helps

Related

Extract all tables and respective columns from long SQL Query

The task I am trying to solve is to get all tables out of a long SQL query and its respective columns.
E.g.
SELECT
t1.id, t1.gender, t1.name,
t2.age, t2.salary
FROM table1 t1
LEFT JOIN table2 t2
ON t1.id = t2.id
Wanted output:
{'table1': ['id', 'gender', 'name'], 'table2': ['age', 'salary']}
I considered using string splitting etc. getting all table names and based on the alias (if available) get the columns.
But this is getting way to complicated if there are multiple joins and maybe also UNIONs.
Is there an available library or smart way to do that?
If it's only for 1 query I would advise to use MS Excel and filter on the table alias. Generate the select statement via MS Excel and you could create something like this:
SELECT
'table1:', t1.id, t1.gender, t1.name,
'table2:',t2.age, t2.salary
FROM table1 t1
LEFT JOIN table2 t2
ON t1.id = t2.id
In case if this helps.
Take the column name from All_TAB_COLUMN and Pivot it. Still this is not exact result you want.
select * from (
select TABLE_NAME,COLUMN_NAME from ALL_TAB_COLUMNS where TABLE_NAME in
('Table1','Table2'))
pivot
(
max(table_name)
for COLUMN_NAME in ('gender','name','age','salary')
)
order by 1;

Using MAX when selecting a high number of fields in a query

I understand some varieties of this question have been asked, but I could not find an answer to my specific scenario.
My query has over 50 fields being selected, and only one of them is an aggregate, using MAX(). On the GROUP BY clause, I would only like to pass two specific fields, name and UserID, not all 50 to make the query run. See small subset below.
SELECT
t1.name,
MAX(t2.id) as UserID,
t3.age,
t3.height,
t3.dob,
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
LEFT JOIN table3 t3 ON t1.id = t3.id
GROUP BY t1.name, UserID
Is there any workaround or better approach to accomplish my goal?
The database is SQL Server and any help would be greatly appreciated.
Hmmm . . . What values do you want for the other fields? If you want the max() of one column for each id and code, you can do:
select t.*
from (select t.*, max(col) over (partition by id, code) as maxcol
from t
) t
where col = maxcol;
Given that id might be unique, you might want the maximum id as well as the other columns for each code:
select t.*
from (select t.*, max(id) over (partition by code) as maxid
from t
) t
where id = maxid;

Column ambiguously defined in subquery using rownums

I have to execute a SQL made from some users and show its results. An example SQL could be this:
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where table1.id = table2.id
This SQL works fine as it is, but I need to manually add pagination and show the rownum, so the SQL ends up like this.
SELECT z.*
FROM(
SELECT y.*, ROWNUM rn
FROM (
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where table1.id = table2.id
) y
WHERE ROWNUM <= 50) z
WHERE rn > 0
This throws an exception: "ORA-00918: column ambiguously defined" because both Table1 and Table2 contains a field with the same name ("id").
What could be the best way to avoid this?
Regards.
UPDATE
In the end, we had to go for the ugly way and parse each SQL coming before executing them. Basically, we resolved asterisks to discover what fields we needed to add, and alias every field with an unique id. This introduced a performance penalty but our client understood it was the only option given the requirements.
I will mark Lex answer as it´s the solution we ended up working on.
I think you have to specify aliasses for (at least one of) table1.id and table2.id. And possibly for any other corresponding columnnames as well.
So instead of SELECT t1.*, t2.* FROM table1 t1, table2 use something like:
SELECT t1.id t1id, t2.id t2id [rest of columns] FROM table1 t1, table2 t2
I'm not familiar with Oracle syntax, but I think you'll get the idea.
I was searching for an answer to something similar. I was referencing an aliased sub-query that had a couple of NULL columns. I had to alias the NULL columns because I had more than one;
select a.*, t2.column, t2.column, t2.column
(select t1.column, t1.column, NULL, NULL, t1.column from t1
where t1='VALUE') a
left outer join t2 on t2.column=t1.column;
Once i aliased the NULL columns in the sub-query it worked fine.
If you could modify the query syntactically (or get the users to do so) to use explicit JOIN syntax with the USING clause, this would automatically fix the problem at hand:
SELECT t1.*, t2.*
FROM table1 t1
JOIN table2 t2 USING (id)
The USING clause does the same as ON t1.id = t2.id (or the implicit JOIN you have in the question), except that only one id column remains in the result, thereby eliminating your problem.
You would still run into problems if there are more columns with identical names that are not included in the USING clause. Aliases as described by #Lex are indispensable then.
Use replace null values function to fix this.
SELECT z.*
FROM(
SELECT y.*, ROWNUM rn
FROM (
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where
NVL(table1.id,0) = NVL(table2.id,0)
) y
WHERE ROWNUM <= 50) z
WHERE rn > 0

In SQL, how can I perform a "subtraction" operation?

Suppose I have two tables, which both have user ids. I want to perform an operation that would return all user IDS in table 1 that are not in table 2. I know there has to be some easy way to do this - can anyone offer some assistance?
Its slow, but you can normally accomplish this with something like 'not in'. (There are other functions in various RDBMS systems to do this in better ways, Oracle for instance has a 'exists' clause that can be used for this.
But you could say:
select id from table1 where id not in (select id from table2)
There are a few ways to do it. Here's one approach using NOT EXISTS:
SELECT userid
FROM table1
WHERE NOT EXISTS
(
SELECT *
FROM table2
WHERE table1.userid = table2.userid
)
And here's another approach using a join:
SELECT table1.userid
FROM table1
LEFT JOIN table2
ON table1.userid = table2.userid
WHERE table2.userid IS NULL
The fastest approach depends on the database.
One way is to use EXCEPT if your TSQL dialect supports it. It is equivalent to performing a left join and null test
SELECT user_id FROM table1 LEFT JOIN table2 ON table1.user_id = table2.user_id WHERE table2.user_id IS NULL;
If it is
SQL Server:
SELECT id FROM table1
EXCEPT
SELECT id FROM table2
Oracle:
SELECT id FROM table1
MINUS
SELECT id FROM table2
Rest: Am not sure....
Try this:
SELECT id FROM table1 WHERE id NOT IN
(
SELECT id FROM table2
)
select ID from table1
where ID not in (select ID from table2)

SQL query for Top 5 of every category

I have a table that has three columns: Category, Timestamp and Value.
What I want is a SQL select that will give me the 5 most recent values of each category. How would I go about and do that?
I tried this:
select
a."Category",
b."Timestamp",
b."Value"
from
(select "Category" from "Table" group by "Category" order by "Category") a,
(select a."Category", c."Timestamp", c."Value" from "Table" c
where c."Category" = a."Category" limit 5) b
Unfortunately, it won't allow it because "subquery in FROM cannot refer to other relations of same query level".
I'm using PostGreSQL 8.3, by the way.
Any help will be appreciated.
SELECT t1.category, t1.timestamp, t1.value, COUNT(*) as latest
FROM foo t1
JOIN foo t2 ON t1.id = t2.id AND t1.timestamp <= t2.timestamp
GROUP BY t1.category, t1.timestamp
HAVING latest <= 5;
Note: Try this out and see if it performs suitably for your needs. It will not scale well for large groups.