Performance for multiple partitioned subqueries - sql

I have a database with one main table and multiple history/log tables that stores the evolution over time of properties of some rows of main table. These properties are not stored on the main table itself, but must be queried from the relevant history/log table. All these tables are big (on the order of gigabytes).
I want to dump the whole main table and join the last entry of all the history/log tables.
Currently I do it via subqueries as follows:
WITH
foo AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY date DESC) AS rownumber,
...
FROM table1),
bar AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY date DESC) AS rownumber,
...
FROM table2)
SELECT
...
FROM maintable mt
JOIN foo foo ON foo.itemid = mt.itemid AND foo.rownumber = 1
JOIN bar bar ON foo.itemid = mt.itemid AND bar.rownumber = 1
WHERE ...
The problem is that this is very slow. Is there a faster solution to this problem?
I am only allowed to perform read-only queries on this database: I can not make any changes to it.

In actual Oracle versions it's usually better to use laterals/CROSS APPLY, because CBO (oracle cost-based optimizer) can transform them (DCL - lateral view decorrelation transformation) and use optimal join method depending on your circumstances/conditions (table statistics, cardinality, etc).
So it would be something like this:
SELECT
...
FROM maintable mt
CROSS APPLY (
SELECT *
FROM table1
WHERE table1.itemid = mt.itemid
ORDER BY date DESC
fetch first 1 row only
)
CROSS APPLY (
SELECT *
FROM table2
WHERE table2.itemid = mt.itemid
ORDER BY date DESC
fetch first 1 row only
)
WHERE ...
PS. You haven't specified your oracle version, so my answer is for Oracle 12+

Related

Foreach/per-item iteration in SQL

I'm new to SQL and I think I must just be missing something, but I can't find any resources on how to do the following:
I have a table with three relevant columns: id, creation_date, latest_id. latest_id refers to the id of another entry (a newer revision).
For each entry, I would like to find the min creation date of all entries with latest_id = this.id. How do I perform this type of iteration in SQL / reference the value of the current row in an iteration?
select
t.id, min(t2.creation_date) as min_creation_date
from
mytable t
left join
mytable t2 on t2.latest_id = t.id
group by
t.id
You could solve this with a loop, but it's not anywhere close the best strategy. Instead, try this:
SELECT tf.id, tf.Creation_Date
FROM
(
SELECT t0.id, t1.Creation_Date,
row_number() over (partition by t0.id order by t1.creation_date) rn
FROM [MyTable] t0 -- table prime
INNER JOIN [MyTable] t1 ON t1.latest_id = t0.id -- table 1
) tf -- table final
WHERE tf.rn = 1
This connects the id to the latest_id by joining the table to itself. Then it uses a windowing function to help identify the smallest Creation_Date for each match.

select rows in sql with latest date from 3 tables in each group

I'm creating PREDICATE system for my application.
Please see image that I already
I have a question how can I select rows in SQL with latest date "Taken On" column tables for each "QuizESId" columns, before that I am understand how to select it but it only using one table, I learn from this
select rows in sql with latest date for each ID repeated multiple times
Here is what I have already tried
SELECT tt.*
FROM myTable tt
INNER JOIN
(SELECT ID, MAX(Date) AS MaxDateTime
FROM myTable
GROUP BY ID) groupedtt ON tt.ID = groupedtt.ID
AND tt.Date = groupedtt.MaxDateTime
What I am confused about here is how can I select from 3 tables, I hope you can guide me, of course I need a solution with good query and efficient performance.
Thanks
This is for SQL Server (you didn't specify exactly what RDBMS you're using):
if you want to get the "latest row for each QuizId" - this sounds like you need a CTE (Common Table Expression) with a ROW_NUMBER() value - something like this (updated: you obviously want to "partition" not just by QuizId, but also by UserName):
WITH BaseData AS
(
SELECT
mAttempt.Id AS Id,
mAttempt.QuizModelId AS QuizId,
mAttempt.StartedAt AS StartsOn,
mUser.UserName,
mDetail.Score AS Score,
RowNum = ROW_NUMBER() OVER (PARTITION BY mAttempt.QuizModelId, mUser.UserName
ORDER BY mAttempt.TakenOn DESC)
FROM
UserQuizAttemptModels mAttempt
INNER JOIN
AspNetUsers mUser ON mAttempt.UserId = muser.Id
INNER JOIN
QuizAttemptDetailModels mDetail ON mDetail.UserQuizAttemptModelId = mAttempt.Id
)
SELECT *
FROM BaseData
WHERE QuizId = 10053
AND RowNum = 1
The BaseData CTE basically selects the data (as you did) - but it also adds a ROW_NUMBER() column. This will "partition" your data into groups of data - based on the QuizModelId - and it will number all the rows inside each data group, starting at 1, and ordered by the second condition - the ORDER BY clause. You said you want to order by "Taken On" date - but there's no such date visible in your query - so I just guessed it might be on the UserQuizAttemptModels table - change and adapt as needed.
Now you can select from that CTE with your original WHERE condition - and you specify, that you want only the first row for each data group (for each "QuizId") - the one with the most recent "Taken On" date value.

optimizing row number query

I am using sql server 2008 r2 and had below query
select * from
(
select d.ID as ID,
....
....
ROW_NUMBER() OVER
(
ORDER BY #some field
) AS RowNum
from
/*some table*/
LEFT join
(select Device_ID,
Level,
ROW_NUMBER() over (partition by Device_ID order by id desc) as rn
from #sometable as de WITH (NOLOCK)
where #some condition
) t
where t.rn = 1)tmp on ID=tmp.Device_ID **/* sort operation 1*/**
/*some more joins */
WHERE /*some condition*/
) as DbD
where RowNum BETWEEN #SkipRowsLocal and (#SkipRowsLocal + #TakeRowsLocal - 1)
order by RowNum
I am trying to implement pagination kind of query from sample
http://blog.sqlauthority.com/2013/04/14/sql-server-tricks-for-row-offset-and-paging-in-various-versions-of-sql-server/
but looks like it's executing very slow, when I looked into query plan sort operation is consuming almost 50% of query time and i guess its the 1st sort operation which I marked as 1, basically in temp table t I want to retrieve latest value and in outer row number I wanted to fetch say only 40 records.
It's basically like sorting 10K rows and then taking 40 out of it, is there is any way we can improve this query?
instead of a left join, try an OUTER APPLY, to get the TOP 1 of the device you are interested in - think of it as 'I get a record from the first table, then for that record, I go and get the TOP 1 for that device ID in the other table, but remembering that in the second table, you need to search by the data from the first table, rather than a join
select * from
(
select d.ID as ID,
....
....
ROW_NUMBER() OVER
(
ORDER BY #some field
) AS RowNum
from
/*some table*/
OUTER APPLY
(select TOP 1 Device_ID,
Level
from #sometable as de WITH (NOLOCK)
where #some condition and Device_ID = [/*some table*/].id ORDER BY id DESC
) t
)tmp
/*some more joins */
WHERE /*some condition*/
) as DbD
where RowNum BETWEEN #SkipRowsLocal and (#SkipRowsLocal + #TakeRowsLocal - 1)
order by RowNum
something on those lines, you might want to build just the relevant parts of the query and compare them, it would seem to avoid sorting and with an index might well be quick

Merge two unrelated views into a single view

Let's say I have in my first view (ClothingID, Shoes, Shirts)
and in the second view I have (ClothingID, Shoes, Shirts) HOWEVER
the data is completely unrelated, even the ID field is not related in anyway.
I want them combined into 1 single view for reporting purposes.
so the 3rd view (the one I'm trying to make) should look like this: (ClothingID, ClothingID2, Shoes, Shoes2, Shirts, Shirts2)
so there's no relation AT ALL, I'm just putting them side by side, unrelated data into the same view.
Any help would be strongly appreciated
You want to combine the results, yet be able to tell the rows apart.
To duplicate all columns would be a bit of an overkill. Add a column with info about the source:
SELECT 'v1'::text AS source, clothingid, shoes, shirts
FROM view1
UNION ALL
SELECT 'v2'::text AS source, clothingid, shoes, shirts
FROM view2;
select v1.ClothingID, v2.ClothingID as ClothingID2, v1.Shoes, v2.Shoes as Shoes2,
v1.Shirts, v2.Shirts as Shirts2
from (
select *, row_number() OVER (ORDER BY ClothingID) AS row
from view_1
) v1
full outer join (
select *, row_number() OVER (ORDER BY ClothingID) AS row
from view_2
) v2 on v1.row = v2.row
I think that full outer join that joins table using new unrelated column row will do the job.
row_number() exists in PostgreSQL 8.4 and above.
If you have lower version you can imitate row_number, example below. It's going to work only if ClothingID is unique in a scope of view.
select v1.ClothingID, v2.ClothingID as ClothingID2, v1.Shoes, v2.Shoes as Shoes2,
v1.Shirts, v2.Shirts as Shirts2
from (
select *, (select count(*) from view_1 t1
where t1.ClothingID <= t.ClothingID) as row
from view_1 t
) v1
full outer join (
select *, (select count(*) from view_2 t2
where t2.ClothingID <= t.ClothingID) as row
from view_2 t
) v2 on v1.row = v2.row
Added after comment:
I've noticed and corrected mistake in preceding query.
I'll try to explain a bit. First of all we'll have to add a row numbers to both views to make sure that there are no gaps in id's. This is quite simple way:
select *, (select count(*) from view_1 t1
where t1.ClothingID <= t.ClothingID) as row
from view_1 t
This consist of two things, simple query selecting rows(*):
select *
from view_1 t
and correlated subquery (read more on wikipedia):
(
select count(*)
from view_1 t1
where t1.ClothingID <= t.ClothingID
) as row
This counts for each row of outer query (here it's (*)) preceding rows including self. So you might say count all rows which have ClothingID less or equal like current row for each row in view. For unique ClothingID (that I've assumed) it gives you row numbering (ordered by ClothingID).
Live example on data.stackexchange.com - row numbering.
After that we can use both subqueries with row numbers to join them (full outer join on Wikipedia), live example on data.stackexchange.com - merge two unrelated views.
You could use Rownumber as a join parameter, and 2 temp tables?
So something like:
Insert #table1
SELECT ROW_NUMBER() OVER (ORDER BY t1.Clothing_ID ASC) [Row_ID], Clothing_ID, Shoes, Shirts)
FROM Table1
Insert #table2
SELECT ROW_NUMBER() OVER (ORDER BY t1.Clothing_ID ASC)[RowID], Clothing_ID, Shoes, Shirts)
FROM Table2
Select t1.Clothing_ID, t2.Clothing_ID,t1.Shoes,t2.Shoes, t1.Shirts,t2.Shirts
from #table1 t1
JOIN atable2 t2 on t1.Row_ID = t2.Row_ID
I think that should be roughly sensible. Make sure you are using the correct join so the full output for both queries appear
e;fb
If the views are unrelated, SQL will struggle to deal with it. You can do it, but there's a better and simpler way...
I suggest merging them one after the other, rather than side-by-side as you have suggested, ie a union rather than a join:
select 'view1' as source, ClothingID, Shoes, Shirts
from view1
union all
select 'view2', ClothingID, Shoes, Shirts
from view2
This would be the usual approach for this kind of situation, and is simple to code and understand.
Note the use of UNION ALL, which preserves row order as selected and does not remove duplicates, as opposed to UNION, which sorts the rows and removes duplicates.
Edited
Added a column indicating which view the row came from.
You can try following:
SELECT *
FROM (SELECT row_number() over(), * FROM table1) t1
FULL JOIN (SELECT row_number() over(), * FROM table2) t2 using(row_number)

sql query to get earliest date

If I have a table with columns id, name, score, date
and I wanted to run a sql query to get the record where id = 2 with the earliest date in the data set.
Can you do this within the query or do you need to loop after the fact?
I want to get all of the fields of that record..
If you just want the date:
SELECT MIN(date) as EarliestDate
FROM YourTable
WHERE id = 2
If you want all of the information:
SELECT TOP 1 id, name, score, date
FROM YourTable
WHERE id = 2
ORDER BY Date
Prevent loops when you can. Loops often lead to cursors, and cursors are almost never necessary and very often really inefficient.
SELECT TOP 1 ID, Name, Score, [Date]
FROM myTable
WHERE ID = 2
Order BY [Date]
While using TOP or a sub-query both work, I would break the problem into steps:
Find target record
SELECT MIN( date ) AS date, id
FROM myTable
WHERE id = 2
GROUP BY id
Join to get other fields
SELECT mt.id, mt.name, mt.score, mt.date
FROM myTable mt
INNER JOIN
(
SELECT MIN( date ) AS date, id
FROM myTable
WHERE id = 2
GROUP BY id
) x ON x.date = mt.date AND x.id = mt.id
While this solution, using derived tables, is longer, it is:
Easier to test
Self documenting
Extendable
It is easier to test as parts of the query can be run standalone.
It is self documenting as the query directly reflects the requirement
ie the derived table lists the row where id = 2 with the earliest date.
It is extendable as if another condition is required, this can be easily added to the derived table.
Try
select * from dataset
where id = 2
order by date limit 1
Been a while since I did sql, so this might need some tweaking.
Using "limit" and "top" will not work with all SQL servers (for example with Oracle).
You can try a more complex query in pure sql:
select mt1.id, mt1."name", mt1.score, mt1."date" from mytable mt1
where mt1.id=2
and mt1."date"= (select min(mt2."date") from mytable mt2 where mt2.id=2)