I am executing a complex query in DB2 and the response time of which is quite high. After a lot of R&D, I found that the repetitive use of the max() function is causing hindrance in the execution time.
Thus I wish to know if there is an alternative for the max() function. I read a bit about rank() and was wondering if the same could be used, but wasn't able to get the result I wanted.
A part of my query is:
Select DISTINCT
(Select MAX(DATE(last_timestamp)) from Table_1 where ID = E.EID)
From Table_2 E
I am stuck at this for too long now. So any help would be appreciated.
row_number() is better than rank() since you don't want duplicates. Here it is. I'm thankful for this site as reference: https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/finding_the_maximum_row_and_more56?lang=en
SELECT DISTINCT last_timestamp
FROM (SELECT ROW_NUMBER() OVER(PARTITION BY A.ID ORDER BY A.last_timestamp DESC) AS rn, DATE(A.last_timestamp) as last_timestamp
FROM Table_1 A, From Table_2 E where A.ID = E.EID)
WHERE rn = 1;
The syntax might not be exact since I did not test it and I don't have DB2 installed.
The issue may be the conversion of the timestamp to a date - DATE() - before doing the MAX() function - not a DB2 expert, but would guess that this may end up creating a temp table of dates that is then MAXed, and indexes are not used...
Suggest that the following be tried:
Select DISTINCT
DATE(
Select MAX(last_timestamp)
from Table_1
where ID = E.EID
)
From Table_2 E
Of course the query optimizer may be smart enough to do this...
Perhaps you can join on the following CTE :
WITH t1(i,t) AS
( SELECT id, MAX(last_timestamp)
FROM Table_1
GROUP BY id )
SELECT
...
DATE(t1.t),
...
FROM Table_2
LEFT JOIN t1 ON Table_2.EID = t1.i
Related
I'm new to SQL and I think I must just be missing something, but I can't find any resources on how to do the following:
I have a table with three relevant columns: id, creation_date, latest_id. latest_id refers to the id of another entry (a newer revision).
For each entry, I would like to find the min creation date of all entries with latest_id = this.id. How do I perform this type of iteration in SQL / reference the value of the current row in an iteration?
select
t.id, min(t2.creation_date) as min_creation_date
from
mytable t
left join
mytable t2 on t2.latest_id = t.id
group by
t.id
You could solve this with a loop, but it's not anywhere close the best strategy. Instead, try this:
SELECT tf.id, tf.Creation_Date
FROM
(
SELECT t0.id, t1.Creation_Date,
row_number() over (partition by t0.id order by t1.creation_date) rn
FROM [MyTable] t0 -- table prime
INNER JOIN [MyTable] t1 ON t1.latest_id = t0.id -- table 1
) tf -- table final
WHERE tf.rn = 1
This connects the id to the latest_id by joining the table to itself. Then it uses a windowing function to help identify the smallest Creation_Date for each match.
I am trying to learn SQL-SERVER and I have created the below query:
WITH T AS
(
SELECT ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num, *
FROM test.db as d
INNER JOIN test.dbs as ds
ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT *
FROM T
WHERE row_num <=10;
I found that the only way to limit is with ROW_NUMBER().
Although when I try to run the join I have this error:
org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [8156] [S0001]: The column 'DIALOG_ID' was specified multiple times for 'T'.
The problem: In the WITH, you do SELECT * which gets all columns from both tables db and dbs. Both have a column DIALOG_ID, so a column by that name ends up twice in the result set of the WITH.
Although until here that is all allowed, it is not good practice: why have the same data twice?
Things go wrong when SQL Server has to determine what SELECT * FROM T means: it expands SELECT * to the actual columns of T, but it finds a duplicate column name, and then it refuses to continue.
The fix (and also highly recommended in general): be specific about the columns that you want to output. If T has no duplicate columns, then SELECT * FROM T will succeed.
Note that the even-more-pure variant is to also be specific about what columns you select from T. By doing that it becomes clear at a glance what the SELECT produces, instead of having to guess or investigate when you look at the query later on (or when someone else does).
The updated code would look like this (fill in your column names as we don't know them):
WITH T AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num,
d.DIALOG_ID, d.SOME_OTHER_COL,
ds.DS_ID, ds.SOME_OTHER_COL_2
FROM test.db AS d
INNER JOIN test.dbs AS ds ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT row_num, DIALOG_ID, SOME_OTHER_COL, DS_ID, SOME_OTHER_COL_2
FROM T
WHERE row_num <= 10;
WITH T AS
(
SELECT ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num, d.*
FROM test.db as d
INNER JOIN test.dbs as ds
ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT *
FROM T
WHERE row_num <=10;
I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id
I need help with this piece:
How can I write the following in HIVE...
SELECT *
FROM tableA
WHERE colA = (SELECT MAX(date_column) FROM tableA)
I need to query only the latest current records from the table. I am storing dates as strings in hive as "yyyy-mm-dd".
Avoid a JOIN, use the analytics and windowing features:
select * from (select *, rank() over (order by date_col desc) as rank
from tableA) S where S.rank = 1;
Something like this might work:
SELECT a.*
FROM tableA a
JOIN (SELECT MAX(date_column) AS max_date_column
FROM tableA) b
ON a.colA = b.max_date_column
Hope it helps
EDIT: I have no idea how I got to this old question, you probably solved it long ago :)
Note that in Hive 0.13+, you can use subqueries in WHERE statements.
I want to get first value in a field in Oracle when another corresponding field has max value.
Normally, we would do this using a query and a subquery. The subquery ordering by a field and the outer query with where rownum<=1.
But, I cannot do this because the table aliases persist only one level deep and this query is a part of another big query and I need to use some aliases from the outermost query.
Here's the query structure
select
(
select a --This should get first value of a after b's are sorted desc
from
(
select a,b from table1 where table1.ID=t2.ID order by b desc
)
where rownum<=1
)
) as "A",
ID
from
table2 t2
Now this is not gonna work because alias t2 wont be available at innermost query.
Real world analogy that comes to my mind is I have a table containing records for all employees of a company, their salaries(including past salaries) and the date from which the salary was effective. So, for each employee, there will multiple records. Now, I want to get latest salaries for all the employees.
With SQL server, I could have used SELECT TOP. But that's not available with Oracle and since where clauses execute before order by, I cannot use where rownum<=1 and order by in same query and expect correct results.
How do I do this?
Using your analogy of employees and their salaries, if I understand what you are trying to do, you could do something like this (haven't tested):
SELECT *
FROM (
SELECT employee_id,
salary,
effective_date,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY effective_date DESC) rowno
FROM employees
)
WHERE rowno=1
I would much rather see you connect the subquery up with a JOIN instead of embedding it in the SELECT. Cleaner SQL. Then you can use the windowing function that roartechs suggests.
Select t2.whatever, t1.a
From table2 t2
Inner Join (
Select tfirst.ID, tfirst.a
From (
Select ID, a,
ROW_NUMBER() Over (Partition BY ID ORDER BY b DESC) rownumber
FROM table1
) tfirst
WHERE tfirst.rownumber=1
) t1 on t2.ID=t1.ID