tune query to avoid using aggregate function with order by - sql

how can i rewrite the below query to avoid the use of aggegate function with order by :
select id from my_table where flow='REQ' and audit_time <= '04-Jul-2014' group by (id) order by min(audit_time);
since i have a large data set, the above query is causing performance issues.
below is a sample data:
pk | id | audit_time | flow
1 | 1 | 10-Jul-2014 | REQ
2 | 1 | 05-Jul-2014 | REQ
3 | 2 | 03-Jul-2014 | REQ
4 | 2 | 01-Jul-2014 | RES
5 | 1 | 04-Jul-2014 | RES
In output, i want to have the unique id list sorted by time.

You don't need group by id because id will be unique(It should be) and you are not joining any table
then you just have to order it by audit_time. And as you are not doing group by you don't need to use function in order by.
select id from my_table where flow='REQ' and audit_time <= '04-Jul-2014' order by audit_time;
If your ID is not unique and you want to order by minimum audit_time then your query is the only way(IMO).

Related

ORACLE SELECT DISTINCT VALUE ONLY IN SOME COLUMNS

+----+------+-------+---------+---------+
| id | order| value | type | account |
+----+------+-------+---------+---------+
| 1 | 1 | a | 2 | 1 |
| 1 | 2 | b | 1 | 1 |
| 1 | 3 | c | 4 | 1 |
| 1 | 4 | d | 2 | 1 |
| 1 | 5 | e | 1 | 1 |
| 1 | 5 | f | 6 | 1 |
| 2 | 6 | g | 1 | 1 |
+----+------+-------+---------+---------+
I need get a select of all fields of this table but only getting 1 row for each combination of id+type (I don't care the value of the type). But I tried some approach without result.
At the moment that I make an DISTINCT I cant include rest of the fields to make it available in a subquery. If I add ROWNUM in the subquery all rows will be different making this not working.
Some ideas?
My better query at the moment is this:
SELECT ID, TYPE, VALUE, ACCOUNT
FROM MYTABLE
WHERE ROWID IN (SELECT DISTINCT MAX(ROWID)
FROM MYTABLE
GROUP BY ID, TYPE);
It seems you need to select one (random) row for each distinct combination of id and type. If so, you could do that efficiently using the row_number analytic function. Something like this:
select id, type, value, account
from (
select id, type, value, account,
row_number() over (partition by id, type order by null) as rn
from your_table
)
where rn = 1
;
order by null means random ordering of rows within each group (partition) by (id, type); this means that the ordering step, which is usually time-consuming, will be trivial in this case. Also, Oracle optimizes such queries (for the filter rn = 1).
Or, in versions 12.1 and higher, you can get the same with the match_recognize clause:
select id, type, value, account
from my_table
match_recognize (
partition by id, type
all rows per match
pattern (^r)
define r as null is null
);
This partitions the rows by id and type, it doesn't order them (which means random ordering), and selects just the "first" row from each partition. Note that some analytic functions, including row_number(), require an order by clause (even when we don't care about the ordering) - order by null is customary, but it can't be left out completely. By contrast, in match_recognize you can leave out the order by clause (the default is "random order"). On the other hand, you can't leave out the define clause, even if it imposes no conditions whatsoever. Why Oracle doesn't use a default for that clause too, only Oracle knows.

More efficient way to query shortest string value associated with each value in another column in Hive QL

I have a table in Hive containing store names, order IDs, and User IDs (as well as some other columns including item ID). There is a row in the table for every item purchased (so there can be more than one row per order if the order contains multiple items). Order IDs are unique within a store, but not across stores. A single order can have more than one user ID associated with it.
I'm trying to write a query that will return a list of all stores and order IDs and the shortest user ID associated with each order.
So, for example, if the data looks like this:
STORE | ORDERID | USERID | ITEMID
------+---------+--------+-------
| a | 1 | bill | abc |
| a | 1 | susan | def |
| a | 2 | jane | abc |
| b | 1 | scott | ghi |
| b | 1 | tony | jkl |
Then the output would look like this:
STORE | ORDERID | USERID
------+---------+-------
a | 1 | bill
a | 2 | jane
b | 1 | tony
I've written a query that will do this, but I feel like there must be a more efficient way to go about it. Does anybody know a better way to produce these results?
This is what I have so far:
select
users.store, users.orderid, users.userid
from
(select
store, orderid, userid, length(userid) as len
from
sales) users
join
(select distinct
store, orderid,
min(length(userid)) over (partition by store, orderid) as len
from
sales) len on users.store = len.store
and users.orderid = len.orderid
and users.len = len.len
Check out probably this will work for you, here you can achieve your goal of single "SELECT" clause with no extra overhead on SQL.
select distinct
store, orderid,
first_value(userid) over(partition by store, orderid order by length(userid) asc) f_val
from
sales;
The result will be:
store orderid f_val
a 1 bill
a 2 jane
b 1 tony
Probably rank() is the best way:
select s.*
from (select s.*, rank() over (partition by store order by length(userid) as seqnum
from sales s
) s
where seqnum = 1;

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

How to search max value from group in sql

I am just learning some SQL, so I have a question.
-I have a table with name TABL
-a variable :ccname which has a value "Bottle"
The table is as follows:
+----------+---------+-------+--------+
| Name | Price | QTY | CODE |
+----------+---------+-------+--------+
| Rope | 3.6 | 35 | 236 |
| Chain | 2.8 | 15 | 237 |
| Paper | 1.6 | 45 | 124 |
| Bottle | 4.5 | 41 | 478 |
| Bottle | 1.8 | 12 | 123 |
| Computer | 1450.75 | 71 | 784 |
| Spoon | 0.7 | 10 | 412 |
| Bottle | 1.3 | 15 | 781 |
| Rope | 0.9 | 14 | 965 |
+----------+---------+-------+--------+
Now I want to find the CODE from the variable :ccname with the higher quantity! So I translated like this:
SELECT CODE
FROM TABL
GROUP BY :ccname
WHERE QTY=MAX(QTY)
In a perfect world that would turn as a result 478.
In the SQL world what should I write in order to get 478?
You probably want something like that:
SELECT code
FROM TABL
WHERE Name=:ccname
ORDER BY QTY DESC
LIMIT 1
The idea is we find all rows of the table whose Name column is the same as the contents of the variable :ccname, then order them by the quantity in descending order, and filally we select first one, which has to be the one with the largest quantity because they are sorted in descending order.
Try this
SELECT CODE
FROM TABLENAme
WHERE QTY = (SELECT MAX(QTY) FROM TablName WHERE Name = :ccname)
Use ORDER BY, a proper WHERE, and the something to limit the result set to one row:
SELECT CODE
FROM TABL
WHERE name = :ccname
ORDER BY QTY DESC
FETCH FIRST 1 ROW ONLY;
Note: Some databases spell the ANSI standard FETCH FIRST 1 ROW ONLY as LIMIT or as SELECT TOP 1.
Depending on your specific database, you can use one of the following options to restrict your result set to a single value after ordering your existing columns through an ORDER BY clause:
SELECT TOP 1
LIMIT 1
FETCH FIRST 1 ROW ONLY
Syntax Examples
SELECT TOP 1 Code
FROM TABL
WHERE Name = :ccname
ORDER BY QTY DESC
or
SELECT Code
FROM TABL
WHERE Name = :ccname
ORDER BY QTY DESC
LIMIT 1
or
SELECT CODE
FROM TABL
WHERE Name = :ccname
ORDER BY QTY DESC
FETCH FIRST 1 ROW ONLY;
Using join can also effectively solve the question:
Select t1.Code
From TABL As t1 Join (
Select Name, Max(table.QTY) as MaxQTY
From TABL
Where Name = :ccname
Group by Name
) As t2
Where t1.QTY = t2.MaxQTY And t1.Name = t2.Name
Explanation:
You first calculate the maximum value for "Bottle" using the subquery and then join the two tables to select corresponding row with MaxQTY and same name.

SQL SELECT only rows where a max value is present, and the corresponding ID from another linked table

I have a simple Parts database which I'd like to use for calculating costs of assemblies, and I need to keep a cost history, so that I can update the costs for parts without the update affecting historic data.
So far I have the info stored in 2 tables:
tblPart:
PartID | PartName
1 | Foo
2 | Bar
3 | Foobar
tblPartCostHistory
PartCostHistoryID | PartID | Revision | Cost
1 | 1 | 1 | £1.00
2 | 1 | 2 | £1.20
3 | 2 | 1 | £3.00
4 | 3 | 1 | £2.20
5 | 3 | 2 | £2.05
What I want to end up with is just the PartID for each part, and the PartCostHistoryID where the revision number is highest, so this:
PartID | PartCostHistoryID
1 | 2
2 | 3
3 | 5
I've had a look at some of the other threads on here and I can't quite get it. I can manage to get the PartID along with the highest Revision number, but if I try to then do anything with the PartCostHistoryID I end up with multiple PartCostHistoryIDs per part.
I'm using MS Access 2007.
Many thanks.
Mihai's (very concise) answer will work assuming that the order of both
[PartCostHistoryID] and
[Revision] for each [PartID]
are always ascending.
A solution that does not rely on that assumption would be
SELECT
tblPartCostHistory.PartID,
tblPartCostHistory.PartCostHistoryID
FROM
tblPartCostHistory
INNER JOIN
(
SELECT
PartID,
MAX(Revision) AS MaxOfRevision
FROM tblPartCostHistory
GROUP BY PartID
) AS max
ON max.PartID = tblPartCostHistory.PartID
AND max.MaxOfRevision = tblPartCostHistory.Revision
SELECT PartID,MAX(PartCostHistoryID) FROM table GROUP BY PartID
Here is query
select PartCostHistoryId, PartId from tblCost
where PartCostHistoryId in
(select PartCostHistoryId from
(select * from tblCost as tbl order by Revision desc) as tbl1
group by PartId
)
Here is SQL Fiddle http://sqlfiddle.com/#!2/19c2d/12