How to get the set size, first and last record in a db2 ordered set with one call - sql

I have a very big transaction table on DB2 v11, and I need to query a subset of it as efficiently as possible. All I need is the total count of the set (not known in advance, it's based on criteria, lets say 1 day) and the ID of the first record, and the ID of the last record.
The old code was fetching the entire table, then just using the 1st record ID, and the last record ID, and size, and not making use of the rest. Now this code is timing out. It's a complex query of several joins.
IS there a way to just fetch the size of the set, 1st record, last record all in one select query ?
I've read that reordering the list in order to fetch the 1st record(so fetch with Desc, then change to Asc) is not efficient.
sample table 1 TRANSACTION_RECORDS:
tdID TIMESTAMP name
-------------------------------
123 2020-03-31 john
234 2020-03-31 dan
456 2020-03-01 Eve
675 2020-04-01 joy
sample table 2 TRANSACTION_TYPE:
invoiceId tdID account
------------------------------
897 123 abc
898 123 def
877 234 mnc
899 456 opp
Sample query
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
group by tr.tdID
order by TR.tdID ASC
This results in multiple columns, (but it requires the group by)
123,123
234,234
456,456
What I want is:
123,456

As I mentioned in the comments, for this query you don't need Group BY and neither Order by, just do:
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
It should work as expected

Related

Oracle SQL self join performance

lets say I have a table called order with following data. I need to get the customer_name along with the no. of orders they have placed.
Table name: order
id | customer_name | item
1 | Siddhant | TV
2 | Siddhant | Mobile
3 | Sankalp | Football
Desired output:
customer_name | no_of_orders
Siddhant | 2
Sankalp | 1
I tried below 2 queries to get the result:
select customer_name, count(customer_name) as no_of_orders
from order
group by customer_name;
This gives me the correct result but takes around ~10.5 secs to run
select ord.customer_name, count(ord1.customer_name) as no_of_orders
from order ord
inner join order ord1 on ord1.customer_name = ord.customer_name
group by ord.customer_name;
This gives me the square(correct count) in the result but runs in ~2 secs. I can get the square root to get the actual count.
I understand why the second query gives the square of the actual count in the output but can someone explain why it runs so fast compared to the first query?
PS: I am running these in Oracle SQL Developer.
The first version is the one you should be using here:
SELECT customer_name, COUNT(customer_name) AS no_of_orders
FROM "order"
GROUP BY customer_name;
Absent a WHERE or HAVING clause, adding an index might not be too helpful here, because Oracle has to basically touch every record in the table in order to do the aggregation. As to why the second version appears to be faster, I speculate that the benchmarks you are using are not representative, because they are based on a fairly small table size. If you scale your table data to be in the tens of thousands of rows, I predict that the first version will be substantially faster.

Display related rows in same row in MSaccess

I have a set of related rows which I need to display in a single line. For example, the data I have is in different rows.
"ID" RecordDate "ExpType" "OrigBudget" "ActualCost"
1001 1-5-2017 Hardware $ 5000
1001 2-6-2017 Hardware $ 5200
The Original budget is approved at an earlier time for the same record but the Actual cost often differs and is recorded at a later date. I want the output as
ProjectID YearofEntry ExpenseType OrgBudget ActualCost <BR>
1001 2017 Hardware $ 5000 $ 5200 <BR>
I have tried group query to aggregate it based on ExpenseType and ProjectId but not successful in getting it into a single row so far.
if you always just have two rows for each ExpType - one with the original budget and one with the actual costs - you could simply use a GROUP BY:
SELECT ID AS ProjectID
,YEAR(RecordDate) AS YearofEntry
,ExpType AS ExpenseType
,MAX(OrigBudget) AS OrgBudget
,MAX(ActualCost) AS ActualCost
FROM yourtable
GROUP BY ID
,YEAR(RecordDate)
,ExpType
Try This:
SELECT ID,
Year([RecordDate]) AS YEARofEntry,
ExpType,
Sum(OrigBudget) AS SumOfOrigBudget,
Sum(ActualCost) AS SumOfActualCost
FROM youtable
GROUP BY ID,
Year([RecordDate]),
ExpType;

oracle - sql query select max from each base

I'm trying to solve this query where i need to find the the top balance at each base. Balance is in one table and bases are in another table.
This is the existing query i have that returns all the results but i need to find a way to limit it to 1 top result per baseID.
SELECT o.names.name t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY o.names.name, t.accounts.bidd.baseID;
accounts is a nested table.
this is the output
Name accounts.BIDD.baseID MAX(T.accounts.BALANCE)
--------------- ------------------------- ---------------------------
Jerard 010 1251.21
john 012 3122.2
susan 012 3022.2
fin 012 3022.2
dan 010 1751.21
What i want the result to display is calculate the highest balance for each baseID and only display one record for that baseID.
So the output would look only display john for baseID 012 because he has the highest.
Any pointers in the right direction would be fantastic.
I think the problem is cause of the "Name" column. since you have three names mapped to one base id(12), it is considering all three records as unique ones and grouping them individually and not together.
Try to ignore the "Name" column in select query and in the "Group-by" clause.
SELECT t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY t.accounts.bidd.baseID;

GROUP BY and aggregate function query

I am looking at making a simple leader board for a time trial. A member may perform many time trials, but I only want for their fastest result to be displayed. My table columns are as follows:
Members { ID (PK), Forename, Surname }
TimeTrials { ID (PK), MemberID, Date, Time, Distance }
An example dataset would be:
Forename | Surname | Date | Time | Distance
Bill Smith 01-01-11 1.14 100
Dave Jones 04-09-11 2.33 100
Bill Smith 02-03-11 1.1 100
My resulting answer from the example above would be:
Forename | Surname | Date | Time | Distance
Bill Smith 02-03-11 1.1 100
Dave Jones 04-09-11 2.33 100
I have this so far, but access complains that I am not using Date as part of an aggregate function:
SELECT Members.Forename, Members.Surname, Min(TimeTrials.Time) AS MinOfTime, TimeTrials.Date
FROM Members
INNER JOIN TimeTrials ON Members.ID = TimeTrials.Member
GROUP BY Members.Forename, Members.Surname, TimeTrials.Distance
HAVING TimeTrials.Distance = 100
ORDER BY MIN(TimeTrials.Time);
IF I remove the Date from the SELECT the query works (without the date). I have tried using FIRST upon the TimeTrials.Date, but that will return the first date which is normally incorrect.
Obviously putting the Date as part of the GROUP BY would not return the result set that I am after.
Make this task easier on yourself by starting with a smaller piece of the problem. First get the minimum Time from TimeTrials for each combination of MemberID and Distance.
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance;
Assuming that SQL is correct, use it in a subquery which you join back to TimeTrials again.
SELECT tt2.*
FROM
TimeTrials AS tt2
INNER JOIN
(
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance
) AS sub
ON
tt2.MemberID = sub.MemberID
AND tt2.Distance = sub.Distance
AND tt2.Time = sub.MinOfTime
WHERE tt2.Distance = 100
ORDER BY tt2.Time;
Finally, you can join that query to Members to get Forename and Surname. Your question shows you already know how to do that, so I'll leave it for you. :-)

JavaDB: get ordered records in the subquery

I have the following "COMPANIES_BY_NEWS_REPUTATION" in my JavaDB database (this is some random data just to represent the structure)
COMPANY | NEWS_HASH | REPUTATION | DATE
-------------------------------------------------------------------
Company A | 14676757 | 0.12345 | 2011-05-19 15:43:28.0
Company B | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company C | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company A | -7874564 | 0.12345 | 2011-05-19 15:43:28.0
One news_hash may relate to several companies while a company can relate to several news_hashes as well. Reputation and date are bound to the news_hash.
What I need to do is calculate the average reputation of last 5 news for every company. In order to do that I somehow feel that I need to user 'order by' and 'offset' in a subquery as shown in the code below.
select COMPANY, avg(REPUTATION) from
(select * from COMPANY_BY_NEWS_REPUTATION order by "DATE" desc
offset 0 rows fetch next 5 row only) as TR group by COMPANY;
However, JavaDB allows neither ORDER BY, nor OFFSET in a subquery. Could anyone suggest a working solution for my problem please?
Which version of JavaDB are you using? According to the chapter TableSubquery in the JavaDB documentation, table subqueries do support order by and fetch next, at least in version 10.6.2.1.
Given that subqueries can be ordered and the size of the result set can be limited, the following (untested) query might do what you want:
select COMPANY, (select avg(REPUTATION)
from (select REPUTATION
from COMPANY_BY_NEWS_REPUTATION
where COMPANY = TR.COMPANY
order by DATE desc
fetch first 5 rows only))
from (select distinct COMPANY
from COMPANY_BY_NEWS_REPUTATION) as TR
This query retrieves all distinct company names from COMPANY_BY_NEWS_REPUTATION, then retrieves the average of the last five reputation rows for each company. I have no idea whether it will perform sufficiently, that will likely depend on the size of your data set and what indexes you have in place.
If you have a list of unique company names in another table, you can use that instead of the select distinct ... subquery to retrieve the companies for which to calculate averages.