What is the best practice to use aggregation function with a view in sql query with a huge data
select name , sum(value) from v_transactions Group by name
this view have 150,000 Record
If the view is presenting 150k rows, but take 5 minutes to aggregate them, the problem is the performance of the view not the aggregation that you're doing.
SELECT * FROM v_transactions will probably take almost exactly the same amount of time
It is possible that the view was only designed to be queried for a small number of rows
FROM v_transaction WHERE some_key = some_value
In which case, trying to query the view for all of its rows may be "misusing" it.
It is possible that the view is doing work unrelated to your query
Complicated calculations and joins to generate other fields
Again, if the view exists specifically to generate fields that you're not using, your use of the view may be considered "misuse".
In any case, as it's a view there will be an alternative source to aggregate from. The could be other views or tables that v_transactions derives it's data from.
Ask around, find out what views, tables and functions you have access to. You may or may not have access to what you want, but until you investigate you're not going to find out.
Another possibility is that the name field is the expensive calculation. Is there another field you can group by instead? Some id or key that corresponds to the name? This would give you several options to test for performance...
SELECT some_id, SUM(value) FROM v_transactions GROUP BY some_id
SELECT some_id, name, SUM(value) FROM v_transactions GROUP BY some_id, name
SELECT some_id, MAX(name), SUM(value) FROM v_transactions GROUP BY some_id
etc, etc
Related
I would like to ask if it is possible to get dynamically Count of distinct fields using ABAP.
Key in our CDS has 9 fields which is quite a lot but it is not possible to split because of historical decisions. What I need is code like below:
select count(distinct (lv_requested_elements)) from CDS_VIEW;
or
select count(*) from (select distinct lv_requested_elements from CDS_VIEW);
I know that it is possible to read the select into memory and get sy-dbcnt but I want to be sure that there is no other option.
I assume that most simple and straightforward way is to read the smallest field into memory and then count by grouped (distinctified) rows:
DATA(fields) = ` BLART, BLDAT, BUDAT`.
DATA: lt_count TYPE TABLE OF string.
SELECT (fields(6))
INTO TABLE #lt_count
FROM ('BKPF')
GROUP BY (fields).
DATA(count) = sy-dbcnt.
CTE, that was mentioned, uses the same memory read, so you'll receive no performance gain:
A common table expression creates a temporary tabular results set, which can be accessed during execution of the WITH statement
If you going to count this key combination frequently, I propose to create consumption or nested CDS view which will do this on-the-fly.
I have a table with invoice payments, which can be partial or full. I am comparing this calculated field to the total amount of the invoice. I have it twice in the query, once in the Select statement and again in the Where clause. Even if I remove one so it's only in either the Where or the Select, it takes more than an hour to run. If I remove the SUM entirely, it takes 10 seconds to run.
Is there a better method to get the sum? Should I use an index view? A temp table? Note that an invoice number is unique only to a vendor, not unique in general. The initial FROM is a view, if this makes a difference.
select distinct
transdate,
invoicedate,
PAY.OrderAccount,
v.VendorName,
invoiceamountmst,
(select sum(PAY1.settleamountcur) from [VIEW_INVOICE_PAYMENT] PAY1 where PAY.INVOICEID=PAY1.INVOICEID and PAY.OrderAccount=PAY1.OrderAccount) as "InvoiceSUM",
settleamountcur,
Currencycodeinvoice,
PAY.Description,
Voucher
from VIEW_INVOICE_PAYMENT PAY
inner join INVOICE on INVOICE_DOC_NO =invoiceid
JOIN VENDOR V on PAY.OrderAccount=v.VendorAccount
where TRANSDATE is not null
and (select sum(PAY1.settleamountcur) from [VIEW_INVOICE_PAYMENT] PAY1 where PAY.INVOICEID=PAY1.INVOICEID and PAY.OrderAccount=PAY1.OrderAccount)=total_cost_on_invoice
In this answer, when I refer to 'that select', I'm referring to the sub-query in the middle select sum(pay1.settlamountcur) ...
Note that aliases in 'that select' looks a little strange e.g., select sum(PAY1.settleamountcur) from [VIEW_INVOICE_PAYMENT] AX1. Where does the PAY1 alias come from? I may have missed something. If that's a typo in your code, it could be doing bad things (if it even runs). Assuming it's not, however...
For your broader problem, I believe that it will be running that select statement once for every row being returned by your overall table. Indeed, it may be doing it more often, depending on when it's doing your filtering in the execution plan.
Note I'm assuming SQL Server in this answer - but it should apply to other databases as well.
A couple of options
Instead of referring to the view, instead bring the tables into your current query and modify the query as such
Try removing aggregation from the subquery, and instead do it over the whole data set etc e.g., GROUP BY relevant fields, sum across relevant fields. This can be combined with option 1.
Put the sub-query as a CTE, or a sub-query within the FROM component. This may make it use it as a single table rather than running many times (or it may not)
(Sometimes my preferred option for large tables) Get the relevant data from the view into a temporary table first e.g.,
SELECT INVOICEId, OrderAccount, SUM(settleamountcur) AS total_settleamountcur
INTO #Temp
FROM [VIEW_INVOICE_PAYMENT]
GROUP BY INVOICEId, OrderAccount
-- Add any where/having clauses you can to filter
-- Consider creating temp table first with primary key, making joins easier for SQL Server
Then use the #Temp table instead of that select sub-query.
I have materialized view manually partitioned by month
create materialized view MV_MVNAME_201001 as
select MONTH_ID, AMOUNT, NAME
from TABLE_NAME_201001
201002, 201003, ..., 201112, 2012, 2009
Now i need query from these views, take only the required views.
Is it possible, without involving the client side?
example query
select AMOUNT, NAME
from (
--union all mview
)
where month_id >= 201003
and month_id < 201101
should look only to the MV_MVNAME_201003 .. MV_MVNAME_201012
The materialized view is "materialized". It is a phisical table with data within it.
The query that produce the materialized view is used only when you refresh data, not on querys.
Oracle doesn't know where data came from(in your case - a union from several distinct tables), unless you specify it somehow, for example - a column.
But in your specific case you have the column month_id, on witch you can partition the table.
When you specify the month or range of months, it will scan only the correspondent partitions.
UDPATE:
Now I understand better your question, but I cannot give you a better answer. Your question has nothing to do with mviews. Mviews can be tables. Your problem is the same. You want to select only from some tables, dynamic. For this was created partitioning. Probably old dogs know a trick...
I have data in two tables (see below for a sample) - how do I create a Crystal report (more of a "score card" really) displaying only sum(table1.column1) and sum(table2.column1) with no other details? When I try, one of the sums gets way too big, indicating it has been included in some inner loop in the calculations.
Table1:
Column1: Integer
Column2: Varchar(100)
...
Table2:
Column1: Integer
Column2: Varchar(50)
...
Note - there are no joint keys, the only relation between the tables is that they relate to the same business area.
Add a grouping levels for Table1.uid. Create a running total Table1Sum, sum on Table1.Column1, on change of group Table1.uid, reset never. Create a running total Table2Sum, sum on Table2.Column1, on every record, reset on change of group Table1.uid. Print both running totals in the report footer.
Place your queries in separate subreports. (This is what I'd probably do.)
The first one obviously requires (1) a unique key in Table1 and (2) printing the values in the footer. If those constraints won't work, two subreports should still work.
select t1.cnt, t2.cnt
from ( select count(*) cnt from table1 where... ) t1
, ( select count(*) cnt from table2 where... ) t2
If you want to avoid the sub-query approach, the only real route that I can think of is to use sub-reports.
2 ways I can think of:
Put each query in its own sub-report, and link them into your main report.
Put one query in your main report, and the other in a linked sub-report.
I answer this with the caveat that it will almost certainly be slower than simply using one query (as in Randy's answer), because Crystal Reports is not as efficient as the DB engine. It's also probably going to be harder than maintain. Basically, while you certainly can do it this way, I'm not sure I would.
You could use two SQL Expression fields. Each field needs to return a scalar value. You can correlate (link) each query with the main-report's query as well.
I've got 10 tables that I'm joining together to create a view. I'm only selecting the ID from each table, but in the view, each ID can show more than once, however the combination of all ID's will always be unique. Is there a way to create another column in this view that will be a unique ID?
I'd like to be able to store the unique ID and use it to query against the view in order to get all the other ID's.
I had a similar issue where I needed to establish a hierarchy across multiple tables. If you are using an integer as the id in each of the tables, you could simply convert the ids of each table to a varchar and prefix them with a different letter for each table. For instance
CREATE VIEW LocationHierarchy as
SELECT 'C' + CONVERT(VARCHAR,[Id]) as Id
,[Name]
,'S' + CONVERT(VARCHAR,[State]) as parent
FROM [City]
UNION
SELECT 'S' + CONVERT(VARCHAR,[Id]) as Id
,[Name]
,'C' + CONVERT(VARCHAR,[Suburb]) as parent
FROM [Suburb]
etc
The effectiveness of this solution will depend on how large your dataset is.
I think you can do that using ROW_NUMBER(), at least if you can guarantee an ordering of the view. For example:
SELECT
ROW_NUMBER() OVER (ORDER BY col1, col2, col3) as UniqueId
FROM <lotsa joins>
As long as the order stays the same, and only fields are added at the end, the id will be unique.
Yes, recently I have had the same requirement.
When creating the view, keep the select statement as a TEMP_TABLE and then use Row_number() function on this TEMP_TABLE.
Here's an example:
CREATE VIEW VIEW_NM
AS
SELECT Row_number() OVER(ORDER BY column_nm DESC) AS 'RowNumber' FROM
(SELECT COL1,COL2 FROM TABLE1
UNION
SELECT COL1,COL2 FROM TABLE2
) AS TEMP_TABLE;
No you cannot do that in a view, what you can do is create a temporary table to hold all this information and create a key or unique id there for each row.
The fact that you're attempting to do this points to much bigger problems in the database design and/or application architecture.
Since you have 10 tables and I'm going to guess that the DB designer(s) just slapped on ID INT IDENTITY to all of the tables you will end up having about (2^31)^10 possible rows in the table.
The only data type that I can think of that might cover that number would be to translate all of the integers into 0-padded strings and put them all together as a big CHAR.
My guess though is that your real problem isn't getting this ID for a view but some other thing that you're attempting to do, which is the question that you should be asking. Just a hunch though.
One possibility would be to call a function from your view that calculates a hash code on all of the ID columns. If you use a decent cryptographic hash algorithm, the odds of a collision are minuscule (probably more likely that the disk would deliver bad data). Even easier, you could of course just concatenate the different IDs into a single, much larger ID, perhaps as a binary or varbinary column.
Storing that ID and being able to query against it would be a bit more work. I can't think of a way to both compute it and store it from a view. You would probably need a stored procedure to create it first; the details depend heavily on the specifics of your app.