Sql query with group by takes too long

Sql query with group by takes too long - sql

I have a very simple query but it takes too long to load when I use Max and group by. Could you please propose an alternative?. I use Oracle 18g for running this query. a_num_ver, id, site_id is a primary key.
SELECT id
, site_id
, sub_id
, max(a_num_ver) as a_num_ver
, ae_no
, max(aer_ver) AS aer_ver
FROM table_1
GROUP BY id
, site_id
, sub_id
, ae_no

Try using parallel hints 4 OR 8 if that is allowed from DBA. I have tried a similar query in a table with around 296,292,720 rows. Without hints, it took around 2 minutes to execute. It comes down to 20 seconds with PARALLEL 8.
SELECT /*+ PARALLEL(8) */
id
, site_id
, sub_id
, max(a_num_ver) as a_num_ver
, ae_no
, max(aer_ver) AS aer_ver
FROM table_1
GROUP BY id
, site_id
, sub_id
, ae_no

Related

bigquery query exceeds memory limits, how to refactor?

I have a query where I am trying to get a count of items in table2 using table1. The following is a simplication of it. It worked fine most of the time, but then, there are some days where the data is structured in such a way where this sql code starts to cause over memory limit errors. I've been trying to debug it with the query planner. It says that it is running out of resources in the aggregate stage, but looking at the planner info, I still cannot understand why it is running out of memory. Can this query be re-written in another way that will make it use less memory? Here is the query planner failing stage image
-- create table
create table actions1(
start_date datetime --goes back 1 year
, end_date datetime
, action varchar(200)
, idA int64
, idB int64
);
create table actions2(
action2_date datetime
, action varchar(300)
, idA int64
, idB int64
);
-- the query
WITH filter_actions_helper AS (
SELECT
a1.action, a1.start_date, a2.start_date
, ARRAY_AGG(action2_date IGNORE NULL) action2col
FROM
actions1 a1
LEFT JOIN
actions2 a2
using(idA, idB)
GROUP BY
idA, idB
)
, filter_actions AS (
SELECT
idA, idB, start_date, end_date
, sum( if(exists( SELECT * FROM UNNEST(action2col) a WHERE a >= start_date), 1,0) ) engaged
FROM
filter_actions_helper f
GROUP BY
idA, idB
)
select * from filter_actions;

It appears that in the case of my query, I can break the large aggregation into views (CTE), calculate each individual aggregation and then join everything back together. Secondly, using COUNTIF also works, though there is no noticeable performance difference.

Row_number partition by performance

How to improve the performance when row_number Partitioned by used in Hive query.
select *
from
(
SELECT
'123' AS run_session_id
, tbl1.transaction_id
, tbl1.src_transaction_id
, tbl1.transaction_created_epoch_time
, tbl1.currency
, tbl1.event_type
, tbl1.event_sub_type
, tbl1.estimated_total_cost
, tbl1.actual_total_cost
, tbl1.tfc_export_created_epoch_time
, tbl1.authorizer
, tbl1.acquirer
, tbl1.processor
, tbl1.company_code
, tbl1.country_of_account
, tbl1.merchant_id
, tbl1.client_id
, tbl1.ft_id
, tbl1.transaction_created_date
, tbl1.event_pst_time
, tbl1.extract_id_seq
, tbl1.src_type
, ROW_NUMBER() OVER(PARTITION by tbl1.transaction_id ORDER BY tbl1.event_pst_time DESC) AS seq_num -- while writing back to the pfit events table, write each event so that event_pst_time populates in right way
FROM nest.nest_cost_events tbl1 --<hiveFinalDB>-- -- DB variables wont work, so need to change the DB accrodingly for testing and PROD deployment
WHERE extract_id_seq BETWEEN 275 - 60
AND 275
AND event_type in('ACT','CBR','SKU','CAL','KIT','BXT' )) tbl1
where seq_num=1;
This table is partitioned by src_type.
Now it is taking 20 mnts to process 154M records. I want to reduce to 10 mnts.
Any suggestions ?
Thanks

To return only the latest row [duplicate]

This question already has answers here:
How to get the last row of an Oracle table
(7 answers)
Closed 8 years ago.
I have a table storing transaction called TRANSFER . I needed to write a query to return only the newest entry of transaction for the given stock tag (which is a unique key to identify the material) so i used the following query
SELECT a.TRANSFER_ID
, a.TRANSFER_DATE
, a.ASSET_CATEGORY_ID
, a.ASSET_ID
, a.TRANSFER_FROM_ID
, a.TRANSFER_TO_ID
, a.STOCK_TAG
FROM TRANSFER a
INNER JOIN (
SELECT STOCK_TAG
, MAX(TRANSFER_DATE) maxDATE
FROM TRANSFER
GROUP BY STOCK_TAG
) b
ON a.STOCK_TAG = b.STOCK_TAG AND
a.Transfer_Date =b.maxDATE
But i end with a problem where when more than one transfer happens on the same transfer date it returns all the row where as i need only the latest . how can i get the latest row?
edited:
transfer_id transfer_date asset_category_id asset_id stock_tag
1 24/12/2010 100 111 2000
2 24/12/2011 100 111 2000

To avoid the potential situation of rows not being inserted in transfer_date order, and maybe for performance reasons, you might like to try:
select
TRANSFER_ID ,
TRANSFER_DATE ,
ASSET_CATEGORY_ID,
ASSET_ID ,
TRANSFER_FROM_ID ,
TRANSFER_TO_ID ,
STOCK_TAG
from (
SELECT
TRANSFER_ID ,
TRANSFER_DATE ,
ASSET_CATEGORY_ID,
ASSET_ID ,
TRANSFER_FROM_ID ,
TRANSFER_TO_ID ,
STOCK_TAG ,
row_number() over (
partition by stock_tag
order by transfer_date desc,
transfer_id desc) rn
FROM TRANSFER)
where rn = 1

Consider selecting MAX(TRANSFER_ID) in your subquery, assuming that TRANSFER_ID is an incrementing field, such that later transfers always have larger IDs than earlier transfers.

SQL query, select from 2 tables random

Hello all i have a problem that i just CANT get to work like i what it..
i want to show news and reviews (2 tables) and i want to have random output and not the same output
here is my query i really hope some one can explain me what i do wrong
SELECT
anmeldelser.billed_sti ,
anmeldelser.overskrift ,
anmeldelser.indhold ,
anmeldelser.id ,
anmeldelser.godkendt
FROM
anmeldelser
LIMIT 0,6
UNION ALL
SELECT
nyheder.id ,
nyheder.billed_sti ,
nyheder.overskrift ,
nyheder.indhold ,
nyheder.godkendt
FROM nyheder
ORDER BY rand() LIMIT 0,6

First off it looks like the column order for the two SELECT statements don't match which they need to for a UNION.
What does the following return?
SELECT
anmeldelser.billed_sti ,
anmeldelser.overskrift ,
anmeldelser.indhold ,
anmeldelser.id ,
anmeldelser.godkendt
FROM
anmeldelser
LIMIT 0,6
UNION ALL
SELECT
nyheder.billed_sti ,
nyheder.overskrift ,
nyheder.indhold ,
nyheder.id ,
nyheder.godkendt
FROM nyheder
ORDER BY rand() LIMIT 0,6
(which RDBMS are you using? the SQL you have is not valid for Sybase but there may be techniques depending on the 'flavour' of SQL you are using)

Since RAND() appears only in the ORDER BY clause, would it not only be evaluated once for the whole query, and not once per row?

The problem is the first table is not selecting random elements
SELECT temp.* FROM
(
SELECT
anmeldelser.id ,
anmeldelser.billed_sti ,
anmeldelser.overskrift ,
anmeldelser.indhold ,
anmeldelser.godkendt,
'News' as artType
FROM anmeldelser
UNION
SELECT
nyheder.id ,
nyheder.billed_sti ,
nyheder.overskrift ,
nyheder.indhold ,
nyheder.godkendt,
'Review' as artType
FROM nyheder
) temp
ORDER BY rand() LIMIT 0,6

SQL query ...multiple max value selection. Help needed

Business World 1256987 monthly 10 2009-10-28
Business World 1256987 monthly 10 2009-09-23
Business World 1256987 monthly 10 2009-08-18
Linux 4 U 456734 monthly 25 2009-12-24
Linux 4 U 456734 monthly 25 2009-11-11
Linux 4 U 456734 monthly 25 2009-10-28
I get this result with the query:
SELECT DISTINCT ljm.journelname,ljm. subscription_id,
ljm.frequency,ljm.publisher, ljm.price, ljd.receipt_date
FROM lib_journals_master ljm,
lib_subscriptionhistory
lsh,lib_journal_details ljd
WHERE ljd.journal_id=ljm.id
ORDER BY ljm.publisher
What I need is the latest date in each journal?
I tried this query:
SELECT DISTINCT ljm.journelname, ljm.subscription_id,
ljm.frequency, ljm.publisher, ljm.price,ljd.receipt_date
FROM lib_journals_master ljm,
lib_subscriptionhistory lsh,
lib_journal_details ljd
WHERE ljd.journal_id=ljm.id
AND ljd.receipt_date = (
SELECT max(ljd.receipt_date)
from lib_journal_details ljd)
But it gives me the maximum from the entire column. My needed result will have two dates (maximum of each magazine), but this query gives me only one?

You could change the WHERE statement to look up the last date for each journal:
AND ljd.receipt_date = (
SELECT max(subljd.receipt_date)
from lib_journal_details subljd
where subljd.journelname = ljd.journelname)
Make sure to give the table in the subquery a different alias from the table in the main query.

You should use Group By if you need the Max from date.
Should look something like this:
SELECT
ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price
, **MAX(ljd.receipt_date)**
FROM
lib_journals_master ljm
, lib_subscriptionhistory lsh
, lib_journal_details ljd
WHERE
ljd.journal_id=ljm.id
GROUP BY
ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price

Something like this should work for you.
SELECT ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price
,md.max_receipt_date
FROM lib_journals_master ljm
, ( SELECT journal_id
, max(receipt_date) as max_receipt_date
FROM lib_journal_details
GROUP BY journal_id) md
WHERE ljm.id = md.journal_id
/
Note that I have removed the tables from the FROM clause which don't contribute anything to the query. You may need to replace them if yopu simplified your scenario for our benefit.

Separate this into two queries one will get journal name and latest date
declare table #table (journalName as varchar,saleDate as datetime)
insert into #table
select journalName,max(saleDate) from JournalTable group by journalName
select all fields you need from your table and join #table with them. join on journalName.

Sounds like top of group. You can use a CTE in SQL Server:
;WITH journeldata AS
(
SELECT
ljm.journelname
,ljm.subscription_id
,ljm.frequency
,ljm.publisher
,ljm.price
,ljd.receipt_date
,ROW_NUMBER() OVER (PARTITION BY ljm.journelname ORDER BY ljd.receipt_date DESC) AS RowNumber
FROM
lib_journals_master ljm
,lib_subscriptionhistory lsh
,lib_journal_details ljd
WHERE
ljd.journal_id=ljm.id
AND ljm.subscription_id = ljm.subscription_id
)
SELECT
journelname
,subscription_id
,frequency
,publisher
,price
,receipt_date
FROM journeldata
WHERE RowNumber = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sql query with group by takes too long - sql

Related

bigquery query exceeds memory limits, how to refactor?

Row_number partition by performance

To return only the latest row [duplicate]

SQL query, select from 2 tables random

SQL query ...multiple max value selection. Help needed

Categories

Resources