I wanna do an offset like: from 0 to 10000 records, from 10000 to 20000 records and so on. How do I modify this query to add an offset? Also, how can I improve this query for performance?
SELECT
CASE
WHEN c.DataHoraUltimaAtualizacaoILR >= e.DataHoraUltimaAtualizacaoILR AND c.DataHoraUltimaAtualizacaoILR >= t.DataHoraUltimaAtualizacaoILR THEN c.DataHoraUltimaAtualizacaoILR
WHEN e.DataHoraUltimaAtualizacaoILR >= c.DataHoraUltimaAtualizacaoILR AND e.DataHoraUltimaAtualizacaoILR >= t.DataHoraUltimaAtualizacaoILR THEN e.DataHoraUltimaAtualizacaoILR
WHEN t.DataHoraUltimaAtualizacaoILR >= c.DataHoraUltimaAtualizacaoILR AND t.DataHoraUltimaAtualizacaoILR >= e.DataHoraUltimaAtualizacaoILR THEN t.DataHoraUltimaAtualizacaoILR
ELSE c.DataHoraUltimaAtualizacaoILR
END AS 'updated_at',
p.Email,
c.ID_Cliente,
p.Nome,
p.DataHoraCadastro,
p.Sexo,
p.EstadoCivil,
p.DataNascimento,
getdate() as [today],
datediff (yy,p.DataNascimento,getdate()) as 'Idade',
datepart(month,p.DataNascimento) as 'MesAniversario',
e.Bairro,
e.Cidade,
e.UF,
c.CodLoja as codloja_cadastro,
t.DDD,
t.Numero
FROM
PessoaFisica p
LEFT JOIN
Cliente c ON (c.ID_Pessoa = p.ID_PessoaFisica)
LEFT JOIN
Loja l ON (CAST(l.CodLoja AS integer) = CAST(c.CodLoja AS integer))
LEFT JOIN
PessoaEndereco pe ON (pe.ID_Pessoa = p.ID_PessoaFisica)
LEFT JOIN
Endereco e ON (e.ID_Endereco = pe.ID_Endereco)
LEFT JOIN
PessoaTelefone pt ON (pt.ID_Pessoa = p.ID_PessoaFisica)
LEFT JOIN
Telefone t ON (t.ID_Telefone = pt.ID_Telefone)
WHERE
p.Email IS NOT NULL
AND p.Email <> ''
--and p.Email = 'aline.salles#uol.com.br'
GROUP BY
p.Email, c.ID_Cliente, p.Nome, p.EstadoCivil, p.DataHoraCadastro,
c.CodLoja, p.Sexo, e.Bairro, p.DataNascimento, e.Cidade, e.UF,
t.DDD, t.Numero, c.DataHoraUltimaAtualizacaoILR, e.DataHoraUltimaAtualizacaoILR,
t.DataHoraUltimaAtualizacaoILR
ORDER BY
updated_at DESC
Overall Process
If you have access to a more modern SQL Server version, then you could setup a process to copy the raw data to a new database on a daily basis. This might initially be an exact copy of the source database, just for staging the data. Then build an transformation process, using stored procedures or perhaps SSIS for high performance. That process would transform your data into your desired end state, and load it into the final database.
The copy process could be replication, but if your staging database is SQL Server 2005 or above, then you could also build a simple SSIS job to perform the copy. Run that job in a schedule task (SQL Agent) on a daily basis. You could combine the two - load data, then transform - but if using SSIS, then I recommend keeping these as separate SSIS packages, which will help with debugging problems. In the scheduled task you could run the two packages back-to-back.
Performance
You'll need good indexing on the table, but indexing alone is not sufficient. Casting CodLoja as an integer will prevent you from using indexes on that field. If you need to store those as strings for some other reason, then consider adding calculated columns,
ALTER TABLE xyz Add CodLojaAsInt as (CAST(CodLoja as int))
Then place an index on that new calculated column. The problem is that any function call in a ON or WHERE clause will cause SQL Server to scan the entire and convert every single row, instead of peaking into an index.
After searching and looking over my problem again, #sfuqua helped me with this solution. Basically I'll create some more organized tables in my local DB and get all the abstract/ugly data from the remote DB and process it locally to new tables.
I'm gonna use Elasticsearch to speed up the indexing and queries.
It sounds like you're trying to emulate MySQL's SELECT ... LIMIT X,Y feature. SQL Server doesn't have that. In SQL Server 2005+, you can use ROW_NUMBER() in a subquery. Since you're on 2000, however, you're going to have to do it one of the hard ways.
They way I've always done it is like this:
SELECT ... FROM Table WHERE PK IN
(SELECT TOP #PageSize PK FROM Table WHERE PK NOT IN
(SELECT TOP #StartRow PK FROM Table ORDER BY SortColumn)
ORDER BY SortColumn)
ORDER BY SortColumn
Although I recommend rewriting it to use EXISTS instead of IN and seeing which works better. You'll have to use EXISTS if you have compound primary keys.
That code and the other solutions are here.
Related
I'm working on an oracle query that is doing a select on a huge table, however the joins with other tables seem to be costing a lot in terms of time of processing.
I'm looking for tips on how to improve the working of this query.
I'm attaching a version of the query and the explain plan of it.
Query
SELECT
l.gl_date,
l.REST_OF_TABLES
(
SELECT
MAX(tt.task_id)
FROM
bbb.jeg_pa_tasks tt
WHERE
l.project_id = tt.project_id
AND l.task_number = tt.task_number
) task_id
FROM
aaa.jeg_labor_history l,
bbb.jeg_pa_projects_all p
WHERE
p.org_id = 2165
AND l.project_id = p.project_id
AND p.project_status_code = '1000'
Something to mention:
This query takes data from oracle to send it to a sql server database, so I need it to be this big, I can't narrow the scope of the query.
the purpose is to set it to a sql server job with SSIS so it runs periodically
One obvious suggestion is not to use sub query in select clause.
Instead, you can try to join the tables.
SELECT
l.gl_date,
l.REST_OF_TABLES
t.task_id
FROM
aaa.jeg_labor_history l
Join bbb.jeg_pa_projects_all p
On (l.project_id = p.project_id)
Left join (SELECT
tt.project_id,
tt.task_number,
MAX(tt.task_id) task_id
FROM
bbb.jeg_pa_tasks tt
Group by tt.project_id, tt.task_number) t
On (l.project_id = t.project_id
AND l.task_number = t.task_number)
WHERE
p.org_id = 2165
AND p.project_status_code = '1000';
Cheers!!
As I don't know exactly how many rows this query is returning or how many rows this table/view has.
I can provide you few simple tips which might be helpful for you for better query performance:
Check Indexes. There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.
Limit the size of your working data set.
Only select columns you need.
Remove unnecessary tables.
Remove calculated columns in JOIN and WHERE clauses.
Use inner join, instead of outer join if possible.
You view contains lot of data so you can also break down and limit only the information you need from this view
I am having issues with my query run time. I want the query to automatically pull the max id for a column because the table is indexed off of that column. If i punch in the number manually, it runs in seconds, but i want the query to be more dynamic if possible.
I've tried placing the sub-query in different places with no luck
SELECT *
FROM TABLE A
JOIN TABLE B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID
AND B.ACTV_FLG = 1
WHERE A.WK_END_THU_ID_NU >= (SELECT DISTINCT MAX (WK_END_THU_ID_NU) FROM TABLE A)
AND A.WK_END_THU_END_YR_NU = YEAR(GETDATE())
AND A.LGCY_NATL_STR_NU IN (7731)
AND B.SLD_MENU_ITM_ID = 4314
I just want this to run faster. Maybe there is a different approach i should be taking?
I would move the subquery to the FROM clause and change the WHERE clause to only refer to A:
SELECT *
FROM A CROSS JOIN
(SELECT MAX(WK_END_THU_ID_NU) as max_wet
FROM A
) am
ON a.WK_END_THU_ID_NU = max_wet JOIN
B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID AND
B.ACTV_FLG = 1
WHERE A.WK_END_THU_END_YR_NU = YEAR(GETDATE()) AND
A.LGCY_NATL_STR_NU IN (7731) AND
A.SLD_MENU_ITM_ID = 4314; -- is the same as B
Then you want indexes. I'm pretty sure you want indexes on:
A(SLD_MENU_ITM_ID, WK_END_THU_END_YR_NU, LGCY_NATL_STR_NU, SLD_MENU_ITM_ID)
B(SLD_MENU_ITM_ID, ACTV_FLG)
I will note that moving the subquery to the FROM clause probably does not affect performance, because SQL Server is smart enough to only execute it once. However, I prefer table references in the FROM clause when reasonable. I don't think a window function would actually help in this case.
I'm converting ETL queries written for Netezza to RedShift. I'm facing some issues with ROWID, because it's not supported in RedShift. I have tried using the key columns in the predicates, based on which ROWID is being generated to actually do a workaround. But i'm confused which columns would be used if there are multiple join operations. So is there anyone who can help me convert the query. I even tried to use ROW_NUMBER() over () function, but it also doesn't work because row ids won't be unique for all rows.
Here are the queries from netezza:
Query #1
CREATE TEMP TABLE TMPRY_DELTA_UPD_1000 AS
SELECT
nvl(PT.HOST_CRRNCY_SRRGT_KEY,-1) as HOST_CRRNCY_SRRGT_KEY,
delta1.ROWID ROW_ID
FROM TMPRY_POS_TX_1000 PT
LEFT JOIN TMPRY_TX_CSTMR_1000 TC ON PT.TX_SRRGT_KEY = TC.TX_SRRGT_KEY AND PT.UPDT_TMSTMP > '2017-01-01'
AND PT.INS_TMSTMP < '2017-01-01' AND PT.DVSN_NBR = 70
JOIN INS_EDW_CP.DM_TX_LINE_FCT delta1 ON PT.TX_SRRGT_KEY = delta1.TX_SRRGT_KEY
WHERE
(
delta1.HOST_CRRNCY_SRRGT_KEY <> PT.HOST_CRRNCY_SRRGT_KEY OR
)
AND PT.DVSN_NBR = 70;
Query #2
UPDATE INS_EDW_CP..DM_TX_LINE_FCT base
SET
base.HOST_CRRNCY_SRRGT_KEY = delta1.HOST_CRRNCY_SRRGT_KEY,
)
FROM TMPRY_DELTA_UPD_1000 delta1
WHERE base.ROWID = delta1.ROW_ID;
How can i convert query # 2?
Well, most of the time I have seen joins on rowid it is due to performance optimizations, but in some cases there ARE no unique combination of columns in the table.
Please talk to the people owning these data & run your own analysis of different key combinations and then get back to us.
See below query which is taking nearly 10 minutes to run. Can this be rewritten to fetch results faster, mainly the where clause condition...
Number of records:
Table tmp_TimeIntervals: 1440
Table tmp_Activities: 1299688
Code:
select
i.ID,
sl.SPID,
i.PeriodStart,
DATEDIFF(mi, i.PeriodStartUTC, sl.ENDTIME)
from
tmp_TimeIntervals i
join
tmp_Activities sl on (sl.StartTime <= i.PeriodStartUTC
and (sl.Endtime > i.PeriodStartUTC
and sl.Endtime < PeriodEndUTC))
Regards
In case you want overlapping intervals in tmp_TimeIntervals and tmp_Activities then the SQL query may look as follows:
select
i.ID,
sl.SPID,
i.PeriodStart,
DATEDIFF(mi, i.PeriodStartUTC, sl.ENDTIME)
from tmp_TimeIntervals i
join tmp_Activities sl
on sl.StartTime <= i.PeriodEndUTC and sl.Endtime > i.PeriodStartUTC
and for the efficient run you definitely need index tmp_Activities(StartTime , Endtime, SPID). The SPID can be removed if it is a key of clustered index.
create index ix_tmp_Activities on tmp_Activities(StartTime , Endtime) include (SPID)
Another index tmp_TimeIntervals(PeriodEndUTC, PeriodStartUTC, PeriodStart, ID) can be helpful as well since SQL Server can use a merge join more easily. Again ID can be removed from the index if it is a key of clustered index.
create index ix_tmp_TimeIntervals on tmp_TimeIntervals(PeriodEndUTC, PeriodStartUTC) include (PeriodStart, ID)
if you are using any framework then use built in method to get record in chunks.
Couple of things
Would be good to check estimated execution plan and check the costs whats chewing up.
Try and place a left outer join as it will create a subquery lookup which will be faster than your inner join.
As the concept is push down predicate.
Another colnsideration before creating index check column lengths. If data writes are not frequent consider generating a columnstore index will be 10x faster than what tou getting now.
I have a massive query that has been working just fine, however, due to the number or records now in my database, the query is taking longer and longer for the stored procedure to complete.
I had a hard enough time getting the query to work in the first place and I'm not confident in myself to either A) Simplify the query or B) break it into smaller queries/stored procedures.
Can any expert help me out?
SELECT
r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate into #temp_Agent
FROM
CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a
ON r.resourceID = a.agentID
INNER JOIN (
SELECT
p.resourceFirstName,
p.resourceLastName,
MAX(e.eventDateTime) MaxeventDateTime
FROM
CCX1.db_cra.dbo.Resource p
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail e
ON p.resourceID = e.agentID
where
e.eventDateTime > (GETDATE() - 1)
GROUP BY
p.resourceFirstName,
p.resourceLastName
) d
ON r.resourceFirstName = d.resourceFirstName
AND r.resourceLastName = d.resourceLastName
AND a.eventDateTime = d.MaxeventDateTime
AND r.active = 1
where
a.eventDateTime >= (GETDATE() - 7)
ORDER BY
r.resourceLastName,
r.resourceFirstName ASC
Can't give the correct answer having only the query. But...
Consider putting an index on "eventDateTime".
It appears you are joining with a set of records within 1 day. That would make the 7 day filter in the outer query irrelevant. I don't have the ability to test, but maybe your query can be reduced to this? (pseudo code below)
Also consider different solutions. Maybe partition a table based on the datetime. Maybe have a separate database for reporting using star schemas or cube design.
What is being done with the temporary table #temp_Agent?
declare #max datetime = (select max(eventDateTime)
from CCX1.db_cra.dbo.AgentStateDetail
where active=1
and eventDateTime > getdate()-1);
if(#max is null)
exit no records today
SELECT r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate
into #temp_Agent
FROM CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a ON r.resourceID = a.agentID
where r.active = 1
and a.eventDateTime = #max;
Without the full definition of your tables it is hard to troubleshoot why the query is hanging out, but i give you a couple of tips that could help you improve the performance in the Query:
Instead of using a temporary table such as "#temp_Agent" it is preferable to create a local Variable of type "Table". You can achieve exactly the same result but you can drastically increase the performace because:
A local variable of the type "Table" can be created with primary keys and indexes, which improves how SQL finds the information.
A local variable can be clustered, which also improves the performacne in certain scenarios, because the information is accesed directly from disk
A temporary table requires that SQL resolves at runtime the types of columns that should be used to store the information obtained by the query.
If you require to store information in temporary tables, variables, etc, avoid storing unnecesary information in those variables. for example, if you only require latter in your proccess two id colums, then avoid including extra columns that you can retrieve lather
If there is a bunch of information that you need to retireve from multiple sources, you should consider using a view, which also can be indexed and improve the retrieval of the information.
Avoid using unnecesary sorting, grouping, conversions and joins of strings. These particular operations cad degrade drastically the performance of a QUery.
As an extra tip, you can made use of the SQL server tools designed to improve your database and objects:
Check the execution plan of your query (Menu Query->Include Actual Execution Plan, or control + M)
Run the SQL Server Engine Tunning Advisor to analyze a trace file(See SQL Server Profiler) and add extra indexes to improve the database performace
Check with SQL Server Profiler if your Queryes are not generating Deadlocks in the tables you are using to get the information. It is a good practice to use "Hints" in all of your queries to avoid lock problems and other behaviors that in certain scenarios you want to avoid.
Se attached links to better understand what i mean:
Understanding Execution Plan
Usage of Hints
Tunning Options available in SQL Server
I hope the information helps.
Assuming this is SQLServer, try:
WITH CTE AS
(SELECT r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate,
RANK() OVER (PARTITION BY r.resourceFirstName, r.resourceLastName
ORDER BY a.eventDateTime DESC) RN
FROM CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a
ON r.resourceID = a.agentID AND a.eventDateTime >= (GETDATE() - 1)
where r.active = 1)
SELECT resourceFirstName, resourceLastName, eventDateTime, eventType, reasonCode, r.extension, ciscoDate
into #temp_Agent
FROM CTE
WHERE RN=1
ORDER BY r.resourceLastName, r.resourceFirstName ASC