working of where condition in sas proc sql, while connecting to some other database - sql

I am working on a table with more that 30 million records.The table is on sybase and i am working on sas. There is a feed_key(numeric) variable which contains the time stamp for the record entry. I wanted to pull records for a particular time frame.
proc sql ;
Connect To Sybase (user="id" pass="password" server=concho);
create table table1 as
select * from connection to sybase
(
select a.feed_key as feed_key,
a.cm15,
a.country_cd,
a.se10,
convert(char(10),a.se10) as se_num,
a.trans_dt,
a.appr_deny_cd,
a.approval_cd,
a.amount
from abc.xyz a
where a.country_cd in ('ABC') and a.appr_deny_cd in ('0','1','6') and a.approval_cd not in ('123456') and feed_key > 12862298
);
disconnect from sybase;
quit;
it is pulling same no of records, irrespective of whether i put the feed_key condition or not, and it is taking almost same time to execute(16 mins without feek_key condition and 15 mins with feed_key condition) the query.
please clarify the working of where clause in this case.
as i believe the feed_key condition should have made the query run much faster as more than 80% of records do not match this condition....

If you're getting the same number of records back, it'll take the same amount of time to process the query.
This is because the I/O (transferring data back to SAS and storing it) is the most time-consuming part of the operation. This is why the lack of index doesn't make a big impact on the total time.
If you adjust your query so that it returns fewer rows, you will get faster processing.
You can tell whenever this is the case by looking at the SAS log, which will show you how much time was used by the CPU (the rest is IO):
NOTE: PROCEDURE SQL used (Total process time):
real time 11.07 seconds
cpu time 1.67 seconds

Related

Why my query become too slow when number of records increased

I have an insert statement in SQL Server.
I tried it for a smaller subset, was fast
increased the number to 1milllion records and was fast 1 min 10 sec
now I doubled it and it seems stuck it has been running for 10 min now and no results
I included the Plan when it was 5 min.
https://www.brentozar.com/pastetheplan/?id=r15MPuC5u
maybe someone can tell me how to improve the process.
PS. I added non clustered index on Tag (RepID).
Tag(iiD) is a primary Key
Reps(RepID) is a primary Key.
While I am writing this. the process finished at 11:47
https://www.brentozar.com/pastetheplan/?id=HJd9uOCcu
Here is my code
insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between 2000001 and 4000000
(can't fit in comment , put it here)
I think , there is pretty much nothing you can do about this , depending on your hardware , 11 min can be actually not bad , in the execution plan , I can see everything looks ok.
but for your information , bottleneck in that insert statement is to read data from T"Recovery..tags" table, which took 07 minutes of your query time.(It's used full scan which is ok considering it needs to read 2 million rows and return a lot of columns)
so the only thing you can do is to find a way to speed up reading from linked server "Recovery".
linked servers are usually the source of poor performance, specially huge data ,which can be due to poor network or busy network , etc ...
anyways one solution is :
pull data from linked server into a table into R3 server ( directly) with server in middle.which depends on your scenario you can
change your query to be against that table
this can significantly improve your query time

Simple select from table takes 24 seconds in SQL Server 2014

I have a table named [cwbOrder] that currently has 1.277.469 rows. I am using SQL Server 2014 and I am doing these tests on a UAT environment, on production this query takes a little bit longer.
If I try selecting all of the rows like using:
SELECT * FROM cwbOrder
It takes 24 seconds to retrieve all of the data from the table. I have read about how it is important to index columns used in the predicates (WHERE), but I still cannot understand how does a simple select take 24 seconds.
Using this table in other more complex queries generates a lot of extra workload for the query, although I have created the JOINs on indexed columns. Additionally I have selected only 2 columns from this table then JOINED it to another table and this operation still takes a significantly long amount of time. As an example please consider the below query:
Below I have attached the index structure of both tables, to illustrate the matter:
PK_cwbOrder is the index on the id_cwbOrder column in the cwbOrder table.
Edit 1: I have added the execution plan for the query in which I join the cwbOrder table with the cwbAction table.
Is there any way, considering the information above, that I can make this query faster?
There are many reasons why such a select could be slow:
The row size or number of rows could be very large, requiring a lot of time to transport or delay.
Other operations on the table could have locks on the table.
The database server or network could be very busy.
The "table" could really be a view that is running a complicated query.
You can test different aspects. For instance:
SELECT TOP 10 <one column here>
FROM cwbOrder o
This returns a very small result set and reads just a small part of the table. This reads the entire table but returns a small result set:
SELECT COUNT(*)
FROM cwbOrder o

Fastest way to do SELECT * WHERE not null

I'm wondering what is the fastest way to get all non null rows. I've thought of these :
SELECT * FROM table WHERE column IS NOT NULL
SELECT * FROM table WHERE column = column
SELECT * FROM table WHERE column LIKE '%'
(I don't know how to measure execution time in SQL and/or Hive, and from repeatedly trying on a 4M lines table in pgAdmin, I get no noticeable difference.)
You will never notice any difference in performance when running those queries on Hive because these operations are quite simple and run on mappers which are running in parallel.
Initializing/starting mappers takes a lot more time than the possible difference in execution time of these queries and adds a lot of heuristics to the total execution time because mappers may be waiting resources and not running at all.
But you can try to measure time, see this answer about how to measure execution time: https://stackoverflow.com/a/44872319/2700344
SELECT * FROM table WHERE column IS NOT NULL is more straightforward (understandable/readable) though all of queries are correct.

Define maximum table size or rows per table - SQL Server 2012

Is there a way to define the maximum size per table or the maximum number of rows in a table? (for a SELECT INTO new_table operation)
I was working with SELECT INTO and JOINS in tables with approximately 70 million rows and it happened that I made a mistake in the ON condition. As a consequence, the result of this join operation created a table larger than the database size limit. The DB crashed and went into a recovery mode (which left for 2 days)
I would like to know how to avoid this kind of problem in the future. Is there any "good manners manual" when working with huge tables? Any kind of pre-defined configuration to prevent this problem?
I don't have the code but as I said, it was basically a left join and the result inserted in a new table through SELECT INTO.
PS: I don't have much experience with SQL SERVER or any other Relational database.
Thank you.
SET ROWCOUNT 10000 would have made it so that no more than 10,000 rows would be inserted. However, while that can prevent damage, it would also mask the mistake you made with your SQL query.
I'd say that before running any SELECT INTO, I would do a SELECT COUNT(*) to see how many rows my JOIN and WHERE clauses are resulting in. If the count is huge, or if it spends hours even coming up with a count, then that's your warning sign.

oracle 10 performance issue with Select * from

sql : select * from user_info where userid='1100907' and status='1'
userid is indexed, the table has less than 10000 rows and has a LOB column.
The sql takes 1 second (I got this by using "set timing on" in sqlplus). I tried to use all columns names to replace *,but still 1 second. After I removed the LOB column ,the sql takes 0.99 secs . When I reduced the number of columns by half, the time goes to halved too.
Finally, select userid from user_info where userid='1100907' and status='1' takes 0.01 seconds.
Can someone figure it out ?
Bear in mind that "wall clock performance testing" is unreliable. It is subject to ambient database conditions, and - when outputting to SQL*Plus - dependent on how long it takes to physically display the data. That might explain why selecting half the columns really has such a substantial impact on elapsed time. How many columns does this table have?
Tuning starts with EXPLAIN PLAN. This tool will show you how the database will execute your query. Find out more.
For instance, it is quicker to service this query
select userid from user_info
then this one
select * from user_info
because the database can satisfy the first query with information from the index on userid, without touching the table at all.
edit
"Can you tell me why sqlplus print
column names many many times other
than just returning result"
This is related to paging. SQLPlus repeats the column headers every time it throws a page. You can suppress this behaviour with either of these SQLPlus commands:
set heading off
or
set pages n
In that second case, make n very big (e.g. 2000) or zero.
Perhaps you have 100 columns in the user_info table? If so, how many of those columns do you actually need in the query?