Faster way to query Access table - sql

I have two access tables, A and B:
Table A
Identifier BenefitBase PlanNav
1 131368.46 131368.46
2 201768.8 201768.79
3 54057.46 54057.46
4 7397.51 7397.51
5 9931.4 9931.4
6 178200 178200
Table B
p ValidityDate LockInAmount
1 2016-4 3.82
2 2016-4 19.97
3 2016-4 26.85
4 2016-6 34.95
I just want to create a query which extracts records from B where the "p" ID is not found in table A.
My current code is:
SELECT B.p, B.ValidityDate, B.LockInAmount
FROM B
WHERE (((B.p) Not In (select Identifier from A)));
Now to me, this code should work fine. However, because the tables are so large (B consists of 486,000 rows (the "p"'s repeats in this table for different dates) whereas A consists of circa 19,000), whenever I run the query, access fills the query progress bar but freezes when near full.
Is there another way to do this?
Thanks

You could also use a left join to do the same thing Gustav does. It's easier for me to read, and I believe that it will operate with the same execution plan.
select B.p, B.ValididtyDate, B.LockInAmount
from B
left join A on B.P = A.Identifier
where A.Identifier is null
And add to that the indexes recommended by Erik up above. (That said, if P and Identifier are primary keys on your tables then they are already indexed and you don't need to add the indexes)

Since you don't know if the fields are indexed:
Create indexes for both fields (see this page by Microsoft for information on indexes):
Execute these queries to create the indexes (or use the GUI)
CREATE INDEX TblAIdentifier ON A(Identifier)
CREATE INDEX TblBP ON B(p)
As long as you at least create the first index, Access won't even need to open up table A. It can just look in the index which fields are taken.
You can use this answer together with the one provided by #Gustav

You could "reverse" the seek - first find those that have a match, then exclude these from Table B:
Select B.*
From B
Where B.ID Not In
(Select A.Id
From A, B
Where A.ID = B.ID)

SELECT B.p, B.ValidityDate, B.LockInAmount
FROM
B
Left join
A
B.p=A.Identifier
WHERE A.Identifier Is Null);

Related

SQL SELECT query where the IDs were already found

I have 2 tables:
Table A has 3 columns (for example) with opportunity sales header data:
OPP_ID, CLOSE_DTTM, STAGE
Table B has 3 columns with the individual line items for the Opportunities:
OPP_LINE_ID, OPP_ID, AMOUNT_USD
I have a select statement that correctly parses through Table A and returns a list of Opportunities. What I would like to do is, without joining the data, to have a SELECT statement that will get data from Table B but only for the OPP_IDs that were found in my first query.
The result should be 2 views/resultset (one for each select query) and not just 1 combined view where Table B is joined to Table A.
The reason why I want to keep them separate is because I will have to perform a few manipulations to the result from table B and i don't want the result from table A affected.
Subquery is all what you need
SELECT OPP_ID, CLOSE_DTTM, STAGE
From table a
where a.opp_id IN (Select opp_id from table b)
Presuming you're using this in some client side data access library that represents B's data in some 2 dimensional collection and you want to manipulate it without affecting/ having A's data present in that collection:
Identify the records in A:
SELECT * FROM a WHERE somecolumn = 'somevalue'
Identify the records in B that relate to A, but don't return A's data:
SELECT b.* FROM a JOIN b ON a.opp_id = b.opp_id WHERE a.somecolumn = 'somevalue'
Just because JOIN is used doesn't mean your end-consuming program has to know about A's data. You could also use IN, like the other answer does, but internally the database will rewrite them to be the same thing anyway
I tend to use exists for this type of query:
select b.*
from b
where exists (select 1 from a where a.opp_id = b.opp_id);
If you want two results sets, you need to run two queries. It is unclear what the second query is, perhaps the first query on A.

sql server does not take most restrictive condition for execution plan

We have a query with multiple joins where sql server 2016 does not take the optimal path and we cannot convince it without hints (which we prefer not to use)
Simplified the problem is as follows :
Table A (12 million rows)
Table B (type table, 5 rows)
Table C (12 million rows)
query (simplified to clarify)
SELECT
[A].[ID]
,[A].[DATE_CREATED]
,[A].[DATE_LAST_MODIFIED]
,[A].[CODE]
,[B].[CODE]
,[B].[DESCRIPTION]
,[C].[EVENT_ID]
,[C].[SOURCE_REFERENCE]
,[C].[EVTY_ID]
,[C].[BUSINESS_KEY]
,[C].[DATA]
,[C].[EVENT_DATE]
FROM A
JOIN B ON [B].[ID] = [A].[PSTY_ID] AND [B].[ACTIVE] = 1
JOIN C ON [C].[ID] = [B].[EVEN_ID] AND [C].[ACTIVE] = 1
WHERE [B].[CODE] = 'nopr' OR [B].[CODE] = 'inpr'
the selected codes from B correspond to values 1 and 2
Table A contain max 10 PSTY_ID values 1 or 2 the rest is 3,4 or 5
There is a foreign key from A.PSTY_ID to B.ID
There is a filtered index on table A PSTY_ID 1,2 and all selected columns as included columns
The optimizer does not seem to recognize that we try to select values 1 and 2, and does not use the index or start with table B (trying to force with subqueries or changing table order do not help, only the hint OPTION (FORCE ORDER) can convince the optimizer, but this we do not want)
Only when we hard code the B.ID or A.PSTY_ID values 1 and 2 in the where clause the optimizer takes the correct path, starting with table B.
If we do not do this, it starts to join table A with table C, and only then with table B, leading to vastly more processing time (approx 50X)
We also tried to declare the values and using them as variables, but still no luck.
Would anyone know if this is a known issue, or if this can be worked around ?
Your filtered index will not be used in this case unless you include values 1 and 2 in the where clause, you cannot change this even if you try to join with the table that ONLY has 1,2 in its rows.
Filtered index will never be used based on some "assumptions" of what values some table (physical or derived like CTE or subquery), and in fact your subquery did not help.
So if you want to use it, you should add the where condition equivalent to those of filtered index to your query.
Since you don't want to add this condition, but still want to change join order of your tables starting with B table you can use temporary table/table variable like this:
select [ID]
,[CODE]
,[DESCRIPTION]
,[EVEN_ID]
into #tmp
from B
where ([CODE] = 'nopr' OR [CODE] = 'inpr') and [ACTIVE] = 1
And now use this #tmp instead of B in your query.

SQL Query returns more

I'm having a bit of a problem with a SQL Query that returns too many results. I'm fairly new to SQL so please bear with me.
Please see the following:
Table Structures
The Query that I use looks like:
SELECT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
I just want the results from Table_B and not what it's giving me.
Please explain this to me as I have spent 3 days on it non-stop.
What am I missing?
You want data from TABLE_B? Then select from it only and have the conditions on the other tables in your where clause.
The inner joins on the other tables serve as existence tests, I assume? Don't do that. You'd only multiply your records, just as you are doing now, only to have to dismiss duplicates later. That can cause bad performance on large tables and errors in more complicated queries. Use EXISTS or IN instead.
select *
from table_b
where item_status <> 'C'
and (common_id, seq_3c) in
(
select common_id, seq_3c
from table_a
where checklist_status = 'I'
and admin_function = 'ADMA'
and checklist_cd = 'APPL'
)
and common_id in
(
select EMPLID
from table_c
where admit_term = '2171'
and institution = 'SOMEWHERE'
);
SELECT DISTINCT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
This should be easy to understand without looking at all your tables and output.
Suppose you join two tables, A and B, on a column id. You only want the columns from table B, and in table B the `id' column is a unique identifier.
Even so, if in table A an id (the same id) appears five times, the join will have five rows for that id. Then you just select the columns from table B, so it will look like you got the same row five different times.
Perhaps you don't really need a join? What is your underlying problem you are trying to solve?
It's hard to answer this question without more information about why you're executing these joins. I can explain why you're getting the results you're getting, and hopefully that will allow you to solve the problem yourself.
You start, in your FROM clause, with table A. You join this table with table B on matching COMMON_ID, which, based on the tables you provide, returns three matches for the one record you have in table A. This increases your result set size to three records. Next, you join these three records with table C, on matching ID. Because all ID's are, in fact, identical, this returns nine matches for every record in your current result set: you now have 9 x 3 = 27 records in your result set.
Finally, the WHERE clause comes into effect. This clause excludes 6 out of 9 records in table C, so you have 3 of those records left. Your final result set is therefore 1 (table A) x 3 (table B) x 3 (table C) = 9 records.

SQL Perfomance : Subselects in joins or direct joins?

I have got a question on the performance of the below tables\
Table A -- Has only 5 customer ID's(5 Rows 1 column)
Table B -- Is the master base for all Customer's and their information.(1 Million Rows and 500 Columns)
Query 1:-
Select A.*,
B.Age
from A
left join B
on A.Customer_id = B.Customer_id;
Query 2:-
Select a.*,
B.Age
from A
left join
(select Customer_id,age from B) C
on A.Customer_id = C.Customer_id;
The main question of performance here is because of the presence of 500 columns in Table B.
I feel the 2nd Query is better as SQL wont have to create a temporary table during the join containing all columns from table B.
Please let me know if this is wrong?
I feel the 2nd Query is better as SQL wont have to create a temporary table during the join containing all columns from table B.
You can tell whether Oracle does create a temporary table during the execution or not from the explain plan. You should also consider whether the Oracle kernel developers would not have got round such an obvious performance problem if it existed.
As it happens, there will be no temporary table, and there is nothing wrong with your first query. There is almost never a need to manipulate the query for performance reasons -- write queries that are the best encapsulation of the logic you require.
CREATE INDEX index_name ON table_b (customer_id)
then use
Select a.*,
B.Age
from A
left join (select Customer_id,
age
from B) C
on A.Customer_id = C.Customer_id;
500 columns is rather extensive.
Maybe you can create an index like:
CREATE INDEX index_name
ON table_b (customer_id,
age
);
sub query in select is faster than using join (no matter if direct join or sub select)
select
a.*,
(select b.age
from b
where b.customer_id = a.customer_id)
from a
note:
it behaves like outer join (return empty field in age if customer_id from b doesn't exists in a)
the sub query should return only one row from b per row from a.

How to get names present in both views?

I have a very large view containing 5 million records containing repeated names with each row having unique transaction number. Another view of 9000 records containing unique names is also present. Now I want to retrieve records in first view whose names are present in second view
select * from v1 where name in (select name from v2)
But the query is taking very long to run. Is there any short cut method?
Did you try just using a INNER JOIN. This will return all rows that exist in both tables:
select v1.*
from v1
INNER JOIN v2
on v1.name = v2.name
If you need help learning JOIN syntax, here is a great visual explanation.
You can add the DISTINCT keyword which will remove any duplicate values that the query returns.
use JOIN.
The DISTINCT will allow you to return only unique records from the list since you are joining from the other table and there could be possibilities that a record may have more than one matches on the other table.
SELECT DISTINCT a.*
FROM v1 a
INNER JOIN v2 b
ON a.name = b.name
For faster performance, add an index on column NAME on both tables since you are joining through it.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins