Performance impact due to Join

Performance impact due to Join - sql

I have two tables PRODUCT and ACCOUNT.
PRODUCT table columns
product_id (PK)
subscription_id
ACCOUNT table columns
account_nbr
subscription_id
(account_nbr and subscription_id are primary key columns in account table)
... Other columns
I have to find account_nbr and subscription_id for a product_id.
I get product_id as input. Using it I can get the subscription_id from PRODUCT table and
using subscription_id I can get the account_nbr value from ACCOUNT table.
Instead of getting the info in two queries, can it be done in one query?
Something like the below:
select distinct a.acct_nbr,p.subscription_id
from ACCOUNT a,PRODUCT p
where v.product_id = ' val 1' and
v.subscription_id = p.subscription_id
Will the performance of the above query be low compared to two separate queries?

I have to find account_nbr and subscription_id for a product_id.
So, you're correct in your approach you need to JOIN the two result-sets together:
select p.account_nbr, p.subscription_id
from account a
join product p
on a.subscription_id = p.subscription_id
where p.product_id = :something
Some points:
You have an alias v in your query; I don't know where this came from.
Learn to use the ANSI join syntax; if you make a mistake it's a lot more obvious.
You're selecting a.acct_nbr, which doesn't exist as-is.
There's no need for a DISTINCT. ACCOUNT_NBR and PRODUCT_ID are the primary key of ACCOUNT.
Will the performance of the above query be low compared to two separate queries?
Probably not. If your query is correct and your tables are correctly indexed it's highly unlikely that whatever you've coded could beat the SQL engine. The database is designed to select data quickly.

Related

Best approach to ocurrences of ids on a table and all elements in another table

Well, the query I need is simple, and maybe is in another question, but there is a performance thing in what I need, so:
I have a table of users with 10.000 rows, the table contains id, email and more data.
In another table called orders I have way more rows, maybe 150.000 rows.
In this orders I have the id of the user that made the order, and also a status of the order. The status could be a number from 0 to 9 (or null).
My final requirement is to have every user with the id, email, some other column , and the number of orders with status 3 or 7. it does not care of its 3 or 7, I just need the amount
But I need to do this query in a low-impact way (or a performant way).
What is the best approach?
I need to run this in a redash with postgres 10.

This sounds like a join and group by:
select u.*, count(*)
from users u join
orders o
on o.user_id = u.user_id
where o.status in (3, 7)
group by u.user_id;
Postgres is usually pretty good about optimizing these queries -- and the above assumes that users(user_id) is the primary key -- so this should work pretty well.

duplicated rows in select query

I am trying to run a query to get all the customers from my database. These are my tables in a diagram :
when running the query by joining the table Companies_Customers and the Customers table based on the customerId in both tables(doesn't show in the join table in the pic), I get duplicate rows, which is not the desired outcome.
This is normal from a database standpoint since a Customer can be related to different companies (Companies can share single customer).
My question is how do I get rid of the duplication via SQL.

There can be 2 approaches to your problem.
Either only select data from Customers table:
SELECT * FROM Customers
Or select from both tables joined together, but without CompanyName and with GROUP BY CompanyCustomerId - although I highly suggest the first approach.

Which of these is preferable? Adding a column to the table or using sub-query to get data?

What I meant was, there's a Table A having 5 columns. I have a SP that I use to get 3 columns from Table A and one column from Table B. Now, would it be better to add the column from Table B to Table A or use a sub-query in that SP to get that column from Table B?

Your question is still confusing and it very much looks like you haven't understood how to use a relational database. So let me try to explain:
Let's say you have two tables:
client
client_number
first_name
last_name
date_of_birth
...
order
order_number
client_number
order_date
...
These are two separate tables, so as to have a normalized relational database. The order table contains the client number, so you can look up the clients name and date of birth in the client table. The date of birth may be important in order know whether the client is allowed to order certain articles. But you don't want to store the date of birth with every order - it doesn't change.
If you want to look up the age you can use a sub query:
select
order_number,
order_date,
quantity,
(
select date_of_birth
from client c
where c.client_number = o.client_number
)
from order o
where item = 'whisky';
but most often you would simply join the tables:
select
o.order_number,
o.order_date,
o.quantity,
c.date_of_birth,
c.first_name,
c.last_name
from order o
join client c on c.client_number = o.client_number;
You would however not change your tables and invite redundancy with all its problems only not to have to join. You design your database such that it is a well-formed relational database, not such that it makes your latest query easy to write. It is very, very common to use joins and subqueries, and having to use them usually shows that you built your database well.
I think this a good time you look up database normalization, e.g. in Wikipedia https://en.wikipedia.org/wiki/Database_normalization.

SQL query with loop

I am having trouble with writing a SQL query in Access with foreign keys. There are two tables, 'Customers'(ID, Name, Surname) and 'Orders'(ID, Customer, Date, Volume). ID fields are primary, and Orders.Customer is a foreign key linked to Customers.ID, so a customer can have many orders or none.
My goal is to do a search on customers based on many criteria, one of which being if customers have at least an order which volume is superior to a certain quantity. I tried joins with SELECT DISTINCT but it still gives me duplicate results, plus I had to create an empty order for every customer without orders if the query didn't use the above condition.
Does anyone have an idea about that? Maybe some special instruction on foreign keys or 2 separate queries?

Based on the information you give, i only can give you hints on what I think you're doing/understanding wrong :
SELECT DISTINCT does select you a unique record, not a unique value, so if your statement selects all fields (*), distinct won't help you much there.
My guess is you had to create an empty order for each customer because you used INNER JOIN, try LEFT OUTER JOIN instead
For example :
SELECT DISTINCT Customers.*
FROM Customers
LEFT OUTER JOIN Orders
ON (Orders.Customer = Customers.id)
WHERE Volume > put_your_value

SQL aggregation question

I have three tables:
unmatched_purchases table:
unmatched_purchases_id --primary key
purchases_id --foreign key to events table
location_id --which store
purchase_date
item_id --item purchased
purchases table:
purchases_id --primary key
location_id --which store
customer_id
credit_card_transactions:
transaction_id --primary key
trans_timestamp --timestamp of when the transaction occurred
item_id --item purchased
customer_id
location_id
All three tables are very large. The purchases table has 590130404 records. (Yes, half a billion) Unmatched_purchases has 192827577 records. Credit_card_transactions has 79965740 records.
I need to find out how many purchases in the unmatched_purchases table match up with entries in the credit_card_transactions table. I need to do this for one location at a time (IE run the query for location_id = 123. Then run it for location_id = 456) "Match up" is defined as:
1) same customer_id
2) same item_id
3) the trans_timestamp is within a certain window of the purchase_date
(EG if the purchase_date is Jan 3, 2005
and the trans_timestamp is 11:14PM Jan 2, 2005, that's close enough)
I need the following aggregated:
1) How many unmatched purchases are there for that location
2) How many of those unmatched purchases could have been matched with credit_card_transactions for a location.
So, what is a query (or queries) to get this information that won't take forever to run?
Note: all three tables are indexed on location_id
EDIT: as it turns out, the credit_card_purchases table has been partitioned based on location_id. So that will help speed this up for me. I'm asking our DBA if the others could be partitioned as well, but the decision is out of my hands.
CLARIFICATION: I only will need to run this on a few of our many locations, not all of them separately. I need to run it on 3 locations. We have 155 location_ids in our system, but some of them are not used in this part of our system.

try this (I have no idea how fast it will be - that depends on your indices)
Select Count(*) TotalPurchases,
Sum(Case When c.transaction_id Is Not Null
Then 1 Else 0 End) MatchablePurchases
From unmatched_purchases u
Join purchases p
On p.purchases_id = u.unmatched_purchases_id
Left Join credit_card_transactions c
On customer_id = p.customer_id
And item_id = u.item_id
And trans_timestamp - purchase_date < #DelayThreshold
Where Location_id = #Location

At least, you'll need more indexes. I propose at least the folloging:
An index on unmatched_purchases.purchases_id, one on purchases.location_id and
another index on credit_card_transactions.(location_id, customer_id, item_id, trans_timestamp).
Without those indexes, there is little hope IMO.

I suggest you to query ALL locations at once. It will cost you 3 full scans (each table once) + sorting. I bet this will be faster then querying locations one by one.
But if you want not to guess, you at least need to examine EXPLAIN PLAN and 10046 trace of your query...

The query ought to be straightforward, but the tricky part is to get it to perform. I'd question why you need to run it once for each location when it would probably be more eficient to run it for every location in a single query.
The join would be a big challenge, but the aggregation ought to be straightforward. I would guess that your best hope performance-wise for the join would be a hash join on the customer and item columns, with a subsequent filter operation on the date range. You might have to fiddle with putting the customer and item join in an inline view and then try to stop the date predicate from being pushed into the inline view.
The hash join would be much more efficient with tables that are being equi-joined both having the same hash partitioning key on all join columns, if that can be arranged.
Whether to use the location index or not ...
Whether the index is worth using or not depends on the clustering factor for the location index, which you can read from the user_indexes table. Can you post the clustering factor along with the number of blocks that the table contains? That will give a measure of the way that values for each location are distributed throughout the table. You could also extract the execution plan for a query such as:
select some_other_column
from my_table
where location_id in (value 1, value 2, value 3)
... and see if oracle thinks the index is useful.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas