Say I have two tables, employee (with a character column id along with some other columns) and orders, with a column employeeid and partno, along with some other ids.
Is it more efficient to say
select *
from employees, orders
where id = employeeid
and id = '22222'
or
select *
from employees, orders
where id = '22222'
and employeeid = '22222'
or does it not really matter?
It looks like you're trying to do a join and in that case the second form isn't going to work as expected, it's going to produce an error. The first would look like this once qualified:
select * from employees, orders where orders.id = employees.id and employees.id = '22222'
The first equality is really specifying the condition to use when selecting rows from the orders table to join with a particular row from the employees table. The second is the one that's a conventional where selection criteria.
The second check is unnecessary. Oracle is smart enough to know that if A=B and B=C then A=C. You don't need to tell it.
The best way to write that query is
SELECT *
FROM employees e
INNER JOIN orders o ON o.employeeid = e.id
WHERE e.id = '22222'
Even if the best possible plan is to start by reading all the rows from orders where employeeid is '22222', Oracle will know to do it even though you didn't explicitly specify the condition
AND o.employeeid = '22222'
Oracle infers the condition from the others that you did specify.
Related
I am working from two tables in a dataset. Let's call the first one 'Demographic_Info', the other 'Study_Info'. The two tables both have a Subject_ID column. How can I run a query that will return all of the Subject_IDs where Sex = Male (from Demographic_Info) but also where the Study Case = Case (from Study_Info)?
Is this an inner join? Do I need to make a combined table?
I just don't know what function to use. I know how to select for each of these conditions in each table individually, but not how to run them against eachother.
Yes, you will want to inner join and then use the where clause to filter on both tables.
select
s.Subject_ID
from `Study_info` s
inner join `Demographic_info` d on s.Subject_ID = d.Subject_ID
where d.Sex = 'Male'
and s.Study_Case = 'Case' -- Unclear from your question about the actual field name
The aliases s and d will be useful for organizing which table each field comes from (or if the same field occurs in both tables).
Similarly, you could filter first and then perform the join.
with study as (select * from `Study_info` where Study_Case = 'Case'),
demographics as (select * from `Demographic_info` where Sex = 'Male')
select s.Subject_ID
from study s
inner join demographics d on s.Subject_ID = d.Subject_ID
I'm trying to build an SQL query where I grab one table's information (WHERE shops.shop_domain = X) along with the COUNT of the customers table WHERE customers.product_id = 4242451.
The shops table DOES NOT have product.id in it, but the customers table DOES HAVE the shop_domain in it, hence my attempt to do some sort of join.
I essentially want to return the following:
shops.id
shops.name
shops.shop_domain
COUNT OF CUSTOMERS WHERE customers.product_id = '4242451'
Here is my not so lovely attempt at the query.
I think I have the idea right (maybe...) but I can't wrap my head around building this query.
SELECT shops.id, shops.name, shops.shop_domain, COUNT(customers.customer_id)
FROM shops
LEFT JOIN customers ON shops.shop_domain = customers.shop_domain
WHERE shops.shop_domain = 'myshop.com' AND
customers.product_id = '4242451'
GROUP BY shops.shop_id
Relevant database schemas:
shops:
id, name, shop_domain
customers:
id, name, product_id, shop_domain
You are close. The condition on customers needs to go in the ON clause, because this is a LEFT JOIN and customers is the second table:
SELECT s.id, s.name, s.shop_domain, COUNT(c.customer_id)
FROM shops s LEFT JOIN
customers c
ON s.shop_domain = c.shop_domain AND c.product_id = '4242451'
WHERE s.shop_domain = 'myshop.com'
GROUP BY s.id, s.name, s.shop_domain;
I am also inclined to include all three columns in the GROUP BY, although Postgres (and ANSI/ISO standards) are happy with just id if it is declared as the primary key in the table.
A correlated subquery should be substantially cheaper (and simpler) for the purpose:
SELECT id, name, shop_domain
, (SELECT count(*)
FROM customers
WHERE shop_domain = s.shop_domain
AND product_id = 4242451) AS special_count
FROM shops s
WHERE shop_domain = 'myshop.com';
This way you only need to aggregate in the subquery, and need not worry about undesired effects on the outer query.
Assuming product_id is a numeric data type, so I use a numeric literal (4242451) instead of a string literal '4242451' - which might cause problems otherwise.
I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.
I am supposed to use the given Database(Its pretty huge so I used codeshare) to list last names and customer numbers of top 5% of customers for each branch. To find the top 5% of customers, I decided to use the NTILE Function, (100/5 = 20, hence NTILE 20). The columns are pulled from two separate tables so I used Inner joins. For the life of me, I honesly cannot figure out where I am going wrong. I keep getting "missing expression" errors but Do not know what exactly I am missing. Here is the Database
Database: https://codeshare.io/5XKKBj
ERD: https://drive.google.com/file/d/0Bzum6VJXi9lUX1d2ZkhudTE3QXc/view?usp=sharing
Here is my SQL Query so far.
SELECT
Ntile(20) over
(partition by Employee.Branch_no
order by sum(ORDERS.SUBTOTAL) desc
) As Top_5,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME
FROM
CUSTOMER
INNER JOIN ORDERS
ON
CUSTOMER.CUSTOMER_NO = ORDERS.CUSTOMER_NO
GROUP BY
ORDERS.SUBTOTAL,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME;
You need to join Employee and the GROUP BY must include all non-aggregated expressions. You can use a subquery to generate the subtotals and get the NTILE in the outer query, e.g.:
SELECT
Ntile(20) over
(partition by BRANCH_NO
order by sum_subtotal desc
) As Top_5,
CUSTOMER_NO,
LNAME
FROM (
SELECT
EMPLOYEE.BRANCH_NO,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME,
sum(ORDERS.SUBTOTAL) as sum_subtotal
FROM CUSTOMER
JOIN ORDERS
ON CUSTOMER.CUSTOMER_NO = ORDERS.CUSTOMER_NO
JOIN EMPLOYEE
ON ORDERS.EMPLOYEE_NO = EMPLOYEE.EMPLOYEE_NO
GROUP BY
EMPLOYEE.BRANCH_NO,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME
);
Note: you might want to include BRANCH_NO in the select list as well, otherwise the output will look confusing with duplicate customers (if a customer has ordered from employees in multiple branches).
Now, if you want to filter the above query to just get the top 5%, you can put the whole thing in another subquery and add a predicate on the Top_5 column, e.g.:
SELECT CUSTOMER_NO, LNAME
FROM (... the query above...)
WHERE Top_5 = 1;
When i written the query like the following.. It's written the combination of all the records.
What's the mistake in the query?
SELECT ven.vendor_code, add.address1
FROM vendor ven INNER JOIN employee emp
ON ven.emp_fk = emp.id
INNER JOIN address add
ON add.emp_name = emp.emp_name;
Using inner join, you've to put all the links (relations) between two tables in the ON clause.
Assuming the relations are good, you may test the following queries to see if they really make the combination of all records:
SELECT count(*)
from vendor ven
inner join employee emp on ven.emp_fk = emp.id
inner join address add on add.emp_name = emp.emp_name;
SELECT count(*)
add.address1
from vendor ven, employee emp, address add
If both queries return the same result (which I doubt), you really have what you say.
If not, as I assume, maybe you are missing a relation or a restriction to filter the number of results.