How to find the most frequent value in a select statement as a subquery? - sql

I am trying to get the most frequent Zip_Code for the Location ID from table B. Table A(transaction) has one A.zip_code per Transaction but table B(Location) has multiple Zip_code for one area or City. I am trying to get the most frequent B.Zip_Code for the Account using Location_D that is present in both table.I have simplified my code and changed the names of the columns for easy understanding but this is the logic for my query I have so far.Any help would be appreciated. Thanks in advance.
Select
A.Account_Number,
A.Utility_Type,
A.Sum(usage),
A.Sum(Cost),
A.Zip_Code,
( select B.zip_Code from B where A.Location_ID= B.Location_ID having count(*)= max(count(B.Zip_Code)) as Location_Zip_Code,
A.Transaction_Date
From
Transaction_Table as A Left Join
Location Table as B On A.Location_ID= B.Location_ID
Group By
A.Account_Number,
A.Utility_Type,
A.Zip_Code,
A.Transaction_Date

This is what I come up with:
Select tt.Account_Number, tt.Utility_Type, Sum(tt.usage), Sum(tt.Cost),
tt.Zip_Code,
(select TOP 1 l.zip_Code
Location_Table l
where tt.Location_ID = l.Location_ID
group by l.zip_code
order by count(*) desc
) as Location_Zip_Code,
tt.Transaction_Date
From Transaction_Table tt
Group By tt.Account_Number, tt.Utility_Type, tt.Zip_Code, tt.Transaction_Date;
Notes:
Table aliases are a good thing. However, they should be abbreviations for the tables referenced, rather than arbitrary letters.
The table alias qualifies the column name, not the function. Hence sum(tt.usage) rather than tt.sum(usage).
There is no need for a join in the outer query. You are doing all the work in the subquery.
An order by with top seems the way to go to get the most common zip code (which, incidentally, is called the mode in statistics).

Related

Subtracting values of columns from two different tables

I would like to take values from one table column and subtract those values from another column from another table.
I was able to achieve this by joining those tables and then subtracting both columns from each other.
Data from first table:
SELECT max_participants FROM courses ORDER BY id;
Data from second table:
SELECT COUNT(id) FROM participations GROUP BY course_id ORDER BY course_id;
Here is some code:
SELECT max_participants - participations AS free_places FROM
(
SELECT max_participants, COUNT(participations.id) AS participations
FROM courses
INNER JOIN participations ON participations.course_id = courses.id
GROUP BY courses.max_participants, participations.course_id
ORDER BY participations.course_id
) AS course_places;
In general, it works, but I was wondering, if there is some way to make it simplier or maybe my approach isn't correct and this code will not work in some conditions? Maybe it needs to be optimized.
I've read some information about not to rely on natural order of result set in databases and that information made my doubts to appear.
If you want the values per course, I would recommend:
SELECT c.id, (c.max_participants - COUNT(p.id)) AS free_places
FROM courses c LEFT JOIN
participations p
ON p.course_id = c.id
GROUP BY c.id, c.max_participants
ORDER BY 1;
Note the LEFT JOIN to be sure all courses are included, even those with no participants.
The overall number is a little tricker. One method is to use the above as a subquery. Alternatively, you can pre-aggregate each table:
select c.max_participants - p.num_participants
from (select sum(max_participants) as max_participants from courses) c cross join
(select count(*) as num_participants from participants from participations) p;

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

NTILE Function and Using Inner Join in Oracle

I am supposed to use the given Database(Its pretty huge so I used codeshare) to list last names and customer numbers of top 5% of customers for each branch. To find the top 5% of customers, I decided to use the NTILE Function, (100/5 = 20, hence NTILE 20). The columns are pulled from two separate tables so I used Inner joins. For the life of me, I honesly cannot figure out where I am going wrong. I keep getting "missing expression" errors but Do not know what exactly I am missing. Here is the Database
Database: https://codeshare.io/5XKKBj
ERD: https://drive.google.com/file/d/0Bzum6VJXi9lUX1d2ZkhudTE3QXc/view?usp=sharing
Here is my SQL Query so far.
SELECT
Ntile(20) over
(partition by Employee.Branch_no
order by sum(ORDERS.SUBTOTAL) desc
) As Top_5,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME
FROM
CUSTOMER
INNER JOIN ORDERS
ON
CUSTOMER.CUSTOMER_NO = ORDERS.CUSTOMER_NO
GROUP BY
ORDERS.SUBTOTAL,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME;
You need to join Employee and the GROUP BY must include all non-aggregated expressions. You can use a subquery to generate the subtotals and get the NTILE in the outer query, e.g.:
SELECT
Ntile(20) over
(partition by BRANCH_NO
order by sum_subtotal desc
) As Top_5,
CUSTOMER_NO,
LNAME
FROM (
SELECT
EMPLOYEE.BRANCH_NO,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME,
sum(ORDERS.SUBTOTAL) as sum_subtotal
FROM CUSTOMER
JOIN ORDERS
ON CUSTOMER.CUSTOMER_NO = ORDERS.CUSTOMER_NO
JOIN EMPLOYEE
ON ORDERS.EMPLOYEE_NO = EMPLOYEE.EMPLOYEE_NO
GROUP BY
EMPLOYEE.BRANCH_NO,
CUSTOMER.CUSTOMER_NO,
CUSTOMER.LNAME
);
Note: you might want to include BRANCH_NO in the select list as well, otherwise the output will look confusing with duplicate customers (if a customer has ordered from employees in multiple branches).
Now, if you want to filter the above query to just get the top 5%, you can put the whole thing in another subquery and add a predicate on the Top_5 column, e.g.:
SELECT CUSTOMER_NO, LNAME
FROM (... the query above...)
WHERE Top_5 = 1;

SQL grouping. How to select row with the highest column value when joined. No CTEs please

I've been banging my head against the wall for something that I think should be simple but just cant get to work.
I'm trying to retrieve the row with the highest multi_flag value when I join table A and table B but I can't seem to get the SQL right because it returns all the rows rather than the one with the highest multi_flag value.
Here are my tables...
Table A
Table B
This is almost my desired result but only if I leave out the value_id row
SELECT CATALOG, VENDOR_CODE, INVLINK, NAME_ID, MAX(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE AS A
INNER JOIN TBLATTRIBUTE_VALUE AS B
ON A.VALUE_ID = B.VALUE_ID
GROUP BY CATALOG, VENDOR_CODE, INVLINK, NAME_ID
ORDER BY CATALOG DESC
This is close to what I want to retreive but not quite notice how it returns unique name_id and the highest multi_flag but I also need the value_id that belongs to such multi_flag / name_id grouping...
If I include the value_id in my SQL statement then it returns all rows and is no longer grouped
Notic ein the results below how it no longer returns the row for the highest multi_flag and how all the different values for name_id (Ex. name_id 1) are also returned
You can choose to use a sub-query, derived table or CTE to solve this problem. Performance will be depending on the amount of data you are querying. To achieve your goal of getting the max multiflag you must first get the max value based on the grouping you want to achieve this you can use a CTE or sub query. The below CTE will give the max multi_flag by value that you can use to get the max multi_flag and then you can use that to join back to your other tables. I have three joins in this example but this can be reduce and as far a performance it may be better to use a subquery but you want know until you get the se the actual execution plans side by side.
;with highest_multi_flag as
(
select value_id, max(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE
group by value_id
)
select A.CATALOG, a.VENDOR_CODE, a.INVLINK, b.NAME_ID,m.multiflag
from highest_multi_flag m
inner join TBLINVENT_ATTRIBUTE AS A on a.VALUE_ID =b. m.VALUE_ID
INNER JOIN TBLATTRIBUTE_VALUE AS B ON m.VALUE_ID = B.VALUE
You can use Lateral too, its an other solution
SELECT
A.CATALOG, A.VENDOR_CODE, A.INVLINK, B.NAME_ID, M.maxmultiflag
FROM TBLINVENT_ATTRIBUTE AS A
inner join lateral
(
select max(B.multi_flag) as maxmultiflag from TBLINVENT_ATTRIBUTE C
where A.VALUE_ID = C.VALUE_ID
) M on 1=1
INNER JOIN TBLATTRIBUTE_VALUE AS B ON M.maxmultiflag = B.VALUE

comparison of two aggregate function without group by clause

suppose we have 3 tables,1st table is the master table below is the details
MasterTable(Id,col1,col2)
ChildTable1(Id,MasterId,Amount1)
ChildTable2(Id,MasterId,Amount2)
i have to find the id from mastertable where sum(chiletable1.Amount1) should be greater than sum(chiletable2.Amount2)
i have written the query using groupby and having clause which givs the right data,
is there any other way to write the same query without groupby keeping in mind the performance issue if there is billons of records in the table.
following is my query
select Mastertable.Id
from Mastertable,Childtable1,ChildTable2
where Mastertable.Id=Childtable1.MasterId
and ChildTable1.MasterId=ChildTable2.MasterId
group by MasterTable.Id
having SUM(Childtable1.Amount1) > SUM(Childtable2.Amount2)
To answer your question, no.
The reason is simple: Without a GROUP BY you can't have SUM()s. (except maybe a grand total)
(EDIT: Well, maybe a stored procedure with loops, cursors and the like, but that would definitely be an overkill! ;) )
Well, you're going to have to group at some point, so I don't know why you don't want to group at all.
You could try grouping by the IDs in the child tables:
SELECT M.Id
FROM Mastertable M
INNER JOIN (SELECT MasterID,
SUM(Amount) Amount
FROM Childtable1
GROUP BY MasterID) C1
ON M.Id = C1.MasterID
INNER JOIN (SELECT MasterID,
SUM(Amount) Amount
FROM Childtable2
GROUP BY MasterID) C2
ON M.Id = C2.MasterID
WHERE C1.Amount > C2.Amount
If there's either a foreign key back to MasterTable or an index on MasterID in the child tables this is likely going to be the fastest way without pre-aggregating via snapshots or some other method.