SQL: Distinct and OrderBy issue - sql

I'm currently working on a query that should return all the rows from CONCENTRATOR table. However, it has to be sortable by all concentrator's columns, as well as department name and type name.
Here are the concentrator's columns :
CONCENTRATOR_ID
NAME
INTERNALADDRESS
TYPE_ID
DEPARTMENT_ID
TYPE_ID and DEPARTMENT_ID are linked to respectively DEPARTMENT table and TYPE table, with both a NAME column.
Here are the constraints :
concentrators are sortable by id, name, address, type's name and department's name
distinct department names (if a same concentrator has 2 department, return only one row)
To resume, I would need something like SELECT * on concentrator columns, and DISTINCT department.name as well but seems complicated... I tried lots of requests but couldn't find any one working.
Could anybody help me please ?
The request I'm looking for should be something like this:
SELECT DISTINCT d.NAME as "department.name", t.NAME as "type.name", *
FROM "CONCENTRATOR" c
LEFT OUTER JOIN "CONCENTRATOR_GROUP" USING(CONCENTRATOR_ID)
LEFT OUTER JOIN "GROUP" g USING(GROUP_ID)
LEFT OUTER JOIN "TYPE" t USING(TYPE_ID)
LEFT OUTER JOIN "DEPARTMENT" d USING(DEPARTMENT_ID)
ORDER BY TRIM(UPPER(c.name)) ASC

There are a few things to note here. I'm really not fond of "natural joins" as they simply disguise useful detail in my view, so I have not used them. I had to assume that the table "GROUP" joins via CONCENTRATOR_GROUP for an example of that missing detail.
The table name "GROUP" isn't a great idea as it is a very commonly used reserved word. I'd not recommend using such a word as a table name. Due to this "GROUP" is quoted (it isn't normal to quote object names in Oracle my experience).
You talk about "distinct" as if it has some magical quality that I should intuitively understand. It doesn't, and I don't. Let's say there are just 2 departments both are also "distinct"
DeptX
DeptY
So now let's assume there are 2 concentrators, both of these are "distinct" too:
ConcenA
ConcenB
Both concentrators are used in both departments, so we produce this query:
select distinct
c.name as c_name, d.name as d_name
from concentrators c
inner join departments d on c.dept_id=d.dept_id
The result is:
ConcenA DeptX
ConcenB DeptX
ConcenA DeptY
ConcenB DeptY
All 4 rows are "distinct"
The point is that "select distinct" is a "row operator", i.e. it considers the entire row to determine if any part of the row is different to all other rows. There are no subtleties or options to "select distinct", it always works the same way (over the entire row). So, with this in mind, we now know that "select distinct" simply is not going to be the right technique (and due to the technical definition of distinct you might also sense it isn't a good way to describe your problem either).
So, as "select distinct" isn't the right technique typically one can turn to these as techniques: "group by" or "row_number()"
because these do give us subtleties and options.
Now you haven't explained why or how you would choose just one department (in fact, to me, it sounds weird you would choose just one) but below I offer you A way to do this using row_number() and the "subtlety" being used is the ORDER BY which gives the number 1 to the first Department Name in alphabetic order, all other departments get more than 1; and this occurs for each CONCENTRATOR_ID because row_number() is "partitioned by" that field.
SELECT
department_name
, type_name
, NAME
, CONCENTRATOR_ID
, INTERNALADDRESS
, TYPE_ID
, DEPARTMENT_ID
FROM (
SELECT
d.NAME AS department_name
, t.NAME AS type_name
, c.CONCENTRATOR_ID
, c.NAME
, c.INTERNALADDRESS
, c.TYPE_ID
, c.DEPARTMENT_ID
, ROW_NUMBER() OVER (PARTITION BY c.CONCENTRATOR_ID
ORDER BY d.NAME, t.NAME, c.NAME) AS RN
FROM CONCENTRATOR c
LEFT OUTER JOIN CONCENTRATOR_GROUP cg
ON c.CONCENTRATOR_ID = cg.CONCENTRATOR_ID
LEFT OUTER JOIN "GROUP" g
ON cg.GROUP_ID = g.GROUP_ID
LEFT OUTER JOIN TYPE t
ON c.TYPE_ID = t.TYPE_ID
LEFT OUTER JOIN DEPARTMENT d
ON c.DEPARTMENT_ID = c.DEPARTMENT_ID
) sq
WHERE RN = 1 /* HERE is where we restrict output to one department per concentrator */
ORDER BY
NAME ASC
, CONCENTRATOR_ID
;
I have no reason to change the type of joins as you can see they remain as left outer joins - but I suspect there may be no valid reason for all or some of these. Do use the more efficient INNER JOIN if you can.

Related

How to avoid duplicates between two tables on using join?

I have two tables work_table and progress_table.
work_table has following columns:
id[primary key],
department,
dept_name,
dept_code,
created_time,
updated_time
progress_table has following columns:
id[primary key],
project_id,
progress,
progress_date
I need only the last updated progress value to be updated in the table now am getting duplicates.
Here is the tried code:
select
row_number() over (order by a.dept_code asc) AS sno,
a.dept_name,
b.project_id,
p.physical_progress,
DATE(b.updated_time) as updated_date,
b.created_time
from
masters.dept_users as a,
work_table as b
LEFT JOIN
progress as p on b.id = p.project_id
order by
a.dept_name asc
It shows the duplicate values for progress with the same id how to resolve it?[the progress values are integer whose values are feed to the form]
Having reformatted your query, some things become clear...
You've mixed , and JOIN syntax (why!?)
You start with the masters.dept_users table, but don't mention it in your description
You have no join predicate between dept_users and work_table
You calculate an sno, but have no partition by and never use it
Your query includes columns not mentioned in the table descriptions above
And to top it off, you use meaningless aliases like a and b? Please for the love of other, and your future self (who will try to read this one day) make the aliases meaningful in Some way.
You possibly want something like...
WITH
sorted_progress AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY project_id
ORDER BY progress_date DESC -- This may need to be updated_time, your question is very unclear
)
AS seq_num
FROM
progress
)
SELECT
<whatever>
FROM
masters.dept_users AS u
INNER JOIN
work_table AS w
ON w.user_id = u.id -- This is a GUESS, but you need to do SOMETHING here
LEFT JOIN
sorted_progress AS p
ON p.project_id = w.id -- Even this looks suspect, are you SURE that w.id is the project_id?
AND p.seq_num = 1
That at least shows how to get that latest progress record (p.seq_num = 1), but whether the other joins are correct is something you'll have to double (and triple) check for yourself.

Best Way to join 1 to many tables

I have two tables. First one is name of all members and second is all projects and its team members of different roles.
Table 1 : [members] id, name
Table 2 : [projects] id, proj_name, sponsor (fk1_tbl_1), proj_mgr(fk2_tbl_1) , proj_co (fk3_tbl_1)
I created a query to show the project name and names of all project roles.
I am doing three joins with two sub-queries in order to achieve this.
I want to know if there is better ways to do this (in pure sql, NOT script languages like pl/sql).
select f.proj_name, f.proj_sponsor, f.proj_mgr, e.name proj_co
from
name e,
(
select
d.proj_name, d.proj_sponsor, c.name proj_mgr, d.proj_co
from
members c,
(
select
b.proj_name, a.name proj_sponsor, b.proj_mgr mgr, b.proj_co co
from
members a, projects b
where
b.sponsor = a.id
) d
where
c.id = d.mgr
) f
where
e.id = f.proj_co
Use join and join again:
select p.*, ms.name as sponsor, mm.name as manager, mc.name as co_name
from projects p left join
members ms
on p.sponsor = ms.id left join
members mm
on p.manager = mm.id left join
members mo
on p.proj_co = mo.id;
Notes:
This uses left join in case any values are missing. The project will still be returned.
Never use commas in the FROM clause.
Always use proper, explicit, standard JOIN syntax.
Use meaningful table aliases, rather than arbitrary letters.

What Join to use against 2 Tables for All Data

Hi I am looking to find out what join I would use if I wanted to join 2 tables together. I currently have a list of all students so 25 students to 1 class and the other table only shows 7 of those names with their test results.
What I would like is to have 1:1 join for the ones with the test results and the other ones without I would like to show them underneath so all in all I have 20 records.
If somebody could please advise on how I could achieve this please.
Thanks in advance.
It sounds like you want an OUTER JOIN.
For this example, we'll assume that there is a table named student and that it contains a column named id which is UNIQUE (or PRIMARY) KEY.
We'll also assume that there is another table named test_result which contains a column named student_id, and that column is a foreign key referencing the id column in student.
For demonstration purposes, we'll just make up some names for the other columns that might appear in these tables, name and score.
SELECT s.id
, s.name
, r.score
FROM student s
LEFT
JOIN test_result r
ON r.student_id = s.id
ORDER
BY r.student_id IS NULL
, s.score DESC
, s.id
Note that if student_id is not unique in test_result, there is potential to return multiple rows that match a row in student.
To get (at most) one row returned from test_result per student, we could use an inline view.
SELECT s.id
, s.name
, r.score
FROM student s
LEFT
JOIN ( SELECT t.student_id
, MAX(t.score) AS score
FROM test_result t
GROUP BY t.student_id
) r
ON r.student_id = s.id
ORDER
BY r.student_id IS NULL
, s.score DESC
, s.id
The expressions in the ORDER BY clause are designed to return the students that have matching row(s) in test_result first, followed by students that don't.
This is just a demonstration, and very likely excludes some important criteria, such as which test a score should be returned for. But without a sample schema and some example data, we're just guessing.
You are looking for a left outer join or a full outer join.
The left outer join will show all students and their tests if they have them.
select *
from Students as s
left outer join Tests as t
on s.StudentId = t.StudentId
The full outer join will show all students with their tests if they have them, and tests even if they do not have students.
select *
from Students as s
full outer join Tests as t
on s.StudentId = t.StudentId

Selecting same column twice from a single table but with different conditions

I would like to display names and numbers of employees with numbers and names of their bosses, like below:
There is only one table:
I tried this so far:
SELECT
ID,
Name,
Boss,
(SELECT Name FROM Employees WHERE ID IN (SELECT Boss FROM Employees))
FROM Employees
But it gives me an error:
"Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as
an expression"
You need a self-join; something like:
Select a.ID, a.Name, b.ID as Boss, b.Name as BossName
from Employees A
left join Employees B
on a.Boss = b.ID
One way to do this is to limit the sub-query to return the first result only using a TOP clause. You need to join it to the main table as well:
SELECT
e.ID,
e.Name,
e.Boss,
(SELECT top 1 Name FROM Employees b where b.ID = e.Boss) as BossName
FROM Employees e
I actually had to create a similar split in my Azure DB. Separating genders (men & Women) using their title.
Here is the bit of code.
select
(Select count(b.TITLE) from SalesLT.Customer b Where Title LIKE 'Mr%') as women
,(Select count(c.TITLE) from SalesLT.Customer c Where Title LIKE 'Ms%') as men
,(Select count(d.TITLE) from SalesLT.Customer d Where Title NOT LIKE 'Ms%' AND Title NOT LIKE 'Mr%' ) as unidentified
,count(a.TITLE) as Total
from SalesLT.Customer a
Result:
APH's answer is definitely the most effective solution. However, I would like to add a solution with a correlated subquery to show that you have been quite close to a working query.
SELECT
E.ID,
E.Name,
E.Boss,
(SELECT Name FROM Employees WHERE ID = E.Boss) AS BossName
FROM Employees E
The problem with your original query was that the condition in the subquery did not depend on the specific record. The subquery returned the name of every employee who is the boss of someone else.
Please note that this solution usually has a worse performance than the join because it has to execute the subquery for every item. Although the result is equivalent to that of the join many optimizers do not realize this because they do not touch subqueries.

How to find the most frequent value in a select statement as a subquery?

I am trying to get the most frequent Zip_Code for the Location ID from table B. Table A(transaction) has one A.zip_code per Transaction but table B(Location) has multiple Zip_code for one area or City. I am trying to get the most frequent B.Zip_Code for the Account using Location_D that is present in both table.I have simplified my code and changed the names of the columns for easy understanding but this is the logic for my query I have so far.Any help would be appreciated. Thanks in advance.
Select
A.Account_Number,
A.Utility_Type,
A.Sum(usage),
A.Sum(Cost),
A.Zip_Code,
( select B.zip_Code from B where A.Location_ID= B.Location_ID having count(*)= max(count(B.Zip_Code)) as Location_Zip_Code,
A.Transaction_Date
From
Transaction_Table as A Left Join
Location Table as B On A.Location_ID= B.Location_ID
Group By
A.Account_Number,
A.Utility_Type,
A.Zip_Code,
A.Transaction_Date
This is what I come up with:
Select tt.Account_Number, tt.Utility_Type, Sum(tt.usage), Sum(tt.Cost),
tt.Zip_Code,
(select TOP 1 l.zip_Code
Location_Table l
where tt.Location_ID = l.Location_ID
group by l.zip_code
order by count(*) desc
) as Location_Zip_Code,
tt.Transaction_Date
From Transaction_Table tt
Group By tt.Account_Number, tt.Utility_Type, tt.Zip_Code, tt.Transaction_Date;
Notes:
Table aliases are a good thing. However, they should be abbreviations for the tables referenced, rather than arbitrary letters.
The table alias qualifies the column name, not the function. Hence sum(tt.usage) rather than tt.sum(usage).
There is no need for a join in the outer query. You are doing all the work in the subquery.
An order by with top seems the way to go to get the most common zip code (which, incidentally, is called the mode in statistics).