join and group by in SQL - sql

I have two tables that I was going to join, but I understand it's more efficient to use CREATE VIEW. This is what I have:
CREATE OR REPLACE VIEW view0_joinedTablesGrouped
AS
Select table1.*,table2.*
FROM table1
inner join table2 on table1.col =
table2.matchingcol
group by table2.matchingcol;
which causes the following error:
ERROR: column "table1.col" must appear in the GROUP BY clause or be
used in an aggregate function
LINE 3: Select table.*,table2.*

Group By cannot do what you are trying to do.
Consider a simple table:
Name Age
-------
Ann 10
Bill 10
Chris 11
If you try to group by age with:
Select * from Table group by Age
What, exactly, do you expect to appear in the Name column for Age=10? Ann, or Bill or both or neither or ....? There is no good answer.
So, when you group by, every column in the output has to be an aggregate – that means a function of every row in the group.
So these are valid:
Select Age, Count(*) from Table group by Age
Select Age, Max( Length(Name)) from Table group by Age
Select Age, Max(Name) from Table group by Age
But this is impossible to do, and isn't valid:
Select Age,Name from Table group by Age
So your select * is the problem -- you can't just select column values because when you group by there's a whole group of column values for every output row, and you can't stuff all those values into one column of one row.
As for using a view, #systemjack's comment is correct.

Related

How to query how many different IDs use the same column value?

I have this homework assignment where I'm attempting to query a table to find the id numbers that are all using the same column value, let's say last name in this case. I'd like to find the ids that use the same last name more than once, and have a column that tells me the total number of unique IDs that used that same last name.
SELECT id, COUNT(*) as ID_count
FROM [table]
WHERE l_name IN
(
SELECT l_name
FROM [table]
GROUP BY l_name HAVING COUNT(*)>1
)
GROUP BY id;
This is what I have so far. It grants me the ID number, but the count(*) is not what I'm going for. What I'm instead trying to get is how many unique IDs have "Smith" as their last name, instead of all the occurrences of one specific ID that has used "Smith".
I've tried different things but I feel like I'm at a roadblock. Any hints or tips are nice; I don't need this problem solved 100%, but I feel as if I can't past the idea of using count(*).
Thanks all.
It sounds like you were already there WITHIN the inner query. Just add the count to it for the output.
SELECT
t1.id,
t1.l_name,
max( PQ.UniqCount ) UniqCount,
COUNT(*) as countForThisSingleID
FROM
[table] t1
JOIN
( SELECT
t.l_name,
COUNT( DISTINCT t.ID ) as UniqCount
FROM
[table] t
GROUP BY
t.l_name
HAVING
COUNT( DISTINCT t.ID ) > 1 ) PQ
on t1.l_name = PQ.l_name
group by
t1.id,
t1.l_name
order by
t1.l_name,
t1.id
So by doing a COUNT( DISTINCT ) on the inner pre-query (alias PQ), for each L_Name, you are getting a count of distinct IDs. I dont know if your [table] has multiple entries for the same ID in it or not, so applying the DISTINCT. Same for the HAVING clause. But at least now the inner pre-query gets the overall distinct counts for a given L_Name value.
Now, doing a JOIN to the outer table on that L_Name will get the corresponding count in the result query, along with showing the l_name that it qualified against. So if you have a table with 18 DISTINCT ID instances of John, 37 of Karen, 11 of Mike, your inner query will get those. Now joined to the outer, you will get the output of EACH instance of John and their corresponding IDs, then all Karen instance and Mike instances.
The count for the outer query is getting the count of the one ID (and name) times that it appears in the table. So if the table had ID = 5, L_Name = John and ID 5 appeared 3 times in the table, the output of his record might look like
ID L_Name countForThisSingleID UniqCount
5 John 3 18
72 John 8 18
127 John 2 18
etc...
Similarly the output would include all Karen's and Mike's within the table (and any others that qualify).
Again, without knowing if your [table] is a unique instance per ID such as a master customer lookup table where it would only appear once vs an order table where the ID may appear more than once for a single person's ID, not positive what your final answer is looking for.
But I think I have given you a bunch to chew on and run with.

Filter by number of occurrences in a SQL Table

Given the following table where the Name value might be repeated in multiple rows:
How can we determine how many times a Name value exists in the table and can we filter on names that have a specific number of occurrances.
For instance, how can I filter this table to show only names that appear twice?
You can use group by and having to exhibit names that appear twice in the table:
select name, count(*) cnt
from mytable
group by name
having count(*) = 2
Then if you want the overall count of names that appear twice, you can add another level of aggregation:
select count(*) cnt
from (
select name
from mytable
group by name
having count(*) = 2
) t
It sounds like you're looking for a histogram of the frequency of name counts. Something like this
with counts_cte(name, cnt) as (
select name, count(*)
from mytable
group by name)
select cnt, count(*) num_names
from counts_cte
group by cnt
order by 2 desc;
You need to use a GROUP BY clause to find counts of name repeated as
select name, count(*) AS Repeated
from Your_Table_Name
group by name;
If You want to show only those Which are repeated more than one times. Then use the below query which will show those occurrences which are there more than one times.
select name, count(*) AS Repeated
from Your_Table_Name
group by name having count(*) > 1;

Check if tables are identical using SQL in Oracle

I was asked this question during an interview for a Junior Oracle Developer position, the interviewer admitted it was a tough one:
Write a query/queries to check if the table 'employees_hist' is an exact copy of the table 'employees'. Any ideas how to go about this?
EDIT: Consider that tables can have duplicate records so a simple MINUS will not work in this case.
EXAMPLE
EMPLOYEES
NAME
--------
Jack Crack
Jack Crack
Jill Hill
These two would not be identical.
EMPLOYEES_HIST
NAME
--------
Jack Crack
Jill Hill
Jill Hill
If the tables have the same columns, you can use this; this will return no rows if the rows in both tables are identical:
(
select * from test_data_01
minus
select * from test_data_02
)
union
(
select * from test_data_02
minus
select * from test_data_01
);
Identical regarding what? Metadata or the actual table data too?
Anyway, use MINUS.
select * from table_1
MINUS
select * from table_2
So, if the two tables are really identical, i.e. the metadata and the actual data, it would return no rows. Else, it would prove that the data is different.
If, you receive an error, it would mean the metadata itself is different.
Update If the data is not same, and that one of the table has duplicates.
Just select the unique records from one of the table, and simply apply MINUS against the other table.
One possible solution, which caters for duplicates, is to create a subquery which does a UNION on the two tables, and includes the number of duplicates contained within each table by grouping on all the columns. The outer query can then group on all the columns, including the row count column. If the table match, there should be no rows returned:
create table employees (name varchar2(100));
create table employees_hist (name varchar2(100));
insert into employees values ('Jack Crack');
insert into employees values ('Jack Crack');
insert into employees values ('Jill Hill');
insert into employees_hist values ('Jack Crack');
insert into employees_hist values ('Jill Hill');
insert into employees_hist values ('Jill Hill');
with both_tables as
(select name, count(*) as row_count
from employees
group by name
union all
select name, count(*) as row_count
from employees_hist
group by name)
select name, row_count from both_tables
group by name, row_count having count(*) <> 2;
gives you:
Name Row_count
Jack Crack 1
Jack Crack 2
Jill Hill 1
Jill Hill 2
This tells you that both names appear once in one table and twice in the other, and therefore the tables don't match.
select name, count(*) n from EMPLOYEES group by name
minus
select name, count(*) n from EMPLOYEES_HIST group by name
union all (
select name, count(*) n from EMPLOYEES_HIST group by name
minus
select name, count(*) n from EMPLOYEES group by name)
You could merge the two tables and then subtract one of the tables from the result. If the result of the subtraction is an empty table then you know that the the tables must be the same since merge had no effect (every row and column were effectively the same)
How do I merge two tables with different column number while removing duplicates?
That link provides a good way to merge the two tables without duplicates without knowing what the columns are.
Ensure the rows are unique by adding a pseudo column
WITH t1 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees)
, t2 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees_hist)
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2
UNION ALL
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2)
Use row_number to make sure there are no duplicate rows. Now you can use minus and if there are no results, the tables are identical.
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab1
MINUS
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab2

help with query in DB2

i would like your help with my query.I have a table employee.details with the following columns:
branch_name, firstname,lastname, age_float.
I want this query to list all the distinct values of the age_float
attribute, one in each row of the result table, and beside each in the second field show the
number of people in the details table who had ages less than or equal to that value.
Any ideas? Thank you!
You can use OLAP functions:
SELECT DISTINCT age_float,
COUNT(lastname) OVER(ORDER BY age_float) AS number
FROM employee_details
COUNT(lastname) OVER(ORDER BY age_float) AS number orders rows by age, and returns employees count whose age <= current row age
or a simple join:
SELECT A.age_float, count(lastname)
FROM (SELECT DISTINCT age_float FROM employee_details) A
JOIN employee_details AS ED ON ED.age_float <= A.age_float
GROUP BY A.age_float

How can I select an entire row without selecting rows where a certain column value is duplicated?

How can I select an entire row without selecting rows where a certain column value is duplicated?
EDIT: Sorry my question is a bit vague. Assuming i have a table with two columns: names and score. There are duplicate values in names. I want to select distinct names and also select their scores.
Based on the edited information given, this will use the GROUP_CONCAT function to produce the distinct names and a comma-delimited list of scores. If more appropriate, you could substitute another aggregate function (e.g., MIN, MAX, SUM) for the GROUP_CONCAT.
SELECT Name, GROUP_CONCAT(Score)
FROM YourTable
GROUP BY Name
Using the following example data...
name score
----- -----
James 10
James 12
Lisa 45
John 42
...the following queries should return the third and fourth row.
select name, score
from table
where name in(select name
from table
group by name having count(*) = 1);
...less clear, but probable more efficient on MySQL.
select t1.name, t1.score
from (select name
from table
group by name having count(*) = 1
) t1
join table t2 on(t1.name = t2.name)
Try using the DISTINCT clause on the columns you don't want duplicates of.
based on your latest edit, try this:
select
name, SUM(score) AS TotalScore
from YourTable
group by name