SQL Select: Do rows matching id all have the same column value - sql

I have a table like this
sub_id reference
1 A
1 A
1 A
1 A
1 A
1 A
1 C
2 B
2 B
3 D
3 D
I want to make sure all the references in each group have the same reference.
Meaning, for example, all references in:
group 1 should be A
group 2 should be B
group 3 should be D
If they are not, then I would like to have returned a list of sub_id's.
So for the table above my result would be: 1
Ideally, with these conditions reference would be in a separate table with sub_id as PK, but I need to fix first for a massive dataset before I can move on restructuring the database.

You could use the following method:
select t.sub_id
from YourTable t
group by t.sub_id
having max(t.reference) <> min(t.reference)
Change YourTable to suit.

Are you looking for simple aggregation ?
select sub_id
from table t
group by sub_id
having count(distinct reference) > 1;

The query you want:
SELECT sub_id
FROM test_sub
GROUP BY sub_id HAVING count(DISTINCT reference) > 1
;
Here is what I used to test it:
CREATE TABLE `test_sub` (
sub_id int(11) NOT NULL,
reference varchar(45) DEFAULT NULL
);
INSERT INTO test_sub (sub_id, reference) VALUES
(1, 'A'),
(1, 'A'),
(1, 'A'),
(1, 'A'),
(1, 'C'),
(2, 'B'),
(2, 'B'),
(3, 'D'),
(3, 'D'),
(3, 'D'),
(4, 'E'),
(4, 'E'),
(4, 'E'),
(5, 'F'),
(5, 'G')
;

Related

Need to get Average count

I would like to get average of product=A that a client have. Say inner select return 1,2,1,4,4,4 for 6 clients
I would like to see result as 4 which means the avg product count a client can have is 4
Can somebody please confirm the following.
E.g
Select avg(count)
From (
Select count(*) as count
From Table1
Where product = A
Group by client)
as counts
Having sample data is important to getting assistance. It's still difficult to determine how your data looks. Let's assume it looks like this:
create table table1 (
client varchar(10),
product varchar(10)
);
insert into table1 values
('xxx', 'A'),
('bbb', 'A'),
('bbb', 'A'),
('ccc', 'A'),
('ddd', 'A'),
('ddd', 'A'),
('ddd', 'A'),
('ddd', 'A'),
('tt', 'A'),
('tt', 'A'),
('tt', 'A'),
('tt', 'A'),
('bdad', 'A'),
('bdad', 'A'),
('bdad', 'A'),
('bdad', 'A');
I don't have access to a DB2 database, but this query works for most dbms types. You may need to tweak to fit DB2.
select purchased as most_common_value
from (
select client, count(*) as purchased
from table1
where product = 'A'
group by client
)z
group by purchased
order by count(client) desc
limit 1
Output of query is:
most_common_value
4

SQL count total number of days by customer

I have a table customer which contains 2 columns, 1 is a customer_id column, and the other one is a date column named order_date that records what dates did the customers purchased a product. Now I want to count for how many days each customer went in and made a purchase. I tried to do the following but only got an error message saying sum(date) doesn't exist.
select customer_id, sum(order_date)
from customer;
How can I do this correctly?
---- Edit, adding the query to create table:
CREATE TABLE sales (
"customer_id" VARCHAR(1),
"order_date" DATE
);
INSERT INTO sales
("customer_id", "order_date")
VALUES
('A', '2021-01-01'),
('A', '2021-01-01'),
('A', '2021-01-07'),
('A', '2021-01-10'),
('A', '2021-01-11'),
('A', '2021-01-11'),
('B', '2021-01-01'),
('B', '2021-01-02'),
('B', '2021-01-04'),
('B', '2021-01-11'),
('B', '2021-01-16'),
('B', '2021-02-01'),
('C', '2021-01-01'),
('C', '2021-01-01'),
('C', '2021-01-07');
You'll want just this:
SELECT
customer_id,
COUNT( DISTINCT "order_date" ) AS count_days_they_bought_something
FROM
sales
GROUP BY
customer_id

Nested case statement with different conditions in T-SQL

I have below data
CREATE TABLE #EmployeeData
(
EmpID INT,
Designation VARCHAR(100),
Grade CHAR(1)
)
INSERT INTO #EmployeeData (EmpID, Designation, Grade)
VALUES (1, 'TeamLead', 'A'),
(2, 'Manager', 'B'),
(3, 'TeamLead', 'B'),
(4, 'SeniorTeamLead', 'A'),
(5, 'TeamLead', 'C'),
(6, 'Manager', 'C'),
(7, 'TeamLead', 'D'),
(8, 'SeniorTeamLead', 'B')
SELECT Designation,CASE WHEN COUNT(DISTINCT GRADE)>1 THEN 'MultiGrade' ELSE Grade END FROM
#EmployeeData
GROUP BY Designation
Desired result:
Designation Grade
--------------------------
Manager MultiGrade
TeamLead MultiGrade
SeniorTeamLead A
Note:
If designation has more than one grade then it is multigrade
If single grade is there then the particular grade
In case there is a combination with A and B then it should be A only
I tried with a query using case but I get this error:
Column '#EmployeeData.Grade' is invalid in the select list because it is not contained in either` an aggregate function or the GROUP BY clause.
Can anyone suggest the query to fetch the desired result?
As the error says, you need to aggregate the columns you are not grouping by. So use MAX and MIN (as Jeroen commented).
SELECT Designation
, CASE WHEN MAX(Grade) = 'B' AND MIN(Grade) = 'A' THEN 'A' WHEN MAX(Grade) <> MIN(Grade) THEN 'MultiGrade' ELSE MIN(Grade) END Grade
FROM #EmployeeData
GROUP BY Designation
ORDER BY Designation;
Your real world situation might be more complex, but the same principle applies.

Running total over duplicate column values and no other columns

I want to do running total but there is no unique column or id column to be used in over clause.
CREATE TABLE piv2([name] varchar(5), [no] int);
INSERT INTO piv2
([name], [no])
VALUES
('a', 1),
('a', 2),
('a', 3),
('a', 4),
('b', 1),
('b', 2),
('b', 3);
there are only 2 columns, name which has duplicate values and the no on which I want to do running total in SQL Server 2017 .
expected result:
a 1
a 3
a 6
a 10
b 11
b 13
b 16
Any help?
The following query would generate the output you expect, at least for the exact sample data you did show us:
SELECT
name,
SUM(no) OVER (ORDER BY name, no) AS no_sum
FROM piv2;
If the order you intend to use for the rolling sum is something other than the order given by the name and no columns, then you should reveal that logic along with sample data.

Select TOP columns from table1, join table2 with their names

I have a TABLE1 with these two columns, storing departure and arrival identifiers from flights:
dep_id arr_id
1 2
6 2
6 2
6 2
6 2
3 2
3 2
3 2
3 4
3 4
3 6
3 6
and a TABLE2 with the respective IDs containing their ICAO codes:
id icao
1 LPPT
2 LPFR
3 LPMA
4 LPPR
5 LLGB
6 LEPA
7 LEMD
How can i select the top count of TABLE1 (most used departure id and most used arrival id) and group it with the respective ICAO code from TABLE2, so i can get from the provided example data:
most_arrivals most_departures
LPFR LPMA
It's simple to get ONE of them, but mixing two or more columns doesn't seem to work for me no matter what i try.
You can do it like this.
Create and populate tables.
CREATE TABLE dbo.Icao
(
id int NOT NULL PRIMARY KEY,
icao nchar(4) NOT NULL
);
CREATE TABLE dbo.Flight
(
dep_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id),
arr_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id)
);
INSERT INTO dbo.Icao (id, icao)
VALUES
(1, N'LPPT'),
(2, N'LPFR'),
(3, N'LPMA'),
(4, N'LPPR'),
(5, N'LLGB'),
(6, N'LEPA'),
(7, N'LEMD');
INSERT INTO dbo.Flight (dep_id, arr_id)
VALUES
(1, 2),
(6, 2),
(6, 2),
(6, 2),
(6, 2),
(3, 2),
(3, 2),
(3, 2),
(3, 4),
(3, 4),
(3, 6),
(3, 6);
Then do a SELECT using two subqueries.
SELECT
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.arr_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_arrivals',
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.dep_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_departures';
Click this button on the toolbar to include the actual execution plan, when you execute the query.
And this is the graphical execution plan for the query. Each icon represents an operation that will be performed by the SQL Server engine. The arrows represent data flows. The direction of flow is from right to left, so the result is the leftmost icon.
try this one:
select
(select name
from table2 where id = (
select top 1 arr_id
from table1
group by arr_id
order by count(*) desc)
) as most_arrivals,
(select name
from table2 where id = (
select top 1 dep_id
from table1
group by dep_id
order by count(*) desc)
) as most_departures