SQL selecting where all distinct values exist in another column - sql

I have a table in which the first two rows are Company, Year. Each company will have some years, but not necessarily all of them:
ABC | 2010
ABC | 2011
ABC | 2012
BBC | 2011 //does not have all the years, don't want to select it
I'd like to select a list of companies which have ALL the years (not just some of them), but I'm having trouble writing a select query to do that. I imagine this is really easy but I can't figure it out for some reason.

try
select company
from your_table
group by company
having count(distinct year) = (select count(distinct year) from your_table)

Select * FROM Company Where CompanyId In(
select CompanyId From Company
group by CompanyId
having count(*) = (select count(distinct Year) from Company)
)
http://www.sqlfiddle.com/#!3/c2f81/11
Note that if you alread know how many years there should be, then obviously you would just say that number instead of doing a select distinct year.

SELECT Company
FROM Table
GROUP BY Company
HAVING COUNT(Distinct YEAR) = 3

Related

Checking conditions per group, and ranking most recent row?

I'm handling a table like so:
Name
Status
Date
Alfred
1
Jan 1 2023
Alfred
2
Jan 2 2023
Alfred
3
Jan 2 2023
Alfred
4
Jan 3 2023
Bob
1
Jan 1 2023
Bob
3
Jan 2 2023
Carl
1
Jan 5 2023
Dan
1
Jan 8 2023
Dan
2
Jan 9 2023
I'm trying to setup a query so I can handle the following:
I'd like to pull the most recent status per Name,
SELECT MAX(Date), Status, Name
FROM test_table
GROUP BY Status, Name
Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2, regardless of if the most recent one is 2 or not
WITH has_2_table AS (
SELECT DISTINCT Name, TRUE as has_2
FROM test_table
WHERE Status = 2 )
And then maybe joining the above on a left join on Name?
But having these as two seperate queries and joining them feels clunky to me, especially since I'd like to add additional columns and other checks. Is there a better way to set this up in one singular query, or is this the most effecient way?
You said, "I'd like to add additional columns" so I interpret that to mean you would like to Select the entire most recent record and add an 'ever-2' column.
You can either do this by joining two queries, or use window functions. Not knowing Snowflake Cloud Data, I cannot tell you which is more efficient.
Join 2 Queries
Select A.*,Coalesce(B.Ever2,"No") as Ever2
From (
Select * From testable x
Where date=(Select max(date) From test_table y
Where x.name=y.name)
) A Left Outer Join (
Select name,"Yes" as Ever2 From test_table
Where status=2
Group By name
) B On A.name=B.name
The first subquery can also be written as an Inner Join if correlated subqueries are implemented badly on your platform.
use of Window Functions
Select * From (
Select row_number() Over (Partition by name, order by date desc, status desc) as bestrow,
A.*,
Coalesce(max(Case When status=2 Then "Yes" End) Over (Partition By name Rows Unbounded Preceding And Unbounded Following), "No") as Ever2
From test_table A
)
Where bestrow=1
This second query type always reads and sorts the entire test_table so it might not be the most efficient.
Given that you have a different partitioning on the two aggregations, you could try going with window functions instead:
SELECT DISTINCT Name,
MAX(Date) OVER(
PARTITION BY Name, Status
) AS lastdate,
MAX(CASE WHEN Status = 2 THEN 1 ELSE 0 END) OVER(
PARTITION BY Name
) AS status2
FROM tab
I'd like to pull the most recent status per name […] Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2.
Snowflake has sophisticated aggregate functions.
Using group by, we can get the latest status with arrays and check for a given status with boolean aggregation:
select name, max(date) max_date,
get(array_agg(status) within group (order by date desc), 0) last_status,
boolor_agg(status = 2) has_status2
from mytable
group by name
We could also use window functions and qualify:
select name, date as max_date,
status as last_status,
boolor_agg(status = 2) over(partition by name) has_status2
from mytable
qualify rank() over(order by name order by date desc) = 1

Show duplicate rows(all columns of that row) where all columns are duplicate except one column

In below table, I need to select duplicate records where all columns are duplicate except Customer Type and Price for a particular week.
For e.g
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
2 John Motor Consumer 200
3 John Motor Consumer 200
3 John Motor Reseller 201
I am using below query but this query doesn't show me both costumer type, it just shows me consumer count(*) for a combination.
select Week, Customer, product, count(distinct Customer Type)
from table
group by Week, Customer, product
having count(distinct Customer Type) > 1
I would like to see below result, that shows me duplicate values and not just the count(*) of duplicate row. I am trying to see customers assigned to multiple customer types in a particular week for a product and at the same time show me all columns. It doesn't matter if the price is different.
Week Customer Product Customer Type Price
1 Alex Cycle Consumer 100
1 Alex Cycle Reseller 101
3 John Motor Consumer 200
3 John Motor Reseller 201
Thanks
Shaki
WITH CustomerDistribution_CTE (WeekC ,CustomerC, ProductC)
AS
(
select Week, Customer, product
from Your_Table_Name group by Week, Customer,
product having count(distinct CustomerType) > 1
)
SELECT Y.*
FROM CustomerDistribution_CTE C
inner join Your_Table_Name Y on C.WeekC =Y.Week
and C.CustomerC =Y.Customer and C.productC =Y.product
Note :Please replace "Your_Table_Name" with exact table name and Try.
One way to achieve this, using generic SQL, is to use a "derived table" like this:
select x.*
from tablex x
inner join (
select Week, Customer, Product
from tablex
group by Week, Customer, Product
having count(*) > 1
) d on x.Week = d.Week and x.Customer = d.Customer and x.Product = d.Product
You can do that by using DISTINCT like
select DISTINCT Customer,Product,Customer_Type,Price from Your_Table_Name
will look for DISTINCT combination.
Note: This query if of SQL Server
From the expected result that you have pasted, it looks like you are not concerned about the week.
If you have a ID (incremental PK), it would be much simpler like below
select * from table where ID not in
(select max(ID) from table group by Customer, Product, CustomerType having count(*) > 1 )
This is tested on MySQL. Do you have a ID column?
In case you don't have a ID column, try the below:
select max(week) week, Customer, Product, CustomerType, max(price) from device group by Customer, Product, CustomerType;
I have not verified this one.
This will return your expected result set:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify
count(*)
over (partition by Week, Customer, product) > 1
For other DBMSes you will need to nest your query:
select *
from
(
select ...,
count(*)
over (partition by Week, Customer, product) as cnt
from table
) as dt
where cnt > 1
Edit:
After re-reading your description above Select might be not exactly what you want, because it will also return rows with a single type. Then switch to:
select *
from table
-- Teradata syntax to filter the result of an OLAP-function
-- (similar to HAVING after GROUP BY)
qualify -- at least two different types:
min(Customer_Type) over (partition by Week, Customer, product)
<> max(Customer_Type) over (partition by Week, Customer, product)

How can I create a row for values that don't exist and fill the count with 0 values?

In SQL Server, I'm running a query on users age groups on data where, for some years, there are zero users per age group. For example there were users in 2013 in the "18-21" age group, so the query returns the next age group, "22-25", as the first row because there were no entries containing "18-21." Instead, I would like to return a row that contains 18-21, but has 0 as the value for number of users.
Currently, I have:
SELECT YEAR, AGE_GROUP, SUM(USERS) as usercount,
FROM USERS
WHERE YEAR = '2013'
AND PRIMARY_GROUP = 'NT'
GROUP BY YEAR, AGE_GROUP
This returns:
YEAR AGE_GROUP usercount
2014 22-25 200
2014 25-28 10
I want it to return:
YEAR AGE_GROUP usercount
2014 18-21 0
2014 22-25 200
2014 25-28 10
How can I create a row for specific values that don't exist and fill the count with 0 values?
For the record, I DO in fact have a column called 'users' in the users table. Confusing, I know, but it's a stupidly named schema that I took over. The Users table contains data ABOUT my users for reporting. It should probably have been named something like Users_Reporting.
I assume you have another table that contain all rows Age Group.
TABLE NAME: AGEGROUPS
AGE_GROUP
18-21
22-25
25-28
Try this:
SELECT '2014' AS YEAR, AG.AGE_GROUP, COALESCE(TB.usercount, 0) AS usercount
FROM (
SELECT YEAR, AGE_GROUP, SUM(USERS) as usercount,
FROM USERS
WHERE YEAR = '2014'
AND PRIMARY_GROUP = 'NT'
GROUP BY YEAR, AGE_GROUP
) AS TB
RIGHT JOIN AGEGROUPS AG ON TB.AGE_GROUP=AG.AGE_GROUP
You do this by joining with a "fake" list of age group+years:
SELECT AGE_GROUPS.YEAR, AGE_GROUPS.AGE_GROUP, COALESCE(SUM(USERS), 0) as usercount
FROM (
SELECT YEAR, AGE_GROUP
FROM (
SELECT '18-21' AS AGE_GROUP
UNION SELECT '22-25'
UNION SELECT '25-28'
) AGE_GROUPS, (SELECT DISTINCT YEAR FROM USERS) YEARS
) AGE_GROUPS
LEFT JOIN USERS ON (USERS.AGE_GROUP = AGE_GROUPS.AGE_GROUP AND USERS.YEAR = AGE_GROUPS.YEAR)
WHERE AGE_GROUPS.YEAR = '2014'
GROUP BY AGE_GROUPS.YEAR, AGE_GROUPS.AGE_GROUP
You can also simplify this, assuming that your USERS table has all possible age groups ignoring a specific year:
SELECT AGE_GROUPS.YEAR, AGE_GROUPS.AGE_GROUP, COALESCE(SUM(USERS), 0) as usercount
FROM (
SELECT YEAR, AGE_GROUP
FROM (SELECT DISTINCT AGE_GROUP FROM USERS) AGE_GROUPS, (SELECT DISTINCT YEAR FROM USERS) YEARS
) AGE_GROUPS
LEFT JOIN USERS ON (USERS.AGE_GROUP = AGE_GROUPS.AGE_GROUP AND USERS.YEAR = AGE_GROUPS.YEAR)
WHERE AGE_GROUPS.YEAR = '2014'
GROUP BY AGE_GROUPS.YEAR, AGE_GROUPS.AGE_GROUP
Assuming that all the years and age groups are in the table (some row for the table), then you can do this just with this table. The idea is to generate all the rows using a cross join and then use a left join to bring in the values you want:
select y.year, ag.age_group, count(u.year) as usercount
from (select 2013 as year) y cross join
(select distinct age_group from users) ag left join
users u
on u.year = y.year and u.age_group = ag.age_group and
u.primary_group = 'NT'
group by y.year, ag.age_group;
I don't know what sum(users) is supposed to be. If you do indeed have a column users.users, then use it with sum(). It looks like you really want count().
Try This...
SELECT YEAR, AGE_GROUP, isnull(Sum(USERS),0) as usercount,
FROM USERS
WHERE YEAR = '2013'
AND PRIMARY_GROUP = 'NT'
GROUP BY YEAR, AGE_GROUP

Find duplicate records in a table using SQL Server

I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
I want your help to find duplicate records in a 50 column table on SQL Server.
Suppose my data is:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Jack 5 A Ipod
Suppose I use the below query:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
will return me
Sam 2
John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
1 Sam 10 A Iphone
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
you can remove duplicates by replacing select statement by
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
Just add all fields to the query and remember to add them to Group By as well.
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
To get the list of multiple records use following command
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Try this instead
SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
Read about the CHECKSUM function first, as there can be duplicates.
Try this
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername
First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.
Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*) and Group by clause. The query would look like this
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by will not miss any column, otherwise yes because you can only select columns that participate in the 'group by' clause.
Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.
Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Jack 5 A Ipod 2016-06-06
Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
The above query will give result together with Shipping Date, for example:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.
Now summarize it little bit, using count(*) and Group by clause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.
While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.
If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE one.
Hope this helps!
You can use below methods to find the output
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
Select *
from dbo.sales
group by shoppername
having(count(Item) > 1)
Select EventID,count() as cnt
from dbo.EventInstances
group by EventID
having count() > 1
The following is running code:
SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )

Split column in 3

Note: Tried a couple of the answers below (its in Teradata, so some of the answers are giving me syntax errors everywhere)
I hit a brick wall here.
I want to compare year by year in different columns
ID, Year, Revenue
1, 2009, 10
1, 2009, 20
1, 2010, 20
2, 2009, 5
2, 2010, 50
2, 2010, 1
How do I separate it by both ID and Year?
At the end I would like it to look like this
ID, Year, Sum
1, 2009, 30
1, 2009, 20
...
2, 2010, 51
(heavily edited for comprehension)
The best I can give you with the amount of detail you have provided is to break your table into subqueries:
select t1.yr - t2.yr from
(select yr
from the_table where yr = 2010) t1,
(select yr
from the_table where yr = 2010) t2
More detail could be given if we knew which type of database you are using, what the real structure of your table is, etc. but perhaps this will get you started.
something like this:
select id, t2009.year, t.2010.year, t2010.year-t.2009.year diff
from
( select id, year
from mytable
where year = 2009
) t2009
,
( select id, year
from mytable
where year = 2010
) t2010
You will most likely have to do a self-join
SELECT [what you are comparing] FROM [table] t1
[INNER/LEFT] JOIN [table] t2 ON t1.[someID] = t2.[someID]
WHERE t1.year = 2009 AND t2.year = 2010
In the someID would not necessarily have to be an ID, or even an indexed column, but it should be the column you are looking to compare across the years.
E.g. a table called 'Products' with columns/fields
ID
ProductName
Price
Year
You could do:
SELECT t1.ProductName, (t2.Price - t1.Price) As Price_change FROM Products t1
INNER JOIN Products t2 ON t1.ProductName = t2.ProductName
WHERE t1.year = 2009 AND t2.year = 2010
This would be faster is ProductName was a primary key or an indexed column. This would also be faster than using nested selects which are much much slower than joins (when joining on an index).
By your data and your desired output, I think you simply want this:
select ID, Year, SUM(Revenue)
from YourTable
GROUP BY ID, Year
Update
Now, if your first data sample is already a SELECT query, you need to:
select ID, Year, SUM(Revenue)
from (SELECT...) YourSelect
GROUP BY ID, Year
This looks like a good candidate for the ROLLUP command. It will give you automatic sums for the grouped-by columns:
GROUP BY ROLLUP (ID,Year)
More info here.