displaying distinct column value with corresponding column random 3 values - sql

I have a table employee with columns (state_cd,emp_id,emp_name,ai_cd) how can i display disticnt state_cd with 3 different values from ai_cd
the answer should be
state_cd ai_cd
------- --------
TX 1
2
5
CA 9
10
11

This type of operation is normally better done in the application. But, you can do it in the query, if you really want to:
select (case when row_number() over (partition by state_cd order by ai_cd) = 1
then state_cd
end) as state_cd,
ai_cd
from employee e
order by e.state_cd, e.ai_cd;
The order by is very important, because SQL result sets are unordered. Your result requires ordering in order to make sense.

Just group by state_id and then count using count(Distinct column_name_)
select state_id from (select state_id,COUNT(DISTINCT ai_cd) as cnt from employee group by state_id) where cnt==3

Related

Use window functions to select the value from a column based on the sum of another column, in an aggregate query

Consider this data (View on DB Fiddle):
id
dept
value
1
A
5
1
A
5
1
B
7
1
C
5
2
A
5
2
A
5
2
B
15
2
A
2
The base query I am running is pretty simple. Just get the total value by id and the most frequent dept.
SELECT
id,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) AS value
FROM test
GROUP BY id
;
id
dept_freq
value
1
A
22
2
A
27
But I also need to get, for each id, the dept that concentrates the greatest value (so the greatest sum of value by id and dept, not the highest individual value in the original table).
Is there any way to use window functions to achieve that and do it directly in the base query above?
The expected output for this particular example would be:
id
dept_freq
dept_value
value
1
A
A
22
2
A
B
27
I could achieve that with the query below and then joining that with the results of the base query above
SELECT * FROM(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY value DESC) as row
FROM (
SELECT id, dept, SUM(value) AS value
FROM test
GROUP BY id, dept
) AS alias1
) AS alias2
WHERE alias2.row = 1
;
id
dept
value
row
1
A
10
1
2
B
15
1
But it is not easy to read/maintain and seems also pretty inefficient. So I thought it should be possible to achieve this using window functions directly in the base query, and that also may also help Postgres to come up with a better query plan that does less passes over the data. But none of my attempts using over partition and filter worked.
step-by-step demo:db<>fiddle
You can fetch the dept for the highest values using the first_value() partition function. Adding this before your mode() grouping should do it:
SELECT
id,
highest_value_dept,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) as value
FROM (
SELECT
id,
dept,
value,
FIRST_VALUE(dept) OVER (PARTITION BY id ORDER BY value DESC) as highest_value_dept
FROM test
) s
GROUP BY 1,2

SQL arrange row in column

I have following SQL query:
SELECT
PatientMRN,
OrderDate,
PatientMRN,
OrdereName,
FROM OrderItem
For some MRN patient have many orders and I would like to have order related to each patient in a column rather than have duplicate row of same MRN with different order.
So instead of having:
MRN OrdereName
012 Name one
012 Name two
012 Name three
012 Name four
I want to have it as follow:
MRN OrdereName1 OrderName2 OrderName3
012 Name one Name two Name three
unfortunately it is a confidential DB so I can attach it here.
If you know in advance the maximum number of orders per patient, you can use window functions and conditional aggregation:
select patientmrn,
max(case when rn = 1 then orderename end) as orderename1,
max(case when rn = 2 then orderename end) as orderename2,
max(case when rn = 3 then orderename end) as orderename3
from (
select t.*, row_number() over(partition by patientmrn order by orderename) rn
from mytable t
) t
group by patientmrn
You can expand the select clause with more conditional max() as needed (additional columns show null values when there is no more orders to display).
You control how orders are spread in the columns using the order by clause of the row_number(). Here, orders are orderes by ascending name. If, for example, you wanted to order them by date, you would change the order by clause to order by orderdate.

Merge and aggregate two columns SQL

I have a table with id, name and score and I am trying to extract the top scoring users. Each user may have multiple entries, and so I wish to SUM the score, grouped by user.
I have looked into JOIN operations, but they seem to be used when there are two separate tables, not with two 'views' of a single table.
The issue is that if the id field is present, the user will not have a name, and vice-versa.
A minimal example can be found at the following link: http://sqlfiddle.com/#!9/ce0629/11
Essentially, I have the following data:
id name score
--- ----- ------
1 '' 15
4 '' 20
NULL 'paul' 8
NULL 'paul' 11
1 '' 13
4 '' 17
NULL 'simon' 9
NULL 'simon' 12
What I want to end up with is:
id/name score
-------- ------
4 37
1 28
'simon' 21
'paul' 19
I can group by id easily, but it treats the NULLs as a single field, when really they are two separate users.
SELECT id, SUM(score) AS total FROM posts GROUP BY id ORDER by total DESC;
id score
--- ------
NULL 40
4 37
1 28
Thanks in advance.
UPDATE
The target environment for this query is in Hive. Below is the query and output looking only at the id field:
hive> SELECT SUM(score) as total, id FROM posts WHERE id is not NULL GROUP BY id ORDER BY total DESC LIMIT 10;
...
OK
29735 87234
20619 9951
20030 4883
19314 6068
17386 89904
13633 51816
13563 49153
13386 95592
12624 63051
12530 39677
Running the query below gives the exact same output:
hive> select coalesce(id, name) as idname, sum(score) as total from posts group by coalesce(id, name) order by total desc limit 10;
Running the following query using the new calculated column name idname gives an error:
hive> select coalesce(id, name) as idname, sum(score) as total from posts group by idname order by total desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:83 Invalid table alias or column reference 'idname': (possible column names are: score, id, name)
Your id looks numeric. In some databases, using coalesce() on a numeric and a string can be a problem. In any case, I would suggesting being explicit about the types:
select coalesce(cast(id as varchar(255)), name) as id_name,
sum(score) as total
from posts
group by id_name
order by total desc;
SELECT new_id, SUM(score) FROM
(SELECT coalesce(id,name) new_id, score FROM posts)o
GROUP BY new_id ORDER by total DESC;
You could use a COALESCE to get the non-NULL value of either column:
SELECT
COALESCE(id, name) AS id
, SUM(score) AS total
FROM
posts
GROUP BY
COALESCE(id, name)
ORDER by total DESC;

SQL Separating Distinct Values using single column

Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.

Find duplicate records in a table using SQL Server

I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
I want your help to find duplicate records in a 50 column table on SQL Server.
Suppose my data is:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Jack 5 A Ipod
Suppose I use the below query:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
will return me
Sam 2
John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
1 Sam 10 A Iphone
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
you can remove duplicates by replacing select statement by
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
Just add all fields to the query and remember to add them to Group By as well.
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
To get the list of multiple records use following command
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Try this instead
SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
Read about the CHECKSUM function first, as there can be duplicates.
Try this
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername
First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.
Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*) and Group by clause. The query would look like this
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by will not miss any column, otherwise yes because you can only select columns that participate in the 'group by' clause.
Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.
Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Jack 5 A Ipod 2016-06-06
Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
The above query will give result together with Shipping Date, for example:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.
Now summarize it little bit, using count(*) and Group by clause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.
While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.
If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE one.
Hope this helps!
You can use below methods to find the output
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
Select *
from dbo.sales
group by shoppername
having(count(Item) > 1)
Select EventID,count() as cnt
from dbo.EventInstances
group by EventID
having count() > 1
The following is running code:
SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )