Selecting unique records from database - sql

Running this query,
select * from table;
Returns the following
|branch | number |
-------------------
| 1 | 123 |
| 1 | 001 |
| 2 | 123 |
| 3 | 123 |
| 4 | 123 |
| 1 | 123 |
| 1 | 789 |
| 2 | 123 |
| 3 | 123 |
| 4 | 009 |
I want to find values that are unique to ONLY branch 1
| 1 | 001 |
| 1 | 789 |
Can this be done without the data being stored in separate tables? I've tried a few "select distinct" queries & don't seem to get the results I'm expecting.

SELECT branch, number
FROM table
WHERE branch = 1
GROUP BY branch, number

If you do not need any aggregates, you can use distinct instead of group by:
select distinct branch
, number
from YourTable
where branch = 1

I guess what I'm trying to say is that I want to find all numbers that are unique to ONLY branch 1. If they are found in any other branch, I don't want to see them.
I guess this is what you want.
SELECT distinct number
FROM MyTable
WHERE branch=1 and number not in
( SELECT distinct number
FROM MyTable
WHERE branch != 1 )

Try this:
SELECT branch, number
FROM table
GROUP BY branch, number
Here is a SQLFiddle for you to have a look at
If you want to limit it to only branch 1, then just add a where clause.
SELECT branch, number
FROM table
WHERE branch = 1
GROUP BY branch, number

To select all values that are unique in column number and have a branch value of 1 you can use the following code:
SELECT branch, number
FROM table1
WHERE number IN (
SELECT number
FROM table1
GROUP BY number
HAVING (COUNT(number ) = 1)
)
AND branch = 1
For a demo see http://sqlfiddle.com/#!2/97145/62

Related

Count redundant for each value in a column in SQL server

If I have the following table in a SQL Server 2019 database as follows:
|id | name | count |
+-----+--------+--------+
| 1 | rose | 1 |
| 2 | peter | 1 |
| 3 | ann | 1 |
| 4 | rose | 2 |
| 5 | ann | 2 |
| 6 | ann | 3 |
| 7 | mike | 1 |
I would like to find out if an inserted name already exists in the column "name" and how many times and right a count next to it as shown in "count" column. For example when ann was inserted the second time I put count value bext to it which is 2, and ann was inserted the third time I put 3 next to it.
How to do that using SQL?
Thank you
Here is one approach using the insert ... select syntax:
insert into mytable (name, cnt)
select v.name, (select count(*) + 1 from mytable t where t.name = v.name)
from (values (#name)) v(name)
The value to insert is given as #name in derived table values(). Then we use a subquery to count how many such names already exist in the table.
I would suggest that you calculate this information when you query the table, rather than when you insert rows:
select t.*,
row_number() over (partition by name order by id) as count
from t;
The value will always be correct -- even when the data is updated or rows are deleted.
Select Name, Count(*)
From Table
Group By Name

Find number of rows identical one some, but different on another column

Say I have the following table:
CREATE TABLE data (
PROJECT_ID VARCHAR,
TASK_ID VARCHAR,
REF_ID VARCHAR,
REF_VALUE VARCHAR
);
I want to identify rows where
PROJECT_ID, REF_ID, REF_VALUE are the same
but TASK_ID are different.
The desired output is a list of TASK_ID_1, TASK_ID_2 and COUNT(*) of such conflicts. So, for example,
DATA
+------------+---------+--------+-----------+
| PROJECT_ID | TASK_ID | REF_ID | REF_VALUE |
+------------+---------+--------+-----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
+------------+---------+--------+-----------+
OUTPUT
+-----------+-----------+----------+
| TASK_ID_1 | TASK_ID_2 | COUNT(*) |
+-----------+-----------+----------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
+-----------+-----------+----------+
would mean that there are two entries with TASK_ID == 1 and two entries with TASK_ID == 2 that share the same values for the other three columns. The inherent symmetry in the output is fine.
How would I go about finding this information? I've tried joining the table onto itself and grouping, but this turned up more results for a single task than the table had rows altogether, so it's clearly wrong.
The database used is PostgreSQL, though a solution that applies to most common SQL systems would be preferable.
You want a self join and aggregation:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
on d1.project_id = d2.project_id and
d1.ref_id = d2.ref_id and
d1.ref_value = d2.ref_value and
d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
Notes:
Add the condition d1.task_id < d2.task_id if you want each pair to occur only once in the result set.
This does not handle NULL values, although that is easy enough to handle. Use is not distinct from instead of =.
You can also simplify this a bit with the using clause:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
using (project_id, ref_id, ref_value)
where d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
You can get an idea of how many rows might be returned by using:
select d.project_id, d.ref_id, d.ref_value, count(distinct d.task_id), count(*)
from data d
group by d.project_id, d.ref_id, d.ref_value;
This is how I understand your question. This assume there are only two task for the same combination.
SQL DEMO
SELECT "PROJECT_ID", "REF_ID", "REF_VALUE",
MIN("TASK_ID") as TASK_ID_1,
MAX("TASK_ID") as TASK_ID_2,
COUNT(*) as cnt
FROM Table1
GROUP BY "PROJECT_ID", "REF_ID", "REF_VALUE"
HAVING MIN("TASK_ID") != MAX("TASK_ID")
-- COUNT(*) > 1 also should work
OUTPUT
I add more column to make clear what are the same elements:
| PROJECT_ID | REF_ID | REF_VALUE | task_id_1 | task_id_2 | cnt |
|------------|--------|-----------|-----------|-----------|-----|
| 1 | 1 | 2 | 1 | 2 | 2 |
| 1 | 1 | 1 | 1 | 2 | 2 |

Rows that have same value in a column, sum all values in another column and display 1 row

Example Table user:
ID | USER_ID | SCORE |
1 | 555 | 50 |
2 | 555 | 10 |
3 | 555 | 20 |
4 | 123 | 5 |
5 | 123 | 5 |
6 | 999 | 30 |
The result set should be like
ID | USER_ID | SCORE | COUNT |
1 | 555 | 80 | 3 |
2 | 123 | 10 | 2 |
3 | 999 | 30 | 1 |
Is it possible to generate a sql that can return the table above, so far I can only count the rows where certain user_id appear, but don't know how to sum and show for every user ?
You've included a column called "ID" in both the source data and desired results, but I'm going to assume that these ID values are not related and simply represent the row or line number - otherwise the question doesn't make sense.
In which case, you can simply use:
SELECT
USER_ID,
SUM(SCORE) AS SCORE,
COUNT(USER_ID) AS COUNT
FROM
<Table>
GROUP BY
USER_ID
If you really want to generate the ID column as well, then how you do this depends on the database platform being used. For example on Oracle you could use the ROWNUM pseudocolumn, on SQL Server you will need to use ROW_NUMBER() function (which also works for Oracle).
SELECT ID
,sum(SCORE)
,count(USER_ID)
FROM Table
GROUP BY
ID
I think COUNT is the number of scores per user_id, if so, then your sql request should be :
SELECT
ID,
USER_ID,
SUM(SCORE)AS SCORE,
COUNT(SCORE)AS COUNT
FROM
TABLE
GROUP BY
USER_ID

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1

An SQL query that combines aggregate and non-aggregate values in one row

The following query gives me the information that I need but I want it to take it just a step further. In the table at the bottom (only showing a subset of the fields), I want to group by cust_line in an unusual way (at least to me it's unusual).
Let's look at the items with a cust_line of 2 as an example. I would like these to be represented by one line not 5. For this line, I would like to select all the fields except for the price field where the cust_part = "GROUPINVC". For the total field I would like it to be 'sum(total) as new_total' and for the price, I would like it to be new_total / qty_invoiced, where qty_invoiced is the value on the line where cust_part = "GROUPINV".
Is what I am asking for completely ridiculous? Is it even possible? I'm not advanced at SQL so it may also be easy and I just don't know how to approach it. I thought of using 'partition by' but I couldn't imagine how I would get it to work as I figured it would still return 5 rows where I only want 1.
I've also looked at these questions with similar titles but not really what I am looking for:
SQL query that returns aggregate AND non aggregate results
Combined aggregated and non-aggregate query in SQL
SELECT L.CUST_LINE, I.LINE_NO, I.ORDER_NO, I.STAGE, I.ORDER_LINE_POS, I.CUST_PART,
I.LINE_ITEM_NO, I.QTY_INVOICED, I.CUST_DESC, I.DESCRIPTION, I.SALE_UNIT_PRICE, I.PRICE_TOTAL,
I.INVOICE_NO, I.CUSTOMER_PO_NO, I.ORDER_NO, I.CUSTOMER_NO, I.CATALOG_DESC, I.ORDER_LINE_NOTES
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
ORDER BY L.CUST_LINE;
| cust_line | line_no | cust_part | qty_invoiced | cust_desc | price | total |
| 1 | 4 | ... | 1 | ... | 55 | 55 |
| 2 | 1 | GROUPINV | 1 | some part | 0 | 0 |
| 2 | 6 | ... | 3 | ... | 0 | 0 |
| 2 | 2 | ... | 1 | ... | 0 | 0 |
| 2 | 3 | ... | 1 | ... | 0 | 0 |
| 2 | 7 | ... | 2 | ... | 10 | 20 |
| 3 | 7 | ... | 1 | ... | 67 | 67 |
You can use an analytic function to calculate a total over multiple rows of a result set, then filter out the rows you don't want.
Leaving out all the extra columns for sake of brevity:
SELECT cust_line, qty_invoiced, order_total/qty_invoiced AS price
FROM (
SELECT l.cust_line, qty_invoiced,
SUM(total) OVER (PARTITION BY l.cust_line) AS order_total,
COUNT(cust_line) OVER (PARTITION BY l.cust_line) AS group_count
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
)
WHERE ( cust_part = 'GROUPINV' OR group_count = 1 )
ORDER BY cust_line
I am guessing on what you want in the PARTITION BY clause; this is essentially a GROUP BY that applies only to the SUM function. Not sure if you might also want order_no in the partition.
The trick is to select all the rows in the inner query, applying SUM across them all; then filter out the rows you are not interested in in the outermost query.