How to remove duplicate rows in postgresql? [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to remove duplicate entries in Postgresql.
There is no unique constraint, but I would like to consider all columns together to consider a row as a duplicate.
So we have a table containing following rows :
id | name | age | started_date |Score |
-----|----------|---------------|---------------|------|
1 | tom | 15 | 01/06/2022 |5 |
2 | tom | 15 | 01/06/2022 |5 |
3 | henry | 10 | 01/06/2022 |4 |
4 | john | 11 | 01/06/2022 |6 |
...
I would like to consider all columns together to identify the duplicate rows.
How to achieve this in Postgresql ?

PostgreSQL assigns a ctid pseudo-column to identify the physical location of each row. You could use that to identify different rows with the same values:
-- Create the table
CREATE TABLE my_table (num1 NUMERIC, num2 NUMERIC);
-- Create duplicate data
INSERT INTO my_table VALUES (1, 2);
INSERT INTO my_table VALUES (1, 2);
-- Remove duplicates
DELETE FROM my_table
WHERE ctid IN (SELECT ctid
FROM (SELECT ctid,
ROW_NUMBER() OVER (
PARTITION BY num1, num2) AS rn
FROM my_table) t
WHERE rn > 1);
DB Fiddle

Let say your table has 2 columns, you can identify duplicates using.
Post this :-
1) Insert this result into a temp table
2) Drop data from Main table
3) Insert data from temp table into main table
4) Drop temp table.
select col1, col2, count(*) as cnt
from table1
group by col1, col2
having cnt > 1

Related

How to loop through selected data in SQL Server? [duplicate]

This question already has answers here:
Insert into ... values ( SELECT ... FROM ... )
(27 answers)
Closed 4 years ago.
I want to select multiple rows of data from a table and add it onto another one.
For example:
Select * from Table1
which would return
id | name
1 | Chad
2 | Mary
3 | Denise
I want to add these rows of data to Table 2
Insert(id, name)
values(#id, #name)
Thank you!
INSERT INTO Table2(id,name)
SELECT t.id,t.name
FROM Table1 t
;

sql query get Data on pre conditions [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
i need fruit list which price greater than in tableA for each fruit.
ID | fruit | Price
----------------------------
1 | apple | 10
2 | banana| 7
3 | grapes| 6
then i have daily table like below
ID | fruit | Price
----------------------------
1 | apple | 9
2 | banana| 5
3 | grapes| 9
4 | mango | 15
in this condition i get only grapes
I think you can just join the daily and tableA tables on the fruit's ID, and then compare prices.
SELECT t1.*
FROM daily t1
INNER JOIN tableA t2
ON t1.ID = t2.ID
WHERE t1.price > t2.price
Note that we join on the ID rather than the fruit name, since in theory names may not be completely unique across a very large table of fruits.
just join by ID and add your additional condition (price in tableA is greater than price in dailyTable).
you don't need to join by column fruit - but if so, it won't change your resultset.
SELECT TableA.*, dailyTable.Price
FROM TableA
INNER JOIN dailyTable
ON TableA.ID = dailyTable.ID
AND TableA.Price > dailyTable.Price
the column fruit is redundant data. so you shouldn't store it in the daily table.

Get row values as new columns [duplicate]

This question already has answers here:
Create a pivot table with PostgreSQL
(3 answers)
Closed 8 years ago.
I was wondering if it would be possible to get all values of rows with the same ID and present them as new columns, via a query.
For example, if I have the following table:
ID | VALUE
1 | a
1 | b
1 | c
2 | a
2 | b
[...]
I want to present it as:
ID | VALUE1 | VALUE2 | VALUE3 [...]
1 | a | b | c
2 | a | b | -
Thank you for any help
A query wouldn't do it. Unless you do 3 seperate querys.
SELECT ID,VALUE1 FROM Table
SELECT ID,VALUE2 FROM Table
ect...
If you have a problem with your database values not being recursive, then i would set up your table differently.
ID | VALUE
1 | a
1 | b
1 | c
2 | a
2 | b
[...]
You should set up the Table atributes like that rather than your first table.
if you are going to set up your tables differently I would do insert Statements.
INSERT INTO newTable (ID, VALUE)
SELECT ID,VALUE1 FROM oldTable
INSERT INTO newTable (ID, VALUE)
SELECT ID,VALUE2 FROM oldTable
ect..
Another possible way to do it is to display it in your application. Take php for instance.
foreach($sqlArray as $var){
echo $var['id'] ' | ' $var['value1']
echo $var['id'] ' | ' $var['value2']
echo $var['id'] ' | ' $var['value3']
}

Add counter field in select result in Access Query [duplicate]

This question already has an answer here:
Access query producing results like ROW_NUMBER() in T-SQL
(1 answer)
Closed 8 years ago.
I have a table with some field.
name, stud_id
ali | 100
has | 230
mah | 300
I want to get some of record and show a row field beside of record.
1 | ali | 100
2 | has | 230
3 | mah | 300
How I can do it.
Thanks.
Select (select count(*) from Table1 A where A.stud_id>=B.stud_id) as RowNo, B.*
from Table1 as B
order by A.stud_id
MS-ACCESS does not have rownum function, so this might help you.
But your ID need to be sortable and unique.

In SQL, what's the difference between count(column) and count(*)?

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)