Count only a specific subset of elements in a Postgres DB - sql

I have a table with some identifiers that repeat themselves like
id
-------
djkfgh
kdfjhw
efkujh
dfsggs
djkfgh
djkfgh
efkujh
I also have a list of id's of interest, say ["djkfgh","dfsggs"]. I would like to count only those values that appear in the list, rather than all the distinct values of the column.

Select count(id) from table where id IN(subset);

Related

How to group by one column and limit to rows where another column has the same value for all rows in group?

I have a table like this
CREATE TABLE userinteractions
(
userid bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);
My problem is that some of the data is polluted with multiple dobyr values for the same user.
The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.
I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions.
For example I start with data like this:
userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989
And I want to select from this to get a table like this:
userid,dobyr
1,1995
2,1999
4,1989
Is there an elegant, efficient way to get this in a single sql query?
I am using postgres.
EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn't matter which one), rest gets discarded.
All rows for a given userid are discarded if it occurs with different dobyr values.
create table userinteractions_clean as
select distinct on (userid,dobyr) *
from userinteractions
where userid in (
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1 )
order by userid,dobyr;
This could also be done with an not in, not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by.
Updated demo with tests and more rows.
If you don't need the other columns in the table, only something you'll later use as a filter/whitelist, plain userid's from records with (userid,dobyr) pairs matching your criteria are enough, as they already uniquely identify those records:
create table userinteractions_whitelist as
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1
Just use a HAVING clause to assert that all rows in a group must have the same dobyr.
SELECT
userid,
MAX(dobyr) AS dobyr
FROM
userinteractions
GROUP BY
userid
HAVING
COUNT(DISTINCT dobyr) = 1

In Snowflake, I want to count duplicates in a table based on all the columns in the table without typing out every column name

I have a table with 60 columns in it. I would like to identify how many duplicates there are in the table based on all the columns being identical.
I don't want to have to type out every field name in the SELECT or GROUP BY clauses. Is there a way to do that?
You can use an approach like this for each table:
SELECT
MD5(OBJECT_CONSTRUCT(SRC.*)::VARCHAR) DUP_MD5, SUM(1) AS TOTAL_COUNT
FROM <table> SRC
GROUP BY 1
HAVING SUM(1) > 1;

Return one result for each search value

How can I return a unique value based on a set of non unique values being searched?
For example:
If I wanted to return one phone number for a list of 4 people who can have more than one phone number - but I can only use one phone number for each person. It doesn't matter which phone number I use to reach them because any number that belongs to them will get me to them.
I don't think something like this exists - but if I could use something like the DISTINCT modifier except it would be called FIRST - it would solve my problem:
SELECT FIRST ID
FROM Sample_Table
WHERE ID in ("Bob", "Sam", "Kyle", "Jordan")
In picture - from this
I'd like that (or any) query to return
.
I'm using this type of query in a db where for 200 "ID"s there are millions of "Unique Values", so it is hard to get crafty.
EDIT The Unique value in my db has numbers and letters in each value
There is no such thing as a "first id" in a SQL table. Tables represent unordered sets. You can accomplish what you want (if I understand correctly) using aggregation:
SELECT ID, MIN(UniqueValue)
FROM Sample_Table
WHERE ID in ('Bob', 'Sam', 'Kyle', 'Jordan')
GROUP BY ID;
using group by and max method can help you:
select id
,uniquvalue
,max (typeofvalue)
from Sample_Table
group by
id
,uniquvalue

find unique rows using SQL?

I want to return all the rows from a table which are unique. I.e. if a certain field in two rows contain the same name, that name shouldn't be shown.
Since you want only the uniques names (and not an unique row for every names like you could have with DISTINCT), you have to use a GROUP BY and a HAVING (instead of a WHERE, because your parameter is the result of a function, not a variable) :
SELECT name FROM myTable GROUP BY name HAVING COUNT(name) = 1
SELECT DISTINCT column_name FROM table
If you want the complete rows, then use row_number() or distinct on:
select distinct on (name) t.*
from table t
order by name;

SQL Server Sum multiple rows into one - no temp table

I would like to see a most concise way to do what is outlined in this SO question: Sum values from multiple rows into one row
that is, combine multiple rows while summing a column.
But how to then delete the duplicates. In other words I have data like this:
Person Value
--------------
1 10
1 20
2 15
And I want to sum the values for any duplicates (on the Person col) into a single row and get rid of the other duplicates on the Person value. So my output would be:
Person Value
-------------
1 30
2 15
And I would like to do this without using a temp table. I think that I'll need to use OVER PARTITION BY but just not sure. Just trying to challenge myself in not doing it the temp table way. Working with SQL Server 2008 R2
Simply put, give me a concise stmt getting from my input to my output in the same table. So if my table name is People if I do a select * from People on it before the operation that I am asking in this question I get the first set above and then when I do a select * from People after the operation, I get the second set of data above.
Not sure why not using Temp table but here's one way to avoid it (tho imho this is an overkill):
UPDATE MyTable SET VALUE = (SELECT SUM(Value) FROM MyTable MT WHERE MT.Person = MyTable.Person);
WITH DUP_TABLE AS
(SELECT ROW_NUMBER()
OVER (PARTITION BY Person ORDER BY Person) As ROW_NO
FROM MyTable)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
First query updates every duplicate person to the summary value. Second query removes duplicate persons.
Demo: http://sqlfiddle.com/#!3/db7aa/11
All you're asking for is a simple SUM() aggregate function and a GROUP BY
SELECT Person, SUM(Value)
FROM myTable
GROUP BY Person
The SUM() by itself would sum up the values in a column, but when you add a secondary column and GROUP BY it, SQL will show distinct values from the secondary column and perform the aggregate function by those distinct categories.