Access query finding the duplicates without using DISTINCT - sql

i have this query
SELECT PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value])) AS CommissionAbsent
FROM PersonalInfo;
and the PersonalInfo.k-commission is a multi value field. the CommissionAbsent shows duplicate values for each k-commission value. when i use DISTINCT i get an error saying that the keyword cannot be used with a multi value field.
now i want to remove the duplicates and show only one result for each. i tried using a WHERE but i dont know how.
edit: i have a lot more columnes and in the example i only showed the few i need.

You can use GROUP BY and COUNT to solve your problem, here is an example for it
SELECT clmn1, clmn2, COUNT(*) as count
FROM table
GROUP BY clmn1, clmn2
HAVING COUNT(*) > 1;
the query groups the rows in the table by the clmn1 and clmn2 columns, and counts the number of occurrences of each group. The HAVING clause is then used to filter the groups and only return the groups that have a count greater than 1, which indicates duplicates.
If you want to select all, then you can do like this
SELECT *
FROM table
WHERE (clmn1, clmn2) IN (SELECT clmn1, clmn2
FROM table
GROUP BY clmn1, clmn2
HAVING COUNT(*) > 1)

SELECT PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value])) AS CommissionAbsent
FROM PersonalInfo
GROUP BY PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value]))
HAVING COUNT(*) > 1

Related

Print repeating values from a given column in a table as many times as they occur. SQL

Ok so we know we can check if a column has duplicate values by using using Group By along with Having keyword and an aggregate function like Count().
What if I want to do is to print those duplicate values, not once but as many times as they are present in the table?
So if a column contains: 1,2,2,2,3,4,5,5,6,7,8,8,8.
I want my query to return: 2,2,2,5,5,8,8,8.
Group by/distinct will only show 2, 5, 8. How can I get what I want?
Thank you.
You can compute a window count in a subquery, then use it to filter the resultset:
select val
from (select val, count(*) over(partition by val) cnt from mytable) t
where cnt > 1

How to find duplicate rows in Hive?

I want to find duplicate rows from one of the Hive table for which I was given two approaches.
First approach is to use following two queries:
select count(*) from mytable; // this will give total row count
second query is as below which will give count of distinct rows
select count(distinct primary_key1, primary_key2) from mytable;
With this approach, for one of my table total row count derived using first query is 3500 and second query gives row count 2700. So it tells us that 3500 - 2700 = 800 rows are duplicate. But this query doesn't tell which rows are duplicated.
My second approach to find duplicate is:
select primary_key1, primary_key2, count(*)
from mytable
group by primary_key1, primary_key2
having count(*) > 1;
Above query should list of rows which are duplicated and how many times particular row is duplicated. but this query shows zero rows which means there are no duplicate rows in that table.
So I would like to know:
If my first approach is correct - if yes then how do I find which rows are duplicated
Why second approach is not providing list of rows which are duplicated?
Is there any other way to find the duplicates?
Hive does not validate primary and foreign key constraints.
Since these constraints are not validated, an upstream system needs to
ensure data integrity before it is loaded into Hive.
That means that Hive allows duplicates in Primary Keys.
To solve your issue, you should do something like this:
select [every column], count(*)
from mytable
group by [every column]
having count(*) > 1;
This way you will get list of duplicated rows.
analytic window function row_number() is quite useful and can provide the duplicates based upon the elements specified in the partition by clause. A simply in-line view and exists clause will then pinpoint what corresponding sets of records contain these duplicates from the original table. In some databases (like TD, you can forgo the inline view using a QUALIFY pragma option)
SQL1 & SQL2 can be combined. SQL2: If you want to deal with NULLs and not simply dismiss, then a coalesce and concatenation might be better in the
SELECT count(1) , count(distinct coalesce(keypart1 ,'') + coalesce(keypart2 ,'') )
FROM srcTable s
3) Finds all records, not just the > 1 records. This provides all context data as well as the keys so it can be useful when analyzing why you have dups and not just the keys.
select * from srcTable s
where exists
( select 1 from (
SELECT
keypart1,
keypart2,
row_number() over( partition by keypart1, keypart2 ) seq
FROM srcTable t
WHERE
-- (whatever additional filtering you want)
) t
where seq > 1
AND t.keypart1 = s.keypart1
AND t.keypart2 = s.keypart2
)
Suppose your want get duplicate rows based on a particular column ID here. Below query will give you all the IDs which are duplicate in table in hive.
SELECT "ID"
FROM TABLE
GROUP BY "ID"
HAVING count(ID) > 1

What's the difference between select distinct count, and select count distinct?

I am aware of select count(distinct a), but I recently came across select distinct count(a).
I'm not very sure if that is even valid.
If it is a valid use, could you give me a sample code with a sample data, that would explain me the difference.
Hive doesn't allow the latter.
Any leads would be appreciated!
Query select count(distinct a) will give you number of unique values in a.
While query select distinct count(a) will give you list of unique counts of values in a. Without grouping it will be just one line with total count.
See following example
create table t(a int)
insert into t values (1),(2),(3),(3)
select count (distinct a) from t
select distinct count (a) from t
group by a
It will give you 3 for first query and values 1 and 2 for second query.
I cannot think of any useful situation where you would want to use:
select distinct count(a)
If the query has no group by, then the distinct is anomalous. The query only returns on row anyway. If there is a group by, then the aggregation columns should be in the select, to identify each row.
I mean, technically, with a group by, it would be answering the question: "how many different non-null values of a are in groups". Usually, it is much more useful to know the value per group.
If you want to count the number of distinct values of a, then use count(distinct a).

Query for showing whole records where a particular field is entered twice

I have a problem here in that I don't know how to execute this S Q L query as I am not sure of the correct syntax...
I am trying to select all records from a record set (populated by a table), where a particular field is entered twice...
please what query should i use to get all records which showing double records in a field.
Try doing something like this:
SELECT "YourFieldName(s)", COUNT(*) AS RecordCount
FROM "YourTableName"
GROUP BY "YourFieldName(s)"
HAVING COUNT(*) > 1
"YourFieldName(s)" can be multiple columns if that is what you are checking duplicates on.
Select columnname,count(*) as cnt
from dbo.table
group by columnname
having count(*) > 1
this gives you the rows which has duplicates in the column " columnname" . edit to your requirement.

SQL query selection help need

I have a table which has column called payment_txn_status
I have to write a query which shows distinct status code with their respective count.
My current query which is as below gives me only distinct status code but how to get count for each individual status code
select distinct payment_txn_status FROM tpayment_txn
Use a GROUP BY and a COUNT:
SELECT payment_txn_status, COUNT(*) AS num
FROM tpayment_txn
GROUP BY payment_txn_status