Group by and Select Distinct in SQL Server - sql

What are the differences between the following two queries?
SELECT distinct(Invalid_Emails), [leads_id]
FROM [dbo].[InvalidEmails_stg]
ORDER BY LEADS_ID DESC
vs
select invalid_emails, max(leads_id) as id
from invalidEmails_stg
group by invalid_emails
having count(*) < 2
order by id desc
The second one gave me fewer rows than the first.

You are confused by the parentheses in the first query. They are doing nothing, so write the query as:
SELECT DISTINCT Invalid_Emails, leads_id
FROM [dbo].[InvalidEmails_stg]
ORDER BY LEADS_ID DESC;
This returns all pairs of Invalid_Emails/Leads_id that appear in the database. No matter how many times a given pair appears, it will be in the result set exactly one time.
This query:
select invalid_emails, max(leads_id) as id
from invalidEmails_stg
group by invalid_emails
having count(*) < 2
order by id desc;
Returns invalid_emails/leads_id pairs that occur only once in your data. It filters out any pairs that occur more than once.
Here is a simple example:
invalid_emails leads_id
a#b.com 1
a#b.com 1
b#c.com 2
b#c.com 3
d#e.com 1
The first query will return:
a#b.com 1
b#c.com 2
b#c.com 3
d#e.com 1
a#b.com is returned once because duplicates are removed.
The second will return:
b#c.com 2
b#c.com 3
d#e.com 1
a#b.com is not returned because it appears twice.

In first query
SELECT distinct(Invalid_Emails),[leads_id]
FROM [dbo].[InvalidEmails_stg]
ORDER BY LEADS_ID DESC
you dont Check Constraint < 2
Actually in Second query :
select invalid_emails, max(leads_id) as id
from invalidEmails_stg
group by invalid_emails
having count(*)<2
order by id desc
if result Contain two or more than row Having Count(*) Filter Your Result .
another diffrence is NULL value . if Column Invalid_Emails having Null Value Appear in First Query and Filter By group by in Next Query

The queries have similar intent, to get a invalid_emails by leads_id.
The 2nd query uses aggregate functions to only bring back the maximum leads_id, and uses a having clause to remove duplicates.

Related

Sort by one column, but get offset by another

Let's say I have a table with two columns:
| ID | A |
I want to sort by A, then get the 10 records after a given ID. What would be the best way to handle this in Postgres?
To clarify, I want to sort by A, but do my pagination by the ID. So if I had a table like:
1 | 'C'
2 | 'B'
3 | 'A'
4 | 'G'
5 | 'A'
6 | 'H'
So after sorting by A, I'd want the first three values after id=1, so:
1 | 'C'
4 | 'G'
6 | 'H'
An ordering of any column is purely dependant on the order by clause and the terms "before" or "after" come into picture only when there's a pre-determined order. So, once the records are ordered by column "A", there's no guarantee that the id's will be ordered in the sequence 1,4,6, unless you also specified that ordering of id.
So, if you
want the first three values after id=1
It means there should be a way to determine the point where the id value has become 1 and all the rows beyond are to be considered. To ensure that you have to explicitly include id in the order by. A COUNT analytic function can come to our rescue to mark the point.
SELECT id,a
FROM ( SELECT t.*,COUNT(CASE WHEN id = 1 THEN 1 END) --the :id argument
OVER( ORDER BY a,id) se
FROM t order by a,id --the rows are ordered first by a, then by id
-- same as in the above count analytic function
) s
WHERE se = 1 limit 3; -- the argument 3 or 10 that you wish to pass
-- se = 1 won't change for other ids, it's a marker
-- that id = n is reached
DEMO
I think this will do:
SELECT *
FROM (SELECT * from MyTable where ID > givenId order by A) sub
LIMIT 10;
You don't want the A columns, so:
SELECT r.*
FROM t
WHERE t.id > ANY (SELET id FROM t t2 WHERE t2.col = 'A')
ORDER BY col
LIMIT 10;
Note that this does not return any rows with A as the value. It also works when the comparison value is not sorted first.
this will work:
SELECT * from Table1 where "ID"=1
order by "A" desc limit 2;
check :http://sqlfiddle.com/#!15/5854b/3
for your query :
SELECT * from Table1 where "ID"=1
order by "A" desc limit 10;

Using GROUP BY, select ID of record in each group that has lowest ID

I am creating a file orginization system where you can add content items to multiple folders.
I am storing the data in a table that has a structure similar to the following:
ID TypeID ContentID FolderID
1 101 1001 1
2 101 1001 2
3 102 1002 3
4 103 1002 2
5 103 1002 1
6 104 1001 1
7 105 1005 2
I am trying to select the first record for each unique TypeID and ContentID pair. For the above table, I would want the results to be:
ID
1
3
4
6
7
As you can see, the pairs 101 1001 and 103 1002 were each added to two folders, yet I only want the record with the first folder they were added to.
When I try the following query, however, I only get result that have at least two entries with the same TypeID and ContentID:
select MIN(ID)
from table
group by TypeID, ContentID
results in
ID
1
4
If I change MIN(ID) to MAX(ID) I get the correct amount of results, yet I get the record with the last folder they were added to and not the first folder:
ID
2
3
5
6
7
Am I using GROUP BY or the MIN wrong? Is there another way that I can accomplish this task of selecting the first record of each TypeID ContentID pair?
MIN() and MAX() should return the same amount of rows. Changing the function should not change the number of rows returned in the query.
Is this query part of a larger query? From looking at the sample data provided, I would assume that this code is only a snippet from a larger action you are trying to do. Do you later try to join TypeID, ContentID or FolderID with the tables the IDs are referencing?
If yes, this error is likely being caused by another part of your query and not this select statement. If you are using joins or multi-level select statements, you can get different amount of results if the reference tables do not contain a record for all the foreign IDs.
Another suggestion, check to see if any of the values in your records are NULL. Although this should not affect the GROUP BY, I have sometime encountered strange behavior when dealing with NULL values.
Use ROW_NUMBER
WITH CTE AS
(SELECT ID,TypeID,ContentID,FolderID,
ROW_NUMBER() OVER (PARTITION BY TypeID,ContentID ORDER BY ID) as rn FROM t
)
SELECT ID FROM CTE WHERE rn=1
Use it with ORDER BY:
select *
from table
group by TypeID, ContentID
order by id
SQLFiddle: http://sqlfiddle.com/#!9/024016/12
Try with first ( id) instead of min(id)
select first(id)
from table
group by TypeID, ContentID
It works ?

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001

SQL Nested Select Statement

I have the following SQL Code which is not giving me my desired results.
SELECT
POLICIES.CLIENTS_ID,
POLICIES.CLIENTCODE,
COUNT(POLICIES.POLICIES_ID) as [Total Policies],
(
SELECT
COUNT(POLICIES.POLICIES_ID)
FROM
POLICIES
WHERE
POLICIES.COVCODE = 'AUT'
) as [Auto Policies]
FROM
POLICIES
LEFT JOIN CLIENTS
ON CLIENTS.CLIENTS_ID = POLICIES.CLIENTS_ID
WHERE
POLICIES.CNR IS NULL
GROUP BY
POLICIES.CLIENTS_ID,
POLICIES.CLIENTCODE
ORDER BY
POLICIES.CLIENTS_ID
I get a result like this:
ID CODE Total Auto
3 ABCDE1 1 999999
4 ABCDE2 1 999999
5 ABCDE3 2 999999
6 ABCDE4 2 999999
I would like for the last column to COUNT the total auto policies that exists for that clientid rather than all of the auto policies that exist. I believe I need a nested select statement that somehow groups all like results on the clientid, but it ends up returning more than 1 row and throws the error.
If I add:
GROUP BY
POLICIES.CLIENTS_ID
I get:
Subquery returned more than 1 value. This is not permitted when the....
Any help would be appreciated greatly!
Thank you
You can use a CASE statement to do this. Instead of your subquery in the SELECT clause use:
SUM(CASE WHEN POLICIES.COVCODE = 'AUT' THEN 1 ELSE 0 END) as [AUTO POLICIES]
As Martin Smith pointed out. If client_id has multiple client_codes then this will give you the count of records for each combination of client_id/client_code. If client_id is 1:1 with client_code then this will give you a count of records for each distinct client_id, which I suspect is the case from your example and question.
Unrelated: You have a LEFT JOIN to your Clients table, but you don't use your Clients table anywhere int he query. Consider removing it if you don't need to select or filter by any its fields, since it's just unused overhead.
What if you modify the inner query for getting count to something like
SUM(CASE WHEN POLICIES.COVCODE = 'AUT' THEN 1 ELSE 0 END) as [Auto Policies]

sql count mismatch

I am not able to understand the SQL query output :
SQL> select distinct(STATUS) from TMP_ORDER_ACTION_PSTN_CP_11035;
InDelivery_SOMBe
In Delivery
Complete
Amended
Cancelled
Failed InComplete
1 SQL> select count(*) from TMP_ORDER_ACTION_PSTN_CP_11035 where
STATUS='Complete';
1484
2 SQL> select count(*) from TMP_ORDER_ACTION_PSTN_CP_11035 where STATUS
!= 'Complete';
3167
3 SQL> select count(*) from TMP_ORDER_ACTION_PSTN_CP_11035;
5091
The sum of count for the 1 and 2 queries should be same as the total count(3 query).Why is the sum differing from the whole count?
It seems like a dump question but i dont know why is this happening.
Please note that My question is not related to null check at all.It is the that
sum(1+2)=3.1484+3167 !=5091.Why is the result different?
My guess is NULL values, which match none of your WHERE clauses, including the last one. Try
select count(*) from TMP_ORDER_ACTION_PSTN_CP_11035 where STATUS is null;
where status = null is never true, nor is where null = null. You have to use is null.
The sum of count for the 1 and 2 queries should be same as the total count(3 query).Why is the sum differing from the whole count?
No, because the records with NULL are not matching query 1 or query 2, but they are counted in query 3.
1 + 2 + IS NULL should equal 3.
WHERE STATUS = NULL won't work. Nothing equals NULL.
Try IS instead of =...
select count(*) from TMP_ORDER_ACTION_PSTN_CP_11035 where STATUS IS null
Try this:
Assuming p_key is a primary key for the table,
select count(p_key) from TMP_ORDER_ACTION_PSTN_CP_11035 where STATUS='Complete';
select count(p_key) from TMP_ORDER_ACTION_PSTN_CP_11035 where STATUS <> 'Complete';
select count(p_key) from TMP_ORDER_ACTION_PSTN_CP_11035 ;