Where Statement w/ Distinct - sql

I have a large table but for the purposes of this question, let's assume I have the follwoing column strucure:
I'd like to have a Where statement that returns only rows where the e-mail address is distinct in that particular column.
Thoughts?

SELECT BillingEMail
FROM tableName
GROUP BY BillingEMail
HAVING COUNT(BillingEMail) = 1
OR HAVING COUNT(*) = 1
SQLFiddle Demo
I don't know what RDBMS you are using (the reason why i can't introduce of using analytical functions) but you can do this by joining with a subquery if you want to get all columns
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT BillingEMail
FROM tableName
GROUP BY BillingEMail
HAVING COUNT(BillingEMail) = 1
)b ON a.BillingEMail = b.BillingEMail
SQLFIddle Demo

In most databases, you can do this
select t.AccountId, t.BillingEmail
from (select t.*, count(*) over (partition by BillingEmail) as cnt
from t
) t
where cnt = 1
The advantage of this approach is that you can get as many columns as you like from the table.

I prefer JW's approach, but here is another one using NOT EXISTS.
SELECT AccountID, [Billing Email]
FROM table t1
WHERE NOT EXISTS (
-- Make sure that no other row contains the same
-- email, but a different Account ID.
SELECT 1
FROM table t2
WHERE t1.[Billing Email] = t2.[Billing Email]
AND t1.AccountID <> t2.AccountID
)

Related

Postgresql - Group By

I have a simple groupby scenario. Below is the output of the query.
Query is:
select target_date, type, count(*) from table_name group by target_date, type
The query and output is perfectly good.
My problem is I am using this in Grafana for plotting. That is Grafana with postgres as backend.
What happens is since "type2" category is missed on 01-10-2020 and 03-10-2020, type2 category never gets plotted (side to side bar plot) at all. Though "type2" is present in other days.
It is expecting some thing like
So whenever a category is missed in a date, we need a count with 0 value.
Need to handle this in query, as the source data cannot be modified.
Any help here is appreciated.
You need to create a list of all the target_date/type combinations. That can be done with a CROSS JOIN of two DISTINCT selects of target_date and type. This list can beLEFT JOINed to table_name to get counts for each combination:
SELECT dates.target_date, types.type, COUNT(t.target_date)
FROM (
SELECT DISTINCT target_date
FROM table_name
) dates
CROSS JOIN (
SELECT DISTINCT type
FROM table_name
) types
LEFT JOIN table_name t ON t.target_date = dates.target_date AND t.type = types.type
GROUP BY dates.target_date, types.type
ORDER BY dates.target_date, types.type
Demo on dbfiddle
You may use a calendar table approach here:
SELECT
t1.target_date,
t2.type,
COUNT(t3.target_date) AS count
FROM (SELECT DISTINCT target_date FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT type FROM yourTable) t2
LEFT JOIN yourTable t3
ON t3.target_date = t1.target_date AND
t3.type = t2.type
GROUP BY
t1.target_date,
t2.type
ORDER BY
t1.target_date,
t2.type;
The idea here is to cross join subqueries finding all distinct target dates and types, to generate a starting point for the query. Then, we left join this intermediate table to your actual table, and find the counts for each date and type.
select t.target_date, tmp.type, sum(case when t.type = tmp.type then 1 else 0 end)
from your_table t
cross join (select distinct type from your_table) tmp
group by t.target_date, tmp.type
Demo

SQL Server Join - With INFO_SCHEMA information

I have the first table:
select COLUMN_NAME
from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = N'tbl_Client_List_Pricing'
Don't mind the numbering in the Column_Name. I was doing this while testing because I need the order to remain as they are in the table. Not by ASC, DESC.
Anyhow, I don't know how to use the row numbers on the left that the system provides to JOIN another table without a condition.
Here is Table 2:
You can see that the left row numbers are my linking value but I don't know how to use that system index value as a condition in my JOIN.
Or if there is another way to join these two tables without a condition while keeping the Table 1 information in it's correct position and not affecting it by ORDER would be much appreciated.
Thank you!
-Chase
I guess you are looking for row_number. Use row_number to order result of two queries then join by matching order nums. Your query would be something like
with query_1 as (
select COLUMN_NAME
, rn = row_number() over (order by cast(left(COLUMN_NAME, 3) as int))
from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = N'tbl_Client_List_Pricing'
)
, query_2 as (
select
*, rn = row_number() over (order by (select null))
from
Table_2
)
select
*
from
query_1 q1
join query_2 q2 on q1.rn = q2.rn
select COLUMN_NAME from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
inner join with Table_2 on Num=cast(LEFT(COLUMN_NAME,CHARINDEX('-', COLUMN_NAME)) AS int)
where TABLE_NAME = N'tbl_Client_List_Pricing'
You could also use sys.all_columns object which could able to state the index for your desired column & JOIN them with table2
SELECT *
FROM sys.all_columns c
INNER JOIN Table2 t ON t.Num = c.column_id
WHERE OBJECT_NAME(object_id) = 'tbl_Client_List_Pricing'

Written a subquery that can return more than one field without using the Exists

The query below is supposed to pull records for fields with the max date.
I am getting an error
You have written a subquery that can return more than one field without using EXISTS reserved word in the Main query's FROM clause. Revise the SELECT statement of the subquery to request only one column.
Code:
SELECT *
FROM TableName
WHERE (((([Project_Name], [Date])) IN (SELECT Project_Name, MAX(Date)
FROM TableName
GROUP BY Project)));
Your probably thinking of a nested subquery used as a table, like the below:
select a.*, b.1, b.2
from FirstTable A
join (Select Id, firstcolumn as 1, secondcolumn as 2
from SecondTable) B on b.ID = a.ID
Works pretty much like a regular join except you are using a subquery. Hope that helps,
SELECT A.*
FROM TableName A
INNER JOIN (select Project_Name, max(Date) MaxDate
from TableName
group by Project) B
ON A.[Project_Name] = B.[Project_Name]
AND A.[Date] = B.MaxDate
A version using EXISTS() looks like this:
SELECT *
FROM TableName AS A
WHERE EXISTS(
SELECT * FROM (
SELECT B.Project_Name, MAX( B.Date ) AS MaxDate
FROM TableName AS B
GROUP BY B.Project_Name ) AS C
WHERE C.Project_Name = A.Project_Name AND C.MaxDate = A.Date
);
Although I have the feeling this will have poorer performance than a JOIN because the GROUP BY statement might have to be executed for each record and each call to the EXISTS() function...

Select rows having dstinct values for two fields

Pardon me for the title. I have a table like this:
There will be thousands of rows and now I want to select the rows having the same group_id but vr_debit and vr_credit values must not be equal: ie;, in the image shown, none of the rows satisfy this criteria. If there is are two rows, say, (6,500.000,0) and(6,0,600.000), I want them as the result. Hope you get the idea.
Thank you.
Calculate each group using SUM() which is an aggregate function and filter them using HAVING clause.
SELECT GROUP_ID, SUM(vr_debit) totalDebit, SUM(vr_credit) totalCredit
FROM TableName
GROUP BY GROUP_ID
HAVING SUM(vr_debit) <> SUM(vr_credit)
if you want to get the uncalculated rows, you can join it on the subquery.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT GROUP_ID
FROM TableName
GROUP BY GROUP_ID
HAVING SUM(vr_debit) <> SUM(vr_credit)
) b ON a.GROUP_ID = b.GROUP_ID
SQLFiddle Demo (for both queries)
Perhaps:
SELECT group_ID,
vr_debit,
vr_credit
FROM
dbo.TableName T1
WHERE
EXISTS(
SELECT 1 FROM dbo.TableName T2
WHERE T1.group_ID = T2.group_ID
AND T1.vr_debit <> T2.vr_debit
AND T1.vr_credit<> T2.vr_credit
AND T1.vr_debit <> T2.vr_credit
)
Also you can use this option
SELECT *
FROM dbo.test64 t
WHERE EXISTS (
SELECT 1
FROM dbo.test64 t2
WHERE t.group_id = t2.group_id
HAVING SUM(t2.vr_debit) - SUM(t2.vr_credit) != 0
)
Demo on SQLFiddle

Using Multiple Columns in a SQL Subquery

My setup is I have two tables, Study and Activity_History. Activities run on studies so there is a 1:many relationship.
I want to be able to run a SQL query on an Activity_History table which will get me the activity and the previously run activity. I currently have this:
SELECT
*
FROM Activity_History AS A1
LEFT JOIN Activity_History AS A2
ON A2.Parent_Study_ID =
(
SELECT TOP 1 Parent_Study_ID
FROM Activity_History AS A3
WHERE A3.Parent_Study_ID = A1.Parent_Study_ID
AND A3.Activity_Date < A1.Activity_Date
ORDER BY Activity_Date DESC
)
This is not working. What's happening is its pulling the Activity_Date party of the query has no effect and it just returns the first matching Activity_Date in descending date order for every row. I think this is happening because in my subquery I am using Activity_Date in the where, but this is not in the subquery select.
Thanks for any help!
I'm assuming you're using SQL Server? If so, then this should work using ROW_NUMBER():
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Parent_Study_Id ORDER BY Activity_Date ) RN
FROM Activity_History
)
SELECT *
FROM CTE T1
LEFT JOIN CTE T2 ON T1.RN = T2.RN+1 AND T1.Parent_Study_Id = T2.Parent_Study_Id
And here is the SQL Fiddle.
In SQLServer2005+ instead LEFT JOIN you need to use OUTER APPLY
SELECT *
FROM Activity_History AS A1 OUTER APPLY (
SELECT TOP 1 Parent_Study_ID
FROM Activity_History AS A2
WHERE A2.Parent_Study_ID = A1.Parent_Study_ID
AND A2.Activity_Date < A1.Activity_Date
ORDER BY A2.Activity_Date DESC
) o