most optimal sql query for a simple table design

most optimal sql query for a simple table design - sql

I'm trying to come up with the most optimal query to solve this problem.
I have simple table made up of columns name(string) and organization_id(int). This table contains a list of names that belong to one or more organizations.
How can I get a list of all the names that belong to the organizations that both "Jim" and "Andy" belong to?
Example:
- John,1
- Jim,1
- Jim,2
- Andy,2
- Carl,2
- Jim,3
- Carl,3
- Andy,4
- John,4
- Jim,5
- Randy,5
- Andy,5
So the query should return to me Jim,2|Andy,2|Carl,2|Jim,5|Randy,5|Andy,5 as both Jim and Andy belong to organizations 2 and 5.
Any ideas?

A straight forward JOIN should do it;
SELECT DISTINCT t1.name
FROM Table1 t1
JOIN Table1 t2 ON t1.organization_id = t2.organization_id AND t2.name = 'Jim'
JOIN Table1 t3 ON t1.organization_id = t3.organization_id AND t3.name = 'Andy'
ORDER BY t1.name
An SQLfiddle to test with.
EDIT: An Oracle SQLfiddle with the same query.

To get the organizations that "Jim" and "Andy" belong to, I like to use aggregation:
select organization
from t
group by organization
having sum(case when name = 'Jim' then 1 else 0 end) > 0 and
sum(case when name = 'Andy' then 1 else 0 end) > 0
You can then get all the people in these organizations using:
select *
from t
where organization in (select organization
from t
group by organization
having sum(case when name = 'Jim' then 1 else 0 end) > 0 and
sum(case when name = 'Andy' then 1 else 0 end) > 0
)

Related

SQL - Get per column count of differences when comparing two tables

I have 2 similar tables as shown below with minor difference between some cells
Table A
Roll_ID
FirstName
LastName
Age
1
AAA
XXX
31
2
BBB
YYY
32
3
CCC
ZZZ
33
Table B
Roll_ID
FirstName
LastName
Age
1
AAA
XXX
35
2
PPP
YYY
36
3
QQQ
WWW
37
I would like to get an output that shows the count of different records on a per-column level.
For example the output of the query for the above scenario should be
Output
Roll_ID
FirstName
LastName
Age
0
2
1
3
For this question we can assume that there will always be one column which will have non-null unique values (or one column which may be primary key). In above example Roll_ID is such a column.
My question is: What would be the most efficient way to get such an output? Is there anything to keep in mind when running such query for tables that may have millions of records from point of view of efficiency?

First you have to join the tables
SELECT *
FROM table1
JOIN table2 on table1.ROLL_ID = table2.ROLL_ID
Now just add the counts
SELECT
SUM(CASE WHEN table1.FirstName <> table2.FirstName THEN 1 ELSE 0 END) as FirstNameDiff,
SUM(CASE WHEN table1.LastName <> table2.LastName THEN 1 ELSE 0 END) as LastNameDiff,
SUM(CASE WHEN table1.Age <> table2.Age THEN 1 ELSE 0 END) as AgeDiff
FROM table1
JOIN table2 on table1.ROLL_ID = table2.ROLL_ID
If an id not existing in both tables is considered "different" then you would need something like this
SELECT
SUM(CASE WHEN COALESCE(table1.FirstName,'x') <> COALESCE(table2.FirstName,'y') THEN 1 ELSE 0 END) as FirstNameDiff,
SUM(CASE WHEN COALESCE(table1.LastName,'x') <> COALESCE(table2.LastName,'y') THEN 1 ELSE 0 END) as LastNameDiff,
SUM(CASE WHEN COALESCE(table1.Age,-1) <> COALESCE(table2.Age,-2) THEN 1 ELSE 0 END) as AgeDiff
FROM ( SELECT table1.Roll_id FROM table1
UNION
SELECT table2.Roll_id FROM table2
) base
LEFT JOIN table1 on table1.ROLL_ID = base.ROLL_ID
LEFT JOIN table2 on table2.ROLL_ID = base.ROLL_ID
Here we get all the roll_ids and then left join back to the tables. This is much better than a cross join if the roll_id column is indexed.

SELECT SUM(IIF(ISNULL(A.FirstName, '') <> ISNULL(B.FirstName, ''), 1, 0)) AS FirstNameRecordDiff,
SUM(IIF(ISNULL(A.LastName, '') <> ISNULL(B.LastName, ''), 1, 0)) AS LastNameRecordDiff,
SUM(IIF(ISNULL(A.Age, 0) <> ISNULL(B.Age, 0), 1, 0)) AS LastNameRecordDiff
FROM A
FULL OUTER JOIN B
ON B.Roll_ID = A.Roll_ID;
This query intentionally allows nulls to equal, assuming that a lack of data would mean the same thing to the end user.
As written, it would only work on SQL Server. To use it for MySQL or Oracle, the query would vary.

Sum a column and perform more calculations on the result? [duplicate]

This question already has an answer here:
How to use an Alias in a Calculation for Another Field
(1 answer)
Closed 3 years ago.
In my query below I am counting occurrences in a table based on the Status column. I also want to perform calculations based on the counts I am returning. For example, let's say I want to add 100 to the Snoozed value... how do I do this? Below is what I thought would do it:
SELECT
pu.ID Id, pu.Name Name,
COUNT(*) LeadCount,
SUM(CASE WHEN Status = 'Working' THEN 1 ELSE 0 END) AS Working,
SUM(CASE WHEN Status = 'Uninterested' THEN 1 ELSE 0 END) AS Uninterested,
SUM(CASE WHEN Status = 'Converted' THEN 1 ELSE 0 END) AS Converted,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END) AS Snoozed,
Snoozed + 100 AS Test
FROM
Prospects p
INNER JOIN
ProspectsUsers pu on p.OwnerId = pu.SalesForceId
WHERE
p.Store = '108'
GROUP BY
pu.Name, pu.Id
ORDER BY
Name
I get this error:
Invalid column name 'Snoozed'.
How can I take the value of the previous SUM statement, add 100 to it, and return it as another column? What I was aiming for is an additional column labeled Test that has the Snooze count + 100.

You can't use one column to create another column in the same way that you are attempting. You have 2 options:
Do the full calculation (as #forpas has mentioned in the comments above)
Use a temp table or table variable to store the data, this way you can get the first 5 columns, and then you can add the last column or you can select from the temp table and do the last column calculations from there.

You can not use an alias as a column reference in the same query. The correct script is:
SELECT
pu.ID Id, pu.Name Name,
COUNT(*) LeadCount,
SUM(CASE WHEN Status = 'Working' THEN 1 ELSE 0 END) AS Working,
SUM(CASE WHEN Status = 'Uninterested' THEN 1 ELSE 0 END) AS Uninterested,
SUM(CASE WHEN Status = 'Converted' THEN 1 ELSE 0 END) AS Converted,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END)+100 AS Snoozed
FROM
Prospects p
INNER JOIN
ProspectsUsers pu on p.OwnerId = pu.SalesForceId
WHERE
p.Store = '108'
GROUP BY
pu.Name, pu.Id
ORDER BY
Name

MSSQL does not allow you to reference fields (or aliases) in the SELECT statement from within the same SELECT statement.
To work around this:
Use a CTE. Define the columns you want to select from in the CTE, and then select from them outside the CTE.
;WITH OurCte AS (
SELECT
5 + 5 - 3 AS OurInitialValue
)
SELECT
OurInitialValue / 2 AS OurFinalValue
FROM OurCte
Use a temp table. This is very similar in functionality to using a CTE, however, it does have different performance implications.
SELECT
5 + 5 - 3 AS OurInitialValue
INTO #OurTempTable
SELECT
OurInitialValue / 2 AS OurFinalValue
FROM #OurTempTable
Use a subquery. This tends to be more difficult to read than the above. I'm not certain what the advantage is to this - maybe someone in the comments can enlighten me.
SELECT
5 + 5 - 3 AS OurInitialValue
FROM (
SELECT
OurInitialValue / 2 AS OurFinalValue
) OurSubquery
Embed your calculations. opinion warning This is really sloppy, and not a great approach as you end up having to duplicate code, and can easily throw columns out-of-sync if you update the calculation in one location and not the other.
SELECT
5 + 5 - 3 AS OurInitialValue
, (5 + 5 - 3) / 2 AS OurFinalValue

You can't use a column alias in the same select. The column alias do not precedence / sequence; they are all created after the eval of the select result, just before group by and order by.
You must repeat code :
SELECT
pu.ID Id,pu.Name Name,
COUNT(*) LeadCount,
SUM(CASE WHEN Status = 'Working' THEN 1 ELSE 0 END) AS Working,
SUM(CASE WHEN Status = 'Uninterested' THEN 1 ELSE 0 END) AS Uninterested,
SUM(CASE WHEN Status = 'Converted' THEN 1 ELSE 0 END) AS Converted,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END) AS Snoozed,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END)+ 100 AS Test
FROM
Prospects p
INNER JOIN
ProspectsUsers pu on p.OwnerId = pu.SalesForceId
WHERE
p.Store = '108'
GROUP BY
pu.Name, pu.Id
ORDER BY
Name
If you don't want to repeat the code, use a subquery
SELECT
ID, Name, LeadCount, Working, Uninterested,Converted, Snoozed, Snoozed +100 AS test
FROM
(SELECT
pu.ID Id,pu.Name Name,
COUNT(*) LeadCount,
SUM(CASE WHEN Status = 'Working' THEN 1 ELSE 0 END) AS Working,
SUM(CASE WHEN Status = 'Uninterested' THEN 1 ELSE 0 END) AS Uninterested,
SUM(CASE WHEN Status = 'Converted' THEN 1 ELSE 0 END) AS Converted,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END) AS Snoozed
FROM Prospects p
INNER JOIN ProspectsUsers pu on p.OwnerId = pu.SalesForceId
WHERE p.Store = '108'
GROUP BY pu.Name, pu.Id) t
ORDER BY Name
or a view

SQL check link between two records (CASE, WHEN, THEN)

I Have in table some records:
ID Services
2 A
2 C
2 C1
2 D2
I`m trying make query that will be select a link between services.
For example: If for ID 2 exists Services C then check if exist Service C1, result Yes or No.
SELECT a. ID, a.service,
CASE
WHEN (a.service ='C') = (a.service = 'C1') THEN 'Yes'
ELSE 'No'
END
FROM t1 a

Try this query:
SELECT *
FROM yourTable t1
WHERE NOT EXISTS (SELECT 1 FROM yourTable t2
WHERE (t2.Services LIKE t1.Services + '%' OR
t1.Services LIKE t2.Services + '%') AND
t1.ID = t2.ID AND t1.Services <> t2.Services);
This returns A and D2 only.
Demo

Hmm... what about this? But I now have problem with checking relationship for each ID independently...
SELECT a. ID, a.service,
CASE
WHEN a.service IN ('C','C1') THEN 'Yes'
ELSE 'No'
END
FROM t1 a

If I understand correctly, you can use aggregation:
SELECT ID,
(CASE WHEN SUM(CASE WHEN service = 'C' THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN service = 'C1' THEN 1 ELSE 0 END) > 0
THEN 'Yes' ELSE 'No'
END) as c_c1_flag
FROM t1
GROUP BY ID;
The SUM(CASE . . . ) counts the number of rows that match the conditions. The > 0 simply says that at least one row exists.

sql Select id that matches combination of column

I have a token table
id | status
------------
1 | taken
1 | used
1 | deleted
2 | taken
2 | deleted
3 | taken
I need to count how many tokens are used ( in use or used).
If a token is taken and deleted without being used then it should not be counted.
So sql would be sth like
SELECT count(*) if the id's status is not (taken & deleted)
The desired number of used token in above example is 2 as
id 1 has been taken used and deleted -> count it
id 3 has been taken -> count it
id 2 has been taken and deleted without being used -> do not count it

A little bit verbose but efficient and still readable and maintainable:
SELECT COUNT(DISTINCT id)
FROM dbo.Token t
WHERE EXISTS
(
SELECT 1 FROM dbo.Token t1
WHERE t.id = t1.id
AND t1.status = 'used'
)
OR
(
EXISTS(
SELECT 1 FROM dbo.Token t1
WHERE t.id = t1.id
AND t1.status = 'taken'
)
AND NOT EXISTS(
SELECT 1 FROM dbo.Token t1
WHERE t.id = t1.id
AND t1.status = 'deleted'
)
)
Demo

Use aggregation and a having clause to get the list of eligible ids:
SELECT id
FROM token t
GROUP BY id
HAVING SUM(case when status = 'taken' then 1 else 0 end) > 0 or
SUM(case when status = 'used' then 1 else 0 end) > 0;
To get the count, use a subquery or CTE:
SELECT COUNT(*)
FROM (SELECT id
FROM token t
GROUP BY id
HAVING SUM(case when status = 'taken' then 1 else 0 end) > 0 or
SUM(case when status = 'used' then 1 else 0 end) > 0
) t

Try this:
SELECT SUM(CASE WHEN (CHARINDEX('used', data.status) > 0) OR (data.status = 'taken') THEN 1 ELSE 0 END) as [count]
FROM
(
SELECT DISTINCT id, (SELECT STUFF((SELECT Distinct ',' + status
FROM token a
WHERE a.id = b.id
FOR XML PATH (''))
, 1, 1, '')) as status
FROM token b
) data
Demo

You need to be able to take into account all three conditions, so a naive approach would be to just compare each three with a case statement:
WITH grouped as
(
select id from #uses group by id
)
select grouped.id,
used =
CASE WHEN used.id is not null THEN 'YES'
WHEN taken.id is not null and deleted.id is null THEN 'YES'
ELSE 'NO'
END
from grouped
left join #uses taken on grouped.id = taken.id
and taken.use_status = 'taken'
left join #uses used on grouped.id = used.id
and used.use_status = 'used'
left join #uses deleted on grouped.id = deleted.id
and deleted.use_status = 'deleted'
The case statement will stop whenever the condition is met, so you only need to WHEN's and an ELSE to meet the conditions.
This is a naive approach, though, and assumes that you only ever have one row per id and use status type. You'd have to do some additional work if that wasn't the case.

if token has been taken and used -> do not count it
SELECT
SUM(DECODE(status, 'taken', 1, 0)) +
SUM(DECODE(status, 'used', 1, 0)) -
SUM(DECODE(status, 'deleted', 1, 0))
FROM
token t
WHERE
status <> 'used' OR
EXISTS(SELECT 1 FROM token t2 WHERE t2.id = t.id and t2.status = 'deleted')
if token has been taken and used -> count it
SELECT
COUNT(1)
FROM
token t
WHERE
status = 'taken' AND
(
EXISTS(SELECT 1 FROM token t2 WHERE t2.id = t.id and t2.status = 'used') OR
NOT EXISTS(SELECT 1 FROM token t2 WHERE t2.id = t.id and t2.status = 'deleted')
)

Coming back to this question, one solution could be with using Pivot
SELECT COUNT(id)
FROM (
SELECT id, status FROM Token
) src
PIVOT
(
COUNT(status) FOR status IN ([taken], [used], [deleted])
) pvt
WHERE (taken = 1 AND deleted = 0)OR (used = 1)
DEMO

SQL Count with multiple conditions then join

Quick one,
I have a table, with the following structure
id lid taken
1 1 0
1 1 0
1 1 1
1 1 1
1 2 1
Pretty simply so far right?
I need to query the taken/available from the lid of 1, which should return
taken available
2 2
I know I can simply do two counts and join them, but is there a more proficient way of doing this rather than two separate queries?
I was looking at the following type of format, but I can not for the life of me get it executed in SQL...
SELECT
COUNT(case taken=1) AS taken,
COUNT(case taken=0) AS available FROM table
WHERE
lid=1
Thank you SO much.

You can do this:
SELECT taken, COUNT(*) AS count
FROM table
WHERE lid = 1
GROUP BY taken
This will return two rows:
taken count
0 2
1 2
Each count corresponds to how many times that particular taken value was seen.

Your query is correct just needs juggling a bit:
SELECT
SUM(case taken WHEN 1 THEN 1 ELSE 0 END) AS taken,
SUM(case taken WHEN 1 THEN 0 ELSE 1 END) AS available FROM table
WHERE
lid=1
Alternatively you could do:
SELECT
SUM(taken) AS taken,
COUNT(id) - SUM(taken) AS available
FROM table
WHERE
lid=1

SELECT
SUM(case WHEN taken=1 THEN 1 ELSE 0 END) AS taken,
SUM(case WHEN taken=0 THEN 1 ELSE 0 END) AS available
FROM table
WHERE lid=1

Weird application of CTE's:
WITH lid AS (
SELECT DISTINCT lid FROM taken
)
, tak AS (
SELECT lid,taken , COUNT(*) AS cnt
FROM taken t0
GROUP BY lid,taken
)
SELECT l.lid
, COALESCE(a0.cnt, 0) AS available
, COALESCE(a1.cnt, 0) AS taken
FROM lid l
LEFT JOIN tak a0 ON a0.lid=l.lid AND a0.taken = 0
LEFT JOIN tak a1 ON a1.lid=l.lid AND a1.taken = 1
WHERE l.lid=1
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

most optimal sql query for a simple table design - sql

Related

SQL - Get per column count of differences when comparing two tables

Sum a column and perform more calculations on the result? [duplicate]

SQL check link between two records (CASE, WHEN, THEN)

sql Select id that matches combination of column

SQL Count with multiple conditions then join

Categories

Resources