Selecting only such groups that contain certain value - sql

First of all, even though this SQL: How do you select only groups that do not contain a certain value? thread is almost identical to my problem, it doesn't fully dissipate my confusion about the problem.
Let's have a table "Contacts" like this one:
+----------------------+
| Department FirstName |
+----------------------+
| 100 Thomas |
| 200 Peter |
| 100 Jerry |
+----------------------+
First, I want to group the rows by the department number and show number of rows in each displayed group. This, I believe, can be easily done by the following query.
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
This outputs 2 groups. First with dep.no. 100 containing 2 rows, second with 200 containing only one row.
But then, I want to extend the query to exclude any group that doesn't contain certain value in certain column (e.g. Thomas in FirstName). Here are my questions:
1) Reading the above-mentioned thread I was able to come up with this, which seems to work correctly:
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
WHERE Department IN (SELECT Department FROM Contacts WHERE FirstName = "Thomas")
GROUP BY Department
Q: How does this work? I understand the "WHERE Department IN" part, but then I'd expect a value, but instead another nested query is included, which to me doesn't make much sense as I'm only beginner with SQL.
2) By accident I was able to come up with another query that also seems to work, but feels weird and I also don't understand its workings.
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
HAVING NOT SUM(FirstName = "Thomas") = 0
Q: How does this work? Why alteration: HAVING SUM(FirstName = "Thomas") > 0 doesn't work?
3) Q: Is there any simple and correct way to do this using the HAVING clause?
I expected, that simple "HAVING FirstName='Thomas'" after the GROUP BY would do the trick as it seems to follow a common language, but it does not.
Note that I want the whole groups to be chosen by the query so "WHERE FirstName='Thomas'" isn't s solution for my problem as it excludes all the rows that don't satisfy the condition before the grouping takes place (at least the way I understand it).

Q: How does this work? I understand the "WHERE Department IN" part,
but then I'd expect a value, but instead another nested query is
included, which to me doesn't make much sense as I'm only beginner
with SQL.
The nested query returns values which are used to match against Department
2) By accident I was able to come up with another query that also
seems to work, but feels weird and I also don't understand its
workings.
HAVING NOT SUM(FirstName = "Thomas") = 0
"Feels weird" because, well, it is. This is not a place for the SUM function.
EDIT: Why does this work?
The expression FirstName = "Thomas" gets evaluated as true or false (known as a Boolean expression). True numerically is equal to 1 and False converts to 0 (zero). By including SUM you then calculated the totals so really zero (still) means false and "not zero" is true. Then to make it weird(er) you included NOT which negated the whole thing and it becomes NOT TRUE = 0 or FALSE = FALSE (which is of course... TRUE)!!
EDIT: I think what could be more helpful to you is consideration of when to use WHERE and when to use HAVING (instead of the Boolean magic taking place).
From this answer:
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows.
WHERE was appropriate for your example because first you want to "only return rows WHERE Department IN (100)" and then you want to "group those rows by Department" and get a COUNT of how many rows had been selected.

Related

COUNT Function: Result of a query

I'm having trouble with a part of the following question. Thank you in advance for your help. I have a hard time visualizing this "fake" database table. I was hoping someone could help me run through my logic and see if it's correct. If someone could just point me in the right direction that would be great!
About:
Sesame is a way to find online class for adults & activities for adults around you.
Imagine a database table named activities. It has four columns:
activity_id [int, non null]
activity_provider_id [int, non null]
area_id [int, nullable]
starts_at [timestamp, non null]
Question: Given the following query, which counts would you expect to return the highest and lowest values? Which counts would you expect to be the same? Why?
select
count(activity_id),
count(distinct activity_provider_id),
count(area_id),
count(distinct area_id),
count(*)

 from activities
My Solution
Highest values: count(*)
Reasoning: The Count(*) function returns the number of rows returned by a SELECT statement, including NULL and duplicates.
Lowest values: count(distinct activity_provider_id)
Reasoning: Less activity providers per activity per area*
Same: Unsure - Could someone just point me in the right direction?
count(*) takes in account all rows in the table, while count(some_col) only counts non-null values of some_col.
Since activity_id is a non nullable colum, one would expect the following expressions return the same, "highest" count:
count(activity_id)
count(*)
As for wich expression returns the lowest count out of the three remaining choices, it is not really possible to tell for sure from the information provided in the question. If actually depends whether that are more, or less, distinct areas than activity providers.
There even is an edge case where all expressions return the same case, if all activity providers (resp. areas) are not null and unique in the table.

Trouble getting newest record from grouped records

I'm pretty new to Access so I'm sure this is something simple. I'm not sure I even have the best subject.
I have an Owner and a Names table that contain data like this:
Owner Names
TMKFK NID ... NIDFK Last ModDate
7721011 45 45 Smith 1/18/15
7721011 137 137 Jones 2/1/15
7721012 45 45 Smith 1/18/15
I am trying to query them so that I get the TMKFK for the latest timestamped record in the Name table. This is used for a lookup from a form. So if I lookup Smi* I expect to get 7721012.
After a bunch of looking around on this site and elsewhere and looking at partition over I concluded the answer had to be using a subquery, but I can't quite figure out what to put where. This is where I got stuck:
SELECT Owner.TMKFK
FROM Owner INNER JOIN Names ON Owner.NID = Names.NIDFK
GROUP BY Owner.TMKFK, [Owner Name].Last, [Owner Name].M
WHERE (Owner.TMKFK=7721011 Or Owner.TMKFK=7721012)
AND Names.Last Like "Smith"
AND Names.ModDate=(SELECT Max(Names.ModDate) FROM Names);
This fails because the subquery returns the Max date from the entire table and not just the two records with the same TMKFK. A HAVING clause doesn't seem to make a difference. Re-ordering the fields in group by didn't make a difference.
The subquery to get the max date would need to be restricted to the owner in question. Something along these lines:
SELECT Owner.TMKFK
FROM Owner INNER JOIN Names ON Owner.NID = Names.NIDFK
WHERE (Owner.TMKFK=7721011 Or Owner.TMKFK=7721012)
AND Names.Last Like 'Smith%'
AND Names.ModDate=(SELECT Max(Names.ModDate)
FROM Names
WHERE NIDFK = Owner.NID
)
Don't think you need the GROUP BY. Don't know the Access syntax, but LIKE usually implies wildcards like % and the string should be single quoted. And if you want case-insensitive searching:
AND UPPER(Names.Last) LIKE UPPER('Smith%')

Missing records if add "NOT IN"

I have a table of doctor names and states.
f_name | l_name | state
MICHAEL | CRANE |
HAL | CRANE | MD
THOMAS | ROMINA | DE
And so on.
What I want is to get all doctors that are NOT in MD. However, if I write this expression I'm missing those with NULL values for state.
SELECT *
FROM doctors
WHERE state NOT IN ('MD')
I don't understand the issue. I was able to fix it by adding
OR state IS NULL
Obviously it has something to due with NOT IN (or IN) not handling NULL. Can anyone explain this for me? Is there an alternative for what I was trying to do?
Thanks
Yes, there is an alternative - you would use the NVL() function (or COALESCE() if you want to stick to the ANSI standard):
SELECT * FROM doctors
WHERE NVL(state, '##') NOT IN ('MD')
However you don't really need to use NOT IN here - it's only necessary when you have multiple values, e.g.:
SELECT * FROM doctors
WHERE NVL(state, '##') NOT IN ('MD','PA')
With one value you can just use = (or in this case, != or <>):
SELECT * FROM doctors
WHERE NVL(state, '##') != 'MD'
In Oracle SQL, NULL can't be compared to other values (not even other NULLs). So WHERE NULL = NULL, for example, will return zero rows. You do NULL comparisons with IS NULL and IS NOT NULL.
As noted already, you don't know that Michael Crane's state isn't Maryland. It's NULL, which can be read as representing "don't know". It might be Maryland, or it might not be. NOT IN ('MD') only finds those values known not to be 'MD'.
If you have a filter WHERE x, you can use MINUS to find exactly those records where x is not true (where x is either false or unknown).
select *
from doctors
minus
select *
from doctors
where state in ('MD');
This has one big advantage over anything involving IS NULL or NVL: it's immediately obvious exactly which records you don't want to see. You don't have to worry about accidentally missing one case where NULL isn't covered in your condition, and you don't have to worry about records that happen to match whatever dummy value you use with NVL.
It's generally not good for performance on Oracle, accessing the table twice, but for one-off queries, depending on the table size, the time saved writing the query can be more than the added execution time.
Inside database, null are not physical string values("null"), it simply says no value. So, if you compare NULLs to anything, it will not be equal or not equal. Even, two NULL's are not equal. You can only check whetherr a value is NULL or not but you can't compare it to other values.

MS SQL 2000 - How to efficiently walk through a set of previous records and process them in groups. Large table

I'd like to consult one thing. I have table in DB. It has 2 columns and looks like this:
Name...bilance
Jane...+3
Jane...-5
Jane...0
Jane...-8
Jane...-2
Paul...-1
Paul...2
Paul....9
Paul...1
...
I have to walk through this table and if I find record with different "name" (than was on previous row) I process all rows with the previous "name". (If I step on the first Paul row I process all Jane rows)
The processing goes like this:
Now I work only with Jane records and walk through them one by one. On each record I stop and compare it with all previous Jane rows one by one.
The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs
Summary:
I loop through this table in 3 levels paralelly (nested loops)
1st level = search for changes of "name" column
2nd level = if change was found, get all rows with previous "name" and walk through them
3rd level = on each row stop and walk through all previous rows with current "name"
Can this be solved only using CURSOR and FETCHING, or is there some smoother solution?
My real table has 30 000 rows and 1500 people and If I do the logic in PHP, it takes long minutes and than timeouts. So I would like to rewrite it to MS SQL 2000 (no other DB is allowed). Are cursors fast solution or is it better to use something else?
Thank you for your opinions.
UPDATE:
There are lots of questions about my "summarization". Problem is a little bit more difficult than I explained. I simplified it just to describe my algorithm.
Each row of my table contains much more columns. The most important is month. That's why there are more rows for each person. Each is for different month.
"Bilances" are "working overtimes" and "arrear hours" of workers. And I need to sumarize + and - bilances to neutralize them using values from previous months. I want to have as many zeroes as possible. All the table must stay as it is, just bilances must be changed to zeroes.
Example:
Row (Jane -5) will be summarized with row (Jane +3). Instead of 3 I will get 0 and instead of -5 I will get -2. Because I used this -5 to reduce +3.
Next row (Jane 0) won't be affected
Next row (Jane -8) can not be used, because all previous bilances are negative
etc.
You can sum all the values per name using a single SQL statement:
select
name,
sum(bilance) as bilance_sum
from
my_table
group by
name
order by
name
On the face of it, it sounds like this should do what you want:
select Name, sum(bilance)
from table
group by Name
order by Name
If not, you might need to elaborate on how the Names are sorted and what you mean by "summarize".
I'm not sure what you mean by this line... "The task is to sumarize "bilance" column (in the scope of actual person) if they have different signs".
But, it may be possible to use a group by query to get a lot of what you need.
select name, case when bilance < 0 then 'negative' when bilance >= 0 then 'positive', count(*)
from table
group by name, bilance
That might not be perfect syntax for the case statement, but it should get you really close.

SQL Group By

If I have a set of records
name amount Code
Dave 2 1234
Dave 3 1234
Daves 4 1234
I want this to group based on Code & Name, but the last Row has a typo in the name, so this wont group.
What would be the best way to group these as:
Dave/Daves 9 1234
As a general rule if the data is wrong you should fix the data.
However if you want to do the report anyway you could come up with another criteria to group on, for example LEFT(Name, 4) would perform a grouping on the first 4 characters of the name.
You may also want to consider the CASE statement as a method (CASE WHEN name = 'Daves' THEN 'Dave' ELSE name), but I really don't like this method, especially if you are proposing to use this for anything else then a one-off report.
If it's a workaround, try
SELECT cname, SUM(amount)
FROM (
SELECT CASE WHEN NAME = 'Daves' THEN 'Dave' ELSE name END AS cname, amount
FROM mytable
)
GROUP BY cname
This if course will handle only this exact case.
For MySQL:
select
group_concat(distinct name separator '/'),
sum(amount),
code
from
T
group by
code
For MSSQL 2005+ group_concat() can be implemented as .NET custom aggregate.
Fix the typo? Otherwise grouping on the name is going to create a new group.
Fixing your data should be your highest priority instead of trying to devise ways to "work around" it.
It should also be noted that if you have this single typo in your data, it is likely that you have (or will have at some point in the future) even more screwy data that will not cleanly fit into your code, which will force you to invent more and more "work arounds" to deal with it, when you should be focusing on the cleanliness of your data.
If the name field is suppose to be a key then the assumption has to be that Dave and Daves are two different items all together, and thus should be grouped differently. If however it is a typo, then as other have suggested, fix the data.
Grouping on a freeform entered text field if that is what this is, will always have issues. Data entry is never 100%.
To me it makes more sense to group on the code alone if that is the key field and leave name out of the grouping all together.