MS-Access SQL DISTINCT GROUP BY - sql

I am currently trying to SELECT the DISTINCT FirstNames in a GROUP, using Microsoft Access 2010.
The simplified relevant columns of my table looks like this:
+----+-------------+-----------+
| ID | GroupNumber | FirstName |
+----+-------------+-----------+
| 1 | 1 | Peter |
| 2 | 1 | Bob |
| 3 | 1 | Peter |
| 4 | 2 | Rosemary |
| 5 | 2 | Jamie |
| 6 | 3 | Peter |
+----+-------------+-----------+
My actual table contains two columns to which I want to apply this process (separately), but I should be able to simply repeat the process for the other column. The column group number is a simplification, my table actually groups all rows in a ten day interval together, but I've already solved that problem.
And I would like it to return this:
+-------------+------------+
| GroupNumber | FirstNames |
+-------------+------------+
| 1 | Peter |
| 1 | Bob |
| 2 | Rosemary |
| 2 | Jamie |
| 3 | Peter |
+-------------+------------+
This means that I want all Distinct FirstNames for each Group.
A regular DISTINCT would ignore group boundaries and only mention Peter once. All aggregate functions reduce my output to only one value or don't work on strings at all. Access also doesn't support SELECTing columns that are not aggregates or in the GROUP BY statement.
All other answers I've found either want an aggregate, are not applicable to MS Access or are solved by working around the data in ways not applicable to my case. (Standardized languages are a nice thing, aren't they?)
My current (invalid) query looks like this:
SELECT GroupNumber,
DISTINCT FirstNames -- This is illegal, distinct applies to all
-- columns and doesn't respect groups.
FROM Example AS b
-- Complicated stuff to make the groups
GROUP BY GroupNumber;
This query is a one time thing and is used to analyze a 58000 row excel spreadsheet exported from another Database (not my fault), so optimizing for runtime is not necessary.
I would like to achieve this purely through SQL and without VBA if at all possible.

This should work:
SELECT DISTINCT GroupNumber, FirstNames
FROM Example AS b

A solution for this problem would be group by the columns GroupNumber and FirstNames at the same time. The query is presented below:
Select GroupNumber, FirstNames
From input
Group By GroupNumber, FirstNames
(Standardized languages are a nice thing, aren't they?)

Related

Remove duplicate rows in MS-Access

I am using Microsoft Access and in it, I have a table with data that is sometimes repeated. I'm not able to create an SQL query that removes duplicate data, leaving only distinct data in the table. Can someone help me?
My current table:
Date | Level | Name
---------+--------+--------
12/25/2021 | 2 | Jack
12/25/2021 | 2 | Jack
12/10/2021 | 3 | Ana
12/01/2021 | 1 | Lenon
12/01/2021 | 1 | Lenon
12/30/2021 | 3 | Ana
Expected result:
Date | Level | Name
---------+--------+--------
12/25/2021 | 2 | Jack
12/10/2021 | 3 | Ana
12/01/2021 | 1 | Lenon
12/30/2021 | 3 | Ana
PS: Ana appears twice in the expected result table because the dates of the two rows referring to Ana are different, so they are not duplicated values.
Just use select distinct:
select distinct t.*
from t;
I would add that tables should not have duplicate rows. Something is wrong with the table generation if you are getting duplicates -- either the query being used or the process for inserting rows into the table.
You can do a group by of the Date, Level and Name columns.
Use this query:
SELECT Date
,Level
,Name
FROM <TableName>
GROUP BY Date, Level, Name

Access text count in query design

I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!
Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.

Calculating Number of Columns that have no Null value

I want to make a table like following
| ID | Sibling1 | Sibling2 | Sibling 3 | Total_Siblings |
______________________________________________________________
| 1 | Tom | Lisa | Null | 2 |
______________________________________________________________
| 2 | Bart | Jason | Nelson | 3 |
______________________________________________________________
| 3 | George | Null | Null | 1 |
______________________________________________________________
| 4 | Null | Null | Null | 0 |
For Sibling1, Sibling2, Sibling3: they are all nvarchar(50) (can't change this as the requirement).
My concern is that how can I calculate the value for Total_Siblings so it will display the number of siblings like above, using SQL? i attempted to use (Sibling1 + Sibling 2) but it does not display the result I want.
Cheers
A query like this would do the trick.
SELECT ID,Sibling1,Sibling2,Sibling3
,COUNT(Sibling1)+Count(Sibling2)+Count(Sibling3) AS Total
FROM MyTable
GROUP BY ID
A little explanation is probably required here. Count with a field name will count the number of non-null values. Since you are grouping by ID, It will only ever return 0 or 1. Now, if you're using anything other than MySQL, you'll have to substitute
GROUP BY ID
FOR
GROUP BY ID,Sibling1,Sibling2,Sibling3
Because most other databases require that you specify all columns that don't contain an aggregate function in the GROUP BY section.
Also, as an aside, you may want to consider changing your database schema to store the siblings in another table, so that each person can have any number of siblings.
You can do this by adding up individual counts:
select id,sibling1,sibling2,sibling3
,count(sibling1)+count(sibling2)+count(sibling3) as total_siblings
from table
group by 1,2,3,4;
However, your table structure makes this scale crappily (what if an id can belong to, say, 50 siblings?). If you store your data into a table with columns of id and sibling, then this query would be as simple as:
select id,count(sibling)
from table
group by id;

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .

Need select query

Consider the following table structure with data -
AdjusterID | CompanyID | FirstName | LastName | EmailID
============================================================
1001 | Sterling | Jane | Stewart | janexxx#sterlin.com
1002 | Sterling | David | Boon | dav#sterlin.com
1003 | PHH | Irfan | Ahmed | irfan#phh.com
1004 | PHH | Rahul | Khanna | rahul#phh.com
============================================================
Where AdjusterID is the primary key. There are no. of adjusters for a company.
I need to have a query that will list single adjuster per company. i.e. I need to get the result as -
========================================================
1001 | Sterling | Jane | Stewart | janexxx#sterlin.com
1003 | PHH | Irfan | Ahmed | irfan#phh.com
========================================================
If any one could help me that will be great.
One way:
SELECT * FROM Adjusters
WHERE AdjusterID IN(SELECT min(AdjusterID)
FROM Adjusters GROUP BY CompanyID)
There are a handful of other ways involving unions and iteration, but this one is simple enough to get you started.
Edit: this assumes you want the adjuster with the lowest ID, as per your example
I know the answer from Jeremy is a valid one, so I will not repeat it. But you may try another one using a so called tie-breaker:
--//using a tie-breaker. Should be very fast on the PK field
--// but it would be good to have an index on CompanyID
SELECT t.*
FROM MyTable t
WHERE t.AdjusterID = (SELECT TOP 1 x.AdjusterID FROM MyTable x WHERE x.CompanyID = t.CompanyID ORDER BY AdjusterID)
It could be better performance-wise. But even more useful it is if you had another column in the table and you wanted to select not just one for each company but the best for each company using some other column ranking as a criteria. So instead of ORDER BY AdjusterID, you would order by that other column(s).