Transpose to Count columns of Boolean values on Access SQL - sql

Ok, so I have a Student table that has 6 fields, (StudentID, HasBamboo, HasFlower, HasAloe, HasFern, HasCactus) the "HasPlant" fields are boolean, so 1 for having the plant, 0 for not having the plant.
I want to find the average number of plants that a student has. There are hundreds of students in the table. I know this could involve transposing of some sort and of course counting the boolean values and getting an average. I did look at this question SQL to transpose row pairs to columns in MS ACCESS database for information on Transposing (never done it before), but I'm thinking there would be too many columns perhaps.
My first thought was using a for loop, but I'm not sure those exist in SQL in Access. Maybe a SELECT/FROM/WHERE/IN type structure?
Just hints on the logic and some possible reading material would be greatly appreciated.

you could just get individual totals per category:
SELECT COUNT(*) FROM STUDENTS WHERE HasBamboo
add them all up, and divide by
SELECT COUNT(*) FROM STUDENTS
It's not a great database design though... Better normalized would be:
Table Students; fields StudentID, StudentName
Table Plants; fields PlantID, PlantName
Table OwnedPlants; fields StudentID,PlantID
The last table then stores records for each student that owns a particular plant; but you could easily add different information at the right place (appartment number to Students; Latin name to Plants; date aquired to OwnedPlants) without completely redesigning table structure and add lots of fields. (DatAquiredBamboo, DateAquiredFlower, etc etc)

Related

MS Access duplication

I got my query in MS Access duplicating its errors. The data is being entered from a Windows Forms app, and the data is in two separate tables. I'm using a basic query to show each row in the table. Name table and score table.
Now the issue is that the data is duplicating itself.
Image of the access table
This is the SQL query:
SELECT DISTINCT score.score, name.Name
FROM name, score;
Table Schema
What SQL code could I use to stop the duplication error?
Thanks in advance
Need to add a long integer number field StudID in Scores table then do data entry to add the appropriate student identifier value into each Scores record. This will establish a primary/foreign key relationship between tables that can be used in query with a LEFT or RIGHT JOIN on key fields.
This assumes a student can have multiple scores (for what?). If there can be multiple scores, then probably also need a date field ScoreDate and then possibly should be a Subject field.
If student can have only 1 score, then don't need Scores table - have a field Score in Students table.
Strongly advise not to use reserved words as names for anything. Name is a reserved word.

Row Stores vs Column Stores

Assuming that the database is already populated with data, and that each of the following SQL statements is the one and only query that an application will perform, why is it better to use row-wise or column-wise record storage for the following queries?...
1) SELECT * FROM Person
2) SELECT * FROM Person WHERE id=5
3) SELECT AVG(YEAR(DateOfBirth)) FROM Person
4) INSERT INTO Person (ID,DateOfBirth,Name,Surname) VALUES(2e25,’1990-05-01’,’Ute’,’Muller’)
In those examples Person.id is the primary key.
The article Row Store and Column Store Databases gives a general discussion on this, but I am specifically concerned about the four queries above.
SELECT * FROM ... queries are better for row stores since it has to access numerous files.
Column store is good for aggregation over large volume of date or when you have quesries that only need a few fields from a wide table.
Therefore:
1st querie: row-wise
2nd query: row-wise
3rd query: column-wise
4th query: row-wise
I have no idea what you are asking. You have this statement:
INSERT INTO Person (ID, DateOfBirth, Name, Surname)
VALUES('2e25', '1990-05-01', 'Ute', 'Muller');
This suggests that you have a table with four columns, one of which is an id. Each person is stored in their own column.
You then have three queries. The first cannot be optimized. The second is optimized, assuming that id is a primary key (a reasonable assumption). The third requires a full table scan -- although that could be ameliorated with an index only on DateOfBirth.
If the data is already in this format, why would you want to change it?
This is a very simple data structure. Three of your four query examples access all columns. I see no reason why you would not use a regular row-store table structure.

SQL - Selecting columns based on attributes of the column

I am currently designing a SQL database to house a large amount of biological data. The main table has over 100 columns, where each row is a particular sampling event and each column is a species name. Values are the number of individuals found of that species for that sampling event.
Often, I would like to aggregate species together based on their taxonomy. For example: suppose Sp1, Sp2, and Sp3 belong to Family1; Sp4, Sp5, and Sp6 belong to Family2; and Family1 and Family2 belong to Class1. How do I structure the database so I can simply query a particular Family or Class, instead of listing 100+ columns each time?
My first thought was to create a second table that lists the attributes of each column from the first table. Such that the primary key in the second table corresponded to the column headers in table 1, and the columns in table 2 are the categories I would want to select by (such as Family, Feeding type, life stage, etc.). However, I'm not sure how to write a query that can join tables in such a way.
I'm a newbie to SQL, and am not sure if I'm going about this in completely the wrong way. How can I structure my data/write queries to accomplish my goal?
Thanks in advance.
No, no, no. Don't make species columns in the table.
Instead, where you have one row now, you want multiple rows. It would have columns such as:
id: auto generated sequential number
sampleId: whatever each row in the current table belongs to
speciesId: reference to the species table
columns of data for that species on that sampling
The species table could then have a hierarchy, the entire hierarchy with genus, family, order, and so on.

MS-Access 2007: Query for names that have two or more different values in another field

Hello & thank you in advance.
I have an access db that has the following information about mammals we captured. Each capture has a unique ID, which is the capture table's primary key: "capture_id". The mammals (depending on species) have ear tags that we use to track them from year to year and day to day. These are in a field called "id_code". I have the sex of the mammal as it was recorded at capture time in another field called sex.
I want a query that will return all instances of an id_code IF the sex changes even once for that id.
Example: Animal E555 was caught 4 times, 3 times someone recorded this animal as a F and once as a M.
I've managed to get it to display this info by stacking about 5 queries on top of each other (Query for recaptured animals -> Query for all records of animals from 1st query -> Query for unique combo of id & sex (via just using those two columns & requiring "Unique Values") -> Query that pulls only duplicate id values from that last one and pulls back up all capture records of those ids). HOwever, this is clearly not the right way to do this, it is then not updateable (which I need since this is for data quality control) and for some reason it also returns duplicates of each of those records...
I realize that this could be solved two other ways:
Using R to pull up these records (I want none of this data to have to leave the database though, because we're working on getting it into one place after 35 years of collecting! And my boss can't use R and I'm seasonal, so I want him to just have to open a query)
Creating a table that tracks all animal id's as an animal index. However, this would make entering the data more difficult and also require someone to go back through 20,000 records and create a brand new animal id for every one because you can't give ear tags to voles & things so they don't get a unique identifier in the field.
Help!
It is quite simple to do with a single query. As a bonus, the query will be updatable, not duplicated, and simple to use:
SELECT mammals.ID, mammals.Sex, mammals.id_code, mammals.date_recorded
FROM mammals
WHERE mammals.id_code In
(select id_code from
(select distinct id_code, sex from [mammals]) a
group by id_code
having count(*)>1
);
The reason why you see a sub-query inside a sub-query is because Access does not support COUNT(DISTINCT). With any other "normal" database you would write:
SELECT mammals.ID, mammals.Sex, mammals.id_code, mammals.date_recorded
FROM mammals
WHERE mammals.id_code In
(select id_code
from [mammals]
group by id_code
having count(DISTINCT Sex)>1
);

How to retrieve a column value using a variable

I have a single SQL table with columns Mary, Joe, Pat and Mick
I have rows of values for each persons weekly expenses
I want a single sproc to query a persons weekly expense value
i.e. a sproc that takes two variables #PersonsName and #WeekNumber and returns a single value
It wont work for me. It will work if I specify the persons name in the query but not if I pass the persons name to the query.
I'm pulling my hair out - is this something to do with Dynamic SQL?
Not sure if you are really digging into performance, scalability etc. as your scenario seems like a small personal tool. But still... as juergen suggested, you better design the schema in a different way. your current design won't scale easily if you want to add one more person to your database.
Instead, you can have something like
a persons table(person name, person Id)
an expenses table (week number, person id, expense)
This scales well. and with indexing on person id your queries can be faster
Suggestions apart,
Can I pass variable to select statement as column name in SQL Server seems to provide answer for your question