SQL: unique constraint and aggregate function - sql

Presently I'm learning (MS) SQL, and was trying out various aggregate function samples. The question I have is: Is there any scenario (sample query that uses aggregate function) where having a unique constraint column (on a table) helps when using an aggregate function.
Please note: I'm not trying to find a solution to a problem, but trying to see if such a scenario exist in real world SQL programming.

One immediate theoretical scenario comes to mind, the unique constraint is going to be backed by a unique index, so if you were just aggregating that field, the index would be narrower than scanning the table, but that would be on the basis that the query didn't use any other fields and was thus covering, otherwise it would tip out of the NC index.
I think the addition of the index to enforce the unique constraint is automatically going to have the ability to potentially help a query, but it might be a bit contrived.
Put the unique constraint on the field if you need the field to be unique, if you need some indexes to help query performance, consider them seperately, or add a unique index on that field + include other fields to make it covering (less useful, but more useful than the unique index on a single field)

Let's take following two tables, one has records for subject name and subject Id and another table contains record for student having marks in particular subjects.
Table1(SubjectId int unique, subject_name varchar, MaxMarks int)
Table2(Id int, StudentId int, SubjectId, Marks int)
so If I need to find AVG of marks obtained in Science subject by all student who have attempted for
Science(SubjectId =2) then I would fire following query.
SELECT AVG(Marks), MaxMarks
FROM Table1, Table2
WHERE Table1.SubjectId = 2;

Related

Postgres data base need suggestion creating an index for table

I a have a table structure as below. For fetching the data from table I am having search criteria as mentioned below. I am writing a singe sql query as per requirement(sample query I mentioned below). I need to create an index for the table to cover all the search criteria. It will be helpful somebody advice me.
Table structure(columns):
applicationid varchar(15),
trans_tms timestamp,
SSN varchar,
firstname varchar,
lastname varchar,
DOB date,
Zipcode smallint,
adddetais json
Search criteria will be from API will be fall under 4 categories. All 4 categories are mandatory. At any cost I will receive 4 categories of values for against single applicant.
Search criteria:
ssn&last name (last name need to use function I.e. soundex(lastname)=soundex('inputvalue').
ssn & DOB
ssn&zipcode
firstname&lastname&DOB.
Query:
I am trying to write.
Sample query is:
Select *
from table
where ((ssn='aaa' and soundex(lastname)=soundex('xxx')
or ((ssn='aaa' and dob=xxx)
or (ssn='aaa' and zipcode = 'xxx')
or (firstname='xxx' and lastname='xxx' and dob= xxxx));
For considering performance I need to create an index for the table. Might be composite. Any suggestion will be helpful.
Some Approaches I would follow:
Yes, you are correct composite index/multicolumn index will give benefit in AND conditions of two columns, however, indexes would overlap on columns for given conditions.
Documentation : https://www.postgresql.org/docs/10/indexes-multicolumn.html
You can use a UNION instead of OR.
Reference : https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/
If multiple conditions could be combined for e.g: ssn should be 'aaa' with any combination, then modifying the where clause with lesser OR is preferable.

Assign unique ID to duplicates in Access

I had a very big excel spreadsheet that I moved into Access to try to deal with it easier. I'm very much a novice. I'm trying to use SQL via Access.
I need to assign a unique identifier to duplicates. I've seen people use DENSE_RANK in SQL but I can't get it to work in Access.
Here's what I'm trying to do: I have a large amount of patient and sample data (20k rows). My columns are called FULL_NAME, SAMPLE_NUM, and DATE_REC. Some patients have come in more than once and have multiple samples. I want to give each patient a unique ID that I want to call PATIENT_ID.
I can't figure out how to do this, aside from typing it out on each row. I would greatly appreciate help as I really don't know what I'm doing and there is no one at my work who can help.
To illustrate the previous answers' textual explanation, consider the following SQL action queries which can be run in an Access query window one by one or as VBA string queries with DAO's CurrentDb.Execute or DoCmd.RunSQL. The ALTER statements can be done in MSAcecss.exe.
Create a Patients table (make-table query)
SELECT DISTINCT s.FULL_NAME INTO myPatientsTable
FROM mySamplesTable s
WHERE s.FULL_NAME IS NOT NULL;
Add an autonumber field to new Patients table as a Primary Key
ALTER TABLE myPatientsTable ADD COLUMN PATIENT_ID AUTOINCREMENT NOT NULL PRIMARY KEY;
Add a blank Patient_ID column to Samples table
ALTER TABLE mySamplesTable ADD COLUMN PATIENT_ID INTEGER;
Update Patient_ID Column in Samples table using FULL_NAME field
UPDATE mySamplesTable s
INNER JOIN myPatientsTable p
ON s.[FULL_NAME] = p.[FULL_NAME]
SET s.PATIENT_ID = p.PATIENT_ID;
Maintain third-norm principles of relational databases and remove FULL_NAME field from Samples table
ALTER TABLE mySamplesTable DROP COLUMN FULL_NAME;
Then in a separate query, add a foreign key constraint on PATIENT_ID
ALTER TABLE mySamplesTable
ADD CONSTRAINT PatientRelationship
FOREIGN KEY (PATIENT_ID)
REFERENCES myPatientsTable (PATIENT_ID);
Sounds like FULL_NAME is currently the unique identifier. However, names make very poor unique identifiers and name parts should be in separate fields. Are you sure you don't have multiple patients with same name, e.g. John Smith?
You need a PatientInfo table and then the SampleData table. Do a query that pulls DISTINCT patient info (apparently this is only one field - FULL_NAME) and create a table that generates unique ID with autonumber field. Then build a query that joins tables on the two FULL_Name fields and updates a new field in SampleData called PatientID. Delete the FULL_Name field from SampleData.
The command to number rows in your table is [1]
ALTER TABLE MyTable ADD COLUMN ID AUTOINCREMENT;
Anyway as June7 pointed out it might not be a good idea to combine records just based on patient name as there might be duplicates. Better way will be treat each record as unique patient for now and have a way to fix patient ID when patient comes back. I would suggest to go this way:
create two new columns in your samples table
ID with autoincrement as per query above
patientID where you will copy values from ID column - for now they will be same. But in future they will diverge
copy columns patientID and patientName into separate table patients
now you can delete patientName column from samples table
add column imported to patients table to indicate, that there might be some other records that belong to this patient.
when patients come back you open his record, update all other info like address, phone, ... and look for all possible samples record that belong to him. If so, then fix patient id in those records.
Now you can switch imported indicator because this patient data are up to date.
After fixing patientID for samples records. You will end up with patients with no record in samples table. So you can go and delete them.
Unless you already have a natural key you will be corrupting this data when you run the distinct query and build a key from it. From your posting I would guess a natural key would be SAMPLE_NUM. Another problem is that if you roll up by last name you will almost certainly be combining different patients into one.

Database Technology - postgresql

I am quite new to postgresql. Could any expert help me solve this problem please.
Consider the following PostgreSQL tables created for a university system recording which students take which modules:
CREATE TABLE module (id bigserial, name text);
CREATE TABLE student (id bigserial, name text);
CREATE TABLE takes (student_id bigint, module bigint);
Rewrite the SQL to include sensible primary keys.
CREATE TABLE module
(
m_id bigserial,
name text,
CONSTRAINT m_key PRIMARY KEY (m_id)
);
CREATE TABLE student
(
s_id bigserial,
name text
CONSTRAINT s_key PRIMARY KEY (s_id)
);
CREATE TABLE takes
(
student_id bigint,
module bigint,
CONSTRAINT t_key PRIMARY KEY (student_id)
);
Given this schema I have the following questions:
Write an SQL query to count how many students are taking DATABASE.
SELECT COUNT(name)
FROM student
WHERE module = 'DATABASE' AND student_id=s_id
Write an SQL query to show the student IDs and names (but nothing else) of all student taking DATABASE
SELECT s_id, name
FROM Student, take
WHERE module = 'DATABASE' AND student_id = s_id
Write an SQL query to show the student IDs and names (but nothing else) of all students not taking DATABASE.
SELECT s_id, name
FROM Student, take
WHERE student_id = s_id AND module != 'DATABASE'
Above are my answers. Please correct me if I am wrong and please comment the reason. Thank you for your expertise.
This looks like homework so I'm not going to give a detailed answer. A few hints:
I found one case where you used ยด quotes instead of ' apostrophes. This suggests you're writing SQL in something like Microsoft Word, which does so-called "smart quotes. Don't do that. Use a sensible text editor. If you're on Windows, Notepad++ is a popular choice. (Fixed it when reformatting the question, but wanted to mention it.)
Don't use the legacy non-ANSI join syntax JOIN table1, table2, table3 WHERE .... It's horrible to read and it's much easier to make mistakes with. You should never have been taught it in the first place. Also, qualify your columns - take.module not just module. Always write ANSI joins, e.g. in your example above:
FROM Student, take
WHERE module = 'DATABASE' AND student_id = s_id
becomes
FROM student
INNER JOIN take
ON take.module = 'DATABASE'
AND take.student_id = student.s_id;
(if the table names are long you can use aliases like FROM student s then s.s_id)
Query 3 is totally wrong. Imagine if take has two rows for a student, one where the student is taking database and one where they're taking cooking. Your query will still return a result for them, even though they're taking database. (It'd also return the same student ID multiple times, which you don't want). Think about subqueries. You will need to query the student table, using a NOT EXISTS (SELECT .... FROM take ...) to filter out students who are not taking database. The rest you get to figure out on your own.
Also, your schemas don't actually enforce the constraint that a student may only take DATABASE once at a time. Either add that, or consider in your queries the possibility that a student might be registered for DATABASE twice.

Alternative to having an aggregate concatenation function

I have the following tables in SQL Server 2005
ReceiptPoint: ID (PK), Name
GasIndexLocation: Location (PK)
ReceiptPointIndexLocation: ReceiptPointId (FK), IndexLocation (FK), StartDate, EndDate
On any given day, a ReceiptPoint can be mapped to 1 or more IndexLocations. I need to be able to, for a given day, see what IndexLocations are being compared against. For a ReceiptPoint with multiple IndexLocations, it should show up as some kind of delimited list (for displaying to a user).
I have two problems with this.
1) ReceiptPointIndexLocation has no primary key, currently. Is the best way to handle this to just add a incrementing index field on to the table?
2) How can I get the list of IndexLocations used on a given day? From what I can tell, it would require using a non-existent string concatenation aggregate function.
Is there a totally different way I should go about organizing my data? This is still in the development stage, so I'm open to changing it.
1) The primary key for ReceiptPointIndexLocation should be all four columns. However, this wouldn't stop entries with overlapping dates for the same location and receipt. There's no need for an identity column when you can identify rows with a composite key.
2) SQL:
SELECT (SELECT t.indexlocation + ','
FROM RECEIPTPOINTINDEXLOCATION t
WHERE t.startdate >= #start
AND t.enddate <= #end
FOR XML PATH(''))
There are several ways to aggegate string in SQL. You should read this article Concatenating Row Values in Transact-SQL. the most effective way is using the FOR XML PATH(', ') trick.
I recommend adding an incrementing index to the table to use as a primary key. If you do not do so, I think SQL Server does so for you automatically behind the scenes, but I highly recommend explicitly defining the column. That way, you can refer to individual rows by the primary key instead of having to search for both cross-reference values.
Also, watch the clustering on the cross-reference table. Don't cluster on the index primary key if you add one. You probably want to cluster on both of the foreign keys, but you get to choose which foreign key one makes more sense. For example:
ALTER TABLE ReceiptPointIndexLocation
ADD CONSTRAINT UK_ReceiptPointIndexLocation_ReceiptPointId_IndexLocation
UNIQUE CLUSTERED (ReceiptPointId, IndexLocation)
Alternatively, if the combination of (ReceiptPointId, IndexLocation) is unique enough, you can make that the primary key.

Generate unique ID to share with multiple tables SQL 2008

I have a couple of tables in a SQL 2008 server that I need to generate unique ID's for. I have looked at the "identity" column but the ID's really need to be unique and shared between all the tables.
So if I have say (5) five tables of the flavour "asset infrastructure" and I want to run with a unique ID between them as a combined group, I need some sort of generator that looks at all (5) five tables and issues the next ID which is not duplicated in any of those (5) five tales.
I know this could be done with some sort of stored procedure but I'm not sure how to go about it. Any ideas?
The simplest solution is to set your identity seeds and increment on each table so they never overlap.
Table 1: Seed 1, Increment 5
Table 2: Seed 2, Increment 5
Table 3: Seed 3, Increment 5
Table 4: Seed 4, Increment 5
Table 5: Seed 5, Increment 5
The identity column mod 5 will tell you which table the record is in. You will use up your identity space five times faster so make sure the datatype is big enough.
Why not use a GUID?
You could let them each have an identity that seeds from numbers far enough apart never to collide.
GUIDs would work but they're butt-ugly, and non-sequential if that's significant.
Another common technique is to have a single-column table with an identity that dispenses the next value each time you insert a record. If you need them pulling from a common sequence, it's not unlikely to be useful to have a second column indicating which table it was dispensed to.
You realize there are logical design issues with this, right?
Reading into the design a bit, it sounds like what you really need is a single table called "Asset" with an identity column, and then either:
a) 5 additional tables for the subtypes of assets, each with a foreign key to the primary key on Asset; or
b) 5 views on Asset that each select a subset of the rows and then appear (to users) like the 5 original tables you have now.
If the columns on the tables are all the same, (b) is the better choice; if they're all different, (a) is the better choice. This is a classic DB spin on the supertype / subtype relationship.
Alternately, you could do what you're talking about and recreate the IDENTITY functionality yourself with a stored proc that wraps INSERT access on all 5 tables. Note that you'll have to put a TRANSACTION around it if you want guarantees of uniqueness, and if this is a popular table, that might make it a performance bottleneck. If that's not a concern, a proc like that might take the form:
CREATE PROCEDURE InsertAsset_Table1 (
BEGIN TRANSACTION
-- SELECT MIN INTEGER NOT ALREADY USED IN ANY OF THE FIVE TABLES
-- INSERT INTO Table1 WITH THAT ID
COMMIT TRANSACTION -- or roll back on error, etc.
)
Again, SQL is highly optimized for helping you out if you choose the patterns I mention above, and NOT optimized for this kind of thing (there's overhead with creating the transaction AND you'll be issuing shared locks on all 5 tables while this process is going on). Compare that with using the PK / FK method above, where SQL Server knows exactly how to do it without locks, or the view method, where you're only inserting into 1 table.
I found this when searching on google. I am facing a simillar problem for the first time. I had the idea to have a dedicated ID table specifically to generate the IDs but I was unsure if it was something that was considered OK design. So I just wanted to say THANKS for confirmation.. it looks like it is an adequate sollution although not ideal.
I have a very simple solution. It should be good for cases when the number of tables is small:
create table T1(ID int primary key identity(1,2), rownum varchar(64))
create table T2(ID int primary key identity(2,2), rownum varchar(64))
insert into T1(rownum) values('row 1')
insert into T1(rownum) values('row 2')
insert into T1(rownum) values('row 3')
insert into T2(rownum) values('row 1')
insert into T2(rownum) values('row 2')
insert into T2(rownum) values('row 3')
select * from T1
select * from T2
drop table T1
drop table T2
This is a common problem for example when using a table of people (called PERSON singular please) and each person is categorized, for example Doctors, Patients, Employees, Nurse etc.
It makes a lot of sense to create a table for each of these people that contains thier specific category information like an employees start date and salary and a Nurses qualifications and number.
A Patient for example, may have many nurses and doctors that work on him so a many to many table that links Patient to other people in the PERSON table facilitates this nicely. In this table there should be some description of the realtionship between these people which leads us back to the categories for people.
Since a Doctor and a Patient could create the same Primary Key ID in their own tables, it becomes very useful to have a Globally unique ID or Object ID.
A good way to do this as suggested, is to have a table designated to Auto Increment the primary key. Perform an Insert on that Table first to obtain the OID, then use it for the new PERSON.
I like to go a step further. When things get ugly (some new developer gets got his hands on the database, or even worse, a really old developer, then its very useful to add more meaning to the OID.
Usually this is done programatically, not with the database engine, but if you use a BIG INT for all the Primary Key ID's then you have lots of room to prefix a number with visually identifiable sequence. For example all Doctors ID's could begin with 100, all patients with 110, all Nurses with 120.
To that I would append say a Julian date or a Unix date+time, and finally append the Auto Increment ID.
This would result in numbers like:
110,2455892,00000001
120,2455892,00000002
100,2455892,00000003
since the Julian date 100yrs from now is only 2492087, you can see that 7 digits will adequately store this value.
A BIGINT is 64-bit (8 byte) signed integer with a range of -9.22x10^18 to 9.22x10^18 ( -2^63 to 2^63 -1). Notice the exponant is 18. That's 18 digits you have to work with.
Using this design, you are limited to 100 million OID's, 999 categories of people and dates up to... well past the shelf life of your databse, but I suspect thats good enough for most solutions.
The operations required to created an OID like this are all Multiplication and Division which avoids all the gear grinding of text manipulation.
The disadvantage is that INSERTs require more than a simple TSQL statement, but the advantage is that when you are tracking down errant data or even being clever in your queries, your OID is visually telling you alot more than a random number or worse, an eyesore like GUID.