SQL Server ranking weirdness using FREETEXTTABLE across multiple columns

SQL Server ranking weirdness using FREETEXTTABLE across multiple columns - sql-server-2012

I have been struggling to get my head around how SQL Server full text search ranks my results.
Consider the following FREETEXTTABLE search:
DECLARE #SearchTerm varchar(55) = 'Peter Alex'
SELECT ftt.[RANK], v.*
FROM FREETEXTTABLE (vMembersFTS, (Surname, FirstName, MiddleName, MemberRef, Passport), #SearchTerm) ftt
INNER JOIN vMembersFTS v ON v.ID = ftt.[KEY]
ORDER BY ftt.[RANK] DESC;
This returns the following results and rankings:
RANK ID MemberRef Passport FirstName MiddleName Surname Salutation
----- ---- ---------- ----------- ----------- ------------ ---------- ------------
18 2 AB-002 Pete Peters
18 9 AB-006 George Alex Mr Alex
18 13 AB-009 Peter David Alex Mr Alex
14 3 AB-003 Peter Alex Jones
As you may be able to tell from the results posted above, the last row, although having, what I consider, a good match on both 'Peter' and 'Alex', appears with a rank of only 14 where the result in the first row has only a single match on 'Peter' (admittedly the surname is 'Peters').
This is a contrived example, but goes some way to illustrate my frustrations and lack of knowledge.
I have spent quite a bit of time researching, but I am feeling a bit out of my depth now. I'm sure that I'm doing something stupid such as searching across multiple columns.
I welcome your help and support. Thanks in advance.
Thanks,
Kaine
(BTW I am using SQL Server 2012)
Here is the SQL you can use to repeat the test yourself:
-- Create the Contacts table.
CREATE TABLE dbo.Contacts
(
ID int NOT NULL PRIMARY KEY,
FirstName varchar(55) NULL,
MiddleName varchar(55) NULL,
Surname varchar(55) NOT NULL,
Salutation varchar(55) NULL,
Passport varchar(55) NULL
);
GO
-- Create the Members table.
CREATE TABLE dbo.Members
(
ContactsID int NOT NULL PRIMARY KEY,
MemberRef varchar(55) NOT NULL
);
GO
-- Create the FTS view.
CREATE VIEW dbo.vMembersFTS WITH SCHEMABINDING AS
SELECT c.ID,
m.MemberRef,
ISNULL(c.Passport, '') AS Passport,
ISNULL(c.FirstName, '') AS FirstName,
ISNULL(c.MiddleName, '') AS MiddleName,
c.Surname,
ISNULL(c.Salutation, '') AS Salutation
FROM dbo.Contacts c
INNER JOIN dbo.Members AS m ON m.ContactsID = c.ID
GO
-- Create the view index for FTS.
CREATE UNIQUE CLUSTERED INDEX IX_vMembersFTS_ID ON dbo.vMembersFTS (ID);
GO
-- Create the FTS catalogue and stop-list.
CREATE FULLTEXT CATALOG ContactsFTSCatalog WITH ACCENT_SENSITIVITY = OFF;
CREATE FULLTEXT STOPLIST ContactsSL FROM SYSTEM STOPLIST;
GO
-- Create the member full-text index.
CREATE FULLTEXT INDEX ON dbo.vMembersFTS
(Surname, Firstname, MiddleName, Salutation, MemberRef, Passport)
KEY INDEX IX_vMembersFTS_ID
ON ContactsFTSCatalog
WITH STOPLIST = ContactsSL;
GO
-- Insert some data.
INSERT INTO Contacts VALUES (1, 'John', NULL, 'Smith', NULL, NULL);
INSERT INTO Contacts VALUES (2, 'Pete', NULL, 'Peters', NULL, NULL);
INSERT INTO Contacts VALUES (3, 'Peter', 'Alex', 'Jones', NULL, NULL);
INSERT INTO Contacts VALUES (4, 'Philip', NULL, 'Smith', NULL, NULL);
INSERT INTO Contacts VALUES (5, 'Harry', NULL, 'Dukes', NULL, NULL);
INSERT INTO Contacts VALUES (6, 'Joe', NULL, 'Jones', NULL, NULL);
INSERT INTO Contacts VALUES (7, 'Alex', NULL, 'Phillips', 'Mr Phillips', NULL);
INSERT INTO Contacts VALUES (8, 'Alexander', NULL, 'Paul', 'Alex', NULL);
INSERT INTO Contacts VALUES (9, 'George', NULL, 'Alex', 'Mr Alex', NULL);
INSERT INTO Contacts VALUES (10, 'James', NULL, 'Castle', NULL, NULL);
INSERT INTO Contacts VALUES (11, 'John', NULL, 'Alexander', NULL, NULL);
INSERT INTO Contacts VALUES (12, 'Robert', NULL, 'James', 'Mr James', NULL);
INSERT INTO Contacts VALUES (13, 'Peter', 'David', 'Alex', 'Mr Alex', NULL);
INSERT INTO Members VALUES (1, 'AB-001');
INSERT INTO Members VALUES (2, 'AB-002');
INSERT INTO Members VALUES (3, 'AB-003');
INSERT INTO Members VALUES (5, 'AB-004');
INSERT INTO Members VALUES (8, 'AB-005');
INSERT INTO Members VALUES (9, 'AB-006');
INSERT INTO Members VALUES (11, 'AB-007');
INSERT INTO Members VALUES (12, 'AB-008');
INSERT INTO Members VALUES (13, 'AB-009');
-- Run the FTS query.
DECLARE #SearchTerm varchar(55) = 'Peter Alex'
SELECT ftt.[RANK], v.*
FROM FREETEXTTABLE (vMembersFTS, (Surname, FirstName, MiddleName, MemberRef, Passport), #SearchTerm) ftt
INNER JOIN vMembersFTS v ON v.ID = ftt.[KEY]
ORDER BY ftt.[RANK] DESC;

The rank is assigning based on the order in your query:
DECLARE #SearchTerm varchar(55) = 'Peter Alex'
SELECT ftt.[RANK], v.*
FROM FREETEXTTABLE (vMembersFTS, (Surname, FirstName, MiddleName, MemberRef, Passport), #SearchTerm) ftt
INNER JOIN vMembersFTS v ON v.ID = ftt.[KEY]
ORDER BY ftt.[RANK] DESC;
So in your case, a match on SurName trumps FirstName, and both trump MiddleName.
Your top 3 results have a rank of 18 as all three match on Surname. The last record has a rank of 14 for matching on FirstName and MiddleName but not SurName.
You can find details on the rank calculations here: https://technet.microsoft.com/en-us/library/ms142524(v=sql.105).aspx
If you want to allocate equal weight to these you can, but you'd have to use CONTAINSTABLE and not FREETEXTTABLE.
Info can be found here: https://technet.microsoft.com/en-us/library/ms189760(v=sql.105).aspx

Related

How to list total number of scholarships per department in SQL

I have 2 tables that look like this where I want to query how many scholarships (from Tuition table) each department (from Student table) has distributed:
I am thinking a join is necessary but am not sure how to do so.

Create tables
create table students (
sid int auto_increment primary key,
name varchar(100),
email varchar(100),
department varchar(100)
);
create table tutions (
id int auto_increment primary key,
sid int,
cost int,
scholarships int,
duedate timestamp default current_timestamp
);
Sample data
insert into students (name, email, department)
values
('John Doe', 'john#abc.xyz', 'B'),
('Jane Doe', 'jane#abc.xyz', 'A'),
('Jack Doe', 'jack#abc.xyz', 'C'),
('Jill Doe', 'jill#abc.xyz', 'B');
insert into tutions (sid, cost, scholarships)
values
(1, 1000, 2),
(2, 1000, 1),
(3, 1000, 7),
(4, 1000, 2);
Query (department-wise total scholarships)
SELECT department, sum(scholarships) as scholarships
FROM students s
JOIN tutions t ON s.sid = t.sid
GROUP BY department
Output
Running SQL Fiddle

Not sure It's something you want? And not sure scholarships is a number or name of scholarship? So I doubt it's a name as varchar string type.
### dummy record
CREATE TABLE students (
psu_id INTEGER PRIMARY KEY,
firstname VARCHAR NOT NULL,
lastname VARCHAR NOT NULL,
email VARCHAR NOT NULL,
department VARCHAR NOT NULL
);
CREATE TABLE tuition (
tuition_id INTEGER PRIMARY KEY,
student_id INTEGER NOT NULL,
semeter_cost INTEGER NOT NULL,
scholarships VARCHAR NOT NULL,
due_date DATE NOT NULL
);
INSERT INTO students VALUES (1, 'John', 'Hello', 'Jonh#email.com', 'Engineering');
INSERT INTO students VALUES (2, 'Bella', 'Fuzz', 'Bella#email.com', 'Computer');
INSERT INTO students VALUES (3, 'Sunny', 'World', 'Sunny#email.com', 'Science');
INSERT INTO tuition VALUES (1, 1, 4000, 'first_class_en', '2022-05-09' );
INSERT INTO tuition VALUES (2, 2, 3000, 'nobel', '2022-05-09' );
INSERT INTO tuition VALUES (3, 3, 5000, 'hackathon', '2022-05-09' );
INSERT INTO tuition VALUES (4, 1, 4500, 'second_class_en', '2022-05-09' );
-----------------
### query
SELECT s.department, count(t.scholarships)
FROM students s
JOIN tuition t
ON s.psu_id = t.student_id
GROUP BY s.department
### output
department, total_scholarships
Computer|1
Engineering|2
Science|1

Oracle simple list results to comma separated list with quotes?

I would like to get results of this simple query
select PersonID from Persons where city ='Miami'
as a flat comma separated list with quotes.
desired results should be '3','5','7','8'
http://sqlfiddle.com/#!4/95e86d/1/0
I tried list_add but im getting: ORA-00904: "STRING_AGG": invalid identifier
CREATE TABLE Persons (
PersonID NUMBER,
FirstName varchar(255),
City varchar(255)
);
INSERT INTO Persons (PersonID, FirstName, City)
VALUES (1, 'Tom B.','Stavanger');
INSERT INTO Persons (PersonID, FirstName, City)
VALUES (2, 'Jerry M.','Train City');
INSERT INTO Persons (PersonID, FirstName, City)
VALUES (3, 'Eric g.','Miami');
INSERT INTO Persons (PersonID, FirstName, City)
VALUES (4, 'Bar Y.','Manhattan');
INSERT INTO Persons (PersonID, FirstName, City)
VALUES (5, 'John K.','Miami');
INSERT INTO Persons (PersonID, FirstName, City)
VALUES (6, 'Foo F.','Washington');
INSERT INTO Persons (PersonID, FirstName, City)
VALUES (7, 'Alen D.','Miami');
INSERT INTO Persons (PersonID, FirstName, City)
VALUES (8, 'John K.','Miami');

select listagg( '''' || PersonID || '''', ',' )
within group (order by personID)
from Persons
where city ='Miami'
should work.
I'd have to question why you're trying to produce this particular result. If you're going to generate this string so that you can subsequently use it in the IN list of another dynamically generated SQL statement, you're probably better off taking a step back because there are much better ways to solve the problem.
SQL Fiddle example

joining three tables together using Inner Joins

Using Table aliases, list the first name, last name and start date of students enrolled on the java fundamentals module:
I am having some trouble when running the query below.
SELECT stu.StudFName, stu.StudLName, enrol.StartDate
From Student stu
INNER JOIN Enrolment enrol
ON stu.StudID = enrol.StudID
INNER JOIN Module mod
ON enrol.ModCode = mod.ModCode
WHERE mod.ModName = 'Java Fundamentals'
Structure:
CREATE TABLE Student
(StudID INTEGER PRIMARY KEY,
StudFName VARCHAR(10) NOT NULL,
StudLName VARCHAR(10) NOT NULL,
DoB DATE NOT NULL,
Sex CHAR(1) NOT NULL CHECK (Sex IN ('M', 'F')),
Email VARCHAR(30) UNIQUE);
CREATE TABLE Staff
(StaffID INTEGER PRIMARY KEY,
Title VARCHAR(4) CHECK (Title IN ('Prof', 'Dr', 'Mr', 'Mrs', 'Miss')),
StaffFName VARCHAR(10) NOT NULL,
StaffLName VARCHAR(10) NOT NULL,
Email VARCHAR(30) UNIQUE,
Department VARCHAR(25) DEFAULT 'Not Assigned',
Extension INTEGER CHECK (Extension BETWEEN 0001 AND 9999));
CREATE TABLE Module
(ModCode CHAR(4) PRIMARY KEY,
ModName VARCHAR(25) NOT NULL,
ModCredits INTEGER NOT NULL CHECK (ModCredits IN (15, 30, 45, 60)),
ModLevel CHAR(3) NOT NULL CHECK (ModLevel IN ('UG1', 'UG2', 'UG3', 'MSc')),
ModLeader INTEGER NOT NULL,
Foreign Key (ModLeader) REFERENCES Staff (StaffID));
CREATE TABLE Enrolment
(ModCode CHAR(4) NOT NULL,
StudID INTEGER NOT NULL,
StartDate DATE NOT NULL,
PRIMARY KEY (ModCode, StudID),
Foreign Key (StudID) REFERENCES Student (StudID),
Foreign Key (ModCode) REFERENCES Module (ModCode));

The answer is... there is no trouble with your query.
It works just fine.
Check here
INSERT INTO Student (StudID, StudFName, StudLName, DoB, Sex, Email) VALUES
(1, 'Jack', 'Black', TO_DATE('2015/01/01', 'yyyy/mm/dd'), 'M', 'jack#email.com');
INSERT INTO Student (StudID, StudFName, StudLName, DoB, Sex, Email) VALUES
(2, 'Andrew', 'Wiggin', TO_DATE('2015/01/01', 'yyyy/mm/dd'), 'M', 'andrew#email.com');
INSERT INTO Student (StudID, StudFName, StudLName, DoB, Sex, Email) VALUES
(3, 'Bob', 'Marley', TO_DATE('2015/01/01', 'yyyy/mm/dd'), 'M', 'bob#email.com');
INSERT INTO Staff (StaffID, Title, StaffFName, StaffLName, Email, Extension) VALUES
(1, 'Prof', 'Joe', 'Smith', 'stuff#emal.com', 0001);
INSERT INTO Module (ModCode, ModName, ModCredits, ModLevel, ModLeader) VALUES
(1, 'Java Fundamentals', 30, 'UG1', 1);
INSERT INTO Module (ModCode, ModName, ModCredits, ModLevel, ModLeader) VALUES
(2, 'C# Fundamentals', 15, 'UG2', 1);
INSERT INTO Enrolment (ModCode, StudID, StartDate) VALUES
(1, 1, TO_DATE('2015/01/01', 'yyyy/mm/dd'));
INSERT INTO Enrolment (ModCode, StudID, StartDate) VALUES
(1, 2, TO_DATE('2015/01/02', 'yyyy/mm/dd'));
INSERT INTO Enrolment (ModCode, StudID, StartDate) VALUES
(2, 3, TO_DATE('2015/01/03', 'yyyy/mm/dd'));
-------------------------------------------------------------
SELECT stu.StudFName, stu.StudLName, enrol.StartDate
From Student stu
INNER JOIN Enrolment enrol ON stu.StudID = enrol.StudID
INNER JOIN Module mod ON enrol.ModCode = mod.ModCode
WHERE mod.ModName = 'Java Fundamentals'
Result
STUDFNAME STUDLNAME STARTDATE
----------------------------------------
Jack Black January, 01 2015
Andrew Wiggin January, 02 2015

Relational algebra one to many, division

I have seen many examples of division but none helped me understand how to solve the following problem:
Relations are:
Patient (Pid, Did, Pname, Disease, Severity)
Doctor (Did, Cname, Dname, Specialty)
Clinic (Cname, Mname, Type, City)
Question is: Find all pairs of patient ID and city such that the patient doesn't suffer from flu, and was treated in all the clinics in that city.
From what I've seen I need to divide the patients without flu by....? I think that it should be a table with all the clinics in a certain city, but how do I get that, and for each possible city?
Thanks

so... this is kind of complicated... pay special attention to the sample data I included in the tables.
http://sqlfiddle.com/#!2/04eac4/15/0
http://sqlfiddle.com/#!2/04eac4/17/0
I have no idea where division was supposed to take place...?
First we need to know how many clinics are in each city:
select city, count(cid) as c_count from clinic group by city
Then we need to know where each patient has been treated for anything besides the flu...
select patient.pid, patient.pname, clinic.city,
count(distinct clinic.cid) as p_c_count
from patient
join patient_doctor
on patient.pid = patient_doctor.pid
join doctor
on patient_doctor.did = doctor.did
join clinic
on doctor.cid = clinic.cid
where patient_doctor.disease not in ('Flu')
group by patient.pid, patient.pname, clinic.city
This joins all the tables together so that we can link PID and CID. Then we want only information that on patients that does not include a diagnosis of the flu. After, we group by the patient information and the clinic city so that we can count the number of clinics that patient went to for each city. Keep in mind, Patient A can go to Clinic 1 in the first city and be treated for the flu and even if he gets treated for the flu at a different clinic in a different city, the way you phrased the question allows that person to show up in results for the first city/clinic.... Does that makes any sense?
Then we need to join those two results. If the number of clinics in each city is equal to the number of clinics a patient visited per city, without being diagnosed with the flu, we get this:
select p_info.pid, c_ct_info.city
from (select city, count(cid) as c_count from clinic group by city) as c_ct_info
join (
select patient.pid, patient.pname, clinic.city,
count(distinct clinic.cid) as p_c_count
from patient
join patient_doctor
on patient.pid = patient_doctor.pid
join doctor
on patient_doctor.did = doctor.did
join clinic
on doctor.cid = clinic.cid
where patient_doctor.disease not in ('Flu')
group by patient.pid, patient.pname, clinic.city ) as p_info
on c_ct_info.city = p_info.city
where c_count = p_c_count;
Now. This excludes anyone who just visited one clinic and wasn't diagnosed with the flu but didn't go any of the other clinics in that city.
Just in case that sqlfiddle dies one day... here is the tables and data I used:
Create table Clinic (
CID int not null primary key,
Cname varchar(100),
Mname varchar(100),
Type tinyint,
City varchar(100)
);
insert into Clinic values (1,'Clinic 1','Mname 1', 1, 'City 1');
insert into Clinic values (2,'Clinic 2','Mname 2', 1, 'City 1');
insert into Clinic values (3,'Clinic 3','Mname 3', 1, 'City 2');
Create table Doctor (
DID int not null primary key,
Dname varchar(100),
Specialty varchar(100),
CID int,
foreign key (CID) references Clinic (CID)
);
insert into Doctor values (1, 'House', 'Internal Medicine', 1);
insert into Doctor values (2, 'Dr Who', 'General Practice', 1);
insert into Doctor values (3, 'Dr Dave', 'General Practice', 1);
insert into Doctor values (4, 'Dr Four', 'General Practice', 2);
insert into Doctor values (5, 'Dr Five', 'General Practice', 3);
insert into Doctor values (6, 'Dr Six', 'General Practice', 3);
create Table Patient (
PID int not null primary key,
PName varchar(100)
);
insert into Patient values (1, 'P. One');
insert into Patient values (2, 'P. Two');
insert into Patient values (3, 'P. Three');
insert into Patient values (4, 'P. Four');
insert into Patient values (5, 'P. Five');
Create table Patient_Doctor (
PDID int not null auto_increment primary key,
PID int not null,
DID int,
Disease varchar(100),
Severity tinyint,
foreign key (PID) references Patient (PID),
foreign key (DID) references Doctor (DID)
);
insert into Patient_Doctor values (null, 1, 1, 'Flu', 1);
insert into Patient_Doctor values (null, 1, 4, 'Flu', 1);
insert into Patient_Doctor values (null, 1, 5, 'Flu', 1);
-- shouldn't be in our results because they were diagnosed with the flu in each city
insert into Patient_Doctor values (null, 2, 2, 'Other', 1);
insert into Patient_Doctor values (null, 2, 4, 'Other', 1);
insert into Patient_Doctor values (null, 2, 5, 'Other', 1);
-- should be in our results because they attended every clinic in every city and
-- did not get diagnosed with the flu at any location
insert into Patient_Doctor values (null, 3, 1, 'Other', 1);
insert into Patient_Doctor values (null, 3, 4, 'Other', 1);
insert into Patient_Doctor values (null, 3, 6, 'Flu', 1);
-- should show up in our results for City 1 because they attended all the clinics in that
-- city and were not diagnosed with the flu. this person should NOT show up in our results
-- for city 2 because they were diagnosed with the flu at a clinic there.
insert into Patient_Doctor values (null, 4, 3, 'Other', 1);
-- should NOT show up in any results. although they weren't diagnosed with the flu, they
-- did not attend each clinic in that city.
insert into Patient_Doctor values (null, 5, 2, 'Flu', 1);
-- should NOT show up in results... meets none of the criteria
if we would add this data:
insert into Patient values (6, 'P. Six');
insert into Patient_Doctor values (null, 6, 5, 'Other', 1);
we should expect to see that person in our results because there's only one clinic in city 2 and they were not diagnosed with the flu there...
Things I changed:
Patient_Doctor: PID, DID, Disease, Severity
Patient: PID, PName
Clinic and Doctor are now joined by CID instead of CName
I highly suggest you create another table for Disease so that you are not repeating the same information over and over again. This table would include the fields Disease_ID and Disease_Description. The disease would then be linked to the patient by the Disease_ID.

Display 2 columns for each header

In SQL Server 2008 I have a table People (Id, Gender, Name).
Gender is either Male or Female. There can be many people with the same name.
I would like to write a query that displays for each gender the top 2 names
by count and their count, like this:
Male Female
Adam 23 Rose 34
Max 20 Jenny 15
I think that PIVOT might be used but all the examples I have seen display only one column for each header.

Here is an example on SQL Fiddle -- http://sqlfiddle.com/#!3/b3477/1
This uses an couple of common table expressions to separate the genders.
create table People
(
Id int,
Gender varchar(50),
Name varchar(50)
)
;
insert into People values (1, 'Male', 'Bob');
insert into People values (2, 'Male', 'Bob');
insert into People values (3, 'Male', 'Bill');
insert into People values (4, 'Male', 'Chuck');
insert into People values (5, 'Female', 'Anne');
insert into People values (6, 'Female', 'Anne');
insert into People values (7, 'Female', 'Bobbi');
insert into People values (8, 'Female', 'Jane');
with cteMale as
(
select Name as 'MaleName', Count(*) as Num, ROW_NUMBER() over(order by count(*) desc, Name) RowNum
from People
where Gender = 'Male'
group by Name
)
,
cteFemale as
(
select top 2 Name as 'FemaleName', Count(*) as Num, ROW_NUMBER() over(order by count(*) desc, Name) RowNum
from People
where Gender = 'Female'
group by Name
)
select a.MaleName, a.Num as MaleNum, b.femaleName, b.Num as FemaleNum
from cteMale a
join cteFemale b on
a.RowNum = b.RowNum
where a.RowNum <= 2

Use a windowing function. Below is a complete solution using a temporary table #people.
-- use temp db
use tempdb;
go
-- drop test table
--drop table #people;
--go
-- create test table
create table #people (my_id int, my_gender char(1), my_name varchar(25));
go
-- clear test table
delete from #people;
-- three count
insert into #people values
(23, 'M', 'Adam'),
(34, 'F', 'Rose');
go 3
-- two count
insert into #people values
(20, 'M', 'Max'),
(15, 'F', 'Jenny');
go 2
-- one count
insert into #people values
(20, 'M', 'John'),
(15, 'F', 'Julie');
go
-- grab top two by gender
;
with cte_Get_Top_Two as
(
select ROW_NUMBER() OVER(PARTITION BY my_gender ORDER BY count() DESC) AS my_window,
my_gender, my_name, count() as total
from #people
group by my_gender, my_name
)
select * from cte_Get_Top_Two where my_window in (1, 2)
go
Here is the output.
PS: You can drop my_id from the table since it does not relate to your problem but does not change solution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server ranking weirdness using FREETEXTTABLE across multiple columns - sql-server-2012

Related

How to list total number of scholarships per department in SQL

Oracle simple list results to comma separated list with quotes?

joining three tables together using Inner Joins

Relational algebra one to many, division

Display 2 columns for each header

Categories

Resources