Related
I have the following tables:
Students(id, name, surname)
Courses(course id)
Course_Signup(id, student_id, course_id, year)
Grades(signup_id, mark)
I want to display all the students(id, name, surname) with their final grade (where final grade = avg of the grades of all courses), but only for the students that have passed all the courses for which they have sign-up in the current year.
This is what I tried:
SELECT s."id", s."name", s."surname", AVG(g."mark") AS "finalGrade"
FROM "STUDENT" s,
"course sign-up" csn
join "GRADES" g
on csn."id" = g."signup_id"
WHERE csn."year" >= '01-01-2022'
HAVING "finalGrade" >= 5.00
GROUP BY s."id"
However, after adding the last 2 lines, regarding the finalGrade condition, I get an invalid identifier error. Why is that?
Uh, oh. Did you really create tables using lower letter case names enclosed into double quotes? If so, get rid of them (the sooner, the better) because they only cause problems.
Apart from that, uniformly use joins - in your from clause there's the student table which isn't joined to any other table and results in cross join.
Don't compare dates to strings; use date literal (as I did), or to_date function with appropriate format model.
As of error you got: you can't reference expression's alias ("finalGrade") as is in the having clause - use the whole expression.
Also, group by should contain all non-aggregated columns from the select column list.
This "fixes" error you got, but - I suggest you consider everything I said:
SELECT s."id", s."name", s."surname", AVG(g."mark") AS "finalGrade"
FROM "STUDENT" s,
"course sign-up" csn
join "GRADES" g
on csn."id" = g."signup_id"
WHERE csn."year" >= date '2022-01-01'
GROUP BY s."id", s."name", s."surname"
HAVING AVG(g."mark") >= 5.00
I am trying to create a new column "PolicyStatus" with a case statement. The the value of the case statement is dependent upon counting the number of occurrences of two variables and both of those variables have to occur a specific number of times.
Here is an example:
Grouping by OpportuntiyID, a policy is "Within Policy" if there are at least 3 QuoteID AND There are at least 3 Within7 that have a value of "Yes". Below is a sample data and my code thus far. Something is not going right with my case statement but I am not sure how to correct it.
The error that I am getting is:
Msg 102, Level 15, State 1, Line 35
Incorrect syntax near ')'.
This is referring to the case statement directly.
I thank you all in advanced for your assistance in correcting this.
OpportunityID,QUOTEID,DaysPassed,Within7
0060c00001QF5jiAAD,a060c00001REnnuAAD,-81,No
0060c00001QF5jiAAD,a060c00001REnpqAAD,-81,No
0060c00001QF5jiAAD,a060c00001REnsBAAT,-81,No
0060c00001QGz6JAAT,NULL,NULL,No
0060c00001QHxZhAAL,a060c00001cVlyzAAC,-32,No
0060c00001QHxZhAAL,a060c00001cVlzEAAS,-32,No
0060c00001QHxZhAAL,a060c00001cVm02AAC,-32,No
0060c00001QHxZhAAL,a060c00001cVm0bAAC,-32,No
0060c00001QHxZhAAL,a060c00001cUhzeAAC,0,Yes
0060c00001QIaK3AAL,a060c00001cV9YeAAK,4,Yes
0060c00001QIaK3AAL,a060c00001cV9YZAA0,4,Yes
0060c00001QIFfZAAX,a060c00001REtIEAA1,43,Yes
0060c00001QIk2UAAT,a060c00001cUYm9AAG,18,Yes
0060c00001QIk2UAAT,a060c00001cUYmEAAW,18,Yes
0060c00001QIk2UAAT,a060c00001cUYmTAAW,18,Yes
0060c00001QIwgaAAD,a060c00001cVMekAAG,0,Yes
0060c00001QIWoPAAX,a060c00001cW8eAAAS,-35,No
0060c00001QIYKbAAP,a060c00001cUYHkAAO,-65,No
0060c00001QIYKbAAP,a060c00001cUYKtAAO,-65,No
0060c00001QIYKbAAP,a060c00001RDzNYAA1,6,Yes
0060c00001QL7bkAAD,a060c00001cUQCmAAO,21,Yes
0060c00001QL7bkAAD,a060c00001cUQCXAA4,21,Yes
0060c00001QL7bkAAD,a060c00001cUQDkAAO,21,Yes
0060c00001QLWg6AAH,NULL,NULL,No
0060c00001QxJSgAAN,NULL,NULL,No
0060c00001QxOe4AAF,NULL,NULL,No
0060c00001Rae5dAAB,a060c00001cUvOLAA0,3,Yes
0060c00001Rb5RoAAJ,a060c00001cUWgEAAW,8,Yes
0060c00001Rb5RoAAJ,a060c00001cUWgJAAW,8,Yes
0060c00001Rb8wuAAB,a060c00001cUvEaAAK,-1,Yes
0060c00001Rb8wuAAB,a060c00001cUvEGAA0,-1,Yes
SELECT
[OPPORTUNITYID]
, COUNT([QUOTEID]) AS 'CountOfQuotes'
, CASE WHEN COUNT([QUOTEID]) >= 3 AND COUNT([WITHIN7] = 'YES') >=3 THEN 'Within Policy'
ELSE 'Breached Policy' END AS 'PolicyStatus'
FROM [DB].dbo.[TABLE]
GROUP BY [OPPORTUNITYID]
ORDER BY [OpportunityID]
GO
You need to do conditional aggregation on the WITHIN7 column, counting how many times 'YES' appears. One option is to count a CASE expression.
SELECT
[OPPORTUNITYID],
COUNT([QUOTEID]) AS [CountOfQuotes],
CASE WHEN COUNT([QUOTEID]) >= 3 AND
COUNT(CASE WHEN [WITHIN7] = 'YES' THEN 1 END) >= 3
THEN 'Within Policy'
ELSE 'Breached Policy' END AS [PolicyStatus]
FROM [DB].dbo.[TABLE]
GROUP BY [OPPORTUNITYID]
ORDER BY [OpportunityID];
I have select:
select v.accs, v.currency,v.amount,v.drcr_ind, count(*) qua,wm_concat(ids) npx_IDS,
wm_concat(px_dtct) npx_DTCT
from table v
group by accs, currency, amount, drcr_ind
but i get error ORA-06502: PL/SQL: : character string buffer too small if i'll remove one string, because sometimes (when v.accs= 3570) count(*) = 215
but when i try to skip using wm_concat for v.accs= 3570 for example this way:
select v.accs, v.currency,v.amount,v.drcr_ind, count(*) qua,wm_concat(ids) npx_IDS,
(case when v.accs = 3570 then wm_concat(px_dtct) else 'too many' end) npx_DTCT
from table v
group by accs, currency, amount, drcr_ind
i still have the same error message. But why?
You concatenate results from a query. This query can result in a lot of rows so eventually you will run out of string length. Maybe concatenation is not the way to go here. Depends on what you want to achieve of course.
Why? Because you still use wm_concat for accs=3570... swap the THEN and ELSE part of your CASE expression
select v.accs, v.currency,v.amount,v.drcr_ind, count(*) qua,wm_concat(ids) npx_IDS,
(case when v.accs = 3570 then 'too many' else wm_concat(px_dtct) end) npx_DTCT
from table v group by accs, currency, amount, drcr_ind
First, as it has already been told, you have to switch then and else clauses in your query.
Then, I guess you should also similarily process your second wm_concat, the one that works with ids.
select v.accs, v.currency,v.amount,v.drcr_ind, count(*) qua,
(case when v.accs = 3570 then 'too many' else wm_concat(ids) end) npx_IDS,
(case when v.accs = 3570 then 'too many' else wm_concat(px_dtct) end) npx_DTCT
from table v
group by accs, currency, amount, drcr_ind
And, finally, why do you think that only v.accs = 3570 is able to bring 06502 error in front of you? I suppose you should handle all of them.
I would like to know if there is a way to match people between two separate systems, using (mostly) SQL.
We have two separate Oracle databases where people are stored. There is no link between the two (i.e. cannot join on person_id); this is intentional. I would like to create a query that checks to see if a given group of people from system A exists in system B.
I am able to create tables if that makes it easier. I can also run queries and do some data manipulation in Excel when creating my final report. I am not very familiar with PL/SQL.
In system A, we have information about people (name, DOB, soc, sex, etc.). In system B we have the same types of information about people. There could be data entry errors (person enters an incorrect spelling), but I am not going to worry about this too much, other than maybe just comparing the first 4 letters. This question deals with that problem more specifically.
They way I thought about doing this is through correlated subqueries. So, roughly,
select a.lastname, a.firstname, a.soc, a.dob, a.gender
case
when exists (select 1 from b where b.lastname = a.lastname) then 'Y' else 'N'
end last_name,
case
when exists (select 1 from b where b.firstname = a.firstname) then 'Y' else 'N'
end first_name,
case [etc.]
from a
This gives me what I want, I think...I can export the results to Excel and then find records that have 3 or more matches. I believe that this shows that a given field from A was found in B. However, I ran this query with just three of these fields and it took over 3 hours to run (I'm looking in 2 years of data). I would like to be able to match on up to 5 criteria (lastname, firstname, gender, date of birth, soc). Additionally, while soc number is the best choice for matching, it is also the piece of data that tends to be missing the most often. What is the best way to do this? Thanks.
You definitely want to weigh the different matches. If an SSN matches, that's a pretty good indication. If a firstName matches, that's basically worthless.
You could try a scoring method based on weights for the matches, combined with the phonetic string matching algorithms you linked to. Here's an example I whipped up in T-SQL. It would have to be ported to Oracle for your issue.
--Score Threshold to be returned
DECLARE #Threshold DECIMAL(5,5) = 0.60
--Weights to apply to each column match (0.00 - 1.00)
DECLARE #Weight_FirstName DECIMAL(5,5) = 0.10
DECLARE #Weight_LastName DECIMAL(5,5) = 0.40
DECLARE #Weight_SSN DECIMAL(5,5) = 0.40
DECLARE #Weight_Gender DECIMAL(5,5) = 0.10
DECLARE #NewStuff TABLE (ID INT IDENTITY PRIMARY KEY, FirstName VARCHAR(MAX), LastName VARCHAR(MAX), SSN VARCHAR(11), Gender VARCHAR(1))
INSERT INTO #NewStuff
( FirstName, LastName, SSN, Gender )
VALUES
( 'Ben','Sanders','234-62-3442','M' )
DECLARE #OldStuff TABLE (ID INT IDENTITY PRIMARY KEY, FirstName VARCHAR(MAX), LastName VARCHAR(MAX), SSN VARCHAR(11), Gender VARCHAR(1))
INSERT INTO #OldStuff
( FirstName, LastName, SSN, Gender )
VALUES
( 'Ben','Stickler','234-62-3442','M' ), --3/4 Match
( 'Albert','Sanders','523-42-3441','M' ), --2/4 Match
( 'Benne','Sanders','234-53-2334','F' ), --2/4 Match
( 'Ben','Sanders','234623442','M' ), --SSN has no dashes
( 'Ben','Sanders','234-62-3442','M' ) --perfect match
SELECT
'NewID' = ns.ID,
'OldID' = os.ID,
'Weighted Score' =
(CASE WHEN ns.FirstName = os.FirstName THEN #Weight_FirstName ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN #Weight_LastName ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN #Weight_SSN ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN #Weight_Gender ELSE 0 END)
,
'RAW Score' = CAST(
((CASE WHEN ns.FirstName = os.FirstName THEN 1 ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN 1 ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN 1 ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN 1 ELSE 0 END) ) AS varchar(MAX))
+
' / 4',
os.FirstName ,
os.LastName ,
os.SSN ,
os.Gender
FROM #NewStuff ns
--make sure that at least one item matches exactly
INNER JOIN #OldStuff os ON
os.FirstName = ns.FirstName OR
os.LastName = ns.LastName OR
os.SSN = ns.SSN OR
os.Gender = ns.Gender
where
(CASE WHEN ns.FirstName = os.FirstName THEN #Weight_FirstName ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN #Weight_LastName ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN #Weight_SSN ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN #Weight_Gender ELSE 0 END)
>= #Threshold
ORDER BY ns.ID, 'Weighted Score' DESC
And then, here's the output.
NewID OldID Weighted Raw First Last SSN Gender
1 5 1.00000 4 / 4 Ben Sanders 234-62-3442 M
1 1 0.60000 3 / 4 Ben Stickler 234-62-3442 M
1 4 0.60000 3 / 4 Ben Sanders 234623442 M
Then, you would have to do some post processing to evaluate the validity of each possible match. If you ever get a 1.00 for weighted score, you can assume that it's the right match, unless you get two of them. If you get a last name and SSN (a combined weight of 0.8 in my example), you can be reasonably certain that it's correct.
Example of HLGEM's JOIN suggestion:
SELECT a.lastname,
a.firstname,
a.soc,
a.dob,
a.gender
FROM TABLE a
JOIN TABLE b ON SOUNDEX(b.lastname) = SOUNDEX(a.lastname)
AND SOUNDEX(b.firstname) = SOUNDEX(a.firstname)
AND b.soc = a.soc
AND b.dob = a.dob
AND b.gender = a.gender
Reference: SOUNDEX
I would probably use joins instead of correlated subqueries but you will have to join on all the fields, so not sure how much that might improve things. But since correlated subqueries often have to evaluate row-by-row and joins don't it could improve things a good bit if you have good indexing. But as with all performance tuning only trying the techinque will let you knw ofor sure.
I did a similar task looking for duplicates in our SQL Server system and I broke it out into steps. So first I found everyone where the names and city/state were an exact match. Then I looked for additional possible matches (phone number, ssn, inexact name match etc. AS I found a possible match between two profiles, I added it to a staging table with a code for what type of match found it. Then I assigned a confidence amount to each type of match and added up the confidence for each potential match. So if the SOC matches, you might want a high confidence, same if the name is eact and the gender is exact and the dob is exact. Less so if the last name is exact and the first name is not exact, etc. By adding a confidence, I was much better able to see which possible mathes were more likely to be the same person. SQl Server also has a soundex function which can help with names that are slightly different. I'll bet Oracle has something similar.
After I did this, I learned how to do fuzzy grouping in SSIS and was able to generate more matches with a higher confidence level. I don't know if Oracle's ETL tools havea a way to do fuzzy logic, but if they do it can really help with this type of task. If you happen to also have SQL Server, SSIS can be run connecting to Oracle, so you could use fuzzy grouping yourself. It can take a long time to run though.
I will warn you that name, dob and gender are not likely to ensure they are the same person especially for common names.
Are there indexes on all of the columns in table b in the WHERE clause? If not, that will force a full scan of the table for each row in table a.
You can use soundex but you can also use utl_match for fuzzy comparing of string, utl_match makes it possible to define a treshold: http://www.psoug.org/reference/utl_match.html
This issue is a continuation of my earlier query. It's still not working.
It's about an ORDER BY clause. I am trying to sort using a variable called "sortby".
Here, now the ORDER BY clause is selected as a separate column using the DECODE() function (as suggested in the answer by #devio in the original version of this question).
Let’s say sortby = ‘memberCount’ in this case, I passed it as first argument in decode(); memberCount is a COLUMN in the grptest table.
select distinct gl.group_id,
decode('memberCount', 'name', gl.group_name_key,
'description', gl.group_description_key,
'memberCount', gl.member_count)
as p_sortby,
gl.group_name,
gl.group_description,
gl.status_code,
gl.member_count,
(select grpp.group_name
from grptest_relationship grel join grptest grpp
on grel.parent_group_id = grpp.group_id
where grel.child_group_id = gl.group_id) as parent_group_name,
gl.group_name_key,
gl.group_description_key
from grptest gl
where gl.group_org_id = '3909'
and (gl.group_name_key like '%' || 'GROUP' || '%')
order by 2;
It doesn’t work.
But if I pass ‘name’ as first argument in decode above, it works.
That’s my original issue about why it doesn’t apply on memberCount.
I commented:
What is the error you get? Or what is the erroneous behaviour you get? In my adaptation of your question to my database, I had to ensure that the numeric column was converted to a character type before the DECODE() was acceptable - the other two columns were character columns. With that done, and with the minor issue that sorting numbers alphabetically places '8' after '79' and before '80', I got an appropriate result.
Rohit asked:
Thanks for the inputs. I guess I am confused about the minor issue you mentioned that "that sorting numbers alphabetically places '8' after '79' and before '80'". I couldn't get what is the thing here? Also, could you please help in my query of how "to ensure that the numeric column was converted to a character type before the DECODE() was acceptable". Can you please modify my query above in this respect?
The table I used is for the 'table of elements':
-- Tables for storing information about chemical elements and chemical compounds
-- See: http://www.webelements.com/ for elements.
-- See: http://ie.lbl.gov/education/isotopes.htm for isotopes.
CREATE TABLE elements
(
atomic_number INTEGER NOT NULL UNIQUE
CHECK (atomic_number > 0 AND atomic_number < 120),
symbol CHAR(3) NOT NULL UNIQUE,
name CHAR(20) NOT NULL UNIQUE,
atomic_weight DECIMAL(8,4) NOT NULL,
stable CHAR(1) DEFAULT 'Y' NOT NULL
CHECK (stable IN ('Y', 'N'))
);
It's an interesting table because it has three genuine candidate keys (atomic number, name and symbol are each unique), and depending on context (isotopes vs chemicals), you are better off using atomic number or symbol as the joining key.
The queries I used were:
select decode('atomic_number',
'name', name,
'symbol', symbol,
'atomic_number', atomic_number||''),
name, symbol, atomic_number
from elements
order by 1;
select decode('name',
'name', name,
'symbol', symbol,
'atomic_number', atomic_number||''),
name, symbol, atomic_number
from elements
order by 1;
select decode('symbol',
'name', name,
'symbol', symbol,
'atomic_number', atomic_number||''),
name, symbol, atomic_number
from elements
order by 1;
These demonstrated the three orderings - by symbol, by name, and by atomic number.
Part of the result set for the atomic number ordering was:
77 Iridium Ir 77
78 Platinum Pt 78
79 Gold Au 79
8 Oxygen O 8
80 Mercury Hg 80
81 Thallium Tl 81
Because the atomic number was coerced into a string, the sort was in string order, and when regarded as a string, '8' appears after '79' and before '80', as shown. One way of avoiding that problem would be:
select decode('atomic_number',
'name', name,
'symbol', symbol,
'atomic_number', lpad(atomic_number, 3)),
name, symbol, atomic_number
from elements
order by 1;
Producing the following (which, though it isn't obvious, has an extra blank at the start of the first column):
77 Iridium Ir 77
78 Platinum Pt 78
79 Gold Au 79
80 Mercury Hg 80
81 Thallium Tl 81
82 Lead Pb 82
This uses the knowledge that space precedes any digit in the (ASCII, Latin-1, Unicode) sort sequence, and that atomic numbers are not more than 3 digits. Alternatively, I could have used 'LPAD(atomic_number, 3, '0')' to zero-pad the data. I tested with IBM Informix Dynamic Server (IDS) 11.50.FC3W2 on Solaris 10. IDS is very tolerant of type mismatches and automatically converts the atomic_number argument to LPAD into a string. Other DBMS may not be so tolerant; you'd have to explicitly cast the value.
Going back to the question...
Assuming memberCount is a numeric column and the values are not more than 4 digits long (adjust appropriately if they are longer), the query can be written:
select distinct gl.group_id,
decode('memberCount', 'name', gl.group_name_key,
'description', gl.group_description_key,
'memberCount', LPAD(gl.member_count, 4))
as p_sortby,
gl.group_name,
gl.group_description,
gl.status_code,
gl.member_count,
(select grpp.group_name
from grptest_relationship grel join grptest grpp
on grel.parent_group_id = grpp.group_id
where grel.child_group_id = gl.group_id) as parent_group_name,
gl.group_name_key,
gl.group_description_key
from grptest gl
where gl.group_org_id = '3909'
and (gl.group_name_key like '%' || 'GROUP' || '%')
order by 2;
Or you might need:
LPAD(CAST(memberCount AS CHAR(4)), 4)
or some other slightly DBMS-specific incantation that achieves the same general effect.
Since you didn't provide a schema (much less sample data) for the query, I don't have your table in my database, so I can't demo your query working
I know that
order by p_sortby
wont work, but could you try
order by decode('memberCount',
'name', gl.group_name_key,
'description', gl.group_description_key,
'memberCount', gl.member_count)
EDIT:
I remembered another way:
select * from (
select column1, decode(....) as column2, .... from table1
) t1
order by 2
and this way might even be a faster one