Oracle Query - Use of Analytical functions - sql

Assume we have loaded a flat file with patient diagnosis data into a table called “Data”. The table structure is:
Create table Data (
Firstname varchar(50),
Lastname varchar(50),
Date_of_birth datetime,
Medical_record_number varchar(20),
Diagnosis_date datetime,
Diagnosis_code varchar(20))
The data in the flat file looks like this:
'jane','jones','2/2/2001','MRN-11111','3/3/2009','diabetes'
'jane','jones','2/2/2001','MRN-11111','1/3/2009','asthma'
'jane','jones','5/5/1975','MRN-88888','2/17/2009','flu'
'tom','smith','4/12/2002','MRN-22222','3/3/2009','diabetes'
'tom','smith','4/12/2002','MRN-33333','1/3/2009','asthma'
'tom','smith','4/12/2002','MRN-33333','2/7/2009','asthma'
'jack','thomas','8/10/1991','MRN-44444','3/7/2009','asthma'
You can assume that no two patients have the same firstname, lastname, and date of birth combination. However one patient might have several visits on different days. These should all have the same medical record number.
The problem is this: Tom Smith has 2 different medical record numbers. Write a query that would always show all the patients
who are like Tom Smith – patients with more than one medical record number.
I came up with below query. It works perfectly fine, but wanted to know if there is a better way to write this query using Oracle Analytical function's. Thank you in advance
SELECT a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
FROM data a, data b
WHERE a.firstname = b.firstname
AND a.lastname = b.lastname
AND a.date_of_birth = b.date_of_birth
AND a.medical_record_number <> .medical_record_number
GROUP BY a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number

It is possible to do via analytic functions, but whether it's faster than doing the join in your query* or not depends on what data you have. You'd need to test.
with data (firstname, lastname, date_of_birth, medical_record_number, diagnosis_date, diagnosis_code)
as (select 'jane','jones','2/2/2001','MRN-11111',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'jane','jones','2/2/2001','MRN-11111',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jane','jones','5/5/1975','MRN-88888',to_date('2/17/2009', 'mm/dd/yyyy'),'flu' from dual union all
select 'tom','smith','4/12/2002','MRN-22222',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('2/7/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jack','thomas','8/10/1991','MRN-44444',to_date('3/7/2009', 'mm/dd/yyyy'),'asthma' from dual),
-- end of mimicking your table and its data
res as (select firstname,
lastname,
date_of_birth,
medical_record_number,
count(distinct medical_record_number) over (partition by firstname, lastname, date_of_birth) cnt_med_rec_nums
from data)
select distinct firstname,
lastname,
date_of_birth,
medical_record_number
from res
where cnt_med_rec_nums > 1;
*btw, the group by in your example query is not necessary; it would make much more sense to switch it out for a distinct - it makes your intent much clearer, since you're wanting to get a distinct set of records.

You can probably simplify the query a bit using a HAVING clause rather than doing a self-join
SELECT a.firstname,
a.lastname,
a.date_of_birth,
MIN(a.medical_record_number) lowest_medical_record_number,
MAX(a.medical_record_number) highest_medical_record_number
FROM data a
GROUP BY a.firstname,
a.lastname,
a.date_of_birth
HAVING COUNT( DISTINCT a.medical_record_number ) > 1
I'm returning the smallest and largest medical record number for each patient here (that's what I'd do if most of the patients with this problem have just two numbers rather than having dozens). You could return just one or you could return a comma-separated list of all the medical record numbers if you'd rather (which would probably make more sense if most of the bad folks have dozens of numbers).

Related

Find string from table in cell in BiqQuery --> Query exceeded resource limits

I have two tables in BigQuery:
City List: Table: invertible-fin-XXX238.Reports.City
StationionNames: invertible-fin-XXX238.Reports.Station
Most of the StationNames containing City Names. Now I want to extract the city from the Station Table.
Here some example data:
City: Berlin
Stationname: inStore_Berlin_Alexanderplatz
Stationname: Berlin Schönefeld Airport
Stationname: Train Station Franchise Berlin
I tried the INSTR Function, but had no success (the INSTR works only with Legacy SQL and there I couldn’t use SUBSELECTS).
SELECT City,
INSTR((SELECT AdGroupName
FROM [invertible-fin-XXX238.Reports.City]),City) AS Match
FROM [invertible-fin-XXX238.Reports.Station]
Therefore I tried it with WHERE LIKE. Below the SQL Code:
SELECT a.City
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
WHERE b. Name LIKE '%' + a.City + '%'
GROUP BY a.City
But now the Query is too computationally intensive and I got the Error Code “Query exceeded resource limits for tier 1. Tier 18 or higher required.” back.
Could some please help me, writing a more resource friendly query.
Thanks in advance,
Philipp
Below are few of many possible versions for BiigQuery Standard SQL
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON REPLACE(LOWER(station), LOWER(city), '') <> LOWER(station)
or
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(station) LIKE CONCAT('%',LOWER(city),'%')
You can remove LOWER() function if names of City are spelled in same case in both tables
While above versions look more straightforward - i would prefer below one as it allows control way you extract city from station -r'([^ _]+)' - you should all characters that you observe being delimiters in column station. So in this case you will extract only city when it is not part of longer name
Of course you should validate if you even need to worry of this
#standardSQL
WITH tokens AS (
SELECT token, station
FROM `invertible-fin-XXX238.Reports.Station` AS s,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(station), r'([^ _]+)')) token
)
SELECT city, station
FROM tokens AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(city) = token
I also wonder how the performance for a sub-query would be in this case. For instance:
WITH City AS(
SELECT 'Berlin' As Name UNION ALL
SELECT 'Hamburg'
),
StationNames AS(
SELECT 'inStore_Berlin_Alexanderplatz' AS Name UNION ALL
SELECT 'Berlin Schönefeld Airport' UNION ALL
SELECT 'Train Station Franchise Berlin' UNION ALL
SELECT 'Train Station Hamburg' UNION ALL
SELECT 'Train Station Pluton'
)
SELECT
Name StationName,
(SELECT Name FROM City c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM StationNames s
Or in your case:
SELECT
Name StationName,
(SELECT Name FROM `invertible-fin-XXX238.Reports.City` c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM `invertible-fin-XXX238.Reports.Station` s
I know it's common sense for most databases that JOIN has better performance than sub-queries but BigQuery have lots of different optimization techniques for storing and querying data, I was curious to know how different the performance would be in this case.

Need to arrange employee names as per their city column wise

I have written a query which extracts the data from different columns group by city name.
My query is as follows:
select q.first_name
from (select employee_id as eid,first_name,city
from employees
group by city,first_name,employee_id
order by first_name)q
, employees e
where e.employee_id = q.eid;
The output of the query is employee names in a single column grouped by their cities.
Now I would like to enhance the above query to classify the employees by their city names in different columns.
I tried using pivot to make this work. Here is my pivot query:
select * from (
select q.first_name
from (select employee_id as eid,first_name,city
from employees
group by city,first_name,employee_id
order by first_name)q
, employees e
where e.employee_id = q.eid
) pivot
(for city in (select city from employees))
I get some syntax issue saying missing expression and I am not sure how to use pivot to achieve the below expected output.
Expected Output:
DFW CH NY
---- --- ---
TripeH John Hitman
Batista Cena Yokozuna
Rock James Mysterio
Appreciate if anyone can guide me in the right direction.
Unfortunately what you are trying to do is not possible, at least not in "straight" SQL - you would need dynamic SQL, or a two-step process (in the first step generating a string that is a new SQL statement). Complicated.
The problem is that you are not including a fixed list of city names (as string literals). You are trying to create columns based on whatever you get from (select city from employees). Thus the number of columns and the name of the columns is not known until the Oracle engine reads the data from the table, but before the engine starts it must already know what all the columns will be. Contradiction.
Note also that if this was possible, you almost surely would want (select distinct city from employees).
ADDED: The OP asks a follow-up question in a comment (see below).
The ideal arrangement is for the cities to be in their own, smaller table, and the "city" in the employees table to have a foreign key constraint so that the "city" thing is manageable. You don't want one HR clerk to enter New York, another to enter New York City and a third to enter NYC for the same city. One way or the other, first try your code by replacing the subquery that follows the operator IN in the pivot clause with simply the comma-separated list of string literals for the cities: ... IN ('DFW', 'CH', 'NY'). Note that the order in which you put them in this list will be the order of the columns in the output. I didn't check the entire query to see if there are any other issues; try this and let us know what happens.
Good luck!
select
(CASE WHEN CITY="DFW" THEN EMPLOYEE_NAME END) DFW,
(CASE WHEN CITY="CH" THEN EMPLOYEE_NAME END) CH,
(CASE WHEN CITY="NY" THEN EMPLOYEE_NAME END) NY
FROM employees
order by first_name
Maybe you need to transpose your result. See this link . I think DECODE or CASE works best for your case:
select
(CASE WHEN CITY="DFW" THEN EMPLOYEE_NAME END) DFW,
(CASE WHEN CITY="CH" THEN EMPLOYEE_NAME END) CH,
(CASE WHEN CITY="NY" THEN EMPLOYEE_NAME END) NY
FROM employees
order by first_name
Normally I would "edit" my first answer, but the question has changed so much, it's quite different from the original one so my older answer can't be "edited" - this now needs a completely new answer.
You can do what you want with pivoting, as I show below. Wondering why you want to do this in basic SQL and not by using reporting tools, which are written specifically for reporting needs. There's no way you need to keep your data in the pivoted format in the database.
You will see 'York' twice in the Chicago column; you will recognize that's on purpose (you will see I had a duplicate row in the "test" table at the top of my code); this is to demonstrate a possible defect of your arrangement.
Before you ask if you could get the list but without the row numbers - first, if you are simply generating a set of rows, those are not ordered. If you want things ordered for reporting purposes, you can do what I did, and then select "'DFW'", "'CHI'", "'NY'" from the query I wrote. Relational theory and the SQL standard do not guarantee the row order will be preserved, but Oracle apparently does preserve it, at least in current versions; you can use that solution at your own risk.
max(name) in the pivot clause may look odd to the uninitiated; one of the weird limitations of the PIVOT operator in Oracle is that it requires an aggregate function to be used, even if it's over a set of exactly one element.
Here's the code:
with t (city, name) as -- setting up input data for testing
(
select 'DFW', 'Smith' from dual union all
select 'CHI', 'York' from dual union all
select 'DFW', 'Matsumoto' from dual union all
select 'NY', 'Abu Osman' from dual union all
select 'DFW', 'Adams' from dual union all
select 'CHI', 'Wilson' from dual union all
select 'CHI', 'Arenas' from dual union all
select 'NY', 'Theodore' from dual union all
select 'CHI', 'McGhee' from dual union all
select 'NY', 'Zhou' from dual union all
select 'NY' , 'Simpson' from dual union all
select 'CHI', 'Narayanan' from dual union all
select 'CHI', 'York' from dual union all
select 'NY', 'Perez' from dual
)
select * from
(
select row_number() over (partition by city order by name) rn,
city, name
from t
)
pivot (max(name) for city in ('DFW', 'CHI', 'NY') )
order by rn
/
And the output:
RN 'DFW' 'CHI' 'NY'
---------- --------- --------- ---------
1 Adams Arenas Abu Osman
2 Matsumoto McGhee Perez
3 Smith Narayanan Simpson
4 Wilson Theodore
5 York Zhou
6 York
6 rows selected.

SQL: multiple counts from same table

I am having a real problem trying to get a query with the data I need. I have tried a few methods without success. I can get the data with 4 separate queries, just can't get hem into 1 query. All data comes from 1 table. I will list as much info as I can.
My data looks like this. I have a customerID and 3 columns that record who has worked on the record for that customer as well as the assigned acct manager
RecID_Customer___CreatedBy____LastUser____AcctMan
1-------1374----------Bob Jones--------Mary Willis------Bob Jones
2-------1375----------Mary Willis------Bob Jones--------Bob Jones
3-------1376----------Jay Scott--------Mary Willis-------Mary Willis
4-------1377----------Jay Scott--------Mary Willis------Jay Scott
5-------1378----------Bob Jones--------Jay Scott--------Jay Scott
I want the query to return the following data. See below for a description of how each is obtained.
Employee___Created__Modified__Mod Own__Created Own
Bob Jones--------2-----------1---------------1----------------1
Mary Willis------1-----------2---------------1----------------0
Jay Scott--------2-----------1---------------1----------------1
Created = Counts the number of records created by each Employee
Modified = Number of records where the Employee is listed as Last User
(except where they created the record)
Mod Own = Number of records for each where the LastUser = Acctman
(account manager)
Created Own = Number of Records created by the employee where they are
the account manager for that customer
I can get each of these from a query, just need to somehow combine them:
Select CreatedBy, COUNT(CreatedBy) as Created
FROM [dbo].[Cust_REc] GROUP By CreatedBy
Select LastUser, COUNT(LastUser) as Modified
FROM [dbo].[Cust_REc] Where LastUser != CreatedBy GROUP By LastUser
Select AcctMan, COUNT(AcctMan) as CreatePort
FROM [dbo].[Cust_REc] Where AcctMan = CreatedBy GROUP By AcctMan
Select AcctMan, COUNT(AcctMan) as ModPort
FROM [dbo].[Cust_REc] Where AcctMan = LastUser AND NOT AcctMan = CreatedBy GROUP By AcctMan
Can someone see a way to do this? I may have to join the table to itself, but my attempts have not given me the correct data.
The following will give you the results you're looking for.
select
e.employee,
create_count=(select count(*) from customers c where c.createdby=e.employee),
mod_count=(select count(*) from customers c where c.lastmodifiedby=e.employee),
create_own_count=(select count(*) from customers c where c.createdby=e.employee and c.acctman=e.employee),
mod_own_count=(select count(*) from customers c where c.lastmodifiedby=e.employee and c.acctman=e.employee)
from (
select employee=createdby from customers
union
select employee=lastmodifiedby from customers
union
select employee=acctman from customers
) e
Note: there are other approaches that are more efficient than this but potentially far more complex as well. Specifically, I would bet there is a master Employee table somewhere that would prevent you from having to do the inline view just to get the list of names.
this seems pretty straight forward. Try this:
select a.employee,b.created,c.modified ....
from (select distinct created_by from data) as a
inner join
(select created_by,count(*) as created from data group by created_by) as b
on a.employee = b.created_by)
inner join ....
This highly inefficient query may be a rough start to what you are looking for. Once you validate the data then there are things you can do to tidy it up and make it more efficient.
Also, I don't think you need the DISTINCT on the UNION part because the UNION will return DISTINCT values unless UNION ALL is specified.
SELECT
Employees.EmployeeID,
Created =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.CreatedBy=Employees.EmployeeID),
Mopdified =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.LastUser=Employees.EmployeeID AND Cust_REc.CreateBy<>Employees.EmployeeID),
ModOwn =
CASE WHEN NOT Empoyees.IsManager THEN NULL ELSE
(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID)
END,
CreatedOwn=(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID AND CReatedBy=Employees.EMployeeID)
FROM
(
SELECT
EmployeeID,
IsManager=CASE WHEN EXISTS(SELECT AcctMan FROM CustRec WHERE AcctMan=EmployeeID)
FROM
(
SELECT DISTINCT
EmployeeID
FROM
(
SELECT EmployeeID=CreatedBy FROM Cust_Rec
UNION
SELECT EmployeeID=LastUser FROM Cust_Rec
UNION
SELECT EmployeeID=AcctMan FROM Cust_Rec
)AS Z
)AS Y
)
AS Employees
I had the same issue with the Modified column. All the other columns worked okay. DCR example would work well with the join on an employees table if you have it.
SELECT CreatedBy AS [Employee],
COUNT(CreatedBy) AS [Created],
--Couldn't get modified to pull the right results
SUM(CASE WHEN LastUser = AcctMan THEN 1 ELSE 0 END) [Mod Own],
SUM(CASE WHEN CreatedBy = AcctMan THEN 1 ELSE 0 END) [Created Own]
FROM Cust_Rec
GROUP BY CreatedBy

How to get records from both tables using ms access query

I have 2 Tables in Ms Access
tbl_Master_Employess
tbl_Emp_Salary
I want to show all the employees in the employee table linked with employee salary table
to link both table the id is coluqEmpID in both table
In the second table, I have a date column. I need a query which should fetch records from both tables using a particular date
I tried the following query:
select coluqEID as EmployeeID , colEName as EmployeeName,"" as Type, "" as Amt
from tbl_Master_Employee
union Select b.coluqEID as EmployeeID, b.colEName as EmployeeName, colType as Type, colAmount as Amt
from tbl_Emp_Salary a, tbl_Master_Employee b
where a.coluqEID = b.coluqEID and a.colDate = #12/09/2013#
However, it shows duplicates.
Query4
EmployeeID EmployeeName Type Amt
1 LAKSHMANAN
1 LAKSHMANAN Advance 100
2 PONRAJ
2 PONRAJ Advance 200
3 VIJAYAN
4 THIRUPATHI
5 VIJAYAKUMAR
6 GOVINDAN
7 TAMILMANI
8 SELVAM
9 ANAMALAI
10 KUMARAN
How would I rewrite my query to avoid duplicates, or what would be a different way to not show duplicates?
The problem with your query is that you are using union when what you want is a join. The union is first going to list all employees with the first part:
select coluqEID as EmployeeID , colEName as EmployeeName,"" as Type, "" as Amt
from tbl_Master_Employee
and then adds to that list all employee records where they have a salary with a certain date.
Select b.coluqEID as EmployeeID, b.colEName as EmployeeName, colType as Type,
colAmount as Amt
from tbl_Emp_Salary a, tbl_Master_Employee b
where a.coluqEID = b.coluqEID and a.colDate = #12/09/2013#
Is your goal to get a list of all employees and only display salary information for those who have a certain date? Some sample data would be useful. Assuming the data here: SQL Fiddle this query should create what you want.
Select a.coluqEID as EmployeeID, colEName as EmployeeName,
b.colType as Type, b.colAmount as Amt
FROM tbl_Master_Employees as a
LEFT JOIN (select coluqEID, colType, colAmount FROM tbl_EMP_Salary
where colDate = '20130912') as b ON a.coluqEID = b.coluqEID;
The first step is to create a select that will get you just the salaries that you want by date. You can then perform a join on this as if you were performing a separate query. You use a LEFT JOIN because you want all of the records from one side, the employees, and only the records that match your criteria from the second side, your salaries.
I believe you will need a join, however as to your question on Unique names.
select **DISTINCT** coluqEID as EmployeeID
Adding the distinct operator would give only uniquely returned results.

SQL help: select the last 3 comments for EACH student?

I have two tables to store student data for a grade-school classroom:
Behavior_Log has the columns student_id, comments, date
Student_Roster has the columns student_id, firstname, lastname
The database is used to store daily comments about student behavior, and sometimes the teacher makes multiple comments about a student in a given day.
Now let's say the teacher wants to be able to pull up a list of the last 3 comments made for EACH student, like this:
Jessica 7/1/09 talking
Jessica 7/1/09 passing notes
Jessica 5/3/09 absent
Ciboney 7/2/09 great participation
Ciboney 4/30/09 absent
Ciboney 2/22/09 great participation
...and so on for the whole class
The single SQL query must return a set of comments for each student to eliminate the human-time-intensive need for the teacher to run separate queries for each student in the class.
I know that this sounds similar to
SQL Statement Help - Select latest Order for each Customer but I need to display the last 3 entries for each person, I can't figure out how to get from here to there.
Thanks for your suggestions!
A slightly modified solution from this article in my blog:
Analytic functions: SUM, AVG, ROW_NUMBER
SELECT student_id, date, comment
FROM (
SELECT student_id, date, comment, (#r := #r + 1) AS rn
FROM (
SELECT #_student_id:= -1
) vars,
(
SELECT *
FROM
behavior_log a
ORDER BY
student_id, date DESC
) ao
WHERE CASE WHEN #_student_id <> student_id THEN #r := 0 ELSE 0 END IS NOT NULL
AND (#_student_id := student_id) IS NOT NULL
) sc
JOIN Student_Roster sr
ON sr.student_id = sc.student_id
WHERE rn <= 3
A different approach would be to use the group_concat function and a single sub select and a limit on that subselect.
select (
select group_concat( concat( student, ', ', date,', ', comment ) separator '\n' )
from Behavior_Log
where student_id = s.student_id
group by student_id
limit 3 )
from Student_Roster s