Need to arrange employee names as per their city column wise - sql

I have written a query which extracts the data from different columns group by city name.
My query is as follows:
select q.first_name
from (select employee_id as eid,first_name,city
from employees
group by city,first_name,employee_id
order by first_name)q
, employees e
where e.employee_id = q.eid;
The output of the query is employee names in a single column grouped by their cities.
Now I would like to enhance the above query to classify the employees by their city names in different columns.
I tried using pivot to make this work. Here is my pivot query:
select * from (
select q.first_name
from (select employee_id as eid,first_name,city
from employees
group by city,first_name,employee_id
order by first_name)q
, employees e
where e.employee_id = q.eid
) pivot
(for city in (select city from employees))
I get some syntax issue saying missing expression and I am not sure how to use pivot to achieve the below expected output.
Expected Output:
DFW CH NY
---- --- ---
TripeH John Hitman
Batista Cena Yokozuna
Rock James Mysterio
Appreciate if anyone can guide me in the right direction.

Unfortunately what you are trying to do is not possible, at least not in "straight" SQL - you would need dynamic SQL, or a two-step process (in the first step generating a string that is a new SQL statement). Complicated.
The problem is that you are not including a fixed list of city names (as string literals). You are trying to create columns based on whatever you get from (select city from employees). Thus the number of columns and the name of the columns is not known until the Oracle engine reads the data from the table, but before the engine starts it must already know what all the columns will be. Contradiction.
Note also that if this was possible, you almost surely would want (select distinct city from employees).
ADDED: The OP asks a follow-up question in a comment (see below).
The ideal arrangement is for the cities to be in their own, smaller table, and the "city" in the employees table to have a foreign key constraint so that the "city" thing is manageable. You don't want one HR clerk to enter New York, another to enter New York City and a third to enter NYC for the same city. One way or the other, first try your code by replacing the subquery that follows the operator IN in the pivot clause with simply the comma-separated list of string literals for the cities: ... IN ('DFW', 'CH', 'NY'). Note that the order in which you put them in this list will be the order of the columns in the output. I didn't check the entire query to see if there are any other issues; try this and let us know what happens.
Good luck!

select
(CASE WHEN CITY="DFW" THEN EMPLOYEE_NAME END) DFW,
(CASE WHEN CITY="CH" THEN EMPLOYEE_NAME END) CH,
(CASE WHEN CITY="NY" THEN EMPLOYEE_NAME END) NY
FROM employees
order by first_name

Maybe you need to transpose your result. See this link . I think DECODE or CASE works best for your case:
select
(CASE WHEN CITY="DFW" THEN EMPLOYEE_NAME END) DFW,
(CASE WHEN CITY="CH" THEN EMPLOYEE_NAME END) CH,
(CASE WHEN CITY="NY" THEN EMPLOYEE_NAME END) NY
FROM employees
order by first_name

Normally I would "edit" my first answer, but the question has changed so much, it's quite different from the original one so my older answer can't be "edited" - this now needs a completely new answer.
You can do what you want with pivoting, as I show below. Wondering why you want to do this in basic SQL and not by using reporting tools, which are written specifically for reporting needs. There's no way you need to keep your data in the pivoted format in the database.
You will see 'York' twice in the Chicago column; you will recognize that's on purpose (you will see I had a duplicate row in the "test" table at the top of my code); this is to demonstrate a possible defect of your arrangement.
Before you ask if you could get the list but without the row numbers - first, if you are simply generating a set of rows, those are not ordered. If you want things ordered for reporting purposes, you can do what I did, and then select "'DFW'", "'CHI'", "'NY'" from the query I wrote. Relational theory and the SQL standard do not guarantee the row order will be preserved, but Oracle apparently does preserve it, at least in current versions; you can use that solution at your own risk.
max(name) in the pivot clause may look odd to the uninitiated; one of the weird limitations of the PIVOT operator in Oracle is that it requires an aggregate function to be used, even if it's over a set of exactly one element.
Here's the code:
with t (city, name) as -- setting up input data for testing
(
select 'DFW', 'Smith' from dual union all
select 'CHI', 'York' from dual union all
select 'DFW', 'Matsumoto' from dual union all
select 'NY', 'Abu Osman' from dual union all
select 'DFW', 'Adams' from dual union all
select 'CHI', 'Wilson' from dual union all
select 'CHI', 'Arenas' from dual union all
select 'NY', 'Theodore' from dual union all
select 'CHI', 'McGhee' from dual union all
select 'NY', 'Zhou' from dual union all
select 'NY' , 'Simpson' from dual union all
select 'CHI', 'Narayanan' from dual union all
select 'CHI', 'York' from dual union all
select 'NY', 'Perez' from dual
)
select * from
(
select row_number() over (partition by city order by name) rn,
city, name
from t
)
pivot (max(name) for city in ('DFW', 'CHI', 'NY') )
order by rn
/
And the output:
RN 'DFW' 'CHI' 'NY'
---------- --------- --------- ---------
1 Adams Arenas Abu Osman
2 Matsumoto McGhee Perez
3 Smith Narayanan Simpson
4 Wilson Theodore
5 York Zhou
6 York
6 rows selected.

Related

SQL query that will return results separated by commas

I has two tables on MS SQL Server. First with the cities, second with student names and city ID from first table, where this student lives.
I need a sql query that will return results like this:
Stella Paris
Bob Moscow,New York
Mary Paris,New York
WITH CITIES(ID,CITY_NAME)AS
(
SELECT 1,'MOSCOW' UNION ALL
SELECT 2,'PARIS' UNION ALL
SELECT 3,'NEW YORK'
),
STUDENTS(ID,STUDENT_NAME,LIVE_IN) AS
(
SELECT 1,'STELLA','2' UNION ALL
SELECT 2,'BOB','1,3' UNION ALL
SELECT 3,'MARY','2,3'
)
SELECT X.ID,X.STUDENT_NAME,STRING_AGG(X.CITY_NAME,',')CITY_NAME FROM
(
SELECT S.ID,S.STUDENT_NAME,VALUE AS CITY_ID,C.CITY_NAME
FROM STUDENTS AS S
CROSS APPLY string_split(S.LIVE_IN,',')
JOIN CITIES AS C ON VALUE=C.ID
)X
GROUP BY X.ID,X.STUDENT_NAME;
For SQL Server 2017 and later, as pointed by #Squirrel, you can use STRING_SPLIT and STRING_AGG
Steps you can use:-
Use two different queries
first returning students records
then loop through to get cities records
update the array data as required
A better suggestion would be redesign database so as to be used table JOIN

Find string from table in cell in BiqQuery --> Query exceeded resource limits

I have two tables in BigQuery:
City List: Table: invertible-fin-XXX238.Reports.City
StationionNames: invertible-fin-XXX238.Reports.Station
Most of the StationNames containing City Names. Now I want to extract the city from the Station Table.
Here some example data:
City: Berlin
Stationname: inStore_Berlin_Alexanderplatz
Stationname: Berlin Schönefeld Airport
Stationname: Train Station Franchise Berlin
I tried the INSTR Function, but had no success (the INSTR works only with Legacy SQL and there I couldn’t use SUBSELECTS).
SELECT City,
INSTR((SELECT AdGroupName
FROM [invertible-fin-XXX238.Reports.City]),City) AS Match
FROM [invertible-fin-XXX238.Reports.Station]
Therefore I tried it with WHERE LIKE. Below the SQL Code:
SELECT a.City
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
WHERE b. Name LIKE '%' + a.City + '%'
GROUP BY a.City
But now the Query is too computationally intensive and I got the Error Code “Query exceeded resource limits for tier 1. Tier 18 or higher required.” back.
Could some please help me, writing a more resource friendly query.
Thanks in advance,
Philipp
Below are few of many possible versions for BiigQuery Standard SQL
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON REPLACE(LOWER(station), LOWER(city), '') <> LOWER(station)
or
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(station) LIKE CONCAT('%',LOWER(city),'%')
You can remove LOWER() function if names of City are spelled in same case in both tables
While above versions look more straightforward - i would prefer below one as it allows control way you extract city from station -r'([^ _]+)' - you should all characters that you observe being delimiters in column station. So in this case you will extract only city when it is not part of longer name
Of course you should validate if you even need to worry of this
#standardSQL
WITH tokens AS (
SELECT token, station
FROM `invertible-fin-XXX238.Reports.Station` AS s,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(station), r'([^ _]+)')) token
)
SELECT city, station
FROM tokens AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(city) = token
I also wonder how the performance for a sub-query would be in this case. For instance:
WITH City AS(
SELECT 'Berlin' As Name UNION ALL
SELECT 'Hamburg'
),
StationNames AS(
SELECT 'inStore_Berlin_Alexanderplatz' AS Name UNION ALL
SELECT 'Berlin Schönefeld Airport' UNION ALL
SELECT 'Train Station Franchise Berlin' UNION ALL
SELECT 'Train Station Hamburg' UNION ALL
SELECT 'Train Station Pluton'
)
SELECT
Name StationName,
(SELECT Name FROM City c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM StationNames s
Or in your case:
SELECT
Name StationName,
(SELECT Name FROM `invertible-fin-XXX238.Reports.City` c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM `invertible-fin-XXX238.Reports.Station` s
I know it's common sense for most databases that JOIN has better performance than sub-queries but BigQuery have lots of different optimization techniques for storing and querying data, I was curious to know how different the performance would be in this case.

SQL unique combinations

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3
One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate
You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;

Oracle Query - Use of Analytical functions

Assume we have loaded a flat file with patient diagnosis data into a table called “Data”. The table structure is:
Create table Data (
Firstname varchar(50),
Lastname varchar(50),
Date_of_birth datetime,
Medical_record_number varchar(20),
Diagnosis_date datetime,
Diagnosis_code varchar(20))
The data in the flat file looks like this:
'jane','jones','2/2/2001','MRN-11111','3/3/2009','diabetes'
'jane','jones','2/2/2001','MRN-11111','1/3/2009','asthma'
'jane','jones','5/5/1975','MRN-88888','2/17/2009','flu'
'tom','smith','4/12/2002','MRN-22222','3/3/2009','diabetes'
'tom','smith','4/12/2002','MRN-33333','1/3/2009','asthma'
'tom','smith','4/12/2002','MRN-33333','2/7/2009','asthma'
'jack','thomas','8/10/1991','MRN-44444','3/7/2009','asthma'
You can assume that no two patients have the same firstname, lastname, and date of birth combination. However one patient might have several visits on different days. These should all have the same medical record number.
The problem is this: Tom Smith has 2 different medical record numbers. Write a query that would always show all the patients
who are like Tom Smith – patients with more than one medical record number.
I came up with below query. It works perfectly fine, but wanted to know if there is a better way to write this query using Oracle Analytical function's. Thank you in advance
SELECT a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
FROM data a, data b
WHERE a.firstname = b.firstname
AND a.lastname = b.lastname
AND a.date_of_birth = b.date_of_birth
AND a.medical_record_number <> .medical_record_number
GROUP BY a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
It is possible to do via analytic functions, but whether it's faster than doing the join in your query* or not depends on what data you have. You'd need to test.
with data (firstname, lastname, date_of_birth, medical_record_number, diagnosis_date, diagnosis_code)
as (select 'jane','jones','2/2/2001','MRN-11111',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'jane','jones','2/2/2001','MRN-11111',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jane','jones','5/5/1975','MRN-88888',to_date('2/17/2009', 'mm/dd/yyyy'),'flu' from dual union all
select 'tom','smith','4/12/2002','MRN-22222',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('2/7/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jack','thomas','8/10/1991','MRN-44444',to_date('3/7/2009', 'mm/dd/yyyy'),'asthma' from dual),
-- end of mimicking your table and its data
res as (select firstname,
lastname,
date_of_birth,
medical_record_number,
count(distinct medical_record_number) over (partition by firstname, lastname, date_of_birth) cnt_med_rec_nums
from data)
select distinct firstname,
lastname,
date_of_birth,
medical_record_number
from res
where cnt_med_rec_nums > 1;
*btw, the group by in your example query is not necessary; it would make much more sense to switch it out for a distinct - it makes your intent much clearer, since you're wanting to get a distinct set of records.
You can probably simplify the query a bit using a HAVING clause rather than doing a self-join
SELECT a.firstname,
a.lastname,
a.date_of_birth,
MIN(a.medical_record_number) lowest_medical_record_number,
MAX(a.medical_record_number) highest_medical_record_number
FROM data a
GROUP BY a.firstname,
a.lastname,
a.date_of_birth
HAVING COUNT( DISTINCT a.medical_record_number ) > 1
I'm returning the smallest and largest medical record number for each patient here (that's what I'd do if most of the patients with this problem have just two numbers rather than having dozens). You could return just one or you could return a comma-separated list of all the medical record numbers if you'd rather (which would probably make more sense if most of the bad folks have dozens of numbers).

SQL Server CTE use IDs from single column with EXCEPT?

Having received kindness the other day from someone whose eyes were less bleary than mine I thought I'd give it another shot. Thanks in advance for your assistance.
I have a single SQL Server (2012) table named Contacts. That table has four columns I am currently concerned with. The table has a total of 71,454 rows. There are two types of records in the table; Companies and Employees. Both use the same column, named (Client ID), for their primary key. The existence of a Company Name is what differentiates between Company and Employee data. Employees have no associated Company Name. There are 29,021 Companies leaving 42,433 Employees.
There may be 0-n number of Employees associated with any one Company. I am attempting to create output that will reflect the relationship between Companies and Clients, if there are any. I would like to use the Company ID (Client ID column) as my anchor data set.
Not sure my definition is correct but the thought was to create a CTE of the known Companies by virtue of a given Company Name. Then, use the remaining Client IDs but use the EXCEPT clause to filter the already-retrieved Client IDs out of the result set.
Here the code I currently have;
;
WITH cte ( BaseID, Client_id, Company_name,
First_name, Last_name, [level] )
AS ( SELECT Client_id AS BaseID ,
Client_id ,
Company_name ,
First_name ,
Last_name ,
1
FROM dbo.Conv_client_clean
WHERE ( COMPANY_NAME IS NOT NULL
OR COMPANY_NAME != ''
)
UNION ALL
SELECT c.BaseID ,
children.Client_id ,
children.Company_name ,
children.First_name ,
children.Last_name ,
cte.[level] + 1
FROM dbo.Conv_client_clean children
INNER JOIN cte c ON c.Client_id = children.CLIENT_ID
EXCEPT
SELECT children.Client_id
FROM cte
)
SELECT BaseID ,
Client_id ,
Company_name ,
first_name ,
Last_name ,
[Level]
FROM cte
OPTION ( MAXRECURSION 0 );
In this instance I receive the following error;
Msg 252, Level 16, State 1, Line 3
Recursive common table expression 'cte' does not contain a top-level UNION ALL operator.
Any suggestions?
Thanks!
In the recursion cte query, you cannot have more set operations(union, except, union all,intersect) after the the one Union ALL which is refers the cte itself. I think what you can try is change the query as below and check
...
UNION ALL
SELECT c.BaseID ,
children.Client_id ,
children.Company_name ,
children.First_name ,
children.Last_name ,
cte.[level] + 1
FROM dbo.Conv_client_clean children
WHERE children.Client_id NOT IN (SELECT Client_id FROM cte)
As mentioned to Kiran I was able to concoct an 'old fashioned' approach what is good enough for now.
Thank you everyone for your kind attention.
I'm not sure what you are trying to do with level. It seems that it will be 1 for companies and 2 for employees. If that's the case, you don't even need recursion. The first part of your cte creates a list of companies. That's fine. Now use that to join back to the original table to show all the employees too.
WITH
cte( BaseID, ClientID, Company_name, First_name, Last_name )AS(
SELECT Base_ID,
Base_ID AS Client_id ,
Company_name,
First_name,
Last_name
FROM dbo.Conv_client_clean
WHERE COMPANY_NAME IS NOT NULL
OR COMPANY_NAME <> ''
)
select c2.Base_id, c2.Client_id,
c1.Company_Name, c2.First_Name, c2.Last_Name,
case when c2.client_id is null then 1 else 2 end Level
from cte c1
join Conv_client_clean c2
on c1.BaseID = isnull( c2.Client_ID, c2.Base_id )
order by c1.BaseID, c2.Base_id;
Here's where I fiddled with it.
Unfortunately anything besides UNION ALL, after you've made your recursive reference, will not work. And if you think about it, it makes sense.
Recursion is conceptually identical to the following where recursion continues until max depth is reached or a query returns no results upon which another execution could act.
WITH Anchor AS (select...)
,recurse1 as (<Some body referring to Anchor>)
,recurse2 as (<Identical body except referring to recurse1>)
,recurse3 as (<Identical body except referring to recurse2>)
...
select * from Anchor
union all
select * from recurse1
union all
select * from recurse2
...
The problem is that conjunctive operators apply to EVERYTHING that precedes it. In your case, EXCEPT operates on everything to it's left side which includes the Anchor query. Afterwards, when looking for the anchor to which the recursive part must be applied, the query compiler doesn't find a 'top level union all operator' any more because it's been consumed as part of the left side of your recursive query.
It wouldn't help to contrive some syntax akin to parenthesis that could delimit the scope of the left side of your table conjunction because you would then build a case of 'multiple recursive references' which is also illegal.
BOTTOM LINE IS: The only conjunction that works in the recursive part of your query is UNION ALL because it simply concatenates the right side. It doesn't require knowledge of the left side to determine which rows to include.