SQL query to clean up/omit missing values depending on another table - sql

I'm looking for a query that is able to omit certain values (which are missing in another table). Trying to explain it using an example:
Table 1 - Person
ID
Name
1
Jane
2
Joe
3
Jose
Table 2 - Schedule
Date
Employees
1/1
Jane,Joe,Jose
2/1
Alice,Jane
3/1
Joe,Bob,Jose
4/1
Alice,Bob
Expected result - missing values omitted
Date
Employees
1/1
Jane,Joe,Jose
2/1
Jane
3/1
Joe,Jose
4/1
Is that even possible to achieve with SQL, and if so, how?
Disclaimer: I do not have any impact on the design of the tables. I know that the structure is far from ideal, but there is no way to change it.

You want a normalized schedule table. You can create that on-the-fly with a recursive query or a combination of a lateral cross join and unnesting an array that you create from the substrings. Put this in a CTE (WITH clause) and then do your aggregation.
With an array and UNNEST
with good_schedule as
(
select s.date, e.employee_name
from schedule s
cross join lateral unnest(string_to_array(employees, ',')) as e(employee_name)
)
select s.date, string_agg(p.name, ',' order by p.name) as employees
from good_schedule s
left outer join person p on p.name = s.employee_name
group by s.date
order by s.date;
With a recursive CTE
with recursive good_schedule(date, employees, employee_name, pos) as
(
select date, employees, split_part(employees, ',', 1), 1
from schedule s
union all
select date, employees, split_part(employees, ',', pos+1) as employee_name, pos+1
from good_schedule
where split_part(employees, ',', pos+1) <> ''
)
select s.date, string_agg(p.name, ',' order by p.name) as employees
from good_schedule s
left outer join person p on p.name = s.employee_name
group by s.date
order by s.date;
Demo: https://dbfiddle.uk/A42E5oYh

Related

How to aggregate different CTEs in outer query SQL

i am trying to join two ctes to get the difference in performance of different countries and group on id here is my example
every campaign can be done in different countries, so how can i group by at the end to have 1 row per campaign id ?
CTE 1: (planned)
select
country
, campaign_id
, sum(sales) as planned_sales
from table x
group by 1,2
CTE 2: (Actual)
select
country
, campaign_id
, sum(sales) as actual_sales
from table y
group by 1,2
outer select
select
country,
planned_sales,
actual_sales
planned - actual as diff
from cte1
join cte2
on campaign_id = campaign_id
This should do it:
select
cte1.campaign_id,
sum(cte1.planned_sales),
sum(cte2.actual_sales)
sum(cte1.planned_sales) - sum(cte2.actual_sales) as diff
from cte1
join cte2
on cte1.campaign_id = cte2.campaign_id and cte1.country = cte2.country
group by 1
I would suggest using full join, so all data is included in both tables, not just data in one or the other. Your query is basically correct but it needs a group by.
select campaign_id,
sum(cte1.planned_sales) as planned_sales
sum(cte2.actual_sales) as actual_sales,
(coalesce(sum(cte1.planned_sales), 0) -
coalesce(sum(cte2.actual_sales), 0)
) as diff
from cte1 full join
cte2
using (campaign_id, country)
group by campaign_id;
That said, there is no reason why the CTEs should aggregate by both campaign and country. They could just aggregate by campaign id -- simplifying the query and improving performance.

How to returns number of rows incrementally based on column values

I have departments and issues tables. For every department there are approval levels.
So if say HR department has 3 approval levels, i want the drop down to return a new alias column as Y1,Y2,Y3.
Similarly if finance has 2 it should return Y1 and Y2.
Is it possible in sql?
As of now the first alias column is returning say Y3 for HR, but i want that split in rows Y1,Y2,Y3. is it possible via sql?
Generate a sequence from 1 to the maximum approval levels in a CTE.
WITH CTE as (
SELECT LEVEL n
FROM DUAL
CONNECT BY LEVEL <= (select MAX(approval_level) from p_it_Departments )
)
SELECT 'Y'||c.n as approval
,d.approval_level
,d.dept_name
FROM p_it_issues i
INNER JOIN p_it_Departments d ON i.related_dept_id=d.dept_id
INNER JOIN CTE c ON c.n <= d.approval_level
ORDER BY dept_name
You could also add a DISTINCT to the last SELECT to eliminate the duplicates that were present in your original query as well.
Ok, this would not have been mentioned in comment properly but i figured it out so wanted to share.
with cte as(
SELECT
ROW_NUMBER() OVER(partition by d.dept_name ORDER BY d.dept_name ASC ) AS Row#,
d.approval_level, d.dept_name
FROM p_it_issues i, p_it_Departments d where i.related_dept_id=d.dept_id
)
select 'Y'||cte.Row# from cte;
This would print what i wanted to display.

Recursive CTE Concept Confusion

I am trying to understand the concepts of using CTE in my SQL code. I have gone through a number of online posts explaining the concept but I cannot grasp how it iterates to present the hierarchical data. One of the widely used examples to explain the R-CTE is the Employee and ManagerID Example as below:
USE AdventureWorks
GO
WITH Emp_CTE AS (
SELECT EmployeeID, ContactID, LoginID, ManagerID, Title, BirthDate
FROM HumanResources.Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT e.EmployeeID, e.ContactID, e.LoginID, e.ManagerID, e.Title, e.BirthDate
FROM HumanResources.Employee e
INNER JOIN Emp_CTE ecte ON ecte.EmployeeID = e.ManagerID
)
SELECT *
FROM Emp_CTE
GO
The anchor query will grab the manager. After that I can't understand how it would bring the other employees if the recursive query is calling the anchor query again and again and the anchor query just has a single record which is the manager.
So you want to understand a recursive CTE.
It's simple really.
First there's the seed query which gets the original records.
SELECT EmployeeID, ContactID, LoginID, ManagerID, Title, BirthDate
FROM HumanResources.Employee
WHERE ManagerID IS NULL
In your case it's the employees without a manager.
Which would be the boss(es)
To demonstrate with a simplified example:
EmployeeID LoginID ManagerID Title
---------- ------- --------- ------------
101 boss NULL The Boss
The second query looks for employees that have the previous record as a manager.
SELECT e.EmployeeID, e.ContactID, e.LoginID, e.ManagerID, e.Title, e.BirthDate
FROM HumanResources.Employee e
INNER JOIN Emp_CTE ecte ON ecte.EmployeeID = e.ManagerID
Since it's a recursive CTE, the CTE uses itself in the second query.
You could see it as a loop, where it uses the previous records to get the next.
For the first iteration of that recursive loop you could get something like this:
EmployeeID LoginID ManagerID Title
---------- ------- --------- ------------
102 head1 101 Top Manager 1
103 head2 101 Top Manager 2
For the second iteration it would use the records from that first iteration to find the next.
EmployeeID LoginID ManagerID Title
---------- ------- --------- ------------
104 bob 102 Department Manager 1
105 hilda 102 Department Manager 2
108 john 103 Department Manager 4
109 jane 103 Department Manager 5
For the 3th iteration it would use the records from the 2nd iteration.
...
And this continues till there are no more employees to join on the ManagerID
Then after all the looping, the CTE will return all the records that were found through all those iterations.
Well, a short introduction to recursive CTEs:
A recursive CTE is rather something iterativ, than really recursive. The anchor query is taken to get some initial result set. With this set we can dive deeper. Try these simple cases:
Just a counter, not even a JOIN needed...
The 1 of the anchor will lead to a 2 in the UNION ALL. This 2 is passed into the UNION ALL again and will be returned as a 3 and so on...
WITH recCTE AS
(
SELECT 1 AS Mycounter
UNION ALL
SELECT recCTE.MyCounter+1
FROM recCTE
WHERE recCTE.MyCounter<10
)
SELECT * FROM recCTE;
A counter of 2 columns
This is exactly the same as above. But we have two columns and deal with them separately.
WITH recCTE AS
(
SELECT 1 AS Mycounter1, 10 AS MyCounter2
UNION ALL
SELECT recCTE.MyCounter1+1,recCTE.MyCounter2+1
FROM recCTE
WHERE recCTE.MyCounter1<10
)
SELECT * FROM recCTE;
Now we have two rows in the initial query
Running alone, the initial query will return two rows. Both with the counter==1 and two different values for the Nmbr-column
WITH recCTE AS
(
SELECT MyCounter=1, Nmbr FROM(VALUES(1),(10)) A(Nmbr)
UNION ALL
SELECT recCTE.MyCounter+1, recCTE.Nmbr+1
FROM recCTE
WHERE recCTE.MyCounter<10
)
SELECT * FROM recCTE ORDER BY MyCounter,Nmbr;
Now we get 20 rows back, not 10 as in the examples before. This is, because both rows of the anchor are used independently.
We can use the recursive CTE in a JOIN
In this example we will create a derived set first, then we will join this to the recursive CTE. Guess why the first row carries "X" instead of "A"?
WITH SomeSet AS (SELECT * FROM (VALUES(1,'A'),(2,'B'),(3,'C'),(4,'D'),(5,'E'),(6,'F'),(7,'G'),(8,'H'),(9,'I'),(10,'J')) A(id,Letter))
,recCTE AS
(
SELECT MyCounter=1, Nmbr,'X' AS Letter FROM(VALUES(1),(10)) A(Nmbr)
UNION ALL
SELECT recCTE.MyCounter+1, recCTE.Nmbr+1, SomeSet.Letter
FROM SomeSet
INNER JOIN recCTE ON SomeSet.id=recCTE.MyCounter+1
WHERE recCTE.MyCounter<10
)
SELECT * FROM recCTE ORDER BY MyCounter,Nmbr;
This will use a self-referring join to simulate your hierarchy, but with one gap-less chain
WITH SomeSet AS (SELECT * FROM (VALUES(1,'A',NULL),(2,'B',1),(3,'C',2),(4,'D',3),(5,'E',4),(6,'F',5),(7,'G',6),(8,'H',7),(9,'I',8),(10,'J',9)) A(id,Letter,Previous))
,recCTE AS
(
SELECT id,Letter,Previous,' ' PreviousLetter FROM SomeSet WHERE Previous IS NULL
UNION ALL
SELECT SomeSet.id,SomeSet.Letter,SomeSet.Previous,recCTE.Letter
FROM SomeSet
INNER JOIN recCTE ON SomeSet.Previous=recCTE.id
)
SELECT * FROM recCTE:
And now almost the same as before, but with several elements with the same "previous".
This is - in principles - your hierarchy
WITH SomeSet AS (SELECT * FROM (VALUES(1,'A',NULL),(2,'B',1),(3,'C',2),(4,'D',2),(5,'E',2),(6,'F',3),(7,'G',3),(8,'H',4),(9,'I',1),(10,'J',9)) A(id,Letter,Previous))
,recCTE AS
(
SELECT id,Letter,Previous,' ' PreviousLetter FROM SomeSet WHERE Previous IS NULL
UNION ALL
SELECT SomeSet.id,SomeSet.Letter,SomeSet.Previous,recCTE.Letter
FROM SomeSet
INNER JOIN recCTE ON SomeSet.Previous=recCTE.id
)
SELECT * FROM recCTE
Conclusio
The key points
The anchor query must return at least one row, but may return many
The second part must match the column list (as any UNION ALL query)
The second part must refer to the cte in its FROM-clause
either directly, or
through a JOIN
The second part will be called over and over using the result of the call before
Each row is handled separately (a hidden RBAR)
You can start with a Manager (top-most-node) and walk down by querying for employees with this manager id, or
You can start with the lowest in hierarchy (the ones, where no other row exists, using their id as manager id) and move up the list
As it is a hidden RBAR you can use this for row-by-row actions like string cummulation.
An example for the last statement
See how the column LetterPath is built.
WITH SomeSet AS (SELECT * FROM (VALUES(1,'A',NULL),(2,'B',1),(3,'C',2),(4,'D',2),(5,'E',2),(6,'F',3),(7,'G',3),(8,'H',4),(9,'I',1),(10,'J',9)) A(id,Letter,Previous))
,recCTE AS
(
SELECT id,Letter,Previous,' ' PreviousLetter,CAST(Letter AS VARCHAR(MAX)) AS LetterPath FROM SomeSet WHERE Previous IS NULL
UNION ALL
SELECT SomeSet.id,SomeSet.Letter,SomeSet.Previous,recCTE.Letter,recCTE.LetterPath + SomeSet.Letter
FROM SomeSet
INNER JOIN recCTE ON SomeSet.Previous=recCTE.id
)
SELECT * FROM recCTE
It's all about recursive step: firstly, root is used to proceed first step of recursion, so:
SELECT EmployeeID, ContactID, LoginID, ManagerID, Title, BirthDate
FROM HumanResources.Employee
WHERE ManagerID IS NULL
This provides first set of records.
Second set of records will be queried based on first set (anchor), so it will query all employees, which have manager in first set.
Second step of recursion will be based on second result set, not the anchor.
Third step will be based on third result set, etc.

BigQuery Standard SQL count original rows after CROSS JOIN UNNEST

I have a table with a repeated field that requires a CROSS JOIN UNNEST and I want to be able to get the count of the original, nested rows. For example.
SELECT studentId, COUNT(1) as studentCount
FROM myTable
CROSS JOIN UNNEST classes
WHERE classes.id in ('1', '2')
Right now, if a student is in class 1 and 2 it will count that student twice in studentCount.
I know I can do count(distinct(student.id)) to workaround this, but this ends up being a lot slower than a simple count. It's not taking advantage of the fact there's exactly one row per student.
So is there any way to compute count of the original rows before unnesting (but after the where clause) but still include the unnest in the query?
Note this must be in Standard SQL.
I understood your "challenge" as to show only students from classes id 1 and 2 while still showing total count of student in all classes. If this is it - see below
#standardSQL
SELECT studentId, studentCount
FROM myTable
CROSS JOIN (SELECT COUNT(1) studentCount FROM myTable)
WHERE studentId IN (
SELECT studentID FROM UNNEST(classes) AS classes
WHERE classes.id IN ('1', '2')
)
you can test / play with it using dummy data as below
#standardSQL
WITH myTable AS (
SELECT 1 AS studentId, [STRUCT<id STRING>('1'),STRUCT('2'),STRUCT('3')] AS classes UNION ALL
SELECT 2, [STRUCT<id STRING>('4'),STRUCT('5')]
)
SELECT studentId, studentCount
FROM myTable
CROSS JOIN (SELECT COUNT(1) studentCount FROM myTable)
WHERE studentId IN (
SELECT studentID FROM UNNEST(classes) AS classes
WHERE classes.id IN ('1', '2')
)
If your desired output is different from what I guessed - you still might find above useful for calculating studentCount
Just given the original constraints--that unnesting is required and you need to count the number of students--you can use a query of this form:
SELECT studentId, (SELECT COUNT(*) FROM myTable) AS studentCount
FROM myTable
CROSS JOIN UNNEST classes
WHERE classes.id in ('1', '2')

SQL query for grouping monthly period ranges

I'm having some trouble building a query that will group my items into monthly ranges according to whenever they exist in a month or not. I'm using PostgreSQL.
For example I have a table with data as this:
Name Period(text)
Ana 2010/09
Ana 2010/10
Ana 2010/11
Ana 2010/12
Ana 2011/01
Ana 2011/02
Peter 2009/05
Peter 2009/06
Peter 2009/07
Peter 2009/08
Peter 2009/12
Peter 2010/01
Peter 2010/02
Peter 2010/03
John 2009/05
John 2009/06
John 2009/09
John 2009/11
John 2009/12
and I want the result query to be this:
Name Start End
Ana 2010/09 2011/02
Peter 2009/05 2009/08
Peter 2009/12 2010/03
John 2009/05 2009/06
John 2009/09 2009/09
John 2009/11 2009/12
Is there any way to achieve this?
This is an aggregation problem, but with a twist -- you need the define the groups of adjacent months for each name.
Assuming that the month never appears more than once for a given name, you can do this by assigning a "month" number to each period and subtracting a sequential number. The values will be a constant for months that are in a row.
select name, min(period), max(period)
from (select t.*,
(cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
row_number() over (partition by name order by period)
) as grp
from names t
) t
group by grp, name;
Here is a SQL Fiddle illustrating this.
Note: duplicates are not really a problem either. You would jsut use dense_rank() instead of row_number().
I don't know if there is an easier way (there probably is) but I can't think of one right now:
with parts as (
select name,
to_date(replace(period,'/',''), 'yyyymm') as period
from names
), flagged as (
select name,
period,
case
when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
else 1
end as group_flag
from parts
), grouped as (
select flagged.*,
coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);
The first common table expression (parts) simple changes the period into a date so that it can be used in an arithmetic expression.
The second CTE (flagged) assigns a flag each time the gap (in months) between the current row and the previous is not one.
The third CTE then accumulates those flags to define a unique group number for each consecutive number of rows.
The final select then simply gets the start and end period for each group. I didn't bother to convert the period back to the original format though.
SQLFiddle example that also shows the intermediate result of the flagged CTE: http://sqlfiddle.com/#!15/8c0aa/2
Well one of the common ways to do this could be recursive SQL:
with recursive cte1 as (
select
"Name" as name,
("Period"||'/01')::date as period
from Table1
), cte2 as (
select
c.name, c.period as s, c.period as e
from cte1 as c
where not exists (select * from cte1 as t where t.name = c.name and t.period = c.period - interval '1 month')
union all
select
c.name, c.s as s, t.period
from cte2 as c
inner join cte1 as t on t.name = c.name and t.period = c.e + interval '1 month'
)
select
c.name, to_char(c.s, 'YYYY/MM') as "Start", to_char(max(c.e), 'YYYY/MM') as "End"
from cte2 as c
group by c.name, c.s
order by 1, 2
I'm not sure about performance of this one, you have to test it.
sql fiddle demo