SQL How to pull in all records that don't contain - sql

This is a bit of a trick question to explain, but I'll try my best.
The essence of the question is that I have a employee salary table and the columns are like so,: Employee ID, Month of Salary, Salary (Currency).
I want to run a select that will show me all of the employees that don't have a record for X month.
I have attached an image to assist in the visualising of this, and here is an example of what UI would want from this data:
Let's say from this small example that I want to see all of the employees that weren't paid on the 1st October 2021. From looking I know that employee 3 was the only one paid and 1 and 2 were not paid. How would I be able to query this on a much larger range of data without knowing which month it could be that they weren't paid?

You need to join your EmployeeSalary table against a list of expected EmployeeID/MonthOfSalary values, and determine the gaps - the instances where there is no matching record in the EmployeeSalary table. A LEFT OUTER JOIN can be used here, whenever there's no matching record / missing record in your EmployeeSalary table, the LEFT OUTER JOIN will give you NULL.
The following query shows how to perform the LEFT OUTER JOIN, however note that I've joined your table on itself to get the list of EmployeeID and MonthOfSalary values. You would be better to join these from other tables, i.e. I assume you have an Employee table with all the IDs in it, which would be more efficient (and more accurate) to use, than building the ID list from the EmployeeSalary table (like I've done).
SELECT EmployeeList.EmployeeID, MonthList.MonthOfSalary
FROM (SELECT DISTINCT MonthOfSalary FROM EmployeeSalary) MonthList
JOIN (SELECT DISTINCT EmployeeID FROM EmployeeSalary) EmployeeList
LEFT OUTER JOIN EmployeeSalary
ON MonthList.MonthOfSalary = EmployeeSalary.MonthOfSalary
AND EmployeeList.EmployeeID = EmployeeSalary.EmployeeID
WHERE EmployeeSalary.EmployeeID IS NULL

You need first to get the latest value, then to calculate the difference and make a filter on it. The filter can be done thanks to having clause.
I propose you the following starting point, that you might need to adapt, at least to cast some formats according to your column types.
with latest_pay as (
-- Filter to get, for each employee, the latest paid month
select Employee_ID, Month, Salary, max(month) as latest_pay_month
from your_table
group by Employee_ID
)
-- Look for employees not paid since more than 'your_treshold' months
select Employee_ID, latest_pay_month, Salary, datediff(latest_pay_month, getdate(), Month) as latest_paid_month_delay
from latest_pay
having datediff(latest_pay_month, getdate(), Month) > your_threshold
Btw, I know it's an example, but avoid using column names such as Month, which would lead to confusions and errors with SQL keywords

This is ideally where you would use a calendar table - having one available is handy for tasks such as this where you need to find missing dates.
You can build one on the fly, I have done so in this example however you would normally have a permanant table to use.
In order to determin which rows are missing you need to generate a list of expected rows, an outer join to your actual data will then reveal the missing rows.
So here we have a CTE that generates a list of dates (based on a date range you can set), followed by another to give a list of all the EmployeeId values.
You expect each employeeId to have a row for each month, so we do a cross join to generate the list of expected results, we then outer join with the actual data and filter to the null rows, these are the employees who have no been paid for that month.
See example DB<>Fiddle
declare #from date='20210101', #to date='20211001';
with dates as (
select DateAdd(month,n,#from) dt from (
select top(100) Row_Number() over(order by (select null))-1 n from master.dbo.spt_values
)v
), e as (select distinct employeeId from t)
select dt, e.EmployeeId
from dates d cross join e
left join t on DatePart(month,d.dt)=DatePart(month,t.PaidDate) and t.EmployeeId=e.EmployeeId
where d.dt<=#to
and t.EmployeeId is null

Related

SELECT rows containing latest values

How do I SELECT the column1 registries that have the column2 with the latest date and is not null?
For example, I need to return just the line five (employee3).
How about this?
SELECT Employee, MAX(Resignation) Resignation
FROM table
WHERE Resignation IS NOT NULL
GROUP BY Employee
Or, if your table has more columns than you've shown,
SELECT a.*
FROM table a
JOIN (
SELECT Employee, MAX(Resignation) Resignation
FROM table
WHERE Resignation IS NOT NULL
GROUP BY Employee
) b ON a.Employee = b.Employee AND a.Resigation = b.Resignation
This is the "find detail rows with extreme values" query pattern.
Updated with a re-interpretation of the question:
I think you mean:
Return the most recent resignation date for all employees who are currently Resigned. Currently resigned is defined as "having all the same employee records with a resignation date populated for that employee". A single employee record with a NULL resignation date means the employee is still employed; regardless of how many times they have resigned!
This can be accomplished with an exists using a correlated subquery.
and a max along with a group by
First we get a list of all the employees who are not resigned.
Then we compare our full set to the set of employees who are not resigned and only keep employees are not in the list of employees not resigned, next we group by employee and get the max resignation.
SELECT Employee, Max(Resignation) Resignation
FROM Table A
WHERE NOT EXISTS (SELECT 1
FROM Table B
WHERE A.Employee = B.Employee
and B.Resignation is null) --Cooelation occurs here Notice how A
GROUP BY Employee
Cooelation occurs on the line WHERE A.Employee = B.Employee as A.EMPLOYEE refers to a table one level removed TABLE A from the table B.
Pretty sure there would be a way to do this with an apply join as well; but I'm not as familiar with that syntax yet.

Count transactions within a month only once

I have a situations like below:
I have two database tables. The first table, which I will call TB1 contains all the salaries that the client credits & also the date when the transaction is made.
The second table, which I will call TB2, contains all the products the client has in the bank.
My purpose is to find the number of salaries the client has got before the date he/she got a product (OVERDRAFT in my case) in our bank.
Till now, everything works fine and I have made the query to extract the necessary data.
The only problem, is that I need to improve the query. So, if a certain client has got more than 1 salary (for example every 15 days) within the same month of the same year, the salary is counted only once.
How can I do that PLEASE?
The query is like below:
SELECT TB1.customer_id, COUNT(TB1.customer_id)
FROM table_1 TB1
JOIN
( SELECT TB2.CUSTOMER_ID, TB2.OD_START_DATE
FROM table_2 TB2
JOIN table_2 TB2_MAX
ON TB2.CUSTOMER_ID = TB2_MAX.CUSTOMER_ID
HAVING TB2.od_start_date = MAX(TB2.od_start_date)
GROUP BY TB2.customer_id, TB2.od_start_date
) TB2
ON TB1.CUSTOMER_ID = TB2.CUSTOMER_ID
WHERE TB1.DATE_FROM < TB2.OD_START_DATE
GROUP BY TB1.CUSTOMER_ID
PS: DATE_FROM field contains the date when the transaction is made, while OD_START_DATE field contains the date when the LATEST product is opened.
JOIN in your inner query is redundant. You simply need a MAX date for each customer.
In your outer query you should be counting the DATE_FROM, and not Customer_Id. Since you want to count only once for transactions in a month, Convert DATE_FROM to year month combination and use DISTINCT to count only once.
SELECT TB1.customer_id, COUNT(DISTINCT TO_CHAR(TB1.DATE_FROM,'YYYYMM'))
FROM table_1 TB1
JOIN
( SELECT CUSTOMER_ID, MAX(OD_START_DATE) AS OD_START_DATE
FROM table_2
GROUP BY customer_id
) TB2
ON TB1.CUSTOMER_ID = TB2.CUSTOMER_ID
WHERE TB1.DATE_FROM < TB2.OD_START_DATE
GROUP BY TB1.CUSTOMER_ID

SQL: Get the first value

I have two tables:
patients(ID, Firstname, Lastname, ...)
records(ID, Date, Time, Version)
I want to (inner) join these tables, so I have the records with patient data, but in the column for Version I want always the first value that was recorded for the patient (so with the minimum of date and time dependent on the patient (id)). I tried with subquery but HANA doesn't allow ORDER-BY or LIMIT clause in subqueries.
How can I implement this with SQL? (HANA SQL)
Kind regards and thanks in advance.
HANA supports window functions, so you can join against a derived table that picks the first version:
select p.*, r.id, r.date, r.time, r.version
from patients p
join (
select id, date, time, version, patient_id,
row_number() over (partition by patient_id order by version) as rn
from records
) r on p.id = r.patient_id and r.rn = 1
The above assumes that the records table has a column patient_id that contains the id of the patients table to which that record belongs to.

How to include missing rows in sql return

I am currently trying to do a query like this:
(Psuedocode)
SELECT
NAME, SUM(VALUE), MONTH
FROM TABLE
WHERE MONTH BETWEEN 12 MONTHS AGO AND NOW
GROUP BY MONTH, NAME
The problem I am getting is that a name exists in a few of the months, but not all of the months, so if i filter this down to return the values for only one name, i sometimes get only 3 or 4 rows, rather than the 12 I expect to see.
My question is, is there a way to return rows, where it will still include the name, and month within the range, where the value would just be set to zero when I am missing the row from the previous result.
My first thought was to just union another select onto it, but I cant seem to get the logic to work to adhere to the group by, as well as the where clauses for limiting the names.
I you have data for all months, you can take the following approach. Generate all the rows (uses a cross join) then bring in the data you want:
select m.month, n.name, sum(t.value)
from (select distinct month from table) m cross join
(select distinct name from table) n left join
table t
on t.month = m.month and t.name = n.name
group by m.month, n.name;
This will return the missing sums as NULL values. If you want zero, then use coalesce(sum(t.value), 0).
you can use something like the following table to generate all the past 12 months as separate rows:
SELECT add_months(trunc(add_months(sysdate, -12), 'MONTH'), LEVEL - 1) AS month_in_range
FROM all_objects
CONNECT BY LEVEL <= 1 + months_between(add_months(sysdate, -12), TRUNC (sysdate, 'MONTH'));
and then do an outer join between you table and this.
I ended up implementing a left outer join similar to #paqogomez 's comment. As my team is already maintaining a time table, its very easy to get the month list for an outer join.
SELECT NAME, SUM(VALUE), TIME.MONTH
FROM (SELECT DISTINCT MONTH FROM TIME_TABLE
WHERE MONTH BETWEEN 12 MONTHS AGO AND NOW) TIME
LEFT OUTER JOIN TABLE ON (TIME.MONTH = TABLE.MONTH)
GROUP BY TIME.MONTH, NAME

Several MAX values based on another column value

I'm trying to write a SQL query in MS ACCESS and I've narrowed it down to the table below, but can't seem to get the last thing right without making several extremely large querys.
Here's the strucuture of thetable I'm trying to query:
The results I want: MemberId and year where memberId had most visits in that year.(That is which memberId had most visits 2014, which had most visits 2015 etc..and I also want the relevant year to be shown in the result)
Thanks!
Sounds like you need to determine MAX(Visits) by year in a subquery, then JOIN to that:
SELECT a.*,b.Max_Visits
FROM YourTable a
JOIN (SELECT Year,MAX(Visits) AS Max_Visits
FROM YourTable
GROUP BY Year
) b
ON a.Year = b.Year
AND a.Visits = b.Max_Visits
If you want to see all members and not just those that had the most visits per year, you can change from JOIN to LEFT JOIN
If there's a tie, this returns both members.