SQL LEFT JOIN finding non-zero value as zero - sql

I have the following query with many LEFT JOIN clauses that has 7 result columns, the last two of which are numbers. I'm expecting the count_non_zero column to always be equal to the count_total column (given the data I current have)
WITH temp_table AS (
SELECT
attr.company_name_id AS option_id,
attr.company_name AS option_name,
uj.internship_company_name_id,
AVG(CASE WHEN s.salary > 0 THEN s.salary END) AS average,
COUNT(CASE WHEN s.salary > 0 THEN attr.company_name END) as count_non_zero,
COUNT(attr.company_name_id) as count_total
FROM company_name attr
LEFT JOIN user_job_internship uj ON uj.internship_company_name_id = attr.company_name_id
AND attr.approved_by_administrator = 1
LEFT JOIN salary_internship s ON uj.user_job_internship_id = s.user_job_id
AND uj.job_type_id = 4
LEFT JOIN [user] u ON u.user_id = uj.user_id AND u.include_in_student_site_results = 1
AND u.site_instance_id IN (1)
LEFT JOIN user_education_mba_school mba ON u.user_id = mba.user_id
AND mba.mba_graduation_year_id NOT IN (8)
GROUP BY attr.company_name_id, attr.company_name, uj.internship_company_name_id)
SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY average DESC) AS row, *
FROM temp_table WHERE count_total >= 3) sub
WHERE row >= 1 AND row <= 25 ORDER BY average DESC;
I run this query to prove that no values in the 'salary' column are returning a value of 0.
SELECT s.* FROM user_job_internship uj, salary_internship s
where internship_company_name_id = 440
AND uj.user_job_internship_id = s.user_job_id
I'm thinking there is something that messes up the results that is causing the count_non_zero to get counts that do not exist. Anyone have anythoughts?

I am assuming your count_total is greater than your count_non_zero. That is to be expected because you are using outer join to join user_job_internship and salary_internship.
Your query is including companies that do not have any internships. A company will not be included in the count_non_zero if either the salary is 0 or if there is no internship at all.
Change those two joins to inner joins and you should get your expected result.
The other option is to change your count_total to ignore companies that haven't any internship
count(case when s.user_job_id is not null then attr.company_name_id end) as count_total
You have one other slight risk. Your count_non_zero is counting company_name whereas your count_total is counting company_name_id. You could have problems if the company_name column allows NULL values.

Related

Join 2 tables on multiple case conditions

I am using pgAdmin on a Postgres db. I am trying to achieve the following result (amounts are random):
In order to do that, I need to query the 2 tables: accounts and transactions
I am not sure how to get the sum(amount) results into 1 column. I have tried the following:
select SUM(
CASE WHEN debit_account_id = 1 then amount
when credit_account_id = 1 then amount * (-1) else 0 end),
SUM(
CASE WHEN debit_account_id = 2 then amount
when credit_account_id = 2 then amount * (-1) else 0 end)
from transactions
where entity_id = 1
and so on up to account_id 6. This will give me the correct sums for each account but each result is in new column. How I can combine this so the results looks like in example above?
You can use UNION ALL.
select debit_account_id account_id, -amount from transactions
union all
select credit_account_id account_id, amount from transactions;
now you have data together in one column
I'd sum the debits and the credits for each account in different queries and join them on the accounts table:
SELECT account_name, sum_credut - sum_debit AS balance
FROM accounts a
JOIN (SELECT credit_account_id, SUM(amount)
FROM transfer
GROUP BY credit_account_id) c ON a.id = c.credit_account_id
JOIN (SELECT debit_account_id, SUM(amount)
FROM transfer
GROUP BY debit_account_id) d ON a.id = d.debit_account_id
I would recommend a lateral joins for this:
select a.account_name,
sum(v.signed_amount) as total_amount
from transactions t left join lateral
(values (t.debit_account_id, t.amount),
(t.credit_account_id, - t.amount)
) v(account_id, signed_amount) join
account a
on a.id = v.account_id
group by a.account_name;
I don't see entity_id in any of the tables, so I don't know where that comes from.

Summing all - / + values from a column by a specific year

I have joined 2 tables. One table has all the values (+/- amounts) and the other has mainly dimensional data. Once joining, I wanted to run a query to sum of all negative and positive values given a specific year.
Problem seems to be happening on the third line. Any thoughts?
select sum(sales_amount)
from salesInfo s inner joint dimInfo d
where sales_amount <0 and year = '2019';
The query is not generating due to an error being thrown on line 3:
error - ORA-00905: missing keyword 00905. 00000 - "missing keyword"
Issue is likely due to the missing ON clause which in Oracle SQL is not allowed for INNER JOIN unlike other database dialects which treats such a join equivalent to a cross join.
Alternatively, you can use Oracle's NATURAL JOIN to join on matching named columns between tables:
from salesInfo s natural join dimInfo d
Either way, you can then run a conditional aggregate and even group by year:
select year,
sum(case when sales_amount < 0 then sales_amount end) as negative_sales,
sum(case when sales_amount > 0 then sales_amount end) as positive_sales
from salesInfo s
inner join dimInfo d on s.some_id = d.some_id
group by year
Rextester Demo
Add an ON clause after the JOIN statement to specify the JOIN condition
SELECT sum(sales_amount)
FROM salesInfo s
INNER JOIN dimInfo d
ON d.<column_name> = s.<column_name>
WHERE sales_amount < 0 and year = '2019'
Just use conditional aggregation:
select sum(case when sales_amount < 0 then sales_amount end) as neg_sum,
sum(case when sales_amount > 0 then sales_amount end) as pos_sum
from salesInfo s inner join
dimInfo d
on ? = ? -- whatever your `JOIN` conditions are here
where year = 2019;

Slow MS Access Sub Query

I have three tables in Access:
employees
----------------------------------
id (pk),name
times
----------------------
id (pk),employee_id,event_time
time_notes
----------------------
id (pk),time_id,note
I want to get the record for each employee record from the times table with an event_time immediately prior to some time. Doing that is simple enough with this:
select employees.id, employees.name,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as time_id
from employees
However, I also want to get some indication of whether there's a matching record in the time_notes table:
select employees.id, employees.name,
(select top 1 time_notes.id from time_notes where time_notes.time_id=(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC)) as time_note_present,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as last_time_id
from employees
This does work but it's SOOOOO SLOW. We're talking 10 seconds or more if there's 100 records in the employee table. The problem is peculiar to Access as I can't use the last_time_id result of the other sub-query like I can in MySQL or SQL Server.
I am looking for tips on how to speed this up. Either a different query, indexes. Something.
Not sure if something like this would work for you?
SELECT
employees.id,
employees.name,
time_notes.id AS time_note_present,
times.id AS last_time_id
FROM
(
employees LEFT JOIN
(
times INNER JOIN
(
SELECT times.employee_id AS lt_employee_id, max(times.event_time) AS lt_event_time
FROM times
WHERE times.event_time <= #2018-01-30 14:21:48#
GROUP BY times.employee_id
)
AS last_times
ON times.event_time = last_times.lt_event_time AND times.employee_id = last_times.lt_employee_id
)
ON employees.id = times.employee_id
)
LEFT JOIN time_notes ON times.id = time_notes.time_id;
(Completely untested and may contain typos)
Basically, your query is running multiple correlated subqueries even a nested one in a WHERE clause. Correlated queries calculate a value separately for each row, corresponding to outer query.
Similar to #LeeMac, simply join all your tables to an aggregate query for the max event_time grouped by employee_id which will run once across all rows. Below times is the baseFROM table joined to the aggregate query, employees, and time_notes tables:
select e.id, e.name, t.event_time, n.note
from ((times t
inner join
(select sub.employee_id, max(sub.event_time) as max_event_time
from times sub
where sub.event_time <= #2018-01-30 14:21:48#
group by sub.employee_id
) as agg_qry
on t.employee_id = agg_qry.employee_id and t.event_time = agg_qry.max_event_time)
inner join employees e
on e.id = t.employee_id)
left join time_notes n
on n.time_id = t.id

Eliminate duplicate rows from query output

I have a large SELECT query with multiple JOINS and WHERE clauses. Despite specifying DISTINCT (also have tried GROUP BY) - there are duplicate rows returned. I am assuming this is because the query selects several IDs from several tables. At any rate, I would like to know if there is a way to remove duplicate rows from a result set, based on a condition.
I am looking to remove duplicates from results if x.ID appears more than once. The duplicate rows all appear grouped together with the same IDs.
Query:
SELECT e.Employee_ID, ce.CC_ID as CCID, e.Manager_ID, e.First_Name, e.Last_Name,,e.Last_Login,
e.Date_Created AS Date_Created, e.Employee_Password AS Password,e.EmpLogin
ISNULL((SELECT TOP 1 1 FROM Gift g
JOIN Type t ON g.TypeID = t.TypeID AND t.Code = 'Reb'
WHERE g.Manager_ID = e.Manager_ID),0) RebGift,
i.DateCreated as ImportDate
FROM #EmployeeTemp ct
JOIN dbo.Employee c ON ct.Employee_ID = e.Employee_ID
INNER JOIN dbo.Manager p ON e.Manager_ID = m.Manager_ID
LEFT JOIN EmployeeImp i ON e.Employee_ID = i.Employee_ID AND i.Active = 1
INNER JOIN CreditCard_Updates cc ON m.Manager_ID = ce.Manager_ID
LEFT JOIN Manager m2 ON m2.Manager_ID = ce.Modified_By
WHERE ce.CCType ='R' AND m.isT4L = 1
AND CHARINDEX(e.first_name, Selected_Emp) > 0
AND ce.Processed_Flag = #isProcessed
I don't have enough reputation to add a comment, so I'll just try to help you in an answer proper (even though this is more of a comment).
It seems like what you want to do is select distinctly on just one column.
Here are some answers which look like that:
SELECT DISTINCT on one column
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?

COUNT() columns by a specific value

I want to make a query on a SQL Compact 4.0 DB-Table, with 2 COUNT()-columns. The first column shall count all rows ( COUNT(*) ) and the second one shall only count the row, when the decimal-value of a specific column is higher as or equal to 3.0
I got this far:
SELECT COUNT(a.number) AS Participant, COUNT(b.specificColumn) AS Approved
FROM person AS a
LEFT OUTER JOIN test AS b
ON b.number = a.number
This way the second COUNT() will obviously only count rows, that actually have a value != NULL
I don't think you can do it using a count. Try using a case statement. Not tested:
SELECT COUNT(a.number) AS Participant,
SUM(case when b.specificColumn >3 then 1 else 0 end) as Approved
FROM person AS a
LEFT OUTER JOIN test AS b
ON b.number = a.number
SELECT COUNT(a.number) AS Participant,
SUM(CASE WHEN b.specificColumn IS NULL THEN 0
WHEN b.specificColumn >= 3 THEN 1
ELSE 0) AS Approved
FROM person AS a
LEFT OUTER JOIN test AS b
ON b.number = a.number