SQL Server operating with null value - sql

I'm trying to make an average of two columns that may or not contain in the second one Null values. Can someone help me get an idea of how to make an operation between these values?
select
nib, primeiroTitular, segundoTitular,
((YEAR(GETDATE()) - YEAR(P1.dataNascimento)) +
(YEAR(GETDATE()) - YEAR(P2.dataNascimento)))/2 as média
from
ContasBancarias
left join
Pessoas P1 on P1.bi = ContasBancarias.primeiroTitular
left join
Pessoas P2 on P2.bi = ContasBancarias.segundoTitular

Here is what you can do. I changed the way that you do calculation between the two dates I used the DATEDIFF() function Next time try to indent your code when you ask a question it's easier for the people when they try to read your code.
SELECT nib
, primeiroTitular
, segundoTitular
,(DATEDIFF(YEAR,GETDATE(), CASE
WHEN P1.dataNascimento IS NULL
THEN GETDATE()
ELSE P1.dataNascimento
END )
+ DATEDIFF(YEAR,GETDATE(),CASE
WHEN P2.dataNascimento IS NULL
THEN GETDATE()
ELSE P2.dataNascimento
END) )/2 as média
FROM ContasBancarias
LEFT JOIN Pessoas P1
ON P1.bi = ContasBancarias.primeiroTitular
LEFT JOIN Pessoas P2
ON P2.bi = ContasBancarias.segundoTitular

Assuming only the second column could be NULL try this -
([Column1] + ISNULL(Column2,0))/2 AS [Average]

In the interest of accuracy, I would recommend NOT INCLUDING any NULLs in your equation, since NULL means "don't know the value", "don't know what the value is", or "don't know if this value is relevant at all". To assign a zero value to NULL is to skew the results.
Most accurate results, use the WHERE clause to eliminate any row with NULL from the calculations, or the ISNULL function.

Related

Query only takes the first value of subquery

I am creating an SQL function to report, the parameters I receive are several strings containing the PK separated by , example:
#ID_Model_list = '1,2'
#ID_Station_list = '1,4,7,8,10'
in my SQL query I perform a subquery, with it I convert the variables I receive into a column, example:
SELECT CAST(VALUE AS INT) AS ID FROM string_split(#ID_Model_list,',')
would be the same as: SELECT CAST(value AS int) AS ID FROM string_split('1,2',',')
Result:
If I add the code above to my query, it only takes the first value of the column that results from the subquery
CREATE FUNCTION V_Scrap_Report
(#ID_Model_list varchar, #ID_Station_list varchar, #fecha datetime, #fechafin datetime)
RETURNS TABLE
AS RETURN
(SELECT S.IDScrap
, S.fecha
, M.modelo
, E.estacion
, C.correccion
, S.elemento
, P.nombre
, P.numeroparte
, Sp.cantidad
FROM dbo.Scrap S
FULL OUTER JOIN dbo.Modelo M ON S.IDModelo = M.IDModelo
FULL OUTER JOIN dbo.Estacion E ON E.IDEstacion = S.IDEstacion
FULL OUTER JOIN dbo.Scrapcorreccion Sc ON S.IDScrap = Sc.IDScrap
FULL OUTER JOIN dbo.Correccion C ON C.IDCorrecion = Sc.IDCorrecion
FULL OUTER JOIN dbo.Scraparte Sp ON S.IDScrap = Sp.IDScrap
JOIN dbo.Parte P ON Sp.IDParte = P.IDParte
WHERE S.fecha >= #fecha
AND S.fecha <= DATEADD(HOUR,23.9999,#fechafin)
AND S.IDModelo = (SELECT CAST(VALUE AS INT) AS ID FROM string_split(#ID_Model_list,','))
AND S.IDEstacion = (SELECT VALUE FROM string_split(#ID_Station_list,',')))
The above function is only returning results when S.IDModelo = 1 AND S.IDEstacion = 1 does not take into account that there is:
S.IDModelo = 2 AND S.IDEstacion = 1
S.IDModelo = 1 AND S.IDEstacion = 4
S.IDModelo = 1 AND S.IDEstacion = 7
S.IDModelo = 1 AND S.IDEstacion = 8
S.IDModelo = 2 AND S.IDEstacion = 10
When I call the function I do it like this:
SELECT * FROM V_Scrap_Report('1,2','1,4,7,8,10','2022-07-18','2022-07-20')
oddish, if i change ... V_Scrap_Report('1,2'... by ... V_Scrap_Report('2,1'... just bring
S.IDModelo = 2 AND S.IDEstacion = 1
what could be missing in the query so as not to skip matches?
The comments and Bohemian's answer give you a few specific things that are wrong with your query that you need to look at, but I think what you really need is a different understanding of what you're doing. So...
A select returns a set. (Technically a bag because it can contain duplicates, but we'll ignore that).
A set can have zero members, or one member, or more members. There's a set of integers greater than 1 and less than 4, and that set is {2, 3}. There's a set of integers less than 1 and greater than 4, and that set is {} aka "the empty set".
So a set is a collection of zero or more things, not one thing.
When you're comparing things, you can't compare a collection of things with just one thing. Suppose I have a bag of oranges in my left hand, and one orange in my right hand. Does the orange in my right hand taste the same as the bag of oranges in my left hand? The question doesn't really make sense, does it? You have to compare the one orange in my right hand with each orange in the bag, individually.
So if I ask SQL to evaluate whether orange_in_right_hand = { first_orange_in_bag, second_orange_in_bag, third_orange_in_bag } what do you want it to do? It's a question that doesn't really make sense.
You have a situation like this in your query:
where S.IDEstacion = (SELECT VALUE FROM string_split(#ID_Station_list,',')))
You do a similar comparison with #ID_model_list.
The left hand side of that operation is one value. The right hand side of that operation is the result of a select, which can return more than one value. In this case, the output of the string_split function. You are asking SQL to determine whether the one thing is equal to potentially many things.
This doesn't really make sense as we saw with the oranges. And because it doesn't make sense, it will actually cause an error.
So why aren't you getting an error? Because in your specific case, the set returned by string_split will happen to only have one member, because you have a bug in the code.
Let's look at #ID_station_list. Your input to the string_split is the #ID_Station_list parameter, which you have said is a varchar. But you didn't say how long it is. In this case, that means it will be treated as being one character long:
declare #ID_station_list varchar;
set #ID_station_list = '1,4,7,8,10';
select #ID_station_list;
What do you think this will return? It will return the string value '1'. All of the other characters got thrown away, because you didn't say #ID_station_list was a varchar big enough to hold the value you gave it. You just said it was a varchar, and SQL will assume you meant varchar(1) in this case.
So the value you are passing to string_split function is just the value '1'. So you get one value back when you split this string.
SQL Server will then look at that and think "well, ok, you are asking me whether a single value is equal to the result of a select, which you really shouldn't be doing because it doesn't make sense, but in this particular case I will do it for you without telling you the problem because there was only one member in the set".
If you fix your parameter declaration by, say, making #ID_station_list a varchar(100) and giving it the value '1,2,3', you'll get an error.
So how should you compare the single value IDEstacion with the set returned by string_split? You tell SQL to check whether the value is in the set instead of checking whether it equals the set. Hence Stu's comment.
If you put where clause conditions on an outer joined table, you effectively convert the join to an inner join.
Move such conditions to the join condition:
...
FROM dbo.Scrap S
FULL OUTER JOIN dbo.Modelo M ON S.IDModelo = M.IDModelo
AND S.IDModelo = (SELECT CAST(VALUE AS INT) AS ID FROM string_split(#ID_Model_list,','))
FULL OUTER JOIN dbo.Estacion E ON E.IDEstacion = S.IDEstacion
AND S.IDEstacion = (SELECT VALUE FROM string_split(#ID_Station_list,',')))
FULL OUTER JOIN dbo.Scrapcorreccion Sc ON S.IDScrap = Sc.IDScrap
FULL OUTER JOIN dbo.Correccion C ON C.IDCorrecion = Sc.IDCorrecion
FULL OUTER JOIN dbo.Scraparte Sp ON S.IDScrap = Sp.IDScrap
JOIN dbo.Parte P ON Sp.IDParte = P.IDParte
WHERE S.fecha >= #fecha
AND S.fecha <= DATEADD(HOUR,23.9999,#fechafin)

DB2 Coalesce not returning expected value

I'm trying to coalesce between two last_record values but I'm receiving a null result despite knowing one of the values is non null. When querying these values alone, an expected non-null value is returned though when checked through coalesce I'm receiving null.
Portion of code:
select rds.*,
case when row_num=coalesce(bo.last_record, boa.last_record)
then closing - (rolling_debit - debit) else debit end Aged_Debt
from rolling_debit_sum rds
inner join balance_overflow bo
on rds.client_number = bo.client_number
inner join balance_overflow_aft boa
on rds.client_number = boa.client_number
where row_num >= coalesce(bo.last_record, boa.last_record)
I Know that last_record is not null in at leasat one of two cases, though the query returns null for both. Any ideas what might be the issue here?
The issue was within the join.
An outer join was required rather than inner.

SELECT statement that only shows rows where there is a NULL in a specific column

I've got an issue I've been racking my brain on this and the code I have makes sense to me but still doesn't work.
Here is the question:
Give me a list of the names of all the unused (potential) caretakers and the names and types of all unclaimed pieces of art (art that does not yet have a caretaker).
Here is how the tables are set up:
CareTakers: CareTakerID, CareTakerName
Donations: DonationID, DonorID, DonatedMoney, ArtName, ArtType, ArtAppraisedPrice, ArtLocationBuilding, ArtLocationRoom, CareTakerID
Donors: DonorID, DonorName, DonorAddress
Here is the code I have:
SELECT
CareTakerName, ArtName, ArtType
FROM
CareTakers
JOIN
Donations ON CareTakers.CareTakerID = Donations.CareTakerID
WHERE
Donations.CareTakerID = ''
Any help would be very much appreciated!
I would suggest two queries for the reasons I noted in my comment on the OP above... However, since you requested one query, the following should get you what you asked for, although the result sets are not depicted side-by-side.
SELECT
CareTakerName, ArtName, ArtType
FROM
CareTakers
LEFT JOIN
Donations ON CareTakers.CareTakerID = Donations.CareTakerID
WHERE
NULLIF(Donations.CareTakerID,'') IS NULL
UNION -- Returns a stacked result set
SELECT
CareTakerName, ArtName, ArtType
FROM
CareTakers
RIGHT JOIN
Donations ON CareTakers.CareTakerID = Donations.CareTakerID
WHERE
NULLIF(CareTakers.CareTakerID,'') IS NULL
If this is not sufficient, I can supply two separate queries as I suggested above.
*EDIT: Included NULLIF with '' criteria to treat blank and NULL equally in the where clause.
Use a LEFT JOIN:
SELECT CareTakerName, ArtName, ArtType
FROM CareTakers
LEFT JOIN Donations ON CareTakers.CareTakerID = Donations.CareTakerID
WHERE Donations.CareTakerID IS NULL
Donations.CareTakerID = '' is not the same as testing for NULL. That's testing for an empty string.
You want
Donations.CareTakerID is NULL
Also note that
Donations.CaretakerID = NULL
will not give you what you want either (a common mistake.)
Firstly, You need to know What is a NULL value. Is it zero, blank space or something else? The answer is: No.
NULL is not a value, it only means that a value wasn't provided when the row was created.
SELECT d.ArtName, d.ArtType
,(SELECT CareTakerName FROM CareTakers c WHERE c.CareTakerID = d.CareTakerID)CareTakerName
FROM Donations d
WHERE ISNULL(d.CareTakerID, 0) = 0
*I like to use a "default" value for a NULL column
More infotmation here: SQL NULL Values

Query taking too long - Optimization

I am having an issue with the following query returning results a bit too slow and I suspect I am missing something basic. My initial guess is the 'CASE' statement is taking too long to process its result on the underlying data. But it could be something in the derived tables as well.
The question is, how can I speed this up? Are there any glaring errors in the way I am pulling the data? Am I running into a sorting or looping issues somewhere? The query runs for about 40 seconds, which seems quite long. C# is my primary expertise, SQL is a work in progress.
Note I am not asking "write my code" or "fix my code". Just for a pointer in the right direction, I can't seem to figure out where the slow down occurs. Each derived table runs very quickly (less than a second) by themselves, the joins seem correct and the result set is returning exactly what I need. It's just too slow and I'm sure there are better SQL scripter's out there ;) Any tips would be greatly appreciated!
SELECT
hdr.taker
, hdr.order_no
, hdr.po_no as display_po
, cust.customer_name
, hdr.customer_id
, 'INCORRECT-LARGE ORDER' + CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK'
END AS Status
FROM
(myDb_view_oe_hdr hdr
LEFT OUTER JOIN myDb_view_customer cust
ON hdr.customer_id = cust.customer_id)
LEFT OUTER JOIN wpd_view_sales_territory_by_customer territory
ON cust.customer_id = territory.customer_id
LEFT OUTER JOIN
(select
order_no,
SUM(ext_price_calc) as ext_price_calc
from
(select
hdr.order_no,
line.item_id,
(line.qty_ordered - isnull(qty_canceled,0)) * unit_price as ext_price_calc
from myDb_view_oe_hdr hdr
left outer join myDb_view_oe_line line
on hdr.order_no = line.order_no
where
line.delete_flag = 'N'
AND line.cancel_flag = 'N'
AND hdr.projected_order = 'N'
AND hdr.delete_flag = 'N'
AND hdr.cancel_flag = 'N'
AND line.item_id not in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%', 'FUEL','NET-FUEL', 'CONVENIENCE-FEE')) as line
group by order_no) as order_total
on hdr.order_no = order_total.order_no
LEFT OUTER JOIN
(select
order_no,
count(order_no) as convenience_count
from oe_line with (nolock)
left outer join inv_mast inv with (nolock)
on oe_line.inv_mast_uid = inv.inv_mast_uid
where inv.item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%')
and oe_line.delete_flag <> 'Y'
group by order_no) as fee_count
on hdr.order_no = fee_count.order_no
INNER JOIN
(select
order_no,
unit_price
from oe_line line with (nolock)
where line.inv_mast_uid in (select inv_mast_uid from inv_mast with (nolock) where item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%'))) as fee_price
ON fee_count.order_no = fee_price.order_no
WHERE
hdr.projected_order = 'N'
AND hdr.cancel_flag = 'N'
AND hdr.delete_flag = 'N'
AND hdr.completed = 'N'
AND territory.territory_id = ‘CUSTOMERTERRITORY’
AND ext_price_calc > 600.00
AND hdr.carrier_id <> '100004'
AND fee_count.convenience_count is not null
AND CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK' END <> 'OK'
Just as a clue to the right direction for optimization:
When you do an OUTER JOIN to a query with calculated columns, you are guaranteeing not only a full table scan, but that those calculations must be performed against every row in the joined table. It appears that you can actually do your join to oe_line without the column calculations (i.e. by filtering ext_price_calc to a specific range).
You don't need to do most of the subqueries that are in your query--the master query can be recrafted to use regular table join syntax. Joins to subqueries containing subqueries presents a challenge to the SQL optimizer that it may not be able to meet. But by using regular joins, the optimizer has a much better chance at identifying more efficient query strategies.
You don't tag which SQL engine you're using. Every database has proprietary extensions that may allow for speedier or more efficient queries. It would be easier to provide useful feedback if you indicated whether you were using MySQL, SQL Server, Oracle, etc.
Regardless of the database you're using, reviewing the query plan is always a good place to start. This will tell you where most of the I/O and time in your query is being spent.
Just on general principle, make sure your statistics are up-to-date.
It's may not be solvable by any of us without the real stuff to test with.
IF that's the case and nobody else posts the answer, I can still help. Here is how to trouble shoot it.
(1) take joins and pieces out one by one.
(2) this will cause errors. Remove or fake the references to get rid of them.
(3) see how that works.
(4) Put items back before you try taking something else out
(5) keep track...
(6) also be aware where a removal of something might drastically reduce the result set.
You might find you're missing an index or some other smoking gun.
I was having the same problem and I was able to solve it by indexing one of the tables and setting a primary key.
I strongly suspect that the problem lies in the number of joins you're doing. A lot of databases do joins basically by systemically checking all possible combinations of the various tables as being valid - so if you're joinging table A and B on column C, and A looks like:
Name:C
Fred:1
Alice:2
Betty:3
While B looks like:
C:Pet
1:Alligator
2:Lion
3:T-Rex
When you do the join, it checks all 9 possibilities:
Fred:1:1:Alligator
Fred:1:2:Lion
Fred:1:3:T-Rex
Alice:2:1:Alligator
Alice:2:2:Lion
Alice:2:3:T-Rex
Betty:3:1:Alligator
Betty:3:2:Lion
Betty:3:3:T-Rex
And goes through and deletes the non-matching ones:
Fred:1:1:Alligator
Alice:2:2:Lion
Betty:3:3:T-Rex
... which means with three entries in each table, it creates nine temporary records, sorts through them all, and deletes six of them ... all before it actually sorts through the results for what you're after (so if you are looking for Betty's Pet, you only want one row on that final result).
... and you're doing how many joins and sub-queries?

MySQL to PostgreSQL: GROUP BY issues

So I decided to try out PostgreSQL instead of MySQL but I am having some slight conversion problems. This was a query of mine that samples data from four tables and spit them out all in on result.
I am at a loss of how to convey this in PostgreSQL and specifically in Django but I am leaving that for another quesiton so bonus points if you can Django-fy it but no worries if you just pure SQL it.
SELECT links.id, links.created, links.url, links.title, user.username, category.title, SUM(votes.karma_delta) AS karma, SUM(IF(votes.user_id = 1, votes.karma_delta, 0)) AS user_vote
FROM links
LEFT OUTER JOIN `users` `user` ON (`links`.`user_id`=`user`.`id`)
LEFT OUTER JOIN `categories` `category` ON (`links`.`category_id`=`category`.`id`)
LEFT OUTER JOIN `votes` `votes` ON (`votes`.`link_id`=`links`.`id`)
WHERE (links.id = votes.link_id)
GROUP BY votes.link_id
ORDER BY (SUM(votes.karma_delta) - 1) / POW((TIMESTAMPDIFF(HOUR, links.created, NOW()) + 2), 1.5) DESC
LIMIT 20
The IF in the select was where my first troubles began. Seems it's an IF true/false THEN stuff ELSE other stuff END IF yet I can't get the syntax right. I tried to use Navicat's SQL builder but it constantly wanted me to place everything I had selected into the GROUP BY and that I think it all kinds of wrong.
What I am looking for in summary is to make this MySQL query work in PostreSQL. Thank you.
Current Progress
Just want to thank everybody for their help. This is what I have so far:
SELECT links_link.id, links_link.created, links_link.url, links_link.title, links_category.title, SUM(links_vote.karma_delta) AS karma, SUM(CASE WHEN links_vote.user_id = 1 THEN links_vote.karma_delta ELSE 0 END) AS user_vote
FROM links_link
LEFT OUTER JOIN auth_user ON (links_link.user_id = auth_user.id)
LEFT OUTER JOIN links_category ON (links_link.category_id = links_category.id)
LEFT OUTER JOIN links_vote ON (links_vote.link_id = links_link.id)
WHERE (links_link.id = links_vote.link_id)
GROUP BY links_link.id, links_link.created, links_link.url, links_link.title, links_category.title
ORDER BY links_link.created DESC
LIMIT 20
I had to make some table name changes and I am still working on my ORDER BY so till then we're just gonna cop out. Thanks again!
Have a look at this link GROUP BY
When GROUP BY is present, it is not
valid for the SELECT list expressions
to refer to ungrouped columns except
within aggregate functions, since
there would be more than one possible
value to return for an ungrouped
column.
You need to include all the select columns in the group by that are not part of the aggregate functions.
A few things:
Drop the backticks
Use a CASE statement instead of IF() CASE WHEN votes.use_id = 1 THEN votes.karma_delta ELSE 0 END
Change your timestampdiff to DATE_TRUNC('hour', now()) - DATE_TRUNC('hour', links.created) (you will need to then count the number of hours in the resulting interval. It would be much easier to compare timestamps)
Fix your GROUP BY and ORDER BY
Try to replace the IF with a case;
SUM(CASE WHEN votes.user_id = 1 THEN votes.karma_delta ELSE 0 END)
You also have to explicitly name every column or calculated column you use in the GROUP BY clause.