SQL query to add or subtract values based on another field - sql

I need to calculate the net total of a column-- sounds simple. The problem is that some of the values should be negative, as are marked in a separate column. For example, the table below would yield a result of (4+3-5+2-2 = 2). I've tried doing this with subqueries in the select clause, but it seems unnecessarily complex and difficult to expand when I start adding in analysis for other parts of my table. Any help is much appreciated!
Sign Value
Pos 4
Pos 3
Neg 5
Pos 2
Neg 2

Using a CASE statement should work in most versions of sql:
SELECT SUM( CASE
WHEN t.Sign = 'Pos' THEN t.Value
ELSE t.Value * -1
END
) AS Total
FROM YourTable AS t

Try this:
SELECT SUM(IF(sign = 'Pos', Value, Value * (-1))) as total FROM table

I am adding rows from a single field in a table based on values from another field in the same table using oracle 11g as database and sql developer as user interface.
This works:
SELECT COUNTRY_ID, SUM(
CASE
WHEN ACCOUNT IN 'PTBI' THEN AMOUNT
WHEN ACCOUNT IN 'MLS_ENT' THEN AMOUNT
WHEN ACCOUNT IN 'VAL_ALLOW' THEN AMOUNT
WHEN ACCOUNT IN 'RSC_DEV' THEN AMOUNT * -1
END) AS TI
FROM SAMP_TAX_F4
GROUP BY COUNTRY_ID;

select a= sum(Value) where Sign like 'pos'
select b = sum(Value) where Signe like 'neg'
select total = a-b
this is abit sql-agnostic, since you didnt say which db you are using, but it should be easy to adapat it to any db out there.

Related

Using the total of a column of the queried table in a case when (Hive)

Simplified example:
In hive, I have a table t with two columns:
Name, Value
Bob, 2
Betty, 4
Robb, 3
I want to do a case when that uses the total of the Value column:
Select
Name
, CASE
When value>0.5*sum(value) over () THEN ‘0’
When value>0.9*sum(value) over () THEN ‘1’
ELSE ‘2’
END as var
From table
I don’t like the fact that sum(value) over () is computed twice. Is there a way to compute this only once. Added twist, I want to do this in one query, so without declaring user variables.
I was thinking of scalar queries:
With total as
(Select sum(value) from table)
Select
Name
, CASE
When value>0.5*(select * from total) THEN ‘0’
When value>0.9*(select * from total)THEN ‘1’
ELSE ‘2’
END as var
From table;
But this doesn’t work.
TLDR: Is there a way to simplify the first query without user variables ?
Don't worry about that. Let the optimizer worry about it. But, you can use a subquery or CTE if you don't want to repeat the expression:
select Name,
(case when value > 0.5 * total then '0'
when value > 0.9 * total then '1'
else '2'
end) as var
From (select t.*, sum(value) over () as total
from table t
) t;
Cross join a subquery that fetches the sum to the table:
Select
t.Name
, CASE
When t.value>0.9*tt.value THEN '1'
When t.value>0.5*tt.value THEN '0'
ELSE '2'
END as var
From table t cross join (select sum(value) value from table) tt
and change the order of the WHEN clauses in the CASE expression because as they are, the 2nd case will never succeed.
Since I/O is the major factor the slows down Hive queries, we should strive to reduce the num of stages to get better performance.
So it's better not to use a sub-query or CTE here.
Try this SQL with a global window clause:
select
name,
case
when value > 0.5*sum(value) over w then '0'
when value > 0.9*sum(value) over w then '1'
else '2'
end as var
from my_table
window w as (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
In this case window clause is the recommended way to reduce repetition of code.
Both the windowing and the sum aggregation will be computed only once. You can run explain select..., confirming that only ONE meaningful MR stage will be launched.
Edit:
1. A simple select clause on a subquery is not sth to worry about. It can be pushed down to the last phase of the subquery, so as to avoid additional MR stage.
2. Two identical aggregations residing in the same query block will only be evaluated once. So don’t worry about potential repeated calculation.

PostgreSQL - Handling empty query result

I am quite new to SQL and I am currently working on some survey results with PostgreSQL. I need to calculate percentages of each option from 5-point scale for all survey questions. I have a table with respondentid, questionid, question response value. Demographic info needed for filtering datacut is retrieved from another table. Then query is passed to result table. All queries texts for specific datacuts are generated by VBA script.
It works OK in general, however there's one problematic case - when there are no respondents for specific cut and I receive empty table as query result. If respondent count is greater than 0 but lower than calculation threshold (5 respondents) I am getting table full of NULLs which is OK. For 0 respondents I get 0 rows as result and nothing is passed to result table and it causes some displacement in final table. I am able to track such cuts as I am also calculating respondent number for overall datacut and storing it in another table. But is there anything I can do at this point - generate somehow table full of NULLs which could be inserted into result table when needed?
Thanks in advance and sorry for clumsiness in code.
WITH ItemScores AS (
SELECT
rsp.questionid,
CASE WHEN SUM(CASE WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) < 5 THEN
NULL
ELSE
ROUND(SUM(CASE WHEN rsp.respvalue = 5 THEN 1 ELSE 0 END)/CAST(SUM(CASE
WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) AS DECIMAL),2)
END AS 5spercentage,
... and so on for frequencies of 1s,2s,3s and 4s
SUM(CASE WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) AS QuestionTotalAnswers
FROM (
some filtering applied here [...]
) AS rsp
GROUP BY rsp.questionid
ORDER BY rsp.questionid;
INSERT INTO results_items SELECT * from ItemScores;
If you want to ensure that the questionid column won't be empty, then you must call a cte with its plain values and then left join with the table that actually you are using to make the aggregations, calcs etc. So it will generate for sure the first list and then join its values.
The example of its concept would be something like:
with calcs as (
select questionid, sum(respvalue) as sum_per_question
from rsp
group by questionid)
select distinct rsp.questionid, calcs.sum_per_question
from rsp
left join calcs on rsp.questionid = calcs.questionid

Emulate subquery with no main table in access

I can do this in SQL Server:
SELECT 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
(select sum(empid) from HR.employees) STOCK
but in Access the same query show me the next error:
Query input must contain at least one table or query
So which could be the best form to emulate this? Make a query with any other table looks dirty for me.
EDIT 1:, HR.employees It may no have data, but i want show constants ('HERRAMIENTA ELÉCTRICA',''0') and 0 in the third column, maybe using isnull and this is not the problem here.
Why not to select directly:
select 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
IIF(ISNULL(sum(empid)), 0, sum(empid)) AS STOCK
from HR.employees
This simply doesn't work in Access. You need a FROM clause.
So you need to have a dummy table with one record, even if you don't use a single field from that table.
SELECT 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
(select sum(empid) from HR.employees) STOCK
FROM Dummy_Table
Using this example as empty table:
with employ as
(select 2 as col from dual
minus
select 2 as col from dual)
The query is this one:
select 'HERRAM' as tipo,
0 as deprec,
coalesce(sum(col), 0) as STOCK
from employ;
coalesce(x, value) sets the column to value when X is null
In Access, you can use a system table, and Val and Nz for the zero value:
SELECT TOP 1
'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
Val(Nz((select sum(empid) from HR.employees), 0)) AS STOCK
FROM
MSysObjects

Doing Math with 2 Subquerys

I have two subquerys both calculating sums. I would like to do an Artithmetic Minus(-) with the result of both Querys . eg Query1: 400 Query2: 300 Result should be 100.
Obvious a basic - in the query does not work. The minus works as MINUS on sets. How can I solve this? Do you have any ideas?
SELECT CustumersNo FROM Custumers WHERE
(
SELECT SUM(value) FROM roe WHERE roe.credit = Custumers.CustumersNo
-
SELECT SUM(value) FROM roe WHERE roe.debit = Custumers.CustumersNo
)
> 500
Using Informix - sorry missed that point
To get the original syntax to work, you would need to surround the sub-selects in parentheses:
SELECT CustumersNo
FROM Custumers
WHERE ((SELECT SUM(value) FROM roe WHERE roe.credit = Custumers.CustumersNo)
-
(SELECT SUM(value) FROM roe WHERE roe.debit = Custumers.CustumersNo)
) > 500
Note that aggregates are defined to ignore nulls in the values they aggregate in standard SQL. However, the SUM of an empty set of rows is NULL, not zero.
You can get inventive and devise ways to always have a value for each customer listed in the roe table, such as:
SELECT CustomersNo
FROM (SELECT CustomersNo, SUM(value) AS net_credit
FROM (SELECT credit AS CustomersNo, +value
UNION
SELECT debit AS CustomersNo, -value
) AS x
GROUP BY CustomersNo
) AS y
WHERE net_credit > 500;
You can also do that with an appropriate HAVING clause if you wish. Note that this avoids issues with customers who have credit entries but no debit entries or vice versa; all the entries that are present are treated appropriately.
Your misspelling (or unorthodox spelling) of 'customers' is nearly as good as 'costumers'.
Something like what you tried should work. It may be a syntax problem, and it may depend on what type of SQL you are using. However, an approach like this would be more efficient:
Update: I see you were having a problem with nulls, so I updated it to handle nulls properly.
select CustumersNo from (
select CustumersNo,
sum(coalesce(roecredit.value,0)) - sum(coalesce(roedebit.value,0))
as balance
FROM Custumers
join roe roecredit on roe.credit = Custumers.CustumersNo
join roe roedebit on roe.debit = Custumers.CustumersNo
group by CustumersNo
)
where balance > 500
Caveat: I don't have experience with Informix specifically.

Sql Server equivalent of a COUNTIF aggregate function

I'm building a query with a GROUP BY clause that needs the ability to count records based only on a certain condition (e.g. count only records where a certain column value is equal to 1).
SELECT UID,
COUNT(UID) AS TotalRecords,
SUM(ContractDollars) AS ContractDollars,
(COUNTIF(MyColumn, 1) / COUNT(UID) * 100) -- Get the average of all records that are 1
FROM dbo.AD_CurrentView
GROUP BY UID
HAVING SUM(ContractDollars) >= 500000
The COUNTIF() line obviously fails since there is no native SQL function called COUNTIF, but the idea here is to determine the percentage of all rows that have the value '1' for MyColumn.
Any thoughts on how to properly implement this in a MS SQL 2005 environment?
You could use a SUM (not COUNT!) combined with a CASE statement, like this:
SELECT SUM(CASE WHEN myColumn=1 THEN 1 ELSE 0 END)
FROM AD_CurrentView
Note: in my own test NULLs were not an issue, though this can be environment dependent. You could handle nulls such as:
SELECT SUM(CASE WHEN ISNULL(myColumn,0)=1 THEN 1 ELSE 0 END)
FROM AD_CurrentView
I usually do what Josh recommended, but brainstormed and tested a slightly hokey alternative that I felt like sharing.
You can take advantage of the fact that COUNT(ColumnName) doesn't count NULLs, and use something like this:
SELECT COUNT(NULLIF(0, myColumn))
FROM AD_CurrentView
NULLIF - returns NULL if the two passed in values are the same.
Advantage: Expresses your intent to COUNT rows instead of having the SUM() notation.
Disadvantage: Not as clear how it is working ("magic" is usually bad).
I would use this syntax. It achives the same as Josh and Chris's suggestions, but with the advantage it is ANSI complient and not tied to a particular database vendor.
select count(case when myColumn = 1 then 1 else null end)
from AD_CurrentView
How about
SELECT id, COUNT(IF status=42 THEN 1 ENDIF) AS cnt
FROM table
GROUP BY table
Shorter than CASE :)
Works because COUNT() doesn't count null values, and IF/CASE return null when condition is not met and there is no ELSE.
I think it's better than using SUM().
Adding on to Josh's answer,
SELECT COUNT(CASE WHEN myColumn=1 THEN AD_CurrentView.PrimaryKeyColumn ELSE NULL END)
FROM AD_CurrentView
Worked well for me (in SQL Server 2012) without changing the 'count' to a 'sum' and the same logic is portable to other 'conditional aggregates'. E.g., summing based on a condition:
SELECT SUM(CASE WHEN myColumn=1 THEN AD_CurrentView.NumberColumn ELSE 0 END)
FROM AD_CurrentView
It's 2022 and latest SQL Server still doesn't have COUNTIF (along with regex!). Here's what I use:
-- Count if MyColumn = 42
SELECT SUM(IIF(MyColumn = 42, 1, 0))
FROM MyTable
IIF is a shortcut for CASE WHEN MyColumn = 42 THEN 1 ELSE 0 END.
Not product-specific, but the SQL standard provides
SELECT COUNT() FILTER WHERE <condition-1>,
COUNT() FILTER WHERE <condition-2>, ...
FROM ...
for this purpose. Or something that closely resembles it, I don't know off the top of my hat.
And of course vendors will prefer to stick with their proprietary solutions.
Why not like this?
SELECT count(1)
FROM AD_CurrentView
WHERE myColumn=1
I had to use COUNTIF() in my case as part of my SELECT columns AND to mimic a % of the number of times each item appeared in my results.
So I used this...
SELECT COL1, COL2, ... ETC
(1 / SELECT a.vcount
FROM (SELECT vm2.visit_id, count(*) AS vcount
FROM dbo.visitmanifests AS vm2
WHERE vm2.inactive = 0 AND vm2.visit_id = vm.Visit_ID
GROUP BY vm2.visit_id) AS a)) AS [No of Visits],
COL xyz
FROM etc etc
Of course you will need to format the result according to your display requirements.
SELECT COALESCE(IF(myColumn = 1,COUNT(DISTINCT NumberColumn),NULL),0) column1,
COALESCE(CASE WHEN myColumn = 1 THEN COUNT(DISTINCT NumberColumn) ELSE NULL END,0) AS column2
FROM AD_CurrentView