Grouping multiple column operations in 1 table scan - sql

I need to translate SAS code (PROC SQL) to (postgres) SQL, especially the calculated keyword in SAS that allow a variable defined in the query to be re-used directly in the same query for another variable computation:
SELECT
id,
sum( case
when (sales > 0) then 1
when (sales = 0) then 0
else -1
end) as pre_freq,
(case
when calculated pre_freq > 0 then calculated pre_freq
else 1
end) as freq
FROM my_table
GROUP BY id
This is not possible (AFAIK) in SQL, so I need to break down each step of the computation.
I was wondering what was the best option, knowing that, from my understanding, it is better to have more computations and fewer table scans, i.e. make as much as computation during a scan, instead of multiple table scans with a small computation steps.
In this particular exemple I could use:
SELECT
id
, greatest(1, sum( case
when (sales > 0) then 1
when (sales = 0) then 0
else -1
end) as freq
FROM
my_table
GROUP BY id
or:
SELECT
id
, (case when sum(case
when (sales > 0) then 1
when (sales < 0) then -1
else 0
end) > 0 then sum(case
when (sales > 0) then 1
when (sales < 0) then -1
else 0
end) else 1 end) as freq
FROM
my_table
GROUP BY id
... which is starting to be hard to read...
Is there anyway to define a variable for a snippet of SQL code that will be repeated?
More generally speaking that this illustration, was is the best (most efficient) approach?

calculated is a nice feature of proc sql. However, you cannot re-use aliases in databases in general (this is not a Postgres-specific limitation). A simple method is to use a subquery or CTE:
select id, pre_freq,
(case when pre_freq > 0 then pre_freq
else 1
end) as freq
from (select id,
sum(case when (sales > 0) then 1
when (sales = 0) then 0
else -1
end) as pre_freq,
from my_table t
group by id
) t;
However, the simplest solution is to use sign():
select id, sum(sign(sales)) as pre_freq,
greatest(sum(sign(sales)), 1) as freq
from my_table t
group by id;
Note: This is slightly different. It basically ignores NULL values. If you really need to treat NULL as -1, then use coalesce().

Related

How to count defined values (e.g. -1) in each column of a big table?

My task is to analyze a big table (250 columns, millions of rows). I need to find out, how many defined values (e.g. -1) there are in each column. I have a solution that loops through the columns of my table and uses the methods described in the following links:
Fastest way to count exact number of rows in a very large table?
https://learn.microsoft.com/de-de/archive/blogs/martijnh/sql-serverhow-to-quickly-retrieve-accurate-row-count-for-table
However, I have to do:
select column into #tab from MyBigTable where column = -1
And then apply the methods to #tab.
Do you see any way how this can be efficiently dealt with?
You could conditionally aggregate
select sum(case when col1 = -1 then 1 else 0 end) col1sum,
sum(case when col2 = -1 then 1 else 0 end) col2sum,
...
...
sum(case when coln = -1 then 1 else 0 end) colnsum
from yourtable

SQL or operator how to use with having

i have a table which i am joining with with operator. I can have 2 combinations id that table FDEL - 1 or 0 and FDVE 1 or 0, what i would like to do is to dispay if - item has fdve, or item has fdel (and count) but it doesnt work (i can see all fdve, or all fdel)
select
lpad(purchase_id,10,0) as purchase_id,
sum(has_label_fdel) as FDEL_count,
case when LABELS like '%FDVE%' then 1 else 0 end as HAS_LABEL_FDVE,
sum(has_label_fdve) as FDVE_count
from
"SRC_ORACLEIWP"."PURCHASE_ANALYSIS_RULES"
group by
lpad(purchase_id,10,0),has_label_fdve
having FDVE_count>0 -- FDEL_count>0
You want one resut row per product, so group by product only. Use SUMfor counting and MAX for the aggregated yes/no.
select
lpad(purchase_id,10,0) as padded_purchase_id,
max(has_label_fdve) as has_labels_fdve,
sum(has_label_fdve) as fdve_count,
max(has_label_fdel) as has_labels_fdel,
sum(has_label_fdel) as fdel_count
from src_oracleiwp.purchase_analysis_rules
group by padded_purchase_id
having has_labels_fdve = 1
or has_labels_fdel = 1
order by padded_purchase_id;
I've changed your alias names slightly, so they are digfferent from the columns you have (because such ambiguities can sometimes lead to problems).
The check on labels like '%FDVE%' is unnecessary, because you already have the has_label_fdve flag, which is always 0 or 1. Or so it seems. If the flags can be null, use COALESCE on them or do use LIKE expressions.
If you don't have has_label_fdve and has_label_fdel yet, use the labels column instead:
select
lpad(purchase_id,10,0) as padded_purchase_id,
max(case when labels like '%FDVE%' then 1 else 0 end) as has_labels_fdve,
sum(case when labels like '%FDVE%' then 1 else 0 end) as fdve_count,
max(case when labels like '%FDEL%' then 1 else 0 end) as has_labels_fdel,
sum(case when labels like '%FDEL%' then 1 else 0 end) as fdel_count
from src_oracleiwp.purchase_analysis_rules
group by padded_purchase_id
having has_labels_fdve = 1
or has_labels_fdel = 1
order by padded_purchase_id;

How to get output in sql two rates into two column in same table

Please help on this. Sometimes my title maybe wrong. Actually i'm unable to explain the problem in word. See below images. Image 1 is db table structure. Image 2 is what I expect result.
I used mentioned query and got result as below image. Also I need to remove 'NULL's and same URGENT_LEVEL values in one row. How i do that? Using ms-sql server.
select TRACKING_NUMBER,URGENT_LEVEL,
case when FROM_KM = '0' then Charge end as 'Under1km' ,
case when FROM_KM='1' then Charge end as '1-100KM'
from my_table
where TRACKING_NUMBER = 'TEST001'
After query
You can use conditional aggregation:
select tracking_number, urgent_level,
sum(case when to_km - from_km <= 1 then charge else 0 end) as charge_under_1,
sum(case when to_km - from_km > 1 and to_km - from_km <= 100 then charge else 0 end) as charge_1_100
from t
group by tracking_number, urgent_level;
I don't know which database you are using, but you need to use group by clause, and something similar to case expression (in oracle) as below :
select TRACKING_NUMBER
, URGENT_LEVEL
, max(case when FROM_KM = 0 and TO_KM = 1 then CHARGE end) as "UNDER 1KM"
, max(case when FROM_KM = 1 and TO_KM = 100 then CHARGE end) as "1-100KM"
from your_table
where TRACKING_NUMBER = 'TEST001'
group by TRACKING_NUMBER, URGENT_LEVEL
order by "UNDER 1KM"
;

Sum a column and perform more calculations on the result? [duplicate]

This question already has an answer here:
How to use an Alias in a Calculation for Another Field
(1 answer)
Closed 3 years ago.
In my query below I am counting occurrences in a table based on the Status column. I also want to perform calculations based on the counts I am returning. For example, let's say I want to add 100 to the Snoozed value... how do I do this? Below is what I thought would do it:
SELECT
pu.ID Id, pu.Name Name,
COUNT(*) LeadCount,
SUM(CASE WHEN Status = 'Working' THEN 1 ELSE 0 END) AS Working,
SUM(CASE WHEN Status = 'Uninterested' THEN 1 ELSE 0 END) AS Uninterested,
SUM(CASE WHEN Status = 'Converted' THEN 1 ELSE 0 END) AS Converted,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END) AS Snoozed,
Snoozed + 100 AS Test
FROM
Prospects p
INNER JOIN
ProspectsUsers pu on p.OwnerId = pu.SalesForceId
WHERE
p.Store = '108'
GROUP BY
pu.Name, pu.Id
ORDER BY
Name
I get this error:
Invalid column name 'Snoozed'.
How can I take the value of the previous SUM statement, add 100 to it, and return it as another column? What I was aiming for is an additional column labeled Test that has the Snooze count + 100.
You can't use one column to create another column in the same way that you are attempting. You have 2 options:
Do the full calculation (as #forpas has mentioned in the comments above)
Use a temp table or table variable to store the data, this way you can get the first 5 columns, and then you can add the last column or you can select from the temp table and do the last column calculations from there.
You can not use an alias as a column reference in the same query. The correct script is:
SELECT
pu.ID Id, pu.Name Name,
COUNT(*) LeadCount,
SUM(CASE WHEN Status = 'Working' THEN 1 ELSE 0 END) AS Working,
SUM(CASE WHEN Status = 'Uninterested' THEN 1 ELSE 0 END) AS Uninterested,
SUM(CASE WHEN Status = 'Converted' THEN 1 ELSE 0 END) AS Converted,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END)+100 AS Snoozed
FROM
Prospects p
INNER JOIN
ProspectsUsers pu on p.OwnerId = pu.SalesForceId
WHERE
p.Store = '108'
GROUP BY
pu.Name, pu.Id
ORDER BY
Name
MSSQL does not allow you to reference fields (or aliases) in the SELECT statement from within the same SELECT statement.
To work around this:
Use a CTE. Define the columns you want to select from in the CTE, and then select from them outside the CTE.
;WITH OurCte AS (
SELECT
5 + 5 - 3 AS OurInitialValue
)
SELECT
OurInitialValue / 2 AS OurFinalValue
FROM OurCte
Use a temp table. This is very similar in functionality to using a CTE, however, it does have different performance implications.
SELECT
5 + 5 - 3 AS OurInitialValue
INTO #OurTempTable
SELECT
OurInitialValue / 2 AS OurFinalValue
FROM #OurTempTable
Use a subquery. This tends to be more difficult to read than the above. I'm not certain what the advantage is to this - maybe someone in the comments can enlighten me.
SELECT
5 + 5 - 3 AS OurInitialValue
FROM (
SELECT
OurInitialValue / 2 AS OurFinalValue
) OurSubquery
Embed your calculations. opinion warning This is really sloppy, and not a great approach as you end up having to duplicate code, and can easily throw columns out-of-sync if you update the calculation in one location and not the other.
SELECT
5 + 5 - 3 AS OurInitialValue
, (5 + 5 - 3) / 2 AS OurFinalValue
You can't use a column alias in the same select. The column alias do not precedence / sequence; they are all created after the eval of the select result, just before group by and order by.
You must repeat code :
SELECT
pu.ID Id,pu.Name Name,
COUNT(*) LeadCount,
SUM(CASE WHEN Status = 'Working' THEN 1 ELSE 0 END) AS Working,
SUM(CASE WHEN Status = 'Uninterested' THEN 1 ELSE 0 END) AS Uninterested,
SUM(CASE WHEN Status = 'Converted' THEN 1 ELSE 0 END) AS Converted,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END) AS Snoozed,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END)+ 100 AS Test
FROM
Prospects p
INNER JOIN
ProspectsUsers pu on p.OwnerId = pu.SalesForceId
WHERE
p.Store = '108'
GROUP BY
pu.Name, pu.Id
ORDER BY
Name
If you don't want to repeat the code, use a subquery
SELECT
ID, Name, LeadCount, Working, Uninterested,Converted, Snoozed, Snoozed +100 AS test
FROM
(SELECT
pu.ID Id,pu.Name Name,
COUNT(*) LeadCount,
SUM(CASE WHEN Status = 'Working' THEN 1 ELSE 0 END) AS Working,
SUM(CASE WHEN Status = 'Uninterested' THEN 1 ELSE 0 END) AS Uninterested,
SUM(CASE WHEN Status = 'Converted' THEN 1 ELSE 0 END) AS Converted,
SUM(CASE WHEN SnoozedId > 0 THEN 1 ELSE 0 END) AS Snoozed
FROM Prospects p
INNER JOIN ProspectsUsers pu on p.OwnerId = pu.SalesForceId
WHERE p.Store = '108'
GROUP BY pu.Name, pu.Id) t
ORDER BY Name
or a view

SQL 2 counts with different filter

I have a table and I need calculate two aggregate functions with different conditions in one statement. How can I do this?
Pseudocode below:
SELECT count(CoumntA) *< 0*, count(CoumntA) * > 0*
FROM dbo.TableA
This is the same idea as tombom's answer, but with SQL Server syntax:
SELECT
SUM(CASE WHEN CoumntA < 0 THEN 1 ELSE 0 END) AS LessThanZero,
SUM(CASE WHEN CoumntA > 0 THEN 1 ELSE 0 END) AS GreaterThanZero
FROM TableA
As #tombom demonstrated, this can be done as a single query. But it doesn't mean that it should be.
SELECT
SUM(CASE WHEN CoumntA < 0 THEN 1 ELSE 0 END) AS less_than_zero,
SUM(CASE WHEN CoumntA > 0 THEN 1 ELSE 0 END) AS greater_than_zero
FROM
TableA
The time when this is not so good is...
- There is an index on CoumntA
- Most values (50% or more feels about right) are exactly zero
In that case, two queries will be faster. This is because each query can use the index to quickly home in on the section to be counted. In the end only counting the relevant records.
The example I gave, however, scans the whole table every time. Only once, but always the whole table. This is worth it when you're counting most of the records. In your case it looks liek you're counting most or all of them, and so this is probably a good way of doing it.
It is possible to do this in one select statement.
The way I've done it before is like this:
SELECT SUM(CASE WHEN ColumnA < 0 THEN 1 END) AS LessThanZero,
SUM(CASE WHEN ColumnA > 0 THEN 1 END) AS GreaterThanZero
FROM dbo.TableA
This is the correct MS SQL syntax and I believe this is a very efficient way of doing it.
Don't forget you are not covering the case when ColumnA = 0!
select '< 0' as filter, COUNT(0) as cnt from TableA where [condition 1]
union
select '> 0' as filter, COUNT(0) as cnt from TableA where [condition 2]
Be sure that condition 1 and condition 2 create a partition on the original set of records, otherwise same records could be counted in both groups.
For SQL Server, one way would be;
SELECT COUNT(CASE WHEN CoumntA<0 THEN 1 ELSE NULL END),
COUNT(CASE WHEN CoumntA>0 THEN 1 ELSE NULL END)
FROM dbo.TableA
Demo here.
SELECT
SUM(IF(CoumntA < 0, 1, 0)) AS lowerThanZero,
SUM(IF(CoumntA > 0, 1, 0)) AS greaterThanZero
FROM
TableA
Is it clear what's happening? Ask, if you have any more questions.
A shorter form would be
SELECT
SUM(CoumntA < 0) AS lowerThanZero,
SUM(CoumntA > 0) AS greaterThanZero
FROM
TableA
This is possible, since in MySQL a true condition is equal 1, a false condition is equal 0
EDIT: okay, okay, sorry, don't know why I thought it's about MySQL here.
See the other answers about correct syntax.