Group Numerical Data Within Specific Ranges - sql

I want to group the data from one of the tables in a SQL database. The field to be used for grouping is numerical and the groups I want do not follow any mathematical formula e.g.
Group 0 value unknown (NULL)
Group 1 values 0 - 499
Group 2 value 500 - 999
Group 3 value 1000 - 1999
Group 4 value 2000 - 4999
Group 5 value 5000 +
How can I achieve this with TSQL?
By the way I want to do this to display in a Crystal Report, so if there is a better way of doing this in Crystal Reports rather than through the SQL Select statement then please advise.

You can use a case to create a custom group. If you define the case in a subquery, you don't have to repeat the definition:
select Grp
, sum(col2)
, avg(col3)
from (
select case
when col1 is null then 0
when col1 between 0 and 499 then 1
when col1 between 500 and 999 then 2
...
end as Grp
, *
) as SubQueryAlias
group by
Grp

Related

SQL COUNT with condition and without - using JOIN

My goal is something like following table:
Key | Count since date X | Count total
1 | 4 | 28
With two simple selects I could gain this values: (the key of the table consists of 3 columns [t$ncmp, t$trav, t$seqn])
1. SELECT COUNT(*) FROM db.table WHERE t$date >= sysdate-2 GROUP BY t$ncmp, t$trav, t$seqn
2. SELECT COUNT(*) FROM db.table GROUP BY t$ncmp, t$trav, t$seqn
How can I join these statements?
What I tried:
SELECT n.t$trav, COUNT(n.t$trav), m.total FROM db.table n
LEFT JOIN (SELECT t$ncmp, t$trav, t$seqn, COUNT(*) as total FROM db.table
GROUP BY t$ncmp, t$trav, t$seqn) m
ON (n.t$ncmp = m.t$ncmp AND n.t$trav = m.t$trav AND n.t$seqn = m.t$seqn)
WHERE n.t$date >= sysdate-2
GROUP BY n.t$ncmp, n.t$trav, n.t$seqn
I tried different variantes, but always got errors like 'group by is missing' or 'unknown qualifier'.
Now this at least executes, but total is always 2.
T$TRAV COUNT(N.T$TRAV) TOTAL
4 2 2
29 3 2
51 1 2
62 2 2
16 1 2
....
If it matter, I will run this as an OPENQUERY from MSSQLSERVER to Oracle-DB.
I'd try
GROUP BY n.t$trav, m.total
You typically GROUP BY the same columns as you SELECT - except those who are arguments to set functions.
My goal is something like following table:
If so, you seem to want conditional aggregation:
select key, count(*) as total,
sum(case when datecol >= date 'xxxx-xx-xx' then 1 else 0 end) as total_since_x
from t
group by key;
I'm not sure how this relates to your sample queries. I simply don't see the relationship between that code and your question.

Need sum of a column from a filter condition for each row

Need to get total sum of defect between main_date column and past 365 day (a year) from it, if any, for a single ID.
And The value need to be populated for each row.
Have tried below queries and tried to use CSUM also but it's not working:
1) select sum(Defect) as "sum",Id,MAIN_DT
from check_diff
where MAIN_DT between ADD_MONTHS(MAIN_DT,-12) and MAIN_DT group by 2,3;
2)select Defect,
Type1,
Type2,
Id,
MAIN_DT,
ADD_MONTHS(TIM_MAIN_DT,-12) year_old,
CSUM(Defect,MAIN_DT)
from check_diff
where
MAIN_DT between ADD_MONTHS(MAIN_DT,-12) and MAIN_DT group by id;
The expected output is as below:
Defect Type1 Type2 Id main_dt sum
1 a a 1 3/10/2017 1
99 a a 1 4/10/2018 99
0 a b 1 7/26/2018 99
1 a b 1 11/21/2018 100
1 a c 2 12/20/2018 1
Teradata doesn't support RANGE for Cumulative Sums, but you can rewrite it using a Correlated Scalar SUbquery:
select Defect, Id, MAIN_DT,
( select sum(Defect) as "sum"
from check_diff as t2
where t2.Id = t1.Id
and t2.MAIN_DT > ADD_MONTHS(t1.MAIN_DT,-12)
and t2.MAIN_DT <= t1.MAIN_DT group by 2,3;
) as dt
from check_diff as t1
Performance might be bad depending on the overall number of rows and the number of rows per ID.

Closest value (MS Access)

Is it possible to get closest(smaller) value of the same id in MS Access database?
It only need one expression, but I don't know how to write it properly.
Should looks like this:
ID Value Value2
1 10 0
2 5 0
1 20 10
1 50 20
2 15 5
...and so on..........
I have tried to use Dlookup and Max functions but I failed.
Thank you for any help.
You seem to want the previous value. There is no way to do this unless you have a column specifying the ordering. If so:
select t.*,
(select top (1) t2.value
from t as t2
where t2.? < t.?
order by t2.? desc
) as prev_value
from t;
The ? is for the column that specifies the ordering.

SQL aggregate rows with same id , specific value in secondary column

I'm looking to filter out rows in the database (PostgreSQL) if one of the values in the status column occurs. The idea is to sum the amount column if the unique reference only has a status equals to 1. The query should not SELECT the reference at all if it has also a status of 2 or any other status for that matter. status refers to the state of the transaction.
Current data table:
reference | amount | status
1 100 1
2 120 1
2 -120 2
3 200 1
3 -200 2
4 450 1
Result:
amount | status
550 1
I've simplified the data example but I think it gives a good idea of what I'm looking for.
I'm unsuccessful in selecting only references that only have status 1.
I've tried sub-queries, using the HAVING clause and other methods without success.
Thanks
Here's a way using not exists to sum all rows where the status is 1 and other rows with the same reference and a non 1 status do not exist.
select sum(amount) from mytable t1
where status = 1
and not exists (
select 1 from mytable t2
where t2.reference = t1.reference
and t2.status <> 1
)
SELECT SUM(amount)
FROM table
WHERE reference NOT IN (
SELECT reference
FROM table
WHERE status<>1
)
The subquery SELECTs all references that must be excluded, then the main query sums everything except them
select sum (amount) as amount
from (
select sum(amount) as amount
from t
group by reference
having not bool_or(status <> 1)
) s;
amount
--------
550
You could use windowed functions to count occurences of status different than 1 per each group:
SELECT SUM(amount) AS amount
FROM (SELECT *,COUNT(*) FILTER(WHERE status<>1) OVER(PARTITION BY reference) cnt
FROM tc) AS sub
WHERE cnt = 0;
Rextester Demo

SQL group table by "leading rows" without pl/sql

I have this table (short example) with two columns
1 a
2 a
3 a3
4 a
5 a
6 a6
7 a
8 a8
9 a
and I would like to group/partition them into groups separated by those leading "a", ideally to add another column like this, so I can address those groups easily.
1 a 0
2 a 0
3 a3 3
4 a 3
5 a 3
6 a6 6
7 a 6
8 a8 8
9 a 8
problem is that setup of the table is dynamic so I can't use staticly lag or lead functions, any ideas how to do this without pl/sql in postgres version 9.5
Assuming the leading part is a single character. Hence the expression right(data, -1) works to extract the group name. Adapt to your actual prefix.
The solution uses two window functions, which can't be nested. So we need a subquery or a CTE.
SELECT id, data
, COALESCE(first_value(grp) OVER (PARTITION BY grp_nr ORDER BY id), '0') AS grp
FROM (
SELECT *, NULLIF(right(data, -1), '') AS grp
, count(NULLIF(right(data, -1), '')) OVER (ORDER BY id) AS grp_nr
FROM tbl
) sub;
Produces your desired result exactly.
NULLIF(right(data, -1), '') to get the effective group name or NULL if none.
count() only counts non-null values, so we get a higher count for every new group in the subquery.
In the outer query, we take the first grp value per grp_nr as group name and default to '0' with COALESCE for the first group without name (which has a NULL as group name so far).
We could use min() or max() as outer window function as well, since there is only one non-null value per partition anyway. first_value() is probably cheapest since the rows are sorted already.
Note the group name grp is data type text. You may want to cast to integer, if those are clean (and reliably) integer numbers.
This can be achieved by setting rows containing a to a specific value and all the other rows to a different value. Then use a cumulative sum to get the desired number for the rows. The group number is set to the next number when a new value in the val column is encountered and all the proceeding rows with a will have the same group number as the one before and this continues.
I assume that you would need a distinct number for each group and the number doesn't matter.
select id, val, sum(ex) over(order by id) cm_sum
from (select t.*
,case when val = 'a' then 0 else 1 end ex
from t) x
The result for the query above with the data in question, would be
id val cm_sum
--------------
1 a 0
2 a 0
3 a3 1
4 a 1
5 a 1
6 a6 2
7 a 2
8 a8 3
9 a 3
With the given data, you can use a cumulative max:
select . . .,
coalesce(max(substr(col2, 2)) over (order by col1), 0)
If you don't strictly want the maximum, then it gets a bit more difficult. The ANSI solution is to use the IGNORE NULLs option on LAG(). However, Postgres does not (yet) support that. An alternative is:
select . . ., coalesce(substr(reft.col2, 2), 0)
from (select . . .,
max(case when col2 like 'a_%' then col1 end) over (order by col1) as ref_col1
from t
) tt join
t reft
on tt.ref_col1 = reft.col1
You can also try this :
with mytable as (select split_part(t,' ',1)::integer id,split_part(t,' ',2) myvalue
from (select unnest(string_to_array($$1 a;2 a;3 a3;4 a;5 a;6 a6;7 a;8 a8;9 a$$,
';'))t) a)
select id,myvalue,myresult from mytable join (
select COALESCE(NULLIF(substr(myvalue,2),''),'0') myresult,idmin id_down
,COALESCE(lead(idmin) over (order by myvalue),999999999999) id_up
from (
select myvalue,min(id) idmin from mytable group by 1
) a) b
on id between id_down and id_up-1