Teradata how can I use count(distinct)? - sql

I saw some questions similar but I still couldn't figure out.
There is many columns, but to short, I want to count distinct by g1 column.
I thought I could just use COUNT(DISTINCT)...?
Please help me for this problem.
Thank you so much in advance
G1
C1
expected results
1
A
2
1
A
2
1
B
2
2
A
3
2
B
3
2
A
3
2
C
3
3
A
1
3
A
1
3
A
1
3
A
1

Most (all?) databases do not support using COUNT(DISTINCT ...) as an analytic function. So in this case, I would suggest just joining to a subquery which finds the distinct counts:
SELECT t1.G1, t1.C1, t2.cnt
FROM yourTable t1
INNER JOIN
(
SELECT G1, COUNT(DISTINCT C1) AS cnt
FROM yourTable
GROUP BY G1
) t2
ON t2.G1 = t1.G1
ORDER BY
t1.G1, t1.C1;

You can get a windowed count distinct, but you always need two functions, one possible way is:
SELECT G1, C1, max(dr) over (partition by G1) as cnt
FROM
(
SELECT G1, C1,
dense_rank()
over (partition by G1
order by C1) AS dr
FROM yourTable
) as dt
Depending on your data and actual query this might perform better than Tim's query :-)
Of course, this can be modified for NULLable columns flagging the 1st occurance:
SELECT G1, C1, sum(flag) over (partition by G1) as cnt
FROM
(
SELECT G1, C1,
case
when lag(C1)
over (partition by G1
order by C1) = C1
then null
else C1
end as flag
FROM yourTable
) as dt

You can use the below query:
SELECT count(C1) from TableName Group by G1

If your database doesn't use count(distinct) as a window function, you can use the handy trick of the sum of dense_rank():
select t.*,
(-1 +
dense_rank() over (partition by g1 order by c1 asc) +
dense_rank() over (partition by g1 order by c1 desc)
) as expected
from t;
Given that count(distinct) as a window function is easily implemented this way, I am surprised that many databases do not support this functionality directly.
One nuance: This counts NULL as a valid value. You don't have NULL values in your sample data so I don't think this affects you. But, if you wanted an exact equivalent:
select t.*,
( (case when count(*) over (partition by g1) = count(c1) over (partition by g1)
then -1 else -2
end) +
dense_rank() over (partition by g1 order by c1 asc) +
dense_rank() over (partition by g1 order by c1 desc)
) as expected
from t;

Related

Oracle get rank for only latest date

I have origin table A:
dt
c1
value
2022/10/1
1
1
2022/10/2
1
2
2022/10/3
1
3
2022/10/1
2
4
2022/10/2
2
6
2022/10/3
2
5
Currently I got the latest dt's percent_rank by:
select * from
(
select
*,
percent_rank() over (partition by c1 order by value) as prank
from A
) as pt
where pt.dt = Date'2022-10-3'
Demo: https://www.db-fiddle.com/f/rXynTaD5nmLqFJdjDSCZpL/0
the excepted result looks like:
dt
c1
value
prank
2022/10/3
1
3
1
2022/10/3
2
5
0.5
Which means at 2022-10-3, the value in c1 group's percent_rank in history is 100% while in c2 group is 66%.
But this sql will sort evey partition which I thought it's time complexity is O(n log n).
I just need the latest date's rank and I thought I could do that by calculating count(last_value > value)/count() which cost O(n).
Any suggestions?
Rather than hard-coding the maximum date, you can use the ROW_NUMBER() analytic function:
SELECT *
FROM (
SELECT t.*,
PERCENT_RANK() OVER (PARTITION BY c1 ORDER BY value) AS prank,
ROW_NUMBER() OVER (PARTITION BY c1 ORDER BY dt DESC) AS rn
FROM table_name t
) t
WHERE rn = 1
Which, for the sample data:
CREATE TABLE table_name (dt, c1, value) AS
SELECT DATE '2022-10-01', 1, 1 FROM DUAL UNION ALL
SELECT DATE '2022-10-02', 1, 2 FROM DUAL UNION ALL
SELECT DATE '2022-10-03', 1, 3 FROM DUAL UNION ALL
SELECT DATE '2022-10-01', 2, 4 FROM DUAL UNION ALL
SELECT DATE '2022-10-02', 2, 6 FROM DUAL UNION ALL
SELECT DATE '2022-10-03', 2, 5 FROM DUAL;
Outputs:
DT
C1
VALUE
PRANK
RN
2022-10-03 00:00:00
1
3
1
1
2022-10-03 00:00:00
2
5
.5
1
fiddle
But this sql will sort every partition which I thought it's time complexity is O(n log n).
Whatever you do you will need to iterate over the entire result-set.
I just need the latest date's rank and I thought I could do that by calculating count(last_value > value)/count().
Then you will need to find the last value which (unless you are hardcoding the last date) will involve using an index- or table-scan over all the values in each partition and sorting the values and then to find a count of the greater values will require a second index- or table-scan. You can profile both solutions but I expect you would find that using analytic functions is going to be equally efficient, if not better, than trying to use aggregation functions.
For example:
SELECT c1,
dt,
value,
( SELECT ( COUNT(CASE WHEN value <= t.value THEN 1 END) - 1 )
/ ( COUNT(*) - 1 )
FROM table_name c
WHERE c.c1 = t.c1
) AS prank
FROM table_name t
WHERE dt = DATE '2022-10-03'
If going to access the table twice and you are likely to find that the I/O costs of table access are going to far outweight any potential savings from using a different method. However, if you look at the explain plan (fiddle) then the query is still performing an aggregate sort so there is not going to be any cost savings, only additional costs from this method.
Try this
select t.c1, t.dt, t.value
from TABLENAME t
inner join (
select c1, max(dt) as MaxDate
from TABLENAME
group BY dt
) tm on t.c1 = tm.c1 and t.dt = tm.MaxDate ORDER BY dt DESC;
Or as simple as
SELECT * from TABLENAME ORDER BY dt DESC;
I fiddled it a bit, it is almost the same answer as MT0 already put.
select dt, c1, val, prank*100 as percent_rank from (select
t1.*,
percent_rank() over (partition by c1 order by val) as prank,
row_number() over (partition by c1 order by dt desc) rn from t1) where rn=1;
result
DT C1 VAL PERCENT_RANK
2022-10-03 1 3 100
2022-10-03 2 5 50
http://sqlfiddle.com/#!4/ec60a/23
I used Row_number = 1 to get the latest date.
And also pushed the percent_rank as percent.
Is this what you desire?

In SQL how to increment a varibale in case statement

So I have a table A as follows
Message code trig timestamp
a x 1 T1
a x 1 T2
a x 0 T3
b y 1 T4
b y 1 T5
a x 1 T6
I want the following result
Message code trig timestamp groupbycolumn
a x 1 T1 1
a x 1 T2 1
a x 0 T3 2
b y 1 T4 3
b y 1 T5 3
a x 1 T6 4
I need to group the rows according to message, code and trigg but ordered by the timestamp. So if a new message, code and trigg value comes then it should have a new number in the groupby column. Note that a,x 1 in the first line has a groupby value 1 and the one in the last has 4.
declare #chngeVal int;
set #chngeVal=0;
select n.Message,n.code,n.trig,
case when n.Message<>n.nextMessage or n.code<>n.nextCode or n.trig<>n.nextTrigg
then #chngeVal+1
else #chngeVal
end as groupbycolumn,
n.timeStamp
from ( select Message,code,trig,timestamp,
lead(Message) over (order by timestamp asc) as nextMessage,
lead(code) over (order by timestamp asc) as nextCode,
lead(trig) over (order by timestamp asc) as nextTrig
from A ) n
If I could get the case to do a #chngeVal= #chngeVal+1 it would work, but I cannot do that in case. Would anybody know how to change the value of a variable in a query.
Any idea would be much appreciated.
I broke the solution into a three part query using two CTEs:
CreateIds produces ids I use to identify the rows in the next two parts.
Firstrows gets only the rows that start each group, and determines the unique id for each group as well as the row id that starts the next group (NexdtGroupRowId).
Finally, I produce the result by joining Firstrows to a range of rows from CreateIds that have a rowId between the rowId of the first row and the rowId of NextGroupRowId - 1.
My feeling is that this is inefficient as heck, and there's a way to do this with a recursive CTE. But since you started using window functions I just went in that direction.
WITH createIds AS (
SELECT *
, ROW_NUMBER() OVER(ORDER BY [timestamp]) AS RowId
, DENSE_RANK() OVER(ORDER BY Message, code, trig DESC) AS GroupId
FROM src
)
, firstrows AS (
SELECT a.RowId
, ROW_NUMBER() OVER (ORDER BY a.RowId) AS OrderedGroupId
, LEAD(a.RowId, 1, NULL) OVER (ORDER BY a.RowId) NextGroupRowId
FROM createIds a
LEFT JOIN createIds b ON b.RowId = a.RowId - 1
WHERE a.GroupId != b.GroupId OR b.GroupId IS NULL
)
SELECT a.[Message], a.code, a.trig, a.[timestamp], r1.OrderedGroupId
FROM firstrows r1
INNER JOIN createIds a ON a.RowId >= r1.RowId AND (r1.NextGroupRowId IS NULL OR a.RowId < r1.NextGroupRowId)
ORDER BY a.[timestamp]
You can use the difference of row_numbers() or lag() and cmulative sums:
select t.*,
sum(case when message = prev_message and code = prev_code and trig = prev_trig
then 0 else 1
end) over (order by timestamp) as groupbycolumn
from (select t.*,
lag(message) over (order by timestamp) as prev_message,
lag(code) over (order by timestamp) as prev_code,
lag(trig) over (order by timestamp) as prev_trig
from a
) a

Cumulative sum over a table

What is the best way to perform a cumulative sum over a table in Postgres, in a way that can bring the best performance and flexibility in case more fields / columns are added to the table.
Table
a b d
1 59 15 181
2 16 268
3 219
4 102
Cumulative
a b d
1 59 15 181
2 31 449
3 668
4 770
You can use window functions, but you need additional logic to avoid values where there are NULLs:
SELECT id,
(case when a is not null then sum(a) OVER (ORDER BY id) end) as a,
(case when b is not null then sum(b) OVER (ORDER BY id) end) as b,
(case when d is not null then sum(d) OVER (ORDER BY id) end) as d
FROM table;
This assumes that the first column that specifies the ordering is called id.
Window functions for running sum.
SELECT sum(a) OVER (ORDER BY d) as "a",
sum(b) OVER (ORDER BY d) as "b",
sum(d) OVER (ORDER BY d) as "d"
FROM table;
If you have more than one running sum, make sure the orders are the same.
http://www.postgresql.org/docs/8.4/static/tutorial-window.html
http://www.postgresonline.com/journal/archives/119-Running-totals-and-sums-using-PostgreSQL-8.4-Windowing-functions.html
It's important to note that if you want your columns to appear as the aggregate table in your question (each field uniquely ordered), it'd be a little more involved.
Update: I've modified the query to do the required sorting, without a given common field.
SQL Fiddle: (1) Only Aggregates, or (2) Source Data Beside Running Sum
WITH
rcd AS (
select row_number() OVER() as num,a,b,d
from tbl
),
sorted_a AS (
select row_number() OVER(w1) as num, sum(a) over(w2) a
from tbl
window w1 as (order by a nulls last),
w2 as (order by a nulls first)
),
sorted_b AS (
select row_number() OVER(w1) as num, sum(b) over(w2) b
from tbl
window w1 as (order by b nulls last),
w2 as (order by b nulls first)
),
sorted_d AS (
select row_number() OVER(w1) as num, sum(d) over(w2) d
from tbl
window w1 as (order by d nulls last),
w2 as (order by d nulls first)
)
SELECT sorted_a.a, sorted_b.b, sorted_d.d
FROM rcd
JOIN sorted_a USING(num)
JOIN sorted_b USING(num)
JOIN sorted_d USING(num)
ORDER BY num;
I think what you are really looking for is this:
SELECT id
, sum(a) OVER (PARTITION BY a_grp ORDER BY id) as a
, sum(b) OVER (PARTITION BY b_grp ORDER BY id) as b
, sum(d) OVER (PARTITION BY d_grp ORDER BY id) as d
FROM (
SELECT *
, count(a IS NULL OR NULL) OVER (ORDER BY id) as a_grp
, count(b IS NULL OR NULL) OVER (ORDER BY id) as b_grp
, count(d IS NULL OR NULL) OVER (ORDER BY id) as d_grp
FROM tbl
) sub
ORDER BY id;
The expression count(col IS NULL OR NULL) OVER (ORDER BY id) forms groups of consecutive non-null rows for a, b and d in the subquery sub.
In the outer query we run cumulative sums per group. NULL values form their own group and stay NULL automatically. No additional CASE statement necessary.
SQL Fiddle (with some added values for column a to demonstrate the effect).

How do I get total count of result set to appear in every record of output?

Using Oracle 11g
How can I write query to include a 4th column which displays the total rows returned?
I'm having technical difficulties posting a question
as noted here. As soon as this posts, I'll continue with my edit.
Use the window function:
SELECT col1, col2, col3, COUNT(*) OVER () AS total_rows
FROM mytable
If you had meant the total so far then:
select c1, c2, c3,
count(*) over (order by c1 range unbounded preceding) as total_rows
from mytable
order by c1
This will get a result like:
C1 C2 C3 TOTAL_ROWS
A B C 1
A B D 2
B D E 3
...

How to return rows only if there is more than one result

In a Oracle 10g database, I would like to build a SQL query that returns the result rows only if there is more than one row result.
Is it possible and how ?
you need to count the number of returned results, if you dont want to make a group by, you can use the following :
SELECT *
FROM (SELECT col1,
col2,
COUNT(*) OVER() cnt
FROM your_table
WHERE <conditions> )
WHERE cnt > 1
select * from
(select column1,column2,column3,
(select count(*) from table1 where '*XYZ Condition*' ) as rowCount
from table1 where '*XYZ Condition*')
where rowCount > 1;
You just need to put same condition at both place in query i.e. 'XYZ Condition' is same at both where clause.
Do you need like the following result?
c1 c2
1 AA
1 BB
2 CC
result:
c1 c2
1 AA,BB
2 CC
The following can meet your requirements.
select c1,ltrim(sys_connect_by_path(c2,','),',') from (
select c1,c2, row_number() over(partition by c1 order by c2)rn,
count(*) over(partition by id ) cnt from XXX -- XXX: your table
) a where level=cnt
start with rn=1 connect by prior c1=c1 and prior rn=rn-1