Element-wise quotient of two columns in SQL - sql

How can I combine the columns returned by two SELECT statements to give their element-wise quotient?
Query 1:
SELECT COUNT(*) AS count
FROM table1
WHERE col2 = 1 AND col3 > 5
GROUP BY col4
ORDER BY col4
Query 2:
SELECT COUNT(*) AS count
FROM table1
WHERE col2 = 1
GROUP BY col4
ORDER BY col4
So if they return something like:
Query 1 Query 2
count count
-----------------------
1 5
2 4
I will get:
quotient
-------
0.2
0.5

With the 4-column version of the question, we can assume that the quotient is between groups with the same value in col4. So, the answer becomes:
SELECT col4, SUM(CASE WHEN col3 > 5 THEN 1 ELSE 0 END) / COUNT(*) AS quotient
FROM table1
WHERE col2 = 1
GROUP BY col4;
I've retained col4 in the output because I don't think the ratios (quotients) will be useful without something to identify which quotient is associated with which values, though theoretically, the answer doesn't want that column in the output.

In this case, you don't need two separate queries at all:
SELECT SUM(col3 > 5) / COUNT(*)
FROM table1
WHERE col2 = 1
GROUP BY col4
ORDER BY col4

In case your actual queries cannot be simplified as per the other answers, you can join the subqueries, like this:
select j1.count / j2.count as quotient
from (
SELECT col4, COUNT(*) AS count
FROM table1
WHERE col2 = 1 AND col3 > 5
GROUP BY col4
) j1
join (
SELECT col4, COUNT(*) AS count
FROM table1
WHERE col2 = 1
GROUP BY col4
) j2 on j1.col4=j2.col4

Related

How can I find groups with more than one rows and list the rows in each such group?

I have a table "mytable" in a database.
Given a subset of the columns of the table, I would like to group by the subset of the columns, and find those groups with more than one rows:
For example, if the table is
col1 col2 col3
1 1 1
1 1 2
1 2 1
2 2 1
2 2 3
2 1 1
I am interested in finding groups by col1 and col2 with more than one rows, which are:
col1 col2 col3
1 1 1
1 1 2
and
col1 col2 col3
2 2 1
2 2 3
I was wondering how to write a SQL query for that purpose?
Is the following the best way to do that?
First get the col1 and col2 values of such groups:
SELECT col1 col2 COUNT(*)
FROM mytable
GROUP BY col1, col2
HAVING COUNT(*) > 1
Then based on the output of the previous query, manually write a query for each group:
SELECT *
FROM mytable
WHERE col1 = val1 AND col2 = val2
If there are many such groups, then I will have to manually write many queries, which can be a disadvantage.
I am using SQL Server.
Thanks.
This is a common problem. One solution is to get the "keys" in a derived table and join to that to get the rows.
declare #test as table (col1 int, col2 int, col3 int)
insert into #test values (1,1,1),(1,1,2),(1,2,1),(2,2,1),(2,2,3),(2,1,1)
select t.*
from #test t
inner join (
select col1, col2
from #test
group by col1, col2
having count(*) > 1
) k
on k.col1 = t.col1 and k.col2 = t.col2
col1 col2 col3
----------- ----------- -----------
1 1 1
1 1 2
2 2 1
2 2 3
The window function sum() over() may help here
Example
with cte as (
Select *
,Cnt = sum(1) over (partition by Col1,Col2)
From YourTable
)
Select *
From cte
Where Cnt>=2
Results
Another option (less performant)
Select top 1 with ties *
From YourTable
Order By case when sum(1) over (partition by Col1,Col2) > 1 then 1 else 2 end
Results

Oracle query - Selecting unique row number based on order of another column

I'm trying to find the best way to make a query with two columns, one is a number and the order a date:
Doing a select and ordering by the date column.
Table1:
col1 (NUMBER)
col2 (DATE)
1
02/2019
2
02/2019
3
02/2019
4
03/2019
2
04/2019
3
05/2019
I'm doing a query like this:
select col1, col2
from table1
order by col2 asc, col1 asc
fetch next 10;
The result I'm getting is also getting the next day's values, and repeating the value on col1 result like this:
col1 (NUMBER)
col2 (DATE)
1
02/2019
2
02/2019
3
02/2019
4
03/2019
2
04/2019
3
05/2019
But I would like a filter to limit to only a sequential col1 value like this:
col1 (NUMBER)
col2 (DATE)
1
02/2019
2
02/2019
3
02/2019
4
03/2019
ignoring values that would come in a "next batch" and not going through the risk of repeating col1 values, or getting col1 values that have a bigger col2 value than a previous result.
Any ideas on the best way to do this?
If I understand correctly, you can use a cumulative max():
select col1, col2
from (select t1.*,
max(col1) over (order by col2, col1 rows between unbounded preceding and 1 preceding) as running_max
from table1 t1
) t1
where running_max is null or col1 > running_max;
This returns rows whose value is greater than the values on the preceding rows.
EDIT:
If you want to return rows only up to the first time there is a decline, then:
select t1.*
from (select t1.*,
sum(case when prev_col1 > col1 then 1 else 0 end) over (order by col2, col1) as num_decreases
from (select t1.*,
lag(col1) over (order by col2, col1) as prev_col1
from table1 t1
) t1
where num_decreases = 0;

Sum of columnA where another columnB is specific value without showing columnB

I have a table that I'm grouping data together from. I'm running into a problem where I want the sum of a number from Column2 where Column3 has a specific value without showing Column3
Table X:
Col1 Col2 Col3 Col4
A 4 tt 6y
B 5 tt 6y
C 4 ee 7y
A 3 ee 7u
A 4 ee 6y
B 5 tt 8u
C 4 tt 7y
A 3 xx 8u
My Select grouping is
select Col1, Sum(Col2), Col4
from table x
group by Col1, Col4
I need to add 2 new columns in the group, the sum of column Col2 where Col3 is tt and another is the sum of Col2 where Col3 is ee. I do not need to show the value of Col3 and do not want to group by Col3.
I have looked at a partition by but I can't figure out how to specify the partition to the value of the column.
You need conditional aggregation:
Select
Col1, Col4,
Sum(Col2) sumcol2,
Sum(case when col3 = 'tt' then Col2 else 0 end) sumtt,
Sum(case when col3 = 'ee' then Col2 else 0 end) sumee
from table x
group by Col1,Col4
Use HAVING in the similar way as WHERE condition while grouping.
Something like:
SELECT Col1,Sum(Col2),Col4
FROM table x
GROUP BY Col1,Col4
HAVING COl3 LIKE 'ee'
As I am not sitting at my SQL machine, I cannot test it - test it yoursef.

select query to fetch rows corresponding to all values in a column

Consider this example table "Table1".
Col1 Col2
A 1
B 1
A 4
A 5
A 3
A 2
D 1
B 2
C 3
B 4
I am trying to fetch those values from Col1 which corresponds to all values (in this case, 1,2,3,4,5). Here the result of the query should return 'A' as none of the others have all values 1,2,3,4,5 in Col2.
Note that the values in Col2 are decided by other parameters in the query and they will always return some numeric values. Out of those values the query needs to fetch values from Col1 corresponding to all in Col2. The values in Col2 could be 11,12,1,2,3,4 for instance (meaning not necessarily in sequence).
I have tried the following select query:
select distinct Col1 from Table1 where Col1 in (1,2,3,4,5);
select distinct Col1 from Table1 where Col1 exists (select distinct Col2 from Table1);
and its different variations. But the problem is that I need to apply an 'and' for Col2 not an 'or'.
like Return a value from Col1 where Col2 'contains' all values between 1 and 5.
Appreciate any suggestion.
You could use analytic ROW_NUMBER() function.
SQL FIddle for a setup and working demonstration.
SELECT col1
FROM
(SELECT col1,
col2,
row_number() OVER(PARTITION BY col1 ORDER BY col2) rn
FROM your_table
WHERE col2 IN (1,2,3,4,5)
)
WHERE rn =5;
UPDATE As requested by OP, some explanation about how the query works.
The inner sub-query gives you the following resultset:
SQL> SELECT col1,
2 col2,
3 row_number() OVER(PARTITION BY col1 ORDER BY col2) rn
4 FROM t
5 WHERE col2 IN (1,2,3,4,5);
C COL2 RN
- ---------- ----------
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
B 1 1
B 2 2
B 4 3
C 3 1
D 1 1
10 rows selected.
PARTITION BY clause will group each sets of col1, and ORDER BY will sort col2 in each group set of col1. Thus the sub-query gives you the row_number for each row in an ordered way. now you know that you only need those rows where row_number is at least 5. So, in the outer query all you need ot do is WHERE rn =5 to filter the rows.
You can use listagg function, like
SELECT Col1
FROM
(select Col1,listagg(Col2,',') within group (order by Col2) Col2List from Table1
group by Col1)
WHERE Col2List = '1,2,3,4,5'
You can also use below
SELECT COL1
FROM TABLE_NAME
GROUP BY COL1
HAVING
COUNT(COL1)=5
AND
SUM(
(CASE WHEN COL2=1 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=2 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=3 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=4 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=5 THEN 1 ELSE 0
END))=5

T-SQL Eliminating duplicate rows while ignoring certain columns

I'm struggling to find the proper statements to select non-duplicate entries that are duplicates only for particular columns. As an example, in the following table I only care about rows that have unique values in col1, col2, and col3 and the values in col4 and col5 do not matter. This means I would consider row 1 and row 2 to be duplicates and row 4 and row 5 to be duplicates:
col1 col2 col3 col4 col5
A 2 p 0 2
A 2 p 1 8
A 3 r 4 12
B 0 f 3 1
B 0 f 6 5
And I would want to select only the following:
col1 col2 col3 col4 col5
A 2 p 0 2
A 3 r 4 12
B 0 f 3 1
Is there a way to combine multiple DISTINCT statements to achieve this or specify certain columns to ignore when comparing rows for duplicates?
You have to choose which lines you want to keep, you can use the ROW_NUMBER() function for this:
SELECT col1, col2, col3, col4, col5
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY col1, col2, col3 ORDER BY col4 DESC) 'RowRank'
FROM table
)sub
WHERE RowRank = 1
You can change the ORDER BY section to change which row you keep and which you toss. The ROW_NUMBER() function just assigns a number to each row, in this example, you want to preserve each combination of col1, col2, col3, so you PARTITION BY them, meaning that numbering will start at 1 for each combination of them. You can run just the inside query to get the idea.
Alternatively, you could use GROUP BY and aggregate functions, ie:
SELECT col1, col2, col3, MAX(col4), MAX(col5)
FROM table
GROUP BY col1, col2, col3
The downside here is that the MAX() of col4 and col5 might come from different rows, so you're not necessarily returning one single row from your original table, but if you don't care which row you return then it doesn't matter.