How to select entry with greater value in postgresql - sql

I have two or more values like:
c1|c2 |c3 |c4
--+---+---+---
1 | Z | B | 29
2 | Z | B | 19
and I want to have the entry with the larger c4 value:
1 | Z | B | 29
I tried to query the max value from c4, after a group by of c2 and c3, but this doesn't work.

Postgres specific solution:
select distinct on (c2,c3) c1, c2, c3, c4
from the_table
order by c2,c3,c4 desc
ANSI SQL solution:
select c1,c2,c3,c4
from (
select c1,c2,c3,c4,
row_number() over (partition by c2,c3 order by c4 desc) as rn
from the_table
) t
where rn = 1;

You can order results in descending order by c4 and output only one row (see LIMIT clause):
SELECT *
FROM table_name
ORDER BY c4 DESC
LIMIT 1

Related

Looking for an alternative SQL statement

Given the following table with 2 columns:
c1 c2
------------
a1 | b1
a1 | b1
a2 | b2
a2 | b3
a3 | b3
I want to return those values from column c2 where the value of c2 column appears multiple times for the same c1 value. I am doing the following SQL query to return the required result:
SELECT DISTINCT ( c2 ) AS c
FROM ( SELECT c1 , c2 , COUNT (*) AS rowcount
FROM table
GROUP BY c1 , c2 HAVING rowcount > 1 )
Result:
c
---
b1
Is there any alternative SQL statement of the above query?
Based on your description, you can use:
select distinct c1
from (select t.*, count(*) over (partition by c2) as cnt
from t
) t
where cnt >= 2;
Based on your sample results:
select c1
from t
group by c1
having count(*) >= 2;
And based on the revised question:
select c2
from t
group by c2
having count(*) >= 2;
Use count in having clause instead of using subquery:-
select c1
from table
group by c1
having count(c2) > 1
Most answers above will work if you want all the values in c1 that appear more than once in the table (even with the same value on c2).
If you want to measure only values of c1 that may have multiple DISTINCT values on c2 you can use:
SELECT c1
FROM table
GROUP BY c1
HAVING COUNT(DISTINCT c2) > 1

Selecting the the last row in a partition in HIVE

I have a table t1:
c1 | c2 | c3| c4
1 1 1 A
1 1 2 B
1 1 3 C
1 1 4 D
1 1 4 E
1 1 4 F
2 2 1 A
2 2 2 A
2 2 3 A
I want to select the last row of each c1, c2 pair. So (1,1,4,F) and (2,2,3,A) in this case. My idea is to do something like this:
create table t2 as
select *, row_number() over (partition by c1, c2 order by c3) as rank
from t1
create table t3 as
select a.c1, a.c2, a.c3, a.c4
from t2 a
inner join
(select c1, c2, max(rank) as maxrank
from t2
group by c1, c2
)
on a.c1=b.c1 and a.c2=b.c1
where a.rank=b.maxrank
Would this work? (Having environment issues so can't test myself)
Just use a subquery:
select t1.*
from (select t1.*, row_number() over (partition by c1, c2 order by c3 desc) as rank
from t1
) t1
where rank = 1;
Note the use of desc for the order by.

Double increment where 2nd increment reflects 1st in sql for encounter data

I am building healthcare 837 encounters and need to set increments on the HL segments.
C1 based on what is set on Criteria1 and C2 based on Criteria2.
C2 will never have the same number as C1 and vice versa.
C1 I was able to pull using row_number() over(order by (select Criteria1))
It's the C2 I am having a problem with.
C1 | C2 | Criteria1 | Criteria2
1 | 2 | ID1 | NID1
1 | 3 | ID1 | NID2
1 | 4 | ID1 | NID3
5 | 6 | ID2 | NID4
5 | 7 | ID2 | NID5
5 | 8 | ID2 | NID6
9 |10 | ID3 | NID7
Simplified query:
SELECT cm.Criteria1, cm.Criteria2, cj.C1
FROM [dbo].[TBL1] cm
JOIN (
SELECT cm.Criteria1,
row_number() over(order by (select Criteria1)) as C1
FROM [dbo].[TBL1] cm
GROUP BY cm.Criteria1) cj on cj.Criteria1 = cm.Criteria1
GROUP BY cm.Criteria1, cm.Criteria2, cj.C1 Order by cj.C1
This seems to work but I didn't check many edge cases (fun with windowing!):
with tbl1 as (
select 'ID1' as Criteria1, 'NID1' as Criteria2
union
select 'ID1', 'NID2'
union
select 'ID2', 'NID4'
union
select 'ID2', 'NID5'
union
select 'ID3', 'NID7'
)
select
rank() over (order by Criteria1) + DENSE_ranK() OVER (ORDER BY CRITERIA1) - 1 as C1,
rank() over (order by Criteria1) + row_number() over (partition by Criteria1 order by Criteria2) + DENSE_ranK() OVER (ORDER BY CRITERIA1) - 1 as C2,
Criteria1,
Criteria2
from
tbl1
To break it down a little:
Let's call each set of Criteria1 rows a "partition" as in SQL parlance.
The requirement is thus:
C1 is always equal to the number of rows in all the previous partitions + 1 for the current partition, plus the number of previous partitions.
C2 is always equal to the number of rows in all the previous partitions + 1 for the current partition, plus the number of previous partitions, plus the number of all the previous rows within the partition + 1 for the current row.
RANK() over (order by Criteria1) gives you the number of rows in all the previous partitions + 1.
DENSE_RANK() over (order by Criteria1) - 1 gives you the number of previous partitions.
ROW_NUMBER() over (partition by Criteria1 order by Criteria2) gives you the number of previous rows within the partition.
It is not really clear what exactly you are trying to get, but it seems below is what you are looking for:
with counts as (select count(distinct cm.criteria1) c1, count(distinct cm.criteria2) c2 from dbo.tbl1)
select cj1.c1, cj2.c2, cm.criteria1, cm.criteria2
from dbo.tbl1 cm
inner join (
select cm1.criteria1,
row_number() over ( order by cm1.criteria1)as c1
from dbo.tbl1 cm1 group by cm1.criteria1) cj1
on cj1.criteria1 = cm.criteria1
inner join (
select cm2.criteria2,
(select counts.c1 from counts) + row_number() over ( order by cm2.criteria2) as c2
from dbo.tbl1 cm2 group by cm1.criteria2) cj2
on cj2.criteria2 = cm.criteria2
group by cm.criteria1, cm.criteria2, cj1.c1, cj2.c2
order by cj1.c1, cj2.c2

how to group data based on its sequence and group by other columns

I have a table with 3 columns c1,c2,c3 in Oracle like below:
c1 c2 c3
1 34 2
2 34 2
3 34 2
4 24 2
5 24 2
6 34 2
7 34 2
8 34 1
I need to group the col1 and get the min and max number (of col1) based on its sequence, col2 and col3.
i.e., I need the result as below:
c1_min c1_max c2 c3
1 3 34 2
4 5 24 2
6 7 34 2
8 8 34 1
There are a number of ways to approach a gaps-and-islands problem. As an alternative to Sylvain's lag version - not better, just different - you can use a trick with row numbers calculated analytically based on your grouping fields. This adds a 'chain' psuedocolumn to the table values, which will be unique for each contiguous group of c2/c3 pairs:
select c1, c2, c3,
dense_rank() over (partition by c2, c3 order by c1)
- dense_rank() over (partition by null order by c1) as chain
from t42
order by c1, c2, c3;
(I can't take credit for this - I first saw it here). You can then use that as an inline view to calculate your sum:
select min(c1) as c1_min, max(c1) as c1_max, c2, c3
from (
select c1, c2, c3,
dense_rank() over (partition by c2, c3 order by c1)
- dense_rank() over (partition by null order by c1) as chain
from t42
)
group by c2, c3, chain
order by c1_min;
C1_MIN C1_MAX C2 C3
---------- ---------- ---------- ----------
1 3 34 2
4 5 24 2
6 7 34 2
8 8 34 1
SQL Fiddle showing the intermediate stage too.
You can use other analytic functions like row_number() instead of dense_rank(); they may give slightly different results for some data, but you get the same result with this sample.
If I understand it well, you want to group consecutive rows together. This is far from being trivial. Or at least, I can't find right now a simple way of doing it. For ease of understanding, I will break the query in several steps:
Step 1:
The first thing is to identify your "groups" boundaries. Using the LAG analytic function might help you here:
CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2"
AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3"
THEN 0
ELSE 1
END CLK,
T.* FROM T
ORDER BY "c1"
Step 2:
The second step, is to number each of your groups. A simple SUM over partition will do the trick. That leads to:
SELECT SUM(CLK) OVER (ORDER BY "c1"
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) GRP,
V.*
FROM (
SELECT
CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2"
AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3"
THEN 0
ELSE 1
END CLK,
T.* FROM T
) V
ORDER BY "c1";
Final step:
Finally, you can wrap that in a simple GROUP BY query to obtain the desired output:
SELECT MIN("c1"), MAX("c1"), "c2", "c3" FROM
(
SELECT SUM(CLK) OVER (ORDER BY "c1"
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) GRP,
V.*
FROM (
SELECT
CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2"
AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3"
THEN 0
ELSE 1
END CLK,
T.* FROM T
) V
)
GROUP BY GRP, "c2", "c3"
ORDER BY GRP
See http://sqlfiddle.com/#!4/7d57c/10

SQL Server : unsure how to retrieve selected records

In this example, can I retrieve only rows BB, DD, and FF using T-SQL syntax and a single select statement?
C1 | C2 | C3 | C4
-----------------
AA | KK | 11 | 99
BB | KK | 11 | 99
CC | KK | 22 | 99
DD | KK | 22 | 99
EE | KK | 33 | 99
FF | KK | 33 | 99
Ok, so this is what I ended up with to solve my problem: SELECT distinct [C4], [C1], [C2], [C3] FROM [Table] where [C4] = 'MyValue' order by [C3] desc.
Give this a shot:
SELECT C1, C2, C3, C4 FROM mytable WHERE C1 IN ('BB', 'DD', 'FF')
If you want to unique the C3 column the query should be:
SELECT MAX(C1), C2, C3, C4 FROM mytable GROUP BY C2, C3, C4
Try this
SELECT t.C1, t.C2, t.C3, t.C4 FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY C3 ORDER BY C1 DESC) AS seqnum
FROM MyTable
) t
WHERE seqnum = 1
This would work for your particular example:
SELECT
C1 = MAX(C1),
C2,
C3,
C4
FROM atable
GROUP BY
C2,
C3,
C4
;
If picking the right value from C1 should follow a more complex logic than just getting the MAX() one, you'll probably need to use #bvr's suggestion (tuning the ORDER BY clause properly).