How to get a count of unique rows? - sql

I'm trying to get a count of rows that only differ by one record so I can find out what is "historically" the correct row by determining the most frequently occurring combination. The rows will look something like this:
RowAVal1 | RowAVal2 | RowAVal3 | DiffVal1
RowAVal1 | RowAVal2 | RowAVal3 | DiffVal1
RowAVal1 | RowAVal2 | RowAVal3 | DiffVal2
RowAVal1 | RowAVal2 | RowBVal1 | DiffVal1
For this example, for the RowAVal1 | RowAVal2 | RowAVal3 combination, the rows with DiffVal1 would be the historically correct combination because it appears the most. I need to figure out how to count these rows.

If I understand correctly, you want the most common value of the fourth column for combinations of the first three. This is called the mode in statistics and is easy to calculate with aggregation and window functions:
select t.*
from (select col1, col2, col3, col4, count(*) as cnt,
row_number() over (partition by col1, col2, col3 order by count(*) desc) as seqnum
from t
group by col1, col2, col3, col4
) t
where seqnum = 1;

Related

SQL - Create a formatted ouput with placeholder rows

For reasons of our IT department, I am stuck doing this entirely within an SQL query.
Simplified, I have this as an input table:
And I need to create this:
And I am just not sure where to start with this. In my normal C# way of thinking its easy. Column1 is ordered, if the value in Col1 is new, then add a new row to the output and put the contents in column1 in the output. Then, whilst the contents of the input Column1 is unchanged, keep adding the contents of column2 to new rows.
In SQL... nope, I just cannot see the right way to start!
This is a presentation issue that can be easily done in the application or presentation layer. In SQL this can be clunky. The goal of a database is not to render a UI but to store and retrieve data fast and also efficiently, in order to serve as many clients as possible with the same hardware and software resources constraints.
The query that could do this can look like:
with
y as (
select col1, row_number() over(order by col1) as r1
from (select distinct col1 as col1 from t) x
),
z as (
select
t.col1, y.r1, t.col2,
row_number() over(partition by t.col1 order by t.col2) as r2
from t
join y on y.col1 = x.col1
)
select col1, col2
from (
select col1, null as col2, r1, 0 from y
union all
select null, col2, r1, r2 from z
) w
order by r1, r2
As you see, it looks clunky and bloated.
You need a header row for each group which will consist of col1 and null and all the rows of the table with null as col1.
You can do it with UNION ALL and conditional sorting:
select
case when t.col2 is null then t.col1 end col1,
t.col2
from (
select col1, col2 from tablename
union all
select distinct col1, null from tablename
) t
order by
t.col1,
case when t.col2 is null then 1 else 2 end,
t.col2
See the demo (for MySql but it is standard SQL).
Results:
| col1 | col2 |
| ---- | ----- |
| SetA | |
| | BH101 |
| | BH102 |
| | BH103 |
| SetB | |
| | BH201 |
| | BH202 |
| | BH203 |
I agree, formatting should be done outside of SQL, but if you have no choice, here is some SQL Server code that will generate your output
select *
from (
select top 100
case
when col2 is null then ' '+col1
else '' end as firstCol,
IsNull(col2,'') as Col2
from dbo.test t1
group by col1,col2 with rollup
order by col1,col2
) x
where x.firstcol is not null

ORACLE conditional COUNT query

001 | 9441 | P021948
001 | 9442 | P021948
001 | 9443 | P021950
001 | 9444 | P021951
001 | 9445 | P021952
001 | 9446 | P021948
In the above table I am looking to COUNT the third column so long as it is outside of the second column's value by (+/- 1).
In other words, I am trying to achieve a count of 2 for P021948 because values 9441 and 9442 are within 1 of each other and record 9446 is outside of that range. My intent is to achieve a total count of 5 given these conditions.
How could I go about querying?
Any advice is greatly appreciated!
Hmmm, I'm thinking you want to count the "islands" that are separated by a value of more than 1. If so:
select count(*)
from (select t.*, lag(col2) over (partition by col1, col3 order by col2) as prev_col2
from t
) t
where prev_col2 is null or col2 - prev_col2 > 1;
Here is a rextester illustration of the query and the result.
select column1, column3,
sum(case when lag(column3, 1, 0) over(order by column3)=column3 or
lead(column3, 1, 0) over(order by column3)=column3 then 1 else 0 end)
from yourtable
group by column1, column3

Group Concat in Redshift

I have a table like this:
| Col1 | Col2 |
|:-----------|------------:|
| 1 | a;b; |
| 1 | b;c; |
| 2 | c;d; |
| 2 | d;e; |
I want the result to be some thing like this.
| Col1 | Col2 |
|:-----------|------------:|
| 1 | a;b;c;|
| 2 | c;d;e;|
Is there some way to write a set function which adds unique values in a column into an array and then displays them. I am using the Redshift Database which mostly uses postgresql with the following difference:
Unsupported PostgreSQL Functions
Have a look at Redshift's listagg() function which is similar to MySQL's group_concat. You would need to split the items first and then use listagg() to give you a list of values. Do take note, though, that, as the documentation states:
LISTAGG does not support DISTINCT expressions
(Edit: As of 11th October 2018, DISTINCT is now supported. See the docs.)
So will have to take care of that yourself. Assuming you have the following table set up:
create table _test (col1 int, col2 varchar(10));
insert into _test values (1, 'a;b;'), (1, 'b;c;'), (2, 'c;d;'), (2, 'd;e;');
Fixed number of items in Col2
Perform as many split_part() operations as there are items in Col2:
select
col1
, listagg(col2, ';') within group (order by col2)
from (
select col1, split_part(col2, ';', 1) as col2 from _test
union select col1, split_part(col2, ';', 2) as col2 from _test
)
group by col1
;
Varying number of items in Col2
You would need a helper here. If there are more rows in the table than items in Col2, a workaround with row_number() could work (but is expensive for large tables):
with _helper as (
select
(row_number() over())::int as part_number
from
_test
),
_values as (
select distinct
col1
, split_part(col2, ';', part_number) as col2
from
_test, _helper
where
length(split_part(col2, ';', part_number)) > 0
)
select
col1
, listagg(col2, ';') within group (order by col2) as col2
from
_values
group by
col1
;

How to retrieve 2nd latest date from a table

I am trying to retrieve second latest date from a table. For example, consider this as my table:
COL1| COL2| COL3
---------------------
A | 1 | 25-JUN-14
B | 1 | 25-JUN-14
C | 1 | 25-JUN-14
A | 1 | 24-JUN-14
B | 1 | 24-JUN-14
C | 1 | 24-JUN-14
A | 1 | 23-JUN-14
B | 1 | 23-JUN-14
C | 1 | 23-JUN-14
I come up with this query which would get the result I want(2nd latest date).
SELECT sub.COL1, sub.COL2, MAX(sub.COL3)
FROM (SELECT t.COL1, t.COL2, t.COL3
FROM test t
GROUP BY t.COL1, t.COL2, t.COL3
HAVING MAX(t.COL3) < (
SELECT MAX(COL3)
FROM test sub
WHERE sub.COL1=t.COL1 AND sub.COL2=t.COL2
GROUP BY COL1, COL2)) sub
GROUP BY sub.COL1, sub.COL2;
As you can see it's big and messy statement with multiple nested sub queries just to get a 2nd latest date. I would love to learn an elegant solution for my problem rather that this mess. Appreciate your help.. :)
PS: I am not allowed to use 'WITH' command.. :(
If I understand correctly, you can do:
select t.*
from (select t.*,
dense_rank() over (order by col3 desc) as seqnum
from test t
) t
where seqnum = 2;
You can try like this:-
SELECT col1, col2, MAX(col3)
FROM TEST
WHERE col3 < (SELECT MAX(col3)
FROM tab1)
GROUP BY col1, col2;
Sql Fiddle Demo

How to deal with subquery which returns more than one value

Here is my table:
+----+------+------+
| ID | Col1 | Col2 |
+----+------+------+
| 11 | 156 | 48 |
| 12 | 5 | 22 |
| 13 | 156 | 32 |
+----+------+------+
What I want to do is
SELECT ID FROM Table1 WHERE Col1 = (SELECT MAX(col1) FROM Table1)
but since it will be declared and this is written inside a stored procedures that's will give an error that "Subquery returned more than 1 value".
If this error happens I want use 11 AND 12 and select Min(col2) of just those id then just give one ID.
Is it possible to catch the two ID's ? If yes how can I do this ?
Alternate answer which will work basically on every DB. (Just use LIMIT instead of TOP on some cases)
SELECT TOP 1 ID FROM Table1 ORDER BY Col1 DESC, Col2 ASC
Last time edit: I notice your sql-server tag. However this will work on postgre :(
This will give you the exact answer you are looking for.
SELECT DISTINCT ON (Col1) ID
FROM Table1
WHERE Col1 = (SELECT MAX(col1) FROM Table1)
ORDER BY Col2 ASC