Distinct particular field in select query - sql

I have table with below sample values.
|Id|Keyword|insertedon|
|:-|:------|:---------|
|1 | abcd | 13/12/20 |
|2 | cdef | 14/12/20 |
|3 | abcd | 14/12/20 |
|4 | defg | 14/12/20 |
In the above table i need distinct values of keywords order by insertedon desc order.
I need recent top 5 results.
Expected Result:
defc
abcd
cdef
Please let me know how to achieve this.

You get the top 5 results with TOP(5) in SQL Server. You'd order the keywords by their last insertedon date:
select top(5) keyword
from mytable
group by keyword
order by max(insertedon) desc;

If you are looking for latest entries based on insertedon column, you can find using the group by clause, something like this:
select keyword, max(insertedon)
from table
group by keyword
order by 2 desc

You can just use select distinct:
select distinct keyword
from t;
If you wanted a full row, you could use row_number():
select t.*
from (select t.*,
row_number() over (partition by keyword order by newid()) as seqnum
from t
) t
where seqnum = 1;
EDIT:
For the edited version, you can use:
select distinct keyword
from (select top (5) keyword
from t
order by insertedon desc
) k

Give a row number based on the descending order of the date column and then select the row wth row number 1.
Query
;with cte as(
select [rn] = row_number() over(
partition by [keyword]
order by [insertedon] desc, [id] desc
)
)
select [keyword] from cte
where [rn] = 1;

You can use the analytical functions as follows:
select t.* from
(select t.*,
row_number() over (partition by keyword order by insertedon desc) as rn,
Dense_rank() over (order by insertedon desc) as dr
from t ) t where rn = 1 and dr <= 5;

Related

Group sequential repeated values sqlite

I have data that repeated sequentially..
A
A
A
B
B
B
A
A
A
I need to group them like this
A
B
A
What is the best approach to do so using sqlite?
Assuming that you have a column that defines the ordering of the rows, say id, you can address this gaps-and-island problem with window functions:
select col, count(*) cnt, min(id) first_id, max(id) last_id
from (
select t.*,
row_number() over(order by id) rn1,
row_number() over(partition by col order by id) rn2
from mytable t
) t
group by col, rn1 - rn2
order by min(id)
I added a few columns to the resultset that give more information about the content of each group.
If you have defined a column that defines the order of the rows, like an id, you can use window function LEAD():
select col
from (
select col, lead(col, 1, '') over (order by id) next_col
from tablename
)
where col <> next_col
See the demo.
Results:
| col |
| --- |
| A |
| B |
| A |

redshift: how to find row_number after grouping and aggregating?

Suppose I have a table of customer purchases ("my_table") like this:
--------------------------------------
customerid | date_of_purchase | price
-----------|------------------|-------
1 | 2019-09-20 | 20.23
2 | 2019-09-21 | 1.99
1 | 2019-09-21 | 123.34
...
I'd like to be able to find the nth highest spending customer in this table (say n = 5). So I tried this:
with cte as (
select customerid, sum(price) as total_pay,
row_number() over (partition by customerid order by total_pay desc) as rn
from my_table group by customerid order by total_pay desc)
select * from cte where rn = 5;
But this gives me nonsense results. For some reason rn doesn't seem to be unique (for example there are a bunch of customers with rn = 1). I don't understand why. Isn't rn supposed to be just a row number?
Remove the partition by in the definition of row_number():
with cte as (
select customerid, sum(price) as total_pay,
row_number() over (order by total_pay desc) as rn
from my_table
group by customerid
)
select *
from cte
where rn = 5;
You are already aggregating by customerid, so each customer has only one row. So the value of rn will always be 1.

Min Date from one column multiple rows

My apologies, I should have added every column and complete problem not just portion.
I have a table A which stores all invoices issued(id 1) payments received (id 4) from clients. Sometimes client pay in 2-3 installments. I want to find dateifference between invoice issued and last payment collected for the invoice. My data looks like this
**a.cltid**|**A.Invnum**|A.Cash|A.Date | a.type| a.status
70 |112 |-200 |2012-03-01|4 |P
70 |112 |-500 |2012-03-12|4 |P
90 |124 |-550 |2012-01-20|4 |P
70 |112 |700 |2012-02-20|1 |p
55 |101 |50 |2012-01-15|1 |d
90 |124 |550 |2012-01-15|1 |P
I am running
Select *, Datediff(dd,T.date,P.date)
from (select a.cltid, a.invnumber,a.cash, min(a.date)date
from table.A as A
where a.status<>'d' and a.type=1
group by a.cltid, a.invnumber,a.cash)T
join
Select *
from (select a.cltid, a.invnumber,a.cash, min(a.date)date
from table.A as A
where a.status<>'d' and a.type=4
group by a.cltid, a.invnumber,a.cash)P
on
T.invnumb=P.invnumber and T.cltid=P.cltid
How can I make it work? So it shows me
70|112|-500|2012-03-12|4|P 70|112|700|2012-02-20|1|p|22
90|124|-550|2012-01-20|4|P 90|124|550|2012-01-15|1|P|5
Edited***
You can use row_number to assign sequence number within each cltid in the order of decreasing date and then filter to get the first row for each cltid which will be the row with latest date for that cltid:
select *
from (
select A.*,
row_number() over (
partition by a.cltid order by a.date desc
) rn
from table.A as A
) t
where rn = 1;
It will return one row (with latest date) for each client. If you want to return all the rows which have latest date, use rank() instead.
Use a ranking function to get all the columns:
select a.*
from (select a.*,
row_number() over (partition by cltid order by date desc) as seqnum
from a
) a
where seqnum = 1;
Use aggregation if you only want the date. The issue with your query is that the group by clause has too many columns:
select a.cltid, max(a.date) as date
from table.A as A
group by a.cltid;
And the fact that min() returns the first date not the last date.
There are many ways to do this. Here are some of them:
test setup: http://rextester.com/VGUY60367
with common_table_expression as () using row_number()
with cte as (
select *
, rn = row_number() over (
partition by cltid, Invnum
order by [date] desc
)
from a
)
select cltid, Invnum, Cash, [date]
from cte
where rn = 1
cross apply version:
select distinct
a.cltid
, a.Invnum
, x.Cash
, x.[date]
from a
cross apply (
select top 1
cltid, Invnum
, [date]
, Cash
from a as i
where i.cltid =a.cltid
and i.Invnum=a.Invnum
order by i.[date] desc
) as x;
top with ties version:
select top 1 with ties
*
from a
order by
row_number() over (
partition by cltid, Invnum
order by [date] desc
)
all return:
+-------+--------+---------------------+------+
| cltid | Invnum | date | Cash |
+-------+--------+---------------------+------+
| 70 | 112 | 12.03.2012 00:00:00 | -500 |
| 90 | 124 | 20.01.2012 00:00:00 | -550 |
+-------+--------+---------------------+------+
You can achieve the desired o/p by this:
Select
a.cltid, a.invnumber,a.cash, max(a.date) [date]
from
YourTable a
group by
a.cltid, a.invnumber, a.cash, a.date

Need to delete duplicate records from the table using row_number()

I am having a table test having data as follows and I want to delete the trsid 124 and I have millions entry in my DB it is just a scenarion. Concept is to delete the duplicate entry from the table
--------------------------------------------
TrsId | ID | Name |
--------------------------------------------
123 | 1 | ABC |
124 | 1 | ABC |
I am trying something like
delete from test
select T.* from
(
select ROW_NUMBER() over (partition by ID order by name) as r,
Trsid,
ID,
name
from test
) t
where r = 2
Even if I update the query which is Ok for me
update test set id=NULL
select T.* from
(
select ROW_NUMBER() over (partition by ID order by name) as r,
Trsid,
ID,
name
from test
) t
where r = 2
But if i run both this query it deletes all the records from table test. And if i update it update both the records.
I dont know what I am doing wrong here
WITH cte AS
(
SELECT ROW_NUMBER() OVER(PARTITION by ID ORDER BY name) AS Row
FROM test
)
DELETE FROM cte
WHERE Row > 1
Use the below query.
;WITH cte_1
AS (SELECT ROW_NUMBER() OVER(PARTITION BY ID,NAME ORDER BY TrsId ) Rno,*
FROM YourTable)
DELETE
FROM cte_1
WHERE RNO>1
WITH cte_DUP AS (
SELECT * FROM (
select <col1,col2,col3..coln>, row_number()
over(partition by <col1,col2,col3..coln>
order by <col1,col2,col3..coln> ) rownumber
from <your table> ) AB WHERE rownumber > 1)
DELETE FROM cte_DUP WHERE ROWNUMBER > 1
To find duplicate records we can write like below query,
;WITH dup_val
AS (SELECT a,
b,
Row_number()
OVER(
partition BY a, b
ORDER BY b, NAME)AS [RANK]
FROM table_name)
SELECT *
FROM dup_val
WHERE [rank] <> 1;

How to select max of count in PostgreSQL

I have table in PostgreSQL with the following schema:
Category | Type
------------+---------
A | 0
C | 11
B | 5
D | 1
D | 0
F | 2
E | 11
E | 9
. | .
. | .
How can I select category wise maximum occurrence of type? The following give me all:
SELECT
category,
type,
COUNT(*)
FROM
table
GROUP BY
category,
type
ORDER BY
category,
count
DESC
My expected result is something like this:
Cat |Type |Count
--------+-------+------
A |0 |5
B |5 |30
C |2 |20
D |3 |10
That is the type with max occurrence in each category with count of that type.
You can use the following query:
SELECT category, type, cnt
FROM (
SELECT category, type, cnt,
RANK() OVER (PARTITION BY category
ORDER BY cnt DESC) AS rn
FROM (
SELECT category, type, COUNT(type) AS cnt
FROM mytable
GROUP BY category, type ) t
) s
WHERE s.rn = 1
The above query uses your own query as posted in the OP and applies RANK() windowed function to it. Using RANK() we can specify all records coming from the initial query having the greatest COUNT(type) value.
Note: If there are more than one types having the maximum number of occurrences for a specific category, then all of them will be returned by the above query, as a consequence of using RANK.
Demo here
If I understand correctly, you can use window functions:
SELECT category, type, cnt
FROM (SELECT category, type, COUNT(*) as cnt,
ROW_NUMBER() OVER (PARTITION BY type ORDER BY COUNT(*) DESC) as seqnum
FROM table
GROUP BY category, type
) ct
WHERE seqnum = 1;
SELECT
category,
type,
COUNT(*)
FROM
table
GROUP BY
category,
type
HAVING
COUNT(*) = (SELECT MAX(C) FROM (SELECT COUNT(*) AS C FROM A GROUP BY A) AS Q)
EDITED:
I apologize to readers,
COUNT(*) = (SELECT MAX(COUNT(*)) FROM table GROUP BY category,type)
is the ORACLE version, postgresql version is:
COUNT(*) = (SELECT MAX(C) FROM (SELECT COUNT(*) AS C FROM A GROUP BY A) AS Q)
SELECT category , MAX (Occurence)
FROM (SELECT t.category as category , Count(*) AS Occurence FROM table t);
SELECT
category,
type,
COUNT(*) AS count
FROM
table
GROUP BY
category,
type
ORDER BY
category ASC