Pulling multiple entries based on ROW_NUMBER - sql

I got the row_num column from a partition. I want each Type to match with at least one Sent and one Resent. For example, Jon's row is removed below because there is no Resent. Kim's Sheet row is also removed because again, there is no Resent. I tried using a CTE to take all columns for a Code if row_num = 2 but Kim's Sheet row obviously shows up because they're all under one Code. If anyone could help, that'd be great!
Edit: I'm using SSMS 2018. There are multiple Statuses other than Sent and Resent.
What my table looks like:
+-------+--------+--------+---------+---------+
| Code | Name | Type | Status | row_num |
+-------+--------+--------+---------+---------+
| 123 | Jon | Sheet | Sent | 1 |
| 221 | Kim | Sheet | Sent | 1 |
| 221 | Kim | Book | Resent | 1 |
| 221 | Kim | Book | Sent | 2 |
| 221 | Kim | Book | Sent | 3 |
+-------+--------+--------+---------+---------+
What I want it to look like:
+-------+--------+--------+---------+---------+
| Code | Name | Type | Status | row_num |
+-------+--------+--------+---------+---------+
| 221 | Kim | Book | Resent| 1 |
| 221 | Kim | Book | Sent | 2 |
| 221 | Kim | Book | Sent | 3 |
+-------+--------+--------+---------+---------+
Here is my CTE code:
WITH CTE AS
(
SELECT *
FROM #MyTable
)
SELECT *
FROM #MyTable
WHERE Code IN (SELECT Code FROM CTE WHERE row_num = 2)

If sent and resent are the only values for status, then you can use:
select t.*
from t
where exists (select 1
from t t2
where t2.name = t.name and
t2.type = t.type and
t2.status <> t.status
);
You can also phrase this with window functions:
select t.*
from (select t.*,
min(status) over (partition by name, type) as min_status,
max(status) over (partition by name, type) as max_status
from t
) t
where min_status <> max_status;
Both of these can be tweaked if other status values are possible. However, based on your question and sample data, that does not seem necessary.

FIDDLE
CREATE TABLE Table1(ID integer,Name VARCHAR(10),Type VARCHAR(10),Status VARCHAR(10),row_num integer);
INSERT INTO Table1 VALUES
('123','Jon','Sheet','Sent','1'),
('221','Kim','Sheet','Sent','1'),
('221','Kim','Book','Resent','1'),
('221','Kim','Book','Sent','2'),
('221','Kim','Book','Sent','3');
SELECT t1.*
FROM Table1 t1
WHERE EXISTS (
select 1
from Table1 t2
where t2.Name=t1.Name
and t2.Type=t1.TYpe
and t2.Status = case when t1.Status='Sent'
then 'Resent'
else 'Sent' end)

It would be easier if you would provide some scripts to create table and put these test data, but try something like
with a1 as (
select
name, type,
row_number() over (partition by code, Name, type, status) as rn
from #MyTable
), a2 as (
select * from a1 where rn > 1
)
select t.*
from #MyTable as t
inner join a2 on t.name = a2.name and t.type = a2.type;
Here you
calculate another row number using partitions by code, name, type and status,
then fetch these with this new row number > 1
and finally, you use that to join to original table and get interesting you rows
Syntax may vary on MSSQL, but you should give it a try. And please use better names than me ;-)
This solution is quite generic because it doesn't rely on used statuses. They're not hardcoded. And you can easily control what matters by changing partitions.
Fiddle

Related

How can I drop a row if repeated by category?

how would I go about dropping rows which have a duplicate and keeping another row by its category.
For example, let's consider a sample table
Item | location | Status
------------------------
123 | A | done
123 | A | not_done
123 | B | Other
435 | D | Other
So essentially what I want to get to would be this table
Item | location | Status
------------------------
123 | A | done
435 | D | Other
I am not interested in the other status or location IF the status is done. If it is not "done" then I would show the following one.
Any clues if it is possible to create something like this in an SQL query?
Identify rows with done if any and prioritize them.
select * except rn
from (
select item, location, status
, row_number() over (
partition by item
order by case status when 'done' then 0 else 1 end
) as rn
from t
)
where rn = 1
(I didn't try it, excuse syntax errors please.)
Yes you can do it via exists condition like below after enabling standard SQL first.
select * from
yourtable A
where not exists
(
select 1 from yourtable B
where A.id=B.id and A.location=B.location
and A.status<>B.status
AND B.status <> 'done'
)

How can I create header records by taking values from one of several line items?

I have a set of sorted line items. They are sorted first by ID then by Date:
| ID | DESCRIPTION | Date |
| --- | ----------- |----------|
| 100 | Red |2019-01-01|
| 101 | White |2019-01-01|
| 101 | White_v2 |2019-02-01|
| 102 | Red_Trim |2019-01-15|
| 102 | White |2019-01-16|
| 102 | Blue |2019-01-20|
| 103 | Red_v3 |2019-01-14|
| 103 | Red_v3 |2019-03-14|
I need to insert rows in a SQL Server table, which represents a project header, so that the first row for each ID provides the Description and Date in the destination table. There should only be one row in the destination table for each ID.
For example, the source table above would result in this at the destination:
| ID | DESCRIPTION | Date |
| --- | ----------- |----------|
| 100 | Red |2019-01-01|
| 101 | White |2019-01-01|
| 102 | Red_Trim |2019-01-15|
| 103 | Red_v3 |2019-01-14|
How do I collapse the source so that I take only the first row for each ID from source?
I prefer to do this with a transformation in SSIS but can use SQL if necessary. Actually, solutions for both methods would be most helpful.
This question is distinct from Trouble using ROW_NUMBER() OVER (PARTITION BY …)
in that this seeks to identify an approach. The asker of that question has adopted one approach, of more than one available as identified by answers here. That question is about how to make that particular approach work.
You can use row_number() :
select t.*
from (select t.*, row_number() over (partition by id order by date) as seq
from table t
) t
where seq = 1;
A correlated subquery will help here:
SELECT *
FROM yourtable t1
WHERE [Date] = (SELECT min([Date]) FROM yourtable WHERE id = t1.id)
use first_value window function
select * from (select *,
first_value(DESCRIPTION) over(partition by id order by Date) as des,
row_number() over(partition by id order by Date) rn
from table
) a where a.rn =1
You can use the ROW_NUMBER() window function to do this. For example:
select *
from (
select
id, description, date,
row_number() over(partition by id order by date) as rn
from t
)
where rn = 1

Select the most common item for each category

Each row in my table belongs to some category, has some value and other data.
I would like to select each category with the most common value for it (doesn't matter which one if there are multiple), ordered by category.
some_table: expected result:
+--------+-----+--- +--------+-----+
|category|value|... |category|value|
+--------+-----+--- +--------+-----+
| 1 | a | | 1 | a |
| 1 | a | | 2 | b |
| 1 | b | | 3 | a # or b
| 2 | a | +--------+-----+
| 2 | b |
| 2 | c |
| 2 | b |
| 3 | a |
| 3 | a |
| 3 | b |
| 3 | b |
+--------+-----+---
I have a solution (posting it as an answer) but it seems suboptimal to me. So I'm looking for better solutions.
My table will have up to 10000 rows (possibly, but not likely, beyond that).
I'm planning to use SQLite but I'm not tied to it, so I may reconsider if SQLite can't do this with reasonable performance.
I would be inclined to do this using a correlated subquery:
select distinct category,
(select value
from some_table t2
where t2.category = t.category
group by value
order by count(*) desc
limit 1
) as mode_value
from some_table t;
The name for the most common value is "mode" in statistics.
And, if you had a categories table, this would be written as:
select category,
(select value
from some_table t2
where t2.category = c.category
group by value
order by count(*) desc
limit 1
) as mode_value
from categories c;
Here is one option, but I think it's slow...
SELECT DISTINCT `category` AS `the_category`, `value`
FROM `some_table`
WHERE `value`=(
SELECT `value`
FROM `some_table`
WHERE `category`=`the_category`
GROUP BY `value`
ORDER BY COUNT(`value`) DESC LIMIT 1)
ORDER BY `category`;
You can replace a part of this with WHERE `id`=( SELECT `id` if the table has a unique/primary key column, then the LIMIT 1 is not needed.
select category, value, count(*) value_count
from some_table t
group by category, value
order by category, value_count DESC;
returns us amout of each value in each category
select category, value
from (
select category, value, count(*) value_count
from some_table t
group by category, value) sub
group by category
actually we need the first value because it's sorted.
I am not sure sqlite leaves the first one and can't test but IMHO it should work

SQL delete almost identical rows

I have a table that have 5 columns, and instead of update, I've done insert of all rows(stupid mistake). How to get rid of duplicated records. They are identical except of the id. I can't remove all records, but I want do delete half of them.
ex. table:
+-----+-------+--------+-------+
| id | name | name2 | user |
+-----+-------+--------+-------+
| 1 | nameA | name2A | u1 |
| 12 | nameA | name2A | u1 |
| 2 | nameB | name2B | u2 |
| 192 | nameB | name2B | u2 |
+-----+-------+--------+-------+
How to do this?
I'm using Microsoft Sql Server.
Try the following.
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY Name, Name2, User)
That is untested so may need adapting. The following video will provide you with some more information about this query.
Video
This is more specific query than #TechDo as I find duplicates where name, name2 and user are identical not only name.
with duplicates as
(
select t.id, ROW_NUMBER() over (partition by t.name, t.name2, t.[user] order by t.id) as RowNumber
from YourTable t
)
delete duplicates
where RowNumber > 1
SQLFiddle demo to try it yourself: DEMO
Please try:
with c as
(
select
*, row_number() over(partition by name, name2, [user] order by id) as n
from YourTable
)
delete from c
where n > 1;

divide a column into two based on another column value - ORACLE

First, hope the title expresses the issue. Otherwise, any suggest is welcomed. My issue is I have the following table structure:
+----+------+------------------+-------------+
| ID | Name | recipient_sender | user |
+----+------+------------------+-------------+
| 1 | A | 1 | X |
| 2 | B | 2 | Y |
| 3 | A | 2 | Z |
| 4 | B | 1 | U |
| | | | |
+----+------+------------------+-------------+
Whereby in the column recipient_sender the value 1 means the user is recipient, the value 2 means the user is sender.
I need to present data in the following way:
+----+------+-----------+---------+
| ID | Name | recipient | sender |
+----+------+-----------+---------+
| 1 | A | X | Z |
| 2 | B | U | Y |
+----+------+-----------+---------+
I've tried self-join but it did not work. I cannot use MAX with CASE WHEN, as the number of records is too big.
Note: Please ignore the bad table design as it's just a simplified example of the real one
Please try:
SELECT
MIN(ID) ID
Name,
max(case when recipient_sender=1 then user else null end) sender,
max(case when recipient_sender=2 then user else null end) recipient
From yourTable
group by Name
maybe you can try this:
select min(id) id,
name,
max(decode(recipient_sender, 1, user, '')) sender,
max(decode(recipient_sender, 2, user, '')) recipient
from t
group by name
You can check a demo here on SQLFiddle.
You can select values with this query
SELECT t.id,
t.name,
case
when t.recipient_sender = 1 then
t.user
ELSE
t2.user
END as recipient,
case
when t.recipient_sender = 2 then
t.user
ELSE
t2.user
END as sender
FROM your_table t
JOIN your_table t2
ON t.name = t2.name
AND t.id != t2.id
after this query you can add DISTINCT keyword or GROUP them ...
this query is used to join tables with column NAME but if you have some identity for message , join tables using that ,
Create new Table (with better struct):
insert into <newtable> as
select distinct
id,
name,
user as recipient,
(select user from <tablename> where id = recip.id and name = recip.name) as sender
from <tablename> recip
sorry, have no oracle here.