How can I drop a row if repeated by category?

How can I drop a row if repeated by category? - sql

how would I go about dropping rows which have a duplicate and keeping another row by its category.
For example, let's consider a sample table
Item | location | Status
------------------------
123 | A | done
123 | A | not_done
123 | B | Other
435 | D | Other
So essentially what I want to get to would be this table
Item | location | Status
------------------------
123 | A | done
435 | D | Other
I am not interested in the other status or location IF the status is done. If it is not "done" then I would show the following one.
Any clues if it is possible to create something like this in an SQL query?

Identify rows with done if any and prioritize them.
select * except rn
from (
select item, location, status
, row_number() over (
partition by item
order by case status when 'done' then 0 else 1 end
) as rn
from t
)
where rn = 1
(I didn't try it, excuse syntax errors please.)

Yes you can do it via exists condition like below after enabling standard SQL first.
select * from
yourtable A
where not exists
(
select 1 from yourtable B
where A.id=B.id and A.location=B.location
and A.status<>B.status
AND B.status <> 'done'
)

Related

How to traverse postgresql in the form of linked list?

I have a table in the form of linked list.
| unique_id | next |
| -------- | ------- |
| 1 | 3 |
| 2 | null |
| 3 | 2 |
Here the unique_id is the id of the row. Next is the id of the row it is pointing to. There is another table which keeps track of the head. Let's say row with uId=1 is the head. So, I want to query my table such that it extracts head from the headTable and gives the data in the same order as this linked-list. 1->3->2 in the form of an array of rows.
Expected Result : [{unique_id:1, next:3},{unique_id:3, next:2}{unique_id:2, next:null}]
Sorry, I'm unable to render the above table properly that's why It's in the form of code.

Try a recursive query - and add a "path" column to the query:
CREATE TABLE
-- your input ....
indata(unique_id,next) AS (
SELECT 1,3
UNION ALL SELECT 2,null
UNION ALL SELECT 3,2
)
;
\pset null NULL
WITH RECURSIVE recursion AS (
SELECT
unique_id
, next
, (unique_id::VARCHAR(8)||'>'||next::VARCHAR(8))::VARCHAR(16) AS path
FROM indata
WHERE unique_id=1 -- need a starting filter
UNION ALL
SELECT
c.unique_id
, c.next
, p.path||'>'||NVL(c.next::VARCHAR(8),'NULL')
FROM recursion p
JOIN indata c ON p.next = c.unique_id
)
SELECT * FROM recursion;
-- out unique_id | next | path
-- out -----------+------+------------
-- out 1 | 3 | 1>3
-- out 3 | 2 | 1>3>2
-- out 2 | NULL | 1>3>2>NULL

A bit late with the answer, but here is my version, without adding columns.
with recursive headtablelist as (
select unique_id, next
from headtable
where unique_id = 1
union
select e.unique_id, e.next
from headtable e
inner join headtablelist s on s.next = e.unique_id
)
select * from headtablelist;
Demo in sqldaddy.io
More information about recursive queries can be found here.

Pulling multiple entries based on ROW_NUMBER

I got the row_num column from a partition. I want each Type to match with at least one Sent and one Resent. For example, Jon's row is removed below because there is no Resent. Kim's Sheet row is also removed because again, there is no Resent. I tried using a CTE to take all columns for a Code if row_num = 2 but Kim's Sheet row obviously shows up because they're all under one Code. If anyone could help, that'd be great!
Edit: I'm using SSMS 2018. There are multiple Statuses other than Sent and Resent.
What my table looks like:
+-------+--------+--------+---------+---------+
| Code | Name | Type | Status | row_num |
+-------+--------+--------+---------+---------+
| 123 | Jon | Sheet | Sent | 1 |
| 221 | Kim | Sheet | Sent | 1 |
| 221 | Kim | Book | Resent | 1 |
| 221 | Kim | Book | Sent | 2 |
| 221 | Kim | Book | Sent | 3 |
+-------+--------+--------+---------+---------+
What I want it to look like:
+-------+--------+--------+---------+---------+
| Code | Name | Type | Status | row_num |
+-------+--------+--------+---------+---------+
| 221 | Kim | Book | Resent| 1 |
| 221 | Kim | Book | Sent | 2 |
| 221 | Kim | Book | Sent | 3 |
+-------+--------+--------+---------+---------+
Here is my CTE code:
WITH CTE AS
(
SELECT *
FROM #MyTable
)
SELECT *
FROM #MyTable
WHERE Code IN (SELECT Code FROM CTE WHERE row_num = 2)

If sent and resent are the only values for status, then you can use:
select t.*
from t
where exists (select 1
from t t2
where t2.name = t.name and
t2.type = t.type and
t2.status <> t.status
);
You can also phrase this with window functions:
select t.*
from (select t.*,
min(status) over (partition by name, type) as min_status,
max(status) over (partition by name, type) as max_status
from t
) t
where min_status <> max_status;
Both of these can be tweaked if other status values are possible. However, based on your question and sample data, that does not seem necessary.

FIDDLE
CREATE TABLE Table1(ID integer,Name VARCHAR(10),Type VARCHAR(10),Status VARCHAR(10),row_num integer);
INSERT INTO Table1 VALUES
('123','Jon','Sheet','Sent','1'),
('221','Kim','Sheet','Sent','1'),
('221','Kim','Book','Resent','1'),
('221','Kim','Book','Sent','2'),
('221','Kim','Book','Sent','3');
SELECT t1.*
FROM Table1 t1
WHERE EXISTS (
select 1
from Table1 t2
where t2.Name=t1.Name
and t2.Type=t1.TYpe
and t2.Status = case when t1.Status='Sent'
then 'Resent'
else 'Sent' end)

It would be easier if you would provide some scripts to create table and put these test data, but try something like
with a1 as (
select
name, type,
row_number() over (partition by code, Name, type, status) as rn
from #MyTable
), a2 as (
select * from a1 where rn > 1
)
select t.*
from #MyTable as t
inner join a2 on t.name = a2.name and t.type = a2.type;
Here you
calculate another row number using partitions by code, name, type and status,
then fetch these with this new row number > 1
and finally, you use that to join to original table and get interesting you rows
Syntax may vary on MSSQL, but you should give it a try. And please use better names than me ;-)
This solution is quite generic because it doesn't rely on used statuses. They're not hardcoded. And you can easily control what matters by changing partitions.
Fiddle

How do I transform the specific row value into column headers in hive [duplicate]

I tried to search posts, but I only found solutions for SQL Server/Access. I need a solution in MySQL (5.X).
I have a table (called history) with 3 columns: hostid, itemname, itemvalue.
If I do a select (select * from history), it will return
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
+--------+----------+-----------+
| 1 | B | 3 |
+--------+----------+-----------+
| 2 | A | 9 |
+--------+----------+-----------+
| 2 | C | 40 |
+--------+----------+-----------+
How do I query the database to return something like
+--------+------+-----+-----+
| hostid | A | B | C |
+--------+------+-----+-----+
| 1 | 10 | 3 | 0 |
+--------+------+-----+-----+
| 2 | 9 | 0 | 40 |
+--------+------+-----+-----+

I'm going to add a somewhat longer and more detailed explanation of the steps to take to solve this problem. I apologize if it's too long.
I'll start out with the base you've given and use it to define a couple of terms that I'll use for the rest of this post. This will be the base table:
select * from history;
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
| 1 | B | 3 |
| 2 | A | 9 |
| 2 | C | 40 |
+--------+----------+-----------+
This will be our goal, the pretty pivot table:
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
Values in the history.hostid column will become y-values in the pivot table. Values in the history.itemname column will become x-values (for obvious reasons).
When I have to solve the problem of creating a pivot table, I tackle it using a three-step process (with an optional fourth step):
select the columns of interest, i.e. y-values and x-values
extend the base table with extra columns -- one for each x-value
group and aggregate the extended table -- one group for each y-value
(optional) prettify the aggregated table
Let's apply these steps to your problem and see what we get:
Step 1: select columns of interest. In the desired result, hostid provides the y-values and itemname provides the x-values.
Step 2: extend the base table with extra columns. We typically need one column per x-value. Recall that our x-value column is itemname:
create view history_extended as (
select
history.*,
case when itemname = "A" then itemvalue end as A,
case when itemname = "B" then itemvalue end as B,
case when itemname = "C" then itemvalue end as C
from history
);
select * from history_extended;
+--------+----------+-----------+------+------+------+
| hostid | itemname | itemvalue | A | B | C |
+--------+----------+-----------+------+------+------+
| 1 | A | 10 | 10 | NULL | NULL |
| 1 | B | 3 | NULL | 3 | NULL |
| 2 | A | 9 | 9 | NULL | NULL |
| 2 | C | 40 | NULL | NULL | 40 |
+--------+----------+-----------+------+------+------+
Note that we didn't change the number of rows -- we just added extra columns. Also note the pattern of NULLs -- a row with itemname = "A" has a non-null value for new column A, and null values for the other new columns.
Step 3: group and aggregate the extended table. We need to group by hostid, since it provides the y-values:
create view history_itemvalue_pivot as (
select
hostid,
sum(A) as A,
sum(B) as B,
sum(C) as C
from history_extended
group by hostid
);
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
(Note that we now have one row per y-value.) Okay, we're almost there! We just need to get rid of those ugly NULLs.
Step 4: prettify. We're just going to replace any null values with zeroes so the result set is nicer to look at:
create view history_itemvalue_pivot_pretty as (
select
hostid,
coalesce(A, 0) as A,
coalesce(B, 0) as B,
coalesce(C, 0) as C
from history_itemvalue_pivot
);
select * from history_itemvalue_pivot_pretty;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
And we're done -- we've built a nice, pretty pivot table using MySQL.
Considerations when applying this procedure:
what value to use in the extra columns. I used itemvalue in this example
what "neutral" value to use in the extra columns. I used NULL, but it could also be 0 or "", depending on your exact situation
what aggregate function to use when grouping. I used sum, but count and max are also often used (max is often used when building one-row "objects" that had been spread across many rows)
using multiple columns for y-values. This solution isn't limited to using a single column for the y-values -- just plug the extra columns into the group by clause (and don't forget to select them)
Known limitations:
this solution doesn't allow n columns in the pivot table -- each pivot column needs to be manually added when extending the base table. So for 5 or 10 x-values, this solution is nice. For 100, not so nice. There are some solutions with stored procedures generating a query, but they're ugly and difficult to get right. I currently don't know of a good way to solve this problem when the pivot table needs to have lots of columns.

SELECT
hostid,
sum( if( itemname = 'A', itemvalue, 0 ) ) AS A,
sum( if( itemname = 'B', itemvalue, 0 ) ) AS B,
sum( if( itemname = 'C', itemvalue, 0 ) ) AS C
FROM
bob
GROUP BY
hostid;

Another option,especially useful if you have many items you need to pivot is to let mysql build the query for you:
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'ifnull(SUM(case when itemname = ''',
itemname,
''' then itemvalue end),0) AS `',
itemname, '`'
)
) INTO #sql
FROM
history;
SET #sql = CONCAT('SELECT hostid, ', #sql, '
FROM history
GROUP BY hostid');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
FIDDLE
Added some extra values to see it working
GROUP_CONCAT has a default value of 1000 so if you have a really big query change this parameter before running it
SET SESSION group_concat_max_len = 1000000;
Test:
DROP TABLE IF EXISTS history;
CREATE TABLE history
(hostid INT,
itemname VARCHAR(5),
itemvalue INT);
INSERT INTO history VALUES(1,'A',10),(1,'B',3),(2,'A',9),
(2,'C',40),(2,'D',5),
(3,'A',14),(3,'B',67),(3,'D',8);
hostid A B C D
1 10 3 0 0
2 9 0 40 5
3 14 67 0 8

Taking advantage of Matt Fenwick's idea that helped me to solve the problem (a lot of thanks), let's reduce it to only one query:
select
history.*,
coalesce(sum(case when itemname = "A" then itemvalue end), 0) as A,
coalesce(sum(case when itemname = "B" then itemvalue end), 0) as B,
coalesce(sum(case when itemname = "C" then itemvalue end), 0) as C
from history
group by hostid

I edit Agung Sagita's answer from subquery to join.
I'm not sure about how much difference between this 2 way, but just for another reference.
SELECT hostid, T2.VALUE AS A, T3.VALUE AS B, T4.VALUE AS C
FROM TableTest AS T1
LEFT JOIN TableTest T2 ON T2.hostid=T1.hostid AND T2.ITEMNAME='A'
LEFT JOIN TableTest T3 ON T3.hostid=T1.hostid AND T3.ITEMNAME='B'
LEFT JOIN TableTest T4 ON T4.hostid=T1.hostid AND T4.ITEMNAME='C'

use subquery
SELECT hostid,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='A' AND hostid = t1.hostid) AS A,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='B' AND hostid = t1.hostid) AS B,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='C' AND hostid = t1.hostid) AS C
FROM TableTest AS T1
GROUP BY hostid
but it will be a problem if sub query resulting more than a row, use further aggregate function in the subquery

If you could use MariaDB there is a very very easy solution.
Since MariaDB-10.02 there has been added a new storage engine called CONNECT that can help us to convert the results of another query or table into a pivot table, just like what you want:
You can have a look at the docs.
First of all install the connect storage engine.
Now the pivot column of our table is itemname and the data for each item is located in itemvalue column, so we can have the result pivot table using this query:
create table pivot_table
engine=connect table_type=pivot tabname=history
option_list='PivotCol=itemname,FncCol=itemvalue';
Now we can select what we want from the pivot_table:
select * from pivot_table
More details here

My solution :
select h.hostid, sum(ifnull(h.A,0)) as A, sum(ifnull(h.B,0)) as B, sum(ifnull(h.C,0)) as C from (
select
hostid,
case when itemName = 'A' then itemvalue end as A,
case when itemName = 'B' then itemvalue end as B,
case when itemName = 'C' then itemvalue end as C
from history
) h group by hostid
It produces the expected results in the submitted case.

I make that into Group By hostId then it will show only first row with values,
like:
A B C
1 10
2 3

I figure out one way to make my reports converting rows to columns almost dynamic using simple querys. You can see and test it online here.
The number of columns of query is fixed but the values are dynamic and based on values of rows. You can build it So, I use one query to build the table header and another one to see the values:
SELECT distinct concat('<th>',itemname,'</th>') as column_name_table_header FROM history order by 1;
SELECT
hostid
,(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue else '' end) as col1
,(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue else '' end) as col2
,(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue else '' end) as col3
,(case when itemname = (select distinct itemname from history a order by 1 limit 3,1) then itemvalue else '' end) as col4
FROM history order by 1;
You can summarize it, too:
SELECT
hostid
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue end) as A
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue end) as B
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue end) as C
FROM history group by hostid order by 1;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
Results of RexTester:
http://rextester.com/ZSWKS28923
For one real example of use, this report bellow show in columns the hours of departures arrivals of boat/bus with a visual schedule. You will see one additional column not used at the last col without confuse the visualization:
** ticketing system to of sell ticket online and presential

This isn't the exact answer you are looking for but it was a solution that i needed on my project and hope this helps someone. This will list 1 to n row items separated by commas. Group_Concat makes this possible in MySQL.
select
cemetery.cemetery_id as "Cemetery_ID",
GROUP_CONCAT(distinct(names.name)) as "Cemetery_Name",
cemetery.latitude as Latitude,
cemetery.longitude as Longitude,
c.Contact_Info,
d.Direction_Type,
d.Directions
from cemetery
left join cemetery_names on cemetery.cemetery_id = cemetery_names.cemetery_id
left join names on cemetery_names.name_id = names.name_id
left join cemetery_contact on cemetery.cemetery_id = cemetery_contact.cemetery_id
left join
(
select
cemetery_contact.cemetery_id as cID,
group_concat(contacts.name, char(32), phone.number) as Contact_Info
from cemetery_contact
left join contacts on cemetery_contact.contact_id = contacts.contact_id
left join phone on cemetery_contact.contact_id = phone.contact_id
group by cID
)
as c on c.cID = cemetery.cemetery_id
left join
(
select
cemetery_id as dID,
group_concat(direction_type.direction_type) as Direction_Type,
group_concat(directions.value , char(13), char(9)) as Directions
from directions
left join direction_type on directions.type = direction_type.direction_type_id
group by dID
)
as d on d.dID = cemetery.cemetery_id
group by Cemetery_ID
This cemetery has two common names so the names are listed in different rows connected by a single id but two name ids and the query produces something like this
CemeteryID Cemetery_Name Latitude
1 Appleton,Sulpher Springs 35.4276242832293

You can use a couple of LEFT JOINs. Kindly use this code
SELECT t.hostid,
COALESCE(t1.itemvalue, 0) A,
COALESCE(t2.itemvalue, 0) B,
COALESCE(t3.itemvalue, 0) C
FROM history t
LEFT JOIN history t1
ON t1.hostid = t.hostid
AND t1.itemname = 'A'
LEFT JOIN history t2
ON t2.hostid = t.hostid
AND t2.itemname = 'B'
LEFT JOIN history t3
ON t3.hostid = t.hostid
AND t3.itemname = 'C'
GROUP BY t.hostid

I'm sorry to say this and maybe I'm not solving your problem exactly but PostgreSQL is 10 years older than MySQL and is extremely advanced compared to MySQL and there's many ways to achieve this easily. Install PostgreSQL and execute this query
CREATE EXTENSION tablefunc;
then voila! And here's extensive documentation: PostgreSQL: Documentation: 9.1: tablefunc or this query
CREATE EXTENSION hstore;
then again voila! PostgreSQL: Documentation: 9.0: hstore

SQL group by can't find correct phrase

I have a simple design
id | grpid | main
-----------------
1 | 1 | 1
2 | 1 | 0
3 | 1 | 0
4 | 2 | 0
5 | 2 | 1
6 | 2 | 0
The question to answer is
What is the "id" of the main in each group?
The result should be
id
---
1
5
Seriously at the moment, I'm not able to answer it on my own. Pls assist me.

Maybe i'm oversimplifying it here but couldn't you just do this:
select id,
grpid
from table
where main = 1;

The simplest way you can do this with:
select id from <table_name> where main=1
but as you have mentioned you want id with group by grpid below query will work.
select id from <table_name> group by grpid, main having main = 1
You have to apply group by on your group id and based on that check the value of main as 1. You will get the desired result.

If you want to add a column for its corresponding "MainId" then you can do this perhaps?
SELECT f.id, f.grpid, f.main, t.MainId
FROM foo f
CROSS APPLY (
SELECT grpid, id AS MainId
FROM foo f1
WHERE main = 1
AND f.grpid = f1.grpid) t

Oracle: DISTINCT or GROUP BY row consistency

I have following table:
Name Parent Status
A P1 0
A P2 1
B PB -1
Will following queue guarantee, that resulting data will be related to a single row:
SELECT
DISTINCT Name, Parent, Status
FROM
MyTable
For ex. could result set contain:
A, P1, 1
It doesn't match any row in the table. How can write an SQL statement, that selects ANY and AT MOST ONE row with each name?

SQL Fiddle
SELECT DISTINCT will get every row and then discard any duplicate rows in the result set.
The data you've given has duplicates in a column but no duplicate rows - so all rows will be returned:
Query 1:
SELECT DISTINCT
Name,
Parent,
Status
FROM MyTable
Results:
| NAME | PARENT | STATUS |
|------|--------|--------|
| A | P2 | 1 |
| B | PB | -1 |
| A | P1 | 0 |
For ex. could result set contain:
A, P1, 1
No, you can see from the above results that it does not. However, you can make a query that does:
Query 2:
SELECT Name,
MIN( Parent ),
MAX( Status )
FROM MyTable
GROUP BY Name
Results:
| NAME | MIN(PARENT) | MAX(STATUS) |
|------|-------------|-------------|
| A | P1 | 1 |
| B | PB | -1 |
In answer to your final question:
How can write an SQL statement, that selects ANY and AT MOST ONE row with each name?
This query orders the rows randomly and then selects the (randomly) first one for each name:
Query 3:
WITH Randomness AS (
SELECT Name,
Parent,
Status,
ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY SYS.DBMS_RANDOM.VALUE() ) AS Random_ID
FROM MyTable
)
SELECT Name,
Parent,
Status
FROM Randomness
WHERE Random_ID = 1
Results:
| NAME | PARENT | STATUS |
|------|--------|--------|
| A | P1 | 0 |
| B | PB | -1 |
If you run Query 3 a second time then you may get the other A row returned (or not - it's random).
Or if you want to be rather silly and completely random then you can select a random parent and a random status for each name (such that the Parent and Status do not have to come from the same row of the original table).
Query 4:
SELECT Name,
MIN( Parent ) KEEP ( DENSE_RANK FIRST ORDER BY SYS.DBMS_RANDOM.VALUE() ) AS Random_Parent,
MIN( Status ) KEEP ( DENSE_RANK FIRST ORDER BY SYS.DBMS_RANDOM.VALUE() ) AS Random_Status
FROM MyTable
GROUP BY Name
Results:
| NAME | RANDOM_PARENT | RANDOM_STATUS |
|------|---------------|---------------|
| A | P1 | 1 |
| B | PB | -1 |

Please try:
SELECT
Name,
Parent,
Status
FROM(
select
Name,
Parent,
Status,
ROW_NUMBER()
OVER (PARTITION BY Name order by Status desc) RNum
From YourTable
)x where RNum=1
SQL Fiddle Demo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I drop a row if repeated by category? - sql

Identify rows with done if any and prioritize them. select * except rn from ( select item, location, status , row_number() over ( partition by item order by case status when 'done' then 0 else 1 end ) as rn from t ) where rn = 1 (I didn't try it, excuse syntax errors please.)

Yes you can do it via exists condition like below after enabling standard SQL first. select * from yourtable A where not exists ( select 1 from yourtable B where A.id=B.id and A.location=B.location and A.status<>B.status AND B.status <> 'done' )

Related

How to traverse postgresql in the form of linked list?

Pulling multiple entries based on ROW_NUMBER

How do I transform the specific row value into column headers in hive [duplicate]

SQL group by can't find correct phrase

Oracle: DISTINCT or GROUP BY row consistency

Categories

Resources