Display duplicate value of two columns in different rows - sql

I have a table in which there can be two newspaper publishing dates for a particular value which is inserted in a single column only under NewsPaperDate. All the remaining values get duplicated. Now I have to write a query in which the two NewsPaperDate values should be shown in a single row under two columns, NewsPaperDate1 and NewsPaperDate2, with the remaining values. Can anyone help in this? The DataBase is Sql Server
The Table structure is

You need to join the table to itself. There are different ways of doing this but based on your screenshot you could do:
select
a.yonja_no,
a.newspaper_date as newspaperdate1,
b.newspaper_date as newspaperdate2
from newspapertable a, newspapertable b
where a.yonja_no = b.yonja_no
and a.newspapere_s > b.newspapere_s
;
(Edited: see comments)

check fiddle link for query execution with sample data
create table tab1(newspaperDate number,b number,c number);
INSERT INTO tab1 VALUES(1,2,3);
INSERT INTO tab1 VALUES(2,2,3);
INSERT INTO tab1 VALUES(3,3,4);
SELECT t1.newspaperDate AS date1,t2.newspaperDate AS date2 , t1.b AS b1,t1.c AS c1 FROM tab1 t1 , tab1 t2
WHERE t1.newspaperDate < t2.newspaperDate AND t1.b=t2.b ;
OUTPUT
| DATE1 | DATE2 | B1 | C1 |
---------------------------
| 1 | 2 | 2 | 3 |

Joining a table to itself is the best approach for your query. Read
this
http://www.thunderstone.com/site/texisman/joining_a_table_to_itself.html

Related

How to delete rows where more than 1 column matches another table?

I have two tables. One (let's call it table1) looks a bit like this:
account_number | offer_code
---------------|-----------
1 | 123
1 | 456
2 | 123
The other table (let's call it table2) looks a bit like this:
account_number | offer_code
---------------|-----------
1 | 123
I want to delete all rows from table1 where the account_number AND the offer_code match a row in table2. So afterwards table1 would look like this:
account_number | offer_code
---------------|-----------
1 | 456
2 | 123
I've tried the following, but it doesn't run:
DELETE
FROM TABLE1 A
INNER JOIN
TABLE2 B
ON A.ACCOUNT_NUMBER = B.ACCOUNT_NUMBER
AND A.OFFER_CODE = B.OFFER_CODE
;
I've also tried the following. It seems to run, but the sheer volume of data in both tables (65.5m rows in table1 and 9m in table2) mean it takes an impractically long time to do so (I was forced to kill the query after 3 hours).
DELETE
FROM TABLE1
WHERE CONCAT(ACCOUNT_NUMBER, OFFER_CODE) IN
(
SELECT CONCAT(ACCOUNT_NUMBER, OFFER_CODE)
FROM TABLE2
)
;
Does anyone know if there is a way to accomplish this efficiently please?
Databases do not like update and delete processes. They are exhausting. Depending on your application(carefully check this out!!!) you can try this:
create table table1_tmp
select * from table1
minus
select * from table2;
alter table table1 rename to table1_tmp2;
alter table table1_tmp rename to table1;

Remove duplicate rows based on specific columns

I have a table that contains these columns:
ID (varchar)
SETUP_ID (varchar)
MENU (varchar)
LABEL (varchar)
The thing I want to achieve is to remove all duplicates from the table based on two columns (SETUP_ID, MENU).
Table I have:
id | setup_id | menu | label |
-------------------------------------
1 | 10 | main | txt |
2 | 10 | main | txt |
3 | 11 | second | txt |
4 | 11 | second | txt |
5 | 12 | third | txt |
Table I want:
id | setup_id | menu | label |
-------------------------------------
1 | 10 | main | txt |
3 | 11 | second | txt |
5 | 12 | third | txt |
You can achieve this with a common table expression (cte)
with cte as (
select id, setup_id, menu,
row_number () over (partition by setup_id, menu, label) rownum
from atable )
delete from atable a
where id in (select id from cte where rownum >= 2)
This will give you your desired output.
Common Table Expression docs
Assuming a table named tbl where both setup_id and menu are defined NOT NULL and id is the PRIMARY KEY.
EXISTS will do nicely:
DELETE FROM tbl t0
WHERE EXISTS (
SELECT FROM tbl t1
WHERE t1.setup_id = t0.setup_id
AND t1.menu = t0.menu
AND t1.id < t0.id
);
This deletes every row where a dupe with lower id is found, effectively only keeping the row with the smallest id from each set of dupes. An index on (setup_id, menu) or even (setup_id, menu, id) will help performance with big tables a lot.
If there is no PK and no reliable UNIQUE (combination of) column(s), you can fall back to using the ctid. If NULL values can be involved, you need to specify how to deal with those.
Consider:
Delete duplicate rows from small table
How to delete duplicate rows without unique identifier
How do I (or can I) SELECT DISTINCT on multiple columns?
After cleaning up duplicates, add a UNIQUE constraint to prevent new dupes:
ALTER TABLE tbl ADD CONSTRAINT tbl_setup_id_menu_uni UNIQUE (setup_id, menu);
If you had an index on (setup_id, menu), drop that now. It's superseded by the UNIQUE constraint.
I have found a solution that fits me the best.
Here it is if anyone needs it:
DELETE FROM table_name
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY setup_id,
menu
ORDER BY id ) AS row_num
FROM table_name ) t
WHERE t.row_num > 1 );
link: https://www.postgresql.org/docs/current/queries-union.html
https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT
let's sat table name is a
select distinct on (setup_id,menu ) a.* from a;
Key point: The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Which means you can only order by setup_id,menu in this distinct on query scope.
Want the opposite:
EXCEPT returns all rows that are in the result of query1 but not in the result of query2. (This is sometimes called the difference between two queries.) Again, duplicates are eliminated unless EXCEPT ALL is used.
SELECT * FROM a
EXCEPT
select distinct on (setup_id,menu ) a.* from a;
You can try something along these lines to delete all but the first row in case of duplicates (please note that this is not tested in any way!):
DELETE FROM your_table WHERE id IN (
SELECT unnest(duplicate_ids[2:]) FROM (
SELECT array_agg(id) AS duplicate_ids FROM your_table
GROUP BY SETUP_ID, MENU
HAVING COUNT(*) > 1
)
)
)
The above collects the ids of the duplicate rows (COUNT(*) > 1) in an array (array_agg), then takes all but the first element in that array ([2:]) and "explodes" the id values into rows (unnest).
The outer query just deletes every id that ends up in that result.
For mysql the similar question is already answered here Find and remove duplicate rows by two columns
Try if any of the approach helps in this matter.
I like the below one for MySql:
ALTER IGNORE TABLE your_table ADD UNIQUE (SETUP_ID, MENU);
DELETE t1
FROM table_name t1
join table_name t2 on
(t2.setup_id = t1.setup_id or t2.menu = t1.menu) and t2.id < t1.id
There are many ways to find and delete all duplicate row(s) based on conditions. But I like inner join method, which works very fast even in a large amount of Data. Please check follows :
DELETE T1 FROM <TableName> T1
INNER JOIN <TableName> T2
WHERE
T1.id > T2.id AND
T1.<ColumnName1> = T2.<ColumnName1> AND T1.<ColumnName2> = T2.<ColumnName2>;
In your case you can write as follows :
DELETE T1 FROM <TableName> T1
INNER JOIN <TableName> T2
WHERE
T1.id > T2.id AND
T1.setup_id = T2. setup_id;
Let me know if you face any issue or need more help.

SQL Select Where Opposite Match Does Not Exist

Trying to compare between two columns and check if there are no records that exist with the reversal between those two columns. Other Words looking for instances where 1-> 3 exists but 3->1 does not exist. If 1->2 and 2->1 exists we will still consider 1 to be part of the results.
Table = Betweens
start_id | end_id
1 | 2
2 | 1
1 | 3
1 would be added since it is a start to an end with no opposite present of 3,1. Though it did not get added until the 3rd entry since 1 and 2 had an opposite.
So, eventually it will just return names where the reversal does not exist.
I then want to join another table where the number from the previous problem has its name installed on it.
Table = Names
id | name
1 | Mars
2 | Earth
3 | Jupiter
So results will just be the names of those that don't have an opposite.
You can use a not exists condition:
select t1.start_id, t1.end_id
from the_table t1
where not exists (select *
from the_table t2
where t2.end_id = t1.start_id
and t2.start_id = t1.end_id);
I'm not sure about your data volume, so with your ask, below query will supply desired result for you in Sql Server.
create table TableBetweens
(start_id INT,
end_id INT
)
INSERT INTO TableBetweens VALUES(1,2)
INSERT INTO TableBetweens VALUES(2,1)
INSERT INTO TableBetweens VALUES(1,3)
create table TableNames
(id INT,
NAME VARCHAR(50)
)
INSERT INTO TableNames VALUES(1,'Mars')
INSERT INTO TableNames VALUES(2,'Earth')
INSERT INTO TableNames VALUES(3,'Jupiter')
SELECT *
FROM TableNames c
WHERE c.id IN (
SELECT nameid1.nameid
FROM (SELECT a.start_id, a.end_id
FROM TableBetweens a
LEFT JOIN TableBetweens b
ON CONCAT(a.start_id,a.end_id) = CONCAT(b.end_id,b.start_id)
WHERE b.end_id IS NULL
AND b.start_id IS NULL) filterData
UNPIVOT
(
nameid
FOR id IN (filterData.start_id,filterData.end_id)
) AS nameid1
)

variable in SQL

I have a DB2 table with two columns A and B storing alphanumeric values.I want to find whether a value(MyValue) exist in between A and B. I want my result to be : MyValue | A | B
I could have used: SELECT 'MyValue',A,B FROM TABLENAME WHERE(A < 'MyValue' AND B > 'MyValue') but the constraint is, I have to search more than 10000 values with single query and want the result as below format.
Ex. value=W140686,0032090,0045790...etc
Expected Result:
MyValue A B
W140686 | W000000 | W999999
0032090 | 0000001 | 0500000
0045790 | 0000001 | 0500000
..
..
..
Any help / suggestion would be highly appreciated.
Thanks in advance.
The BETWEEN predicate includes the end points. In this case, the result set will include the string 'MyValue' if it's equal to either the string in column A or the string in column B.
SELECT 'MyValue',A,B
FROM TABLENAME
WHERE 'MyValue' BETWEEN A AND B;
If you want to exclude the end points . . .
SELECT 'MyValue',A,B
FROM TABLENAME
WHERE 'MyValue' > A AND 'MyValue' < B;
For performance, you might want one index on the pair of columns {A, B}. For details, read the execution plan.
If you need to supply 10000 values to the WHERE clause, you should probably store them in a temporary table. Join the temporary table to TABLENAME.
-- DB2 syntax to create a temporary table is different. Look it up.
create temp table my_values (
MyValue varchar(10) primary key
);
-- Insert the 10000 values here.
insert into my_values values
('W140686'),('0032090'), ('0045790');
select T1.MyValue, T2.a, T2.b
from my_values T1
inner join TABLENAME T2
on T1.MyValue > T2.a and T1.MyValue < T2.b
The join predicate is the same as the WHERE clause in my earlier queries.

Adding Row Numbers To a SELECT Query Result in SQL Server Without use Row_Number() function

i need Add Row Numbers To a SELECT Query without using Row_Number() function.
and without using user defined functions or stored procedures.
Select (obtain the row number) as [Row], field1, field2, fieldn from aTable
UPDATE
i am using SAP B1 DIAPI, to make a query , this system does not allow the use of rownumber() function in the select statement.
Bye.
I'm not sure if this will work for your particular situation or not, but can you execute this query with a stored procedure? If so, you can:
A) Create a temp table with all your normal result columns, plus a Row column as an auto-incremented identity.
B) Select-Insert your original query, sans the row column (SQL will fill this in automatically for you)
C) Select * on the temp table for your result set.
Not the most elegant solution, but will accomplish the row numbering you are wanting.
This query will give you the row_number,
SELECT
(SELECT COUNT(*) FROM #table t2 WHERE t2.field <= t1.field) AS row_number,
field,
otherField
FROM #table t1
but there are some restrictions when you want to use it. You have to have one column in your table (in the example it is field) which is unique and numeric and you can use it as a reference. For example:
DECLARE #table TABLE
(
field INT,
otherField VARCHAR(10)
)
INSERT INTO #table(field,otherField) VALUES (1,'a')
INSERT INTO #table(field,otherField) VALUES (4,'b')
INSERT INTO #table(field,otherField) VALUES (6,'c')
INSERT INTO #table(field,otherField) VALUES (7,'d')
SELECT * FROM #table
returns
field | otherField
------------------
1 | a
4 | b
6 | c
7 | d
and
SELECT
(SELECT COUNT(*) FROM #table t2 WHERE t2.field <= t1.field) AS row_number,
field,
otherField
FROM #table t1
returns
row_number | field | otherField
-------------------------------
1 | 1 | a
2 | 4 | b
3 | 6 | c
4 | 7 | d
This is the solution without functions and stored procedures, but as I said there are the restrictions. But anyway, maybe it is enough for you.
RRUZ, you might be able to hide the use of a function by wrapping your query in a View. It would be transparent to the caller. I don't see any other options, besides the ones already mentioned.