Finding duplicate lines

Finding duplicate lines - sql

I am looking for the best solution to find duplicated rows that have in in specific column NULL value and some INTGER value as shown bellow.
Result
This is what I expect to get from query
TREATY_NUMBER
SECTION_NUMBER
DT_PERIOD_START
INVOLVEMENT
1
001
20190101
NULL
1
001
20190101
58
1
001
20200101
NULL
1
001
20200101
58
2
001
20200101
NULL
2
001
20200101
77
2
001
20200101
NULL
2
001
20210101
77
I was trying to do something like this to find all TREATY_NUMBERs that have INTEGER value and than to join them to same table to get all data.
select distinct v.*
from STREATY v
join
(select TREATY_NUMBER, SECTION_NUMBER, DT_PERIOD_START, max(INVOLVEMENT) INV
from TREATY group by TREATY_NUMBER, SECTION_NUMBER, DT_PERIOD_START
having count(*) >1) a
on a.TREATY_NUMBER=v.TREATY_NUMBER and a.DT_PERIOD_START=v.DT_PERIOD_START
where a.INV is not null
But in this case I got also a lines that have only INTEGER value but do not have any NULL value
This is what I get now from query
TREATY_NUMBER
SECTION_NUMBER
DT_PERIOD_START
INVOLVEMENT
1
001
20190101
NULL
1
001
20190101
58
1
001
20200101
NULL
1
001
20200101
58
2
001
20200101
NULL
2
001
20200101
77
2
001
20200101
NULL
2
001
20210101
77
6038
001
20200101
6
6038
001
20200101
7
6038
001
20200101
8

Related

Counting distinct ID based on date

So I have a table as follows:
ID create_date
001 01/01/2021
002 02/04/2021
003 07/22/2021
004 01/29/2021
005 03/01/2021
ID is unique for the table.
I have another table (below) where these IDs appear multiple times alongside another variable, titled code_id.
ID code_id date data
001 A 01/01/2021 xxx
002 W 02/08/2021 xxx
002 B 03/06/2021 xxx
001 A 01/19/2021 xxx
002 C 05/01/2021 xxx
004 D 12/01/2021 xxx
001 K 01/02/2021 xxx
001 J 01/15/2021 xxx
005 A 03/01/2021 xxx
005 A 03/01/2021 xxx
005 B 03/05/2021 xxx
005 B 03/30/2021 xxx
005 C 03/30/2021 xxx
005 D 04/01/2021 xxx
What I want to do is create a new table (preferably via CTE, but open to join options) which show the distinct count of code_id after both 5 and 30 days from table1.create_date.
So in other words, how many different code_id's appear for each ID after x days from create_date, where x is equal to 5 and 30 respectively.
Here is the resulting table I seek:
ID distinct_code_id_5_day distinct_code_id_30_day distinct_code_id_total
001 2 3 3
002 1 2 3
003 0 0 0
004 0 0 1
005 2 3 4
In the case of ID = 001,we show all code_id's that appeared from 01/01/2021 - 01/05/2021, inclusive for distinct_code_id_5_day and 01/01/2021 - 01/30/2021, inclusive for distinct_code_id_30_day.

You should be able to solve this with a join and a couple iff() with date math:
with ids as (
select split(value, ' ') x, x[0] id, x[1]::date create_date
from table(split_to_table('001 01/01/2021
002 02/04/2021
003 07/22/2021
004 01/29/2021
005 03/01/2021', '\n'))
), data as(
select split(value, ' ') x, x[0] id, x[7] code_id, x[9]::date date, x[11] data
from table(split_to_table('001 A 01/01/2021 xxx
002 W 02/08/2021 xxx
002 B 03/06/2021 xxx
001 A 01/19/2021 xxx
002 C 05/01/2021 xxx
004 D 12/01/2021 xxx
001 K 01/02/2021 xxx
001 J 01/15/2021 xxx
005 A 03/01/2021 xxx
005 A 03/01/2021 xxx
005 B 03/05/2021 xxx
005 B 03/30/2021 xxx
005 C 03/30/2021 xxx
005 D 04/01/2021 xxx', '\n')))
select id, count(distinct code5), count(distinct code30), count(distinct code_id)
from (
select a.id, iff(a.create_date + 5 >= b.date, b.code_id, null) code5
, iff(a.create_date + 30 >= b.date, b.code_id, null) code30
, b.code_id
from ids a
left outer join data b
where a.id=b.id
)
group by 1

Count the same occurrences from one column and store the incremental count in a new column

For a given pair of cat_id, subcat_id, I want to count the occurrences of the same brand_id and store the incremental count in a new column called counter:
cat_id subcat_id product_code customer_id quantity brand_id
----------------------------------------------------------------
123 456 AB,CD 111 2 1
123 456 CD 111 3 1
123 456 AB 222 2 1
123 789 AB,CD 111 2 2
123 789 CD 111 3 2
123 789 AB 222 2 2
The result should be:
cat_id subcat_id product_code customer_id quantity brand_id counter
---------------------------------------------------------------------------
123 456 AB,CD 111 2 1 1
123 456 CD 111 3 1 2
123 456 AB 222 2 1 3
123 789 AB,CD 111 2 2 1
123 789 CD 111 3 2 2
123 789 AB 222 2 2 3

It looks like you want row_number():
select t.*,
row_number() over (partition by cat_id, subcat_id order by customer_id, quantity)
from t;

SQL - Check if field value (money) is 99% of the value where customer is equal

Please be advised that I would like to write an SQL query that has the following conditions.
Check if there is a debit amount = between (99% and 101%) of credit amount (and vice versa)
where customer is equal and date is today
Lets say that I have the table below:
Customer Debit Credit Amount Processing_Date
1001 D 100 01/12/2020
1001 C 100.02 01/12/2020
1002 D 102 01/12/2020
1002 C 102 01/12/2020
1004 D 106 01/12/2020
1004 C 135 01/12/2020
1005 D 111 01/12/2020
1006 D 123 01/12/2020
In this case I want only the 1st 4 records to be displayed.
Can someone suggest what the SQL query should look like to obtain such result?
Thank you for your time.

You can try below approach to get the ratio and filter out. I have put constant for today. You can accordingly use GETDATE().
DECLARE #table table(customerid int, debit char(1), credit char(1),
amt money, dateval date)
INSERT INTO #table
values
(1001,'D',null,100 ,'01/12/2020')
,(1001,null,'C',100.02,'01/12/2020')
,(1002,'D',null,102 ,'01/12/2020')
,(1002,null,'C',102 ,'01/12/2020')
,(1004,'D',null,106 ,'01/12/2020')
,(1004,null,'C',135 ,'01/12/2020')
,(1005,'D',null,111 ,'01/12/2020')
,(1006,'D',null,123 ,'01/12/2020');
;With cte_customerId as
(
select customerId
,sum(case when debit is not null then amt end) as debit
,sum(case when credit is not null then amt end) as credit
from #table
WHERE DATEVAL = '01/12/2020'
group by customerid
)
SELECT * FROM #table where customerid in
(
SELECT customerid FROM cte_customerId
where (credit/debit) between 0.99 and 1.01
or (debit/credit) between 0.99 and 1.01
)
+------------+-------+--------+--------+------------+
| customerid | debit | credit | amt | dateval |
+------------+-------+--------+--------+------------+
| 1001 | D | NULL | 100.00 | 2020-01-12 |
| 1001 | NULL | C | 100.02 | 2020-01-12 |
| 1002 | D | NULL | 102.00 | 2020-01-12 |
| 1002 | NULL | C | 102.00 | 2020-01-12 |
+------------+-------+--------+--------+------------+

SQL- selecting two id's who share something in common

I have a table that has player_id, team_id
I want to find all players who played on the same 3 or more teams.
The expected output would be :
player1, player2, number_of_teams
so far i have something like
SELECT player_id as player1, player_id as player2, count(team_id) as number_of_teams
FROM player_history
WHERE ....
Sample Data:
player_id | team_id
--------------------
001 | 23
001 | 15
001 | 21
002 | 23
002 | 21
002 | 15
002 | 34
003 | 23
003 | 15
003 | 34
003 | 21
004 | 12
004 | 11
004 | 23
should return:
player1 | player2 | number_of_teams
-----------------------------------
001 | 002 | 3
001 | 003 | 3
002 | 003 | 4

What you should do is join your table with itself, on the same team but different players, once found, you should group the result table and count
Since I assume there's more than 2 players in each team and you're looking for different players in the same year as implied (not really specified) in your question, I took the liberty to add it to the join conditions
You can, of course, remove it
SELECT
p1,
p2,
COUNT(team_id) as total
FROM
(
SELECT
h1.team_id,
h1.player_id as p1,
h2.player_id as p2
FROM
player_history h1
INNER JOIN player_history h2 ON h1.team_id = h2.team_id AND h1.player_id != h2.player_id AND h1.year = h2.year
GROUP BY
h1.team_id,
h1.player_id
) sameteam
GROUP BY
p1,
p2
HAVING
total >= 3
Notice that your example result doesn't fit the example data. play 4 should not be on the list
SQLFiddle here
hope it helps

How to get difference of price of today and yesterday price for a product and also find out whether product was available yesterday or not?

I have a table "price_hist" in AmazonRedshift (Postgresql) which has product and price data for 10 countries on daily basis twice a day. I want only latest data for each day for each product
For Example below is the table
Country Product Price(string) Created_On
US 001 $2,300 2015/02/16 00:46:20
US 001 $2,300 2015/02/16 13:27:12
DK 006 kr1,700 2015/02/16 00:46:20
DK 006 kr1,700 2015/02/16 13:27:12
US 002 $5,300 2015/02/15 00:46:20
US 002 $5,300 2015/02/15 13:27:12
US 001 $2,200 2015/02/15 00:46:20
US 001 $2,200 2015/02/15 13:27:12
DK 007 kr28 2015/02/15 00:46:20
DK 007 kr28 2015/02/15 13:27:12
US 001 $2,100 2015/02/14 00:46:20
US 002 $5,200 2015/02/14 13:27:12
DK 007 kr9,100 2015/02/14 00:46:20
DK 007 kr9,100 2015/02/14 13:27:12
Now I want a query which should show always data for today and yesterday with price difference and with a flag for product whether it was available yesterday or not.
Required Output :
Country Product P_today p_yesterday p_change flag created_on
US 001 2300 2200 100 Both 2015/02/16 13:27:12
US 002 0 5300 -5300 Removed 2015/02/15 13:27:12
DK 006 1700 0 1700 Added 2015/02/16 13:27:12
DK 007 0 9100 -9100 Removed 2015/02/15 13:27:12
where column P_Change - Show price changes between today's and yesterday's products.
flag - Create a column to reflect new products added in Today's data and the ones which got removed.

You can do it with something like that:
select country,product,P_today,P_yesterday, (P_today - P_yesterday) as P_change ,
CASE
WHEN P_today > 0 and P_yesterday > 0 then 'both'
WHEN P_today = 0 and P_yesterday > 0 then 'removed'
WHEN P_today > 0 and P_yesterday = 0 then 'added'
END
from
(select
isnull(q1.country,q2.country) as country,isnull(q1.product, q2.product) as product ,isnull(q1.price, 0) as P_today, isnull(q2.price,0) as P_yesterday
from
(select * from product where created_on in (select max(created_on) from product where date_trunc('day', created_on) = '2015-02-16 00:00:00+00' group by product,country)) as q1
full outer join
(select * from product where created_on in (select max(created_on) from product where date_trunc('day', created_on) = '2015-02-15 00:00:00+00' group by product,country)) as q2
on q1.country = q2.country and q1.product = q2.product )
I tested it and it gave me something similar to what you are looking for, see below:
country | product | p_today | p_yesterday | p_change | case
---------+---------+---------+-------------+----------+---------
US | 001 | 2300 | 2300 | 0 | both
US | 002 | 0 | 2300 | -2300 | removed
DK | 006 | 700 | 0 | 700 | added
DK | 007 | 0 | 2300 | -2300 | removed
Hope that helps.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding duplicate lines - sql

Related

Counting distinct ID based on date

Count the same occurrences from one column and store the incremental count in a new column

SQL - Check if field value (money) is 99% of the value where customer is equal

SQL- selecting two id's who share something in common

How to get difference of price of today and yesterday price for a product and also find out whether product was available yesterday or not?

Categories

Resources