SQL count duplicates in another column based on one field per row - sql

I am building out a customer retention report. We identify customers by their email. Here is some sample data from our table:
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| Email | BrandNewCustomer | RecurringCustomer | ReactivatedCustomer | OrderCount | TotalOrders | Date_Created | Customer_Name | Customer_Address | Customer_City | Customer_State | Customer_Zip | Customer_Country | | | | | |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| zyw#marketplace.amazon.com | 1 | 0 | 0 | 1 | 1 | 41:50.0 | Sha | 990 | BRO | NY | 112 | US | | | | | |
| zyu#gmail.com | 1 | 0 | 0 | 1 | 1 | 57:25.0 | Zyu | 181 | Mia | FL | 330 | US | | | | | |
| ZyR#aol.com | 1 | 0 | 0 | 1 | 1 | 10:19.0 | Day | 581 | Myr | SC | 295 | US | | | | | |
| zyr#gmail.com | 1 | 0 | 0 | 1 | 1 | 25:19.0 | Nic | 173 | Was | DC | 200 | US | | | | | |
| zy#gmail.com | 1 | 0 | 0 | 1 | 1 | 19:18.0 | Kim | 675 | MIA | FL | 331 | US | | | | | |
| zyou#gmail.com | 1 | 0 | 0 | 1 | 1 | 40:29.0 | zoe | 160 | Mob | AL | 366 | US | | | | | |
| zyon#yahoo.com | 1 | 0 | 0 | 1 | 1 | 17:21.0 | Zyo | 84 | Sta | CT | 690 | US | | | | | |
| zyo#gmail.com | 1 | 0 | 0 | 2 | 2 | 02:03.0 | Zyo | 432 | Ell | GA | 302 | US | | | | | |
| zyo#gmail.com | 1 | 0 | 0 | 1 | 2 | 12:54.0 | Zyo | 432 | Ell | GA | 302 | US | | | | | |
| zyn#icloud.com | 1 | 0 | 0 | 1 | 1 | 54:56.0 | Zyn | 916 | Nor | CA | 913 | US | | | | | |
| zyl#gmail.com | 0 | 1 | 0 | 3 | 3 | 31:27.0 | Ser | 123 | Mia | FL | 331 | US | | | | | |
| zyk#marketplace.amazon.com | 1 | 0 | 0 | 1 | 1 | 44:00.0 | Myr | 101 | MIA | FL | 331 | US | | | | | |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
We define our customer by email. So all orders with the same email are marked to be under one customer and then we do calculations on top of that.
Now I am trying to find out about customers whose emails have changed. So to do this we will try to line up customers by their address.
So per each row (so when separated by email), I want to have another column called something like Orders_With_Same_Address_Different_Email. How would I do that?
I have tried doing something with Dense Rank but it doesn't seem to work:
SELECT DISTINCT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
,(DENSE_RANK() over (partition by Email order by (case when email <> email then Customer_Address end) asc)
+DENSE_RANK() over ( partition by Email order by (case when email <> email then Customer_Address end) desc)
- 1) as Orders_With_Same_Name_Different_Email
--*
FROM Customers

Try counting the email partitioned by address, not by email:
select Email,
-- ...
Orders_With_Same_Name_Different_Email = iif(
(count(email) over (partition by Customer_Address) > 1,
1, 0)
from Customers;
But this is a lesson in why you wouldn't use an email as an identifier for a client. Address is a bad idea as well. Use something that won't change. That usually means making an internal identifier, such as something that auto-increments:
alter table #customers
add customerId int identity(1,1) primary key not null
Now customerId = 1 will always refer to that particular customer.

You can group by customer_address and check the count. This is by the assumption that each customer has one address.
Select * from table where
customer_address IN (
Select customer_address
From table group by customer_address
having count(distinct customer_email)
>1)

If I understand what you want to do, this is how I would solve it:
Note, you don't need the having clause in the CTE but depending on your data it could make it faster. (That is, if you have a large dataset.)
WITH email2addr
(
select email, count(distinct customer_address) as addr_cnt
from customers
group by email
having count(distinct customer_address) > 1
)
SELECT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
CASE when coalese(email2addr.addr_cnt,1) > 1 then 'Y' ELSE 'N' END as has_more_than_1_email
from customers
left join email2addr on customers.email = email2addr.email

Related

SQL Merge two Tables

Let's say I have these 2 tables:
ArticleTBL
+---------+----------+-------------+------------+
|articleid| typeid | price | user |
+---------+----------+-------------+------------+
| 0 | 2 | 1 | 122 |
| 1 | 3 | 2 | 344 |
| 2 | 3 | 1 | 455 |
| 3 | 1 | 4 | 34 |
+---------+----------+-------------+------------+
TypeTBL
+---------+----------+-------------+
|typeid | type | factory |
+---------+----------+-------------+
| 0 | wooden | factry1 |
| 1 | plastic | factry2 |
| 2 | metal | factry3 |
| 3 | sth. | factry4 |
+---------+----------+-------------+
How do I request all this information only with articleid for each row?
Isn't this what you want? Read more
SELECT a.articleid,
a.price.a.USER,
t.typeid,
t.type,
t.factory
FROM form ArticleTBL a
INNER JOIN typetbl t
ON a.typeid = t.typeid
WHERE a.articleid = 0

Selecting the first instance of a vendor, part combination

I am trying to create an indicator for if a particular transaction was the first time a part was purchased from a particular vendor.
I have a dataset that looks like this:
| transaction_id | vendor_id | part_id | trans_date |
|:--------------:|:---------:|:-------:|:-----------------:|
| 9Bx*2Pc' | a | 873 | 10/12/2018 |
| 1Po.4Ot, | a | 473 | 4/22/2016 |
| 9Sk"7Kv/ | b | 123 | 7/23/2016 |
| 2Lz&7Hu& | a | 873 | 12/20/2017 |
| 8Lz)5Is# | b | 743 | 10/22/2016 |
| 5Sc'6Jl/ | a | 113 | 10/6/2016 |
| 0Ra&8Hb& | a | 653 | 10/4/2017 |
| 4Wc-8Of* | c | 333 | 8/3/2017 |
| 8Vv+9Yo/ | c | 333 | 12/7/2016 |
| 6Qh!1Ha- | c | 333 | 3/28/2017 |
| 2Ol%4Rs# | c | 333 | 5/2/2017 |
| 1Gg#8Cm% | c | 333 | 11/15/2016 |
| 0Lw(6Pv/ | d | 873 | 8/13/2017 |
| 1Gy/7Zw, | a | 443 | 10/12/2018 |
| 2Gz,4Gp. | b | 103 | 1/5/2018 |
| 5Dj)6Wc+ | a | 893 | 12/17/2016 |
| 5Hl-8Ds! | a | 903 | 12/8/2017 |
| 8Ws$3Vy* | b | 873 | 1/13/2018 |
What I am looking to do is determine if the transaction_id was the first time (sorted by trans_date), that the part_id was purchased from a vendor_id. I would imagine the ideal output to look like this:
| transaction_id | vendor_id | part_id | trans_date | first_time |
|:--------------:|:---------:|:-------:|:-----------------:|:----------:|
| 9Bx*2Pc' | a | 873 | 10/12/2018 | N |
| 1Po.4Ot, | a | 473 | 4/22/2016 | Y |
| 9Sk"7Kv/ | b | 123 | 7/23/2016 | Y |
| 2Lz&7Hu& | a | 873 | 12/20/2017 | Y |
| 8Lz)5Is# | b | 743 | 10/22/2016 | Y |
| 5Sc'6Jl/ | a | 113 | 10/6/2016 | Y |
| 0Ra&8Hb& | a | 653 | 10/4/2017 | Y |
| 4Wc-8Of* | c | 333 | 8/3/2017 | N |
| 8Vv+9Yo/ | c | 333 | 12/7/2016 | N |
| 6Qh!1Ha- | c | 333 | 3/28/2017 | N |
| 2Ol%4Rs# | c | 333 | 5/2/2017 | N |
| 1Gg#8Cm% | c | 333 | 11/15/2016 | Y |
| 0Lw(6Pv/ | d | 873 | 8/13/2017 | Y |
| 1Gy/7Zw, | a | 443 | 10/12/2018 | Y |
| 2Gz,4Gp. | b | 103 | 1/5/2018 | Y |
| 5Dj)6Wc+ | a | 893 | 12/17/2016 | Y |
| 5Hl-8Ds! | a | 903 | 12/8/2017 | Y |
| 8Ws$3Vy* | b | 873 | 1/13/2018 | Y |
So far, I have tried (which was influenced by this post):
WITH
first_instance AS (
SELECT
tbl_trans.*,
ROW_NUMBER() OVER (PARTITION BY vendor_id||part_id ORDER BY trans_date) AS row_nums
FROM
tbl_trans
)
SELECT
x.*,
CASE WHEN y.row_nums = 1 THEN 'Y' ELSE 'N' END AS first_time_indicator
FROM
tbl_trans x
LEFT JOIN first_instance y
But I am met with:
ORA-00905: missing keyword
I have created a SQL FIDDLE with this data and the query thus far for testing. How can I determine the if a transaction was a first time purchase for a part/vendor combination?
Use window functions:
select t.*,
(case when row_number() over (partition by vendor_id, part_id order by trans_date) = 1
then 'Y' else 'N'
end) as first_time
from tbl_trans t;
You don't need a join.
Apart from row_number, there are multiple ways of achieving the desired result using analytical function as follows.
You can use first_value analytical function as follows:
Select t.*,
Case
when first_value(trans_date)
over (partition by vendor_id, part_id order by trans_date) = trans_date
then 'Y'
else 'N'
end as first_time
From your_table t;
The same way, you can also use min as follows:
Select t.*,
Case
when min(trans_date)
over (partition by vendor_id, part_id) = trans_date
then 'Y'
else 'N'
end as first_time
From your_table t;

Counting based on group of 1st column

I am using following query to count how many Bill_date each BAN have
select replace(c.usertoken, '-', '') as BAN
, to_char(to_date(bi.name,'YYYY-MM-DD'),'dd-mm-yy') as Billdate_dmy
, (replace(c.usertoken, '-', '') ||':'|| to_char(to_date(bi.name,'YYYY-MM-DD'),'dd-mm-yy')) as BAN_Billdate_dmy
, count(c.usertoken) as Number_Of_Bills
from customer c
, service s
, document d
, bill bi
, batch ba
, billrun br
where c.ID = s.CUSTOMER_SERVICE_ID
and s.ID = d.SERVICE_DOCUMENT_ID
and bi.ID = d.BILL_DOCUMENT_ID
and d.BATCH = ba.ID
and ba.BILLRUN = br.ID
and br.STATUS = 'APPROVED'
and c.brand='rogers'
and d.VERSIONEDCONTENTFOLDER='cbu'
group by c.usertoken, bi.name
order by c.usertoken
Output of the above query
+-----------+----------+--------------------+--------------+--+-------+
| BAN | Bill_date | BAN_Billdate | Count |
+-----------+----------+--------------------+--------------+--+-------+
| 100001247 | 25-09-19 | 100001247:25-09-19 | 1 | | |
| 100001247 | 25-10-19 | 100001247:25-10-19 | 1 | | |
| 100002583 | 15-10-19 | 100002583:15-10-19 | 1 | | |
| 100004753 | 25-09-19 | 100004753:25-09-19 | 1 | | |
| 100004753 | 25-10-19 | 100004753:25-10-19 | 1 | | |
| 100005719 | 25-09-19 | 100005719:25-09-19 | 1 | | |
| 100005719 | 25-10-19 | 100005719:25-10-19 | 1 | | |
| 100006311 | 06-09-19 | 100006311:06-09-19 | 1 | | |
| 100009596 | 25-09-19 | 100009596:25-09-19 | 1 | | |
| 100009596 | 25-10-19 | 100009596:25-10-19 | 1 | | |
+-----------+----------+--------------------+--------------+--+-------+
However I was expecting the following output
+-----------+----------+--------------------+--------------+--+-------+
| BAN | Billdate | BAN_Billdate | | Count |
+-----------+----------+--------------------+--------------+--+-------+
| 100001247 | 25-09-19 | 100001247:25-09-19 | 2 | | |
| 100001247 | 25-10-19 | 100001247:25-10-19 | 2 | | |
| 100002583 | 15-10-19 | 100002583:15-10-19 | 3 | | |
| 100004753 | 25-09-19 | 100004753:25-09-19 | 3 | | |
| 100004753 | 25-10-19 | 100004753:25-10-19 | 3 | | |
| 100005719 | 25-09-19 | 100005719:25-09-19 | 2 | | |
| 100005719 | 25-10-19 | 100005719:25-10-19 | 2 | | |
| 100006311 | 06-09-19 | 100006311:06-09-19 | 1 | | |
| 100009596 | 25-09-19 | 100009596:25-09-19 | 2 | | |
| 100009596 | 25-10-19 | 100009596:25-10-19 | 2 | | |
+-----------+----------+--------------------+--------------+--+-------+
Please advise what changes should I do in the query to have the count column reflecting the expected values.
I don't want to touch your query and the archaic join syntax. Please learn proper SQL grammar with JOIN and ON clauses for joins.
That said, you seem to want a window function to sum the counts:
select sum(count(*)) over (partition by ban, to_date(bi.name, 'YYYY-MM-DD'))
I'm not sure that aggregation is really useful, if you are only getting one row per group. In that case, you might want to remove the group by and use:
select count(*) over (partition by ban, to_date(bi.name, 'YYYY-MM-DD'))

How to fix correlated subquery wrongly selected data?

I want to select the 'batchid's from below table that batchid's all records 'subId' and 'substatus' in '23' and 'READY' respectively. if any values from 'subId' or 'substatus' is not matched '23' and 'READY' respectively then don't take that batch.
Table:
+---------+----------+--------+-------+-----------+
| batchid | dcn | dcnseq | subId | substatus |
+---------+----------+--------+-------+-----------+
| 10001 | 10001001 | 1 | 23 | READY |
| 10001 | 10001001 | 2 | 23 | READY |
| 10001 | 10001002 | 1 | 23 | READY |
| 10001 | 10001003 | 1 | 23 | READY |
| 10001 | 10001004 | 1 | 23 | READY |
| 10001 | 10001004 | 2 | 23 | READY |
| 10001 | 10001004 | 3 | 23 | READY |
| 10002 | 10001005 | 1 | 23 | READY |
| 10002 | 10001005 | 2 | 23 | READY |
| 10002 | 10001006 | 1 | 23 | READY |
| 10002 | 10001007 | 1 | 23 | READY |
| 10002 | 10001008 | 1 | 23 | READY |
| 10002 | 10001008 | 2 | 23 | READY |
| 10002 | 10001009 | 1 | 23 | READY |
+---------+----------+--------+-------+-----------+
I am using below query to achieve this requirement.
select distinct batchid from fm o
where o.subId='23' and o.substatus='READY'
and o.dcnseq='1' and o.batchid in
(
select a.batchid from
(
select i.batchid, SUM(case when i.subId='23' and i.substatus='READY' then 0 else 1 end)match from fm i
where i.batchid=o.batchid
group by i.batchid
having SUM(case when i.subId='23' and i.substatus='READY' then 0 else 1 end)=0
)a
)
Result:
+---------+
| batchid |
+---------+
| 10001 |
| 10002 |
+---------+
It's working perfectly. Now changed 'substatus' value for one records as 'HOLD'
+---------+----------+--------+-------+-----------+
| batchid | dcn | dcnseq | subId | substatus |
+---------+----------+--------+-------+-----------+
| 10001 | 10001001 | 1 | 23 | HOLD |
| 10001 | 10001001 | 2 | 23 | READY |
| 10001 | 10001002 | 1 | 23 | READY |
| 10001 | 10001003 | 1 | 23 | READY |
| 10001 | 10001004 | 1 | 23 | READY |
| 10001 | 10001004 | 2 | 23 | READY |
| 10001 | 10001004 | 3 | 23 | READY |
| 10002 | 10001005 | 1 | 23 | READY |
| 10002 | 10001005 | 2 | 23 | READY |
| 10002 | 10001006 | 1 | 23 | READY |
| 10002 | 10001007 | 1 | 23 | READY |
| 10002 | 10001008 | 1 | 23 | READY |
| 10002 | 10001008 | 2 | 23 | READY |
| 10002 | 10001009 | 1 | 23 | READY |
+---------+----------+--------+-------+-----------+
Now result is:
+---------+
| batchid |
+---------+
| 10002 |
+---------+
Now its also working correctly. But sometimes also picking '10001' for same case. its occurred when tables have lot of batchid. I try to understand mistake. But I can't able to find out.
I think your query is too complicated. Just use aggregation and having:
select batchid
from fm
group by batchid
having min(subid) = max(subid) and max(subid) = 23 and
min(substatus) = max(substatus) and max(substatus)= 'READY';
I don't know if your other conditions are important. They are in your query but not mentioned in the question.
Selecting batchid's where (subid=23 and substatus = 'READY'), and no other values for subid and/or substatus exist for that batchid.
select batchid
from fm
where subId=23 and substatus='READY'
group by batchid
except
select batchid
from fm
where not(subId=23 and substatus='READY' )
group by batchid
The simplest solution is with NOT EXISTS:
select distinct f.batchid
from fm f
where not exists (
select 1 from fm
where batchid = f.batchid and (coalesce(subid, 0) <> 23 or coalesce(substatus, '') <> 'READY')
)
coalesce() is needed only for the case there may exist nulls in the columns subId and substatus.
If there are not any nulls then the where clause can be simplified to:
where batchid = f.batchid and (subid <> 23 or substatus <> 'READY')
See the demo.
Results:
> | batchid |
> | ------: |
> | 10001 |
> | 10002 |

Moving data to correct record

I have a table where the data is needs to be corrected. Below is an example of one record. Basically the data in the selling closed_unit needs to be in the Agent_to_Agent Ref close_unit. I have tried every different what I can think of but I can't get it figured out. I am sure it is fairly simple I think I am just looking too hard at the wrong way. Any help is greatly appreciated!
Current (bad) data:
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| sale_no | payeeID | ComType | close_units | record_type | ref_agent_type | referring_agentID | ref_side |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| 7586 | 1001 | Listing | 1 | Listing | NULL | 0 | |
| 7586 | 2001 | Selling | 1 | Selling | NULL | 0 | |
| 7586 | 3254 | NULL | 0 | Off The Top Ref | NULL | 0 | L |
| 7586 | 4684 | Agent to Agent Ref | 0 | Agent Paid Ref | Selling | 2001 | |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
Expected result:
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| sale_no | payeeID | ComType | close_units | record_type | ref_agent_type | referring_agentID | ref_side |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| 7586 | 1001 | Listing | 1 | Listing | NULL | 0 | |
| 7586 | 2001 | Selling | 0 | Selling | NULL | 0 | |
| 7586 | 3254 | NULL | 0 | Off The Top Ref | NULL | 0 | L |
| 7586 | 4684 | Agent to Agent Ref | 1 | Agent Paid Ref | Selling | 2001 | |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
The following query will copy the value to the "Agent to Agent Ref" row:
update my_table t1 set close_units = (
select close_units from my_table t2
where t2.sale_no = t1.sale_no and t2.ComType = 'Selling'
)
where ComType = 'Agent to Agent Ref';
And this one will reset the "Selling" value to zero:
update my_table t1
set close_units = 0
where ComType = 'Selling'
and exists (
select close_units from my_table t2
where t2.sale_no = t1.sale_no and t2.ComType = 'Agent to Agent Ref'
)