For this problem I would be happy with a solution either in R (ideally with dplyr but other methods would also be OK) or pure SQL.
I have data consisting for individuals (ID) and email addresses, and a binary indicator representing whether the email address is the individual's primary email address (1) or not (0)
all IDs have one and only one primary email address
IDs can have several non-primary email addresses (or none)
IDs can have the same email address as both primary and non-primary
For example:
ID Email Primary
1 1 A 1
2 1 A 0
3 1 B 0
4 2 A 1
5 2 A 0
6 3 C 1
7 4 D 1
8 4 C 0
9 5 E 1
10 5 F 0
(The actual dataset has around half a million rows)
I wish to identify IDs where an email address is non-primary, but is primary for a different ID. That is, I want to select rows where:
Primary is 0
There exists another row where that ID is Primary but for a different ID
Thus in the data above, I want to select row 5 (because the email address is non-primary, but primary in row 1 for a different ID and row 8 (because it is non-primary, but primary in row 6 for a different ID) and row 2
For R users, here is the toy dataframe above:
structure(list(ID = c(1, 1, 1, 2, 2, 3, 4, 4, 5, 5), Email = c("A", "A", "B", "A", "A", "C", "D", "C", "E", "F"), Primary = c(1, 0, 0, 1, 0, 1, 1, 0, 1, 0)), class = "data.frame", row.names = c(NA, -10L))
You can select rows where
Primary = 0
number of ID's for that Email is greater than 1.
There is atleast one primary = 1 for that Email
Using dplyr, you can do this as :
library(dplyr)
df %>%
group_by(Email) %>%
filter(Primary == 0, n_distinct(ID) > 1, any(Primary == 1))
# ID Email Primary
# <dbl> <chr> <dbl>
#1 1 A 0
#2 2 A 0
#3 4 C 0
Since you have big data a data.table solution would be helpful :
library(data.table)
setDT(df)[, .SD[Primary == 0 & uniqueN(ID) > 1 & any(Primary == 1)], Email]
In SQL, you can use exists for this:
select t.*
from mytable t
where t.primary = 0
and exists (
select 1
from mytable t1
where t1.email = t.email
and t1.id <> t.id
and t1.primary = 1
)
Related
I am trying to process data within the same table.
Input:
Table
id sort value
1 1 1
2 1 8
3 2 0
4 1 2
What I want to achieve is obtain for each id, the first encountered value for all value equal to its sort, and this ordered by id.
Output
Table
id sort value new
1 1 1 1
2 1 8 1
3 2 0 0
4 1 2 1
I tried to self join the table, but I constantly get relation not found. I tried with a case statement but I don't see how can I connect to the same table, I get the same error, relation not found.
The beauty of SQL is that many requirements (yours included) can be verbosely described in very similar way they are finally coded:
with t(id, sort, value ) as (values
(1, 1, 1),
(2, 1, 8),
(3, 2, 0),
(4, 1, 2)
)
select t.*
, first_value(value) over (partition by sort order by id) as "new"
from t
order by id
id
sort
value
new
1
1
1
1
2
1
8
1
3
2
0
0
4
1
2
1
fiddle
I have a messaging system with database tables like below. UID is the users id, and sid is the store id that is sending the message.
UID SID Content
1 10 "blah"
1 11 ...
2 10 ...
3 12 ...
3 12 ...
3 10 ...
I want to group users with the number of messages they have received per store. So the output will be
UID NumUniqueSIDs
1 1 (corresponds to uid = 1, sid = 10)
1 1 (corresponds to uid = 1, sid = 11)
2 1 (corresponds to UID = 2, sid = 10)
3 2 (corresponds to UID = 3, sid = 12)
3 1 (corresponds to UID = 3, sid = 10)
I have been unable to come up with a query that accomplishes this. Does anyone know how this can be done?
The following query produces the correct results:
SELECT uid, count(sid) as NumUniqueSIDs from Messages group by uid, sid
I have an input:
id
1
2
3
4
5
6
7
8
9
10
I want get even and odd columns separately by columns in specified output like this
id col
1 2
3 4
5 6
7 8
9 10
here id and col are separate columns id contains the odd number and col contains the even number for specified input
SELECT MIN(id) as id, MAX(id) as col
FROM YourTable
GROUP BY FLOOR((id+1)/2)
For IDs 1 and 2, (id+1)/2 are 2/2 = 1 and 3/2 = 1.5, respectively, and FLOOR then returns 1 for both of them. Similarly, for 3 and 4, this is 2, and so on. So it groups all the input rows into pairs based on this formula. Then it uses MIN and MAX within each group to get the lower and higher IDs of the pairs.
Joined the table on itself
select *
from yourTable tA
left join yourTable tb on tA.id = (tB.id - 1)
where tA.id % 2 <> 0
If you use SQL you can try:
SELECT CASE WHEN column % 2 = 1
THEN column
ELSE null
END AS odds,
CASE WHEN column % 2 = 2
THEN column
ELSE null
END AS even
FROM yourtable
but not exactl as you ask
To show odd:
Select * from MEN where (RowID % 2) = 1
To show even:
Select * from MEN where (RowID % 2) = 0
Now, just join those two result sets and that's it.
Source
I have this database:
Id TsOffersId OffersId
----------- ----------- -----------
0 0 0
1 0 9
2 0 16
3 1 0
4 1 9
5 1 16
6 1 20
7 2 0
8 2 9
I get from the input some values for the "OffersId", let's say I get 0, 9, and 16.
In this case I need to match only the rows with a TsOffersId value of 0, because:
TsOffersId = 1 is in a row with an OffersId value different than 0, 9, and 16
TsOffersId = 2 isn't in any row with an OffersId = 16
Any elegant solution?
You can do this with aggregation and a having clause. Here is one way:
select TsOffersId
from databasetable t
group by TsOffersId
having count(distinct OffersId) = 3 and
count(distinct case when OffersId in (0, 9, 16) then OffersId end) = 3;
The first condition checks that there are three distinct values (so there are no extras). The second checks that there are three distinct values when you only look at specific values (so all values are there).
I have a table with 3 columns as below
id a b
=================
1 1 2
2 1 3
3 1 4
4 2 4
5 2 5
6 3 4
7 3 5
I wanna show the result
if a column or b column is duplicated,
I have try to use Group by ( a,b) but result is not I want.
I wanna Group by (a) and show grouped first row A,B and B is not duplicated
In my example, A will grouped into { 1, 2 ,3 },
and B will show {2 , 4, 5} not {2,4,4} because 4,4 is duplicated
id a b
=================
1 1 2
4 2 4
7 3 5
How do I do for this ?
Sorry, I’m not good at English.
Thx for help.
This code goes from your example data to your example results. Seems strange though and I doubt this is what you're looking for. If you give more detail, then you can get a better answer.
CREATE TABLE Example
(
id INT NOT NULL,
a INT NOT NULL,
b INT NOT NULL
)
GO
INSERT Example
VALUES
(1, 1, 2)
, (2, 1, 3)
, (3, 1, 4)
, (4, 2, 3)
, (5, 2, 4)
, (6, 3, 4)
SELECT
MIN(id) AS id
, a
, MIN(b) AS b
FROM
Example
GROUP BY
a
DROP TABLE Example