How can I select unique rows in a database over two columns? - sql

I have found similar solutions online but none that I've been able to apply to my specific problem.
I'm trying to "unique-ify" data from one table to another. In my original table, data looks like the following:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
1 3 EF
1 3 GH
The user IDs are composed of two parts, USERIDP1 and USERIDP2 concatenated. I want to transfer all the rows that correspond to a user who has QUALIFIER=TRUE in ANY row they own, but ignore users who do not have a TRUE QUALIFIER in any of their rows.
To clarify, all of User 12's rows would be transferred, but not User 13's. The output would then look like:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
So basically, I need to find rows with distinct user ID components (involving two unique fields) that also possess a row with QUALIFIER=TRUE and copy all and only all of those users' rows.

Although this nested query will be very slow for large tables, this could do it.
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
WHERE EXISTS (SELECT 1 FROM YOUR_TABLE_NAME AS Y WHERE Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE)
It could be written as an inner join with itself too:
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
INNER JOIN YOUR_TABLE_NAME AS Y ON Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE
For a large table, create a new auxiliary table containing only USERIDP1 and USERIDP2 columns for rows that have QUALIFIER = TRUE and then join this table with your original table using inner join similar to the second option above. Remember to create appropriate indexes.

This should do the trick - if the id fields are stored as integers then you will need to convert / cast into Varchars
SELECT 1 as id1,2 as id2,'TRUE' as qualifier,'AB' as data into #sampled
UNION ALL SELECT 1,2,NULL,'CD'
UNION ALL SELECT 1,3,NULL,'EF'
UNION ALL SELECT 1,3,NULL,'GH'
;WITH data as
(
SELECT
id1
,id2
,qualifier
,data
,SUM(CASE WHEN qualifier = 'TRUE' THEN 1 ELSE 0 END)
OVER (PARTITION BY id1 + '' + id2) as num_qualifier
from #sampled
)
SELECT
id1
,id2
,qualifier
,data
from data
where num_qualifier > 0

Select *
from yourTable
INNER JOIN (Select UserIDP1, UserIDP2 FROM yourTable WHERE Qualifier=TRUE) B
ON yourTable.UserIDP1 = B.UserIDP1 and YourTable.UserIDP2 = B.UserIDP2

How about a subquery as a where clause?
SELECT *
FROM theTable t1
WHERE CAST(t1.useridp1 AS VARCHAR) + CAST(t1.useridp2 AS VARCHAR) IN
(SELECT CAST(t2.useridp1 AS VARCHAR) + CAST(t.useridp2 AS VARCHAR)
FROM theTable t2
WHERE t2.qualified
);

This is a solution in mysql, but I believe it should transfer to sql server pretty easily. Use a subquery to pick out groups of (id1, id2) combinations with at least one True 'qualifier' row; then join that to the original table on (id1, id2).
mysql> SELECT u1.*
FROM users u1
JOIN (SELECT id1,id2
FROM users
WHERE qualifier
GROUP BY id1, id2) u2
USING(id1, id2);
+------+------+-----------+------+
| id1 | id2 | qualifier | data |
+------+------+-----------+------+
| 1 | 2 | 1 | aa |
| 1 | 2 | 0 | bb |
+------+------+-----------+------+
2 rows in set (0.00 sec)

Related

SQL Select from 1 table rows with 2 specific column value that are not equal

I have a table
id number name update_date
1 123 asd 08.05.18
2 412 ddd 08.05.18
3 123 dsa 14.05.18
4 125 dsa 05.05.18
Whole table consist from that rows like that. I need to select row 1 and 3 because I need different update_dates but same number. How to do that? I need to see the changes from specific Number between 2 update dates 08.05.18 and 14.05.18. I have more update dates in my table.
I tried:
SELECT *
FROM legal_entity_history a
JOIN legal_entity_history b ON a.BIN = b.BIN
WHERE ( a.update_date <> b.update_date AND
a.update_date = "08.05.18" AND
b.update_date = "14.05.18" )
A relatively simple method is:
select leh.*
from legal_entity_history leh
where exists (select 1
from legal_entity_history leh2
where leh2.number = leh.number and leh2.update_date <> leh.update_date
);
For performance, you want an index on legal_entity_history(number, update_date).
TRY THIS: Assuming that same number may not appear more than once under same update_date, so, you can achieve that using GROUP BY with HAVING as below
SELECT t.*
FROM test t
INNER JOIN (SELECT number
FROM test
GROUP BY number
HAVING COUNT(DISTINCT update_date) > 1) t1 ON t1.number = t.number
OUTPUT:
id number name update_date
1 123 asd 08.05.18
3 123 dsa 14.05.18

SQL select entries from table where atribute equals parameter else select * entries

It is possible in SQL (ORACLE) to select all entry from a table where an atribute equals an parameter and if not select all the others entries?
like in this example:
COD | Name
1 | Monday
2 | Thursday
3 | Saturday
parameter=3
when cod equals parameter(cod=3) return entry of cod parameter(cod=3) (including cod and name)
else
return all others entries different from parameter(cod=3) (including cod and name) (like 1 Monday and 2 Thursday)
Is it possible with SQL (oracle), or i need something like PLSQL?
I'd use a correlated query and a non-correlated query:
SELECT COD, NAME
FROM TABLE a
WHERE EXISTS (SELECT 1 FROM TABLE b WHERE b.COD = a.COD AND b.COD = 3)
OR NOT EXISTS (SELECT 1 FROM TABLE c WHERE c.COD = 3)
I'm not sure if I'm following your logic, entirely, however.
And, actually, in cases where it's all from one table it can be simplified to just:
SELECT COD, NAME
FROM TABLE a
WHERE a.COD = 3
OR NOT EXISTS (SELECT 1 FROM TABLE c WHERE c.COD = 3)
IF EXISTS(SELECT 1 FROM TABLE WHERE COD=3)
THEN
SELECT COD, NAME FROM TABLE WHERE COD=3
ELSE
SELECT COD, NAME FROM TABLE
END IF

SQL - Computing overlap between Interests

I have a schema (millions of records with proper indexes in place) that looks like this:
groups | interests
------ | ---------
user_id | user_id
group_id | interest_id
A user can like 0..many interests and belong to 0..many groups.
Problem: Given a group ID, I want to get all the interests for all the users that do not belong to that group, and, that share at least one interest with anyone that belongs to the same provided group.
Since the above might be confusing, here's a straightforward example (SQLFiddle):
| 1 | 2 | 3 | 4 | 5 | (User IDs)
|-------------------|
| A | | A | | |
| B | B | B | | B |
| | C | | | |
| | | D | D | |
In the above example users are labeled with numbers while interests have characters.
If we assume that users 1 and 2 belong to group -1, then users 3 and 5 would be interesting:
user_id interest_id
------- -----------
3 A
3 B
3 D
5 B
I already wrote a dumb and very inefficient query that correctly returns the above:
SELECT * FROM "interests" WHERE "user_id" IN (
SELECT "user_id" FROM "interests" WHERE "interest_id" IN (
SELECT "interest_id" FROM "interests" WHERE "user_id" IN (
SELECT "user_id" FROM "groups" WHERE "group_id" = -1
)
) AND "user_id" NOT IN (
SELECT "user_id" FROM "groups" WHERE "group_id" = -1
)
);
But all my attempts to translate that into a proper joined query revealed themselves fruitless: either the query returns way more rows than it should or it just takes 10x as long as the sub-query, like:
SELECT "iii"."user_id" FROM "interests" AS "iii"
WHERE EXISTS
(
SELECT "ii"."user_id", "ii"."interest_id" FROM "groups" AS "gg"
INNER JOIN "interests" AS "ii" ON "gg"."user_id" = "ii"."user_id"
WHERE EXISTS
(
SELECT "i"."interest_id" FROM "groups" AS "g"
INNER JOIN "interests" AS "i" ON "g"."user_id" = "i"."user_id"
WHERE "group_id" = -1 AND "i"."interest_id" = "ii"."interest_id"
) AND "group_id" != -1 AND "ii"."user_id" = "iii"."user_id"
);
I've been struggling trying to optimize this query for the past two nights...
Any help or insight that gets me in the right direction would be greatly appreciated. :)
PS: Ideally, one query that returns an aggregated count of common interests would be even nicer:
user_id totalInterests commonInterests
------- -------------- ---------------
3 3 1/2 (either is fine, but 2 is better)
5 1 1
However, I'm not sure how much slower it would be compared to doing it in code.
Using the following to set up test tables
--drop table Interests ----------------------------
CREATE TABLE Interests
(
InterestId char(1) not null
,UserId int not null
)
INSERT Interests values
('A',1)
,('A',3)
,('B',1)
,('B',2)
,('B',3)
,('B',5)
,('C',2)
,('D',3)
,('D',4)
-- drop table Groups ---------------------
CREATE TABLE Groups
(
GroupId int not null
,UserId int not null
)
INSERT Groups values
(-1, 1)
,(-1, 2)
SELECT * from Groups
SELECT * from Groups
The following query would appear to do what you want:
DECLARE #GroupId int
SET #GroupId = -1
;WITH cteGroupInterests (InterestId)
as (-- List of the interests referenced by the target group
select distinct InterestId
from Groups gr
inner join Interests nt
on nt.UserId = gr.UserId
where gr.GroupId = #GroupId)
-- Aggregate interests for each user
SELECT
UserId
,count(OwnInterstId) OwnInterests
,count(SharedInterestId) SharedInterests
from (-- Subquery lists all interests for each user
select
nt.UserId
,nt.InterestId OwnInterstId
,cte.InterestId SharedInterestId
from Interests nt
left outer join cteGroupInterests cte
on cte.InterestId = nt.InterestId
where not exists (-- Correlated subquery: is "this" user in the target group?)
select 1
from Groups gr
where gr.GroupId = #GroupId
and gr.UserId = nt.UserId)) xx
group by UserId
having count(SharedInterestId) > 0
It appears to work, but I'd want to do more elaborate tests, and I've no idea how well it'd work against millions of rows. Key points are:
cte creates a temp table referenced by the later query; building an actual temp table might be a performance boost
Correlated subqueries can be tricky, but indexes and not exists should make this pretty quick
I was lazy and left out all the underscores, sorry
This is a bit confounding. I think the best approach is exists and not exists:
select i.*
from interest i
where not exists (select 1
from groups g
where i.user_id = g.user_id and
g.group_id = $group_id
) and
exists (select 1
from groups g join
interest i2
on g.user_id = i2.user_id
where g.user_id <> i.user_user_id and
i.interest_id = i2.interest_id
);
The first subquery is saying that the user is not in the group. The second is saying that the interest is shared with someone who is in the group.

Assign a random order to each group

I want to expand each row in TableA into 4 rows. The result hold all the columns from TableA and two additional columns: SetID = ranging from 0 to 3 and unique when grouped by TableA. Random = a random permutation of SetID within the same grouping.
I use SQLite and would prefer a pure SQL solution.
Table A:
Description
-----------
A
B
Desired output:
Description | SetID | Random
------------|-------|-------
A | 0 | 2
A | 1 | 0
A | 2 | 3
A | 3 | 1
B | 0 | 3
B | 1 | 2
B | 2 | 0
B | 3 | 1
My attempt so far solves creating 4 rows for each row in TableA but doesn't get the permutation correctly. wrong will contain a random number ranging from 0 to 3. I need exactly one 0, 1, 2 and 3 for each unique value in Description and their order should be random.
SELECT
Description,
SetID,
abs(random()) % 4 AS wrong
FROM
TableA
LEFT JOIN
TableB
ON
1 = 1
Table B:
SetID
-----
0
1
2
3
Use a cross join
SELECT Description,
SetID,
abs(random()) % 4 AS wrong
FROM TableA
CROSS JOIN TableB
Consider a solution in your specialty, R. As you know, R maintains excellent database packages, one of which is RSQLite. Additionally, R can run commands via the connection without the need to import very large datasets.
Your solution is essentially a random sampling without replacement. Simply have R run the sampling and concatenate list items into an SQL string.
Below creates a table in the SQLite database where R sends the CREATE TABLE command to the SQL engine. No import or export of data. Should you need to run every four rows, run an iterative loop in a defined function that outputs the sql string. For append queries change the CREATE TABLE AS to INSERT INTO ... SELECT statement.
library(RSQLite)
sqlite <- dbDriver("SQLite")
conn <- dbConnect(sqlite,"C:\\Path\\To\\Database\\File\\newexample.db")
# SAMPLE WITHOUT REPLACEMENT
randomnums <- as.list(sample(0:3, 4, replace=F))
# SQL CONCATENATION
sql <- sprintf("CREATE TABLE PermutationsTable AS
SELECT a.Description, b.SetID,
(select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=0
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=1
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=2
union select %d from TableB WHERE TableB.SetID = b.SetID AND TableB.SetID=3)
As RandomNumber
from TableA a, TableB b;",
randomnums[[1]], randomnums[[2]],
randomnums[[3]], randomnums[[4]])
# RUN QUERY
dbSendQuery(conn, sql)
dbDisconnect(conn)
You will notice a nested union subquery. This is used to achieve the inline random numbers for each row. Also, to return all possible combinations from all tables, no join statements are needed, simply list tables in FROM clause.

merge adjacent repeated rows into one

I want to merge adjacent repeated rows into one ,
for example , I have a table demo with two columns ,
data | order
-------------
A | 1
A | 2
B | 3
B | 4
A | 5
I want the result to be :
A
B
A
How to achieve this by one select SQL query in oracle ?
please, try something like this
select *
from table t1
where not exists(select * from table t2 where t2.order = t1.order - 1 and t1.data = t2.data)
The answer suggested by Dmitry above is working in SQL, to make it work in oracle you need to do some modifications.
order is a reserved keyword you need to escape it as follows.
select
*
from
Table1 t1
where not exists(
select * from Table1 t2
where
t2."order" = t1."order" - 1
and
t1."data" = t2."data"
) order by "order"
Working Fiddle at http://sqlfiddle.com/#!4/cc816/3
You can group by a column
Take a look at http://docs.oracle.com/javadb/10.6.1.0/ref/rrefsqlj32654.html
Example from official oracle site:
SELECT AVG (flying_time), orig_airport
FROM Flights
GROUP BY orig_airport