mysql query performance - sql

Can somebody give a hint on this one? :
I have a table, let's say tblA, where I have id1 and id2 as columns and index(id1,id2).
I want to select the id1´s where id2´s belong to several sets. So I would want to say
select id1 from tblA
where id2 in (val1,val2,val3 ...)
union
select id1 from tblA
where id2 in (val4,val2,val3 ...)
union
(...)*
Let's say we have in table A the following:
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(2,1)
(2,2)
(2,3)
Now I want all the id1s that have id2 in (3,4).
So what I want to get is id1 = 1.
2 shouldn't appear because although we have a relation (2,3) we don't have (2,4).
Any ideas how to perform this query? I guess the way above has a problem with performance if the (...) grows to much!? Thanks.
greets

You should create a temporary table like this:
CREATE TABLE temp (id INT NOT NULL PRIMARY KEY) ENGINE MEMORY;
, fill it with values you are searching for (2 and 3 in your example):
INSERT
INTO temp
VALUES (3), (4)
and issue this query:
SELECT ad.id1
FROM (
SELECT DISTINCT id1
FROM a
) ad
WHERE NOT EXISTS
(
SELECT NULL
FROM temp
WHERE NOT EXISTS
(
SELECT NULL
FROM a
WHERE a.id1 = ad.id1
AND a.id2 = temp.id
)
)
You should create a composite index on (id1, id2) for this to work.
For each id1, this will probe each id2 against temp at most once, and will return false as soon as the first id2 absent in temp is found for each id1.
Here's the plan for the query:
1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 2, 'Using where'
3, 'DEPENDENT SUBQUERY', 'temp', 'ALL', '', '', '', '', 2, 'Using where'
4, 'DEPENDENT SUBQUERY', 'a', 'eq_ref', 'PRIMARY', 'PRIMARY', '8', 'ad.id1,test.temp.id', 1, 'Using index'
2, 'DERIVED', 'a', 'range', '', 'PRIMARY', '4', '', 3, 'Using index for group-by'
, no temporary, no filesort.

The union is gonna kill your performance. Use something like this:
select id1 from tblA where id2 in (val1,val2,val3 ...) or id2 in (val4,val2,val3)

Can you combine all the sets into one large set?
If the order is not important, this would seem to be the fastest way.

First, remember that
select id1 from tblA where id2 in (val1, val2, val3) union
select id1 from tblA where id2 in (val4, val5, val6)
should give the same result as
select id1 from tblA where id2 in (val1, val2, val3, val4, val5, val6)
so you can perhaps improve efficiency by formulating a single query rather than using a union.
Secondly (and independent of the above) you should add an index on id2 to tblA. Without it the id2 values are randomly distributed through both the existing index and the table data, so the optimizer will have no option but to perform a linear scan - of the index, if you are lucky.

But all these queries give back both ids from column id1! I think Robert meant that as a result he just wants "1" from column id1:
id1 id2
1 | 1
1 | 2
1 | 3
1 | 4 --> id1s that have id2 with 3 and 4
1 | 5
2 | 1
2 | 2
2 | 3
Because id1=2 does not have 3 AND 4 it should not be a result.
Please correct me if I misunderstood...
I was trying to do a statement but I could not get just the id1=1 back, but I am as well very interested in an efficient solution to this!

You need to create a separate index on column 'id2' because combined index on (id1,id2) will not be used when looking up for id2 only.
This query does what you mentioned
SELECT id1 FROM tblA WHERE id2 IN (?,?,?,?)
GROUP BY id1 HAVING COUNT(id2)=4
NOTE: You need to adjust the COUNT(id2) condition in HAVING clause to the number of values mentioned in the IN clause. Here i used four '?' to represent four values that's why i have written COUNT(id2)=4.
For the scenario which you mentioned in the comment, query will look like following
SELECT id1 FROM tblA WHERE id2 IN (3,4)
GROUP BY id1 HAVING COUNT(id2)=2

Related

An association x-reference table

I've got a SQL table like below where one value is linked to a second value and vice versa.
ROW
ID1
ID2
1
1
2
2
2
1
3
3
4
4
4
3
....
This might be some bad design but this is what I'm stuck with. I need to produce a SQL query in SQL Server to return only the following (doesn't matter which order):
ROW
ID1
ID2
1
1
2
3
3
4
....
OR
ROW
ID1
ID2
2
2
1
4
4
3
....
I've got a list of ID's (1, 2, 3, 4) which I used to query the table against ID1 field or ID2 field, but it always returns all the rows because those IDs exist in both columns.
I've tried looking at eliminating one row by looking if the one field it exists in the other column, but then I get no results. Obviously.
The one solution that could work is by looking at the rownum field and only get the even or odd rows. But this feels hacky. Also there might be other values in that list that is not part of my IN list, so that could possibly miss some rows?
Anything eloquent to consider from a TSQL perspective
Here's one (quite cumbersome but pretty effective) way to do it.
First, Create and populate sample table (Please save us this step in future questions):
CREATE TABLE Table1 (
[ROW] int,
[ID1] int,
[ID2] int
);
INSERT INTO Table1 ([ROW], [ID1], [ID2]) VALUES
(1, 1, 2),
(2, 2, 1),
(3, 3, 4),
(4, 4, 3),
(5, 1, 4);
Note: The last raw is not a part of the sample data you've provided, but I assumed you would also like to include in the results records where only one row had the connection beteween Id1 and Id2.
Then, use a couple of common table expression to get the minimum row number of any pair of Id1 and Id2, regardless of the order of ids, and then query the original table joined to the second cte:
WITH CTE1 AS
(
SELECT Row,
IIF(Id1 < Id2, Id1, Id2) As Small,
IIF(Id1 < Id2, Id2, Id1) As Big
FROM Table1
), CTE2 AS
(
SELECT Min(Row) As MinRow
FROM CTE1
GROUP BY Small, Big
)
SELECT Row, Id1, Id2
FROM Table1
JOIN CTE2
ON Row = MinRow;
Results:
Row Id1 Id2
1 1 2
3 3 4
5 1 4
You can see a live demo on DB<>Fiddle

Combine SQL Select Lines into "Groups"

I have the following problem,
I have an audit system, which saves the ID of at most three companies for each Audit onde the database line, but it is possible to have up to 15 companies in an audit.
Criteria to determine if group audit
IF
A record has data in Audited_Company2 AND/OR Audited_Company3
THEN
Find additional records where:
Auditor_ID AND Audit_Type AND Audit_Date all match the record found above AND
IF
Records matching criteria are found:
1.Take Audited_Company1-3 from all other matching records and insert them into Audited_Company4-15
Do not show any audits that have had Audited_Company fields merged into another record in the view
However, I have no idea how to I merge 2 or more SQL Lines n a select result like they asked
You can use UNPIVOT to normalize your data and then PIVOT to denormalize it again. Below is a sample query that combines the two. It uses ROW_NUMBER() with appropriate partitioning and ordering conditions to assign sequence numbers to the data for use in the final pivot.
DECLARE #Unnormalized TABLE (Id INT, Value1 VARCHAR(100), Value2 VARCHAR(100), Value3 VARCHAR(100))
INSERT #Unnormalized
VALUES
(1, 'A', 'B', 'C'),
(1, 'Z', 'Y', NULL),
(1, 'X', NULL, NULL),
(2, 'Red', 'Green', NULL),
(2, 'Blue', NULL, NULL)
SELECT P.Id, Value1 = P.[1], Value2 = P.[2], P.[3], P.[4], P.[5], P.[6], P.[7], P.[8], P.[9]
FROM (
SELECT U.Id, U.Value, Sequence = ROW_NUMBER() OVER(PARTITION BY U.ID ORDER BY U.Value)
FROM (SELECT * FROM #Unnormalized) D
UNPIVOT (Value FOR Col IN (Value1, Value2, Value3)) U
) A
PIVOT (MAX(Value) FOR Sequence IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])) P
Results:
Id
Value1
Value2
3
4
5
6
7
8
9
1
A
B
C
X
Y
Z
NULL
NULL
NULL
2
Blue
Green
Red
NULL
NULL
NULL
NULL
NULL
NULL
You should be able to expand on the above to suit your specific needs.
For more information, take a look at the documentation at FROM - Using PIVOT and UNPIVOT.

I need all activities in SQL assign to one ID, but currently each activity has three diffrent id [duplicate]

Lets say that we have views table where each activity has three user ID.
ID1 - user cookie on the server side
ID2 - user cookie on the browser side
ID3 - logged in user
I need to assign activity to one id.
Example:
ID1 ID2 ID3
-----------
1 A
1 A
1 B I
2 B I
3 C I
During the third activity, user registered account and now I know that all activity concern one user. ID1 and ID2 are cookies ID which are uncertain. ID3 is only when user is registered and logged in. Users don't have to be registered and logged. Users can use the site without an account - but I need information about the whole users activities.
How can I count all views and assing to one ID?
In the above example we can see that 5 views were generated by one user.
Is it possible to JOIN another table having the same three ID?
I'm not sure I understood the question correctly but could it look like that :
SELECT ID3 AS "User ID", COUNT(*) AS "Number of views" FROM VIEWS GROUP BY ID3
I tried to play a bit with it but cannot achieve what you want (limited time)
Two solution I think you can work with:
window functions SELECT gen_random_uuid(), count(id1) over (partition by ...)
union
you can hash the columns together to form a distinct key for any 3 combinations
example in the link.
You can assign a sequential id using dense_rank():
select dense_rank() over (order by id1, id2, id3) as my_id
from t;
And you can join to another table, but be careful because of NULL values. The first column always appears to be filled, so:
select . . .
from t1 join
t2
on t1.id1 = t2.id1 and
t1.id2 is not distinct from t2.id2 and
t2.id3 is not distinct from t3.id3
The only caveat is that is not distinct from does not work well with indexes, so this might be slower than you expect on a larger table. If that is an issue, ask a new question. This has gotten pretty far from your original question.
EDIT:
After reflecting on this problem, you have a graph problem with three possible connectors. You should assign a unique id to each row in your original data. Then you can use a recursive CTE to solve this.
Here is how:
with recursive ids as (
select *
from (values (1, 1, 'A', NULL),
(2, 1, 'A', NULL),
(3, 1, 'B', 'I'),
(4, 2, 'B', 'I'),
(5, 3, 'C', 'I'),
(6, 5, NULL, NULL)
) v(id, id1, id2, id3)
),
pairs as (
select distinct a.id as ida, b.id as idb
from ids a join
ids b
on a.id1 = b.id1 or a.id2 = b.id2 or a.id3 = b.id3
),
cte as (
select ida as ida, idb as idb, array[ida] as ids, 1 as lev
from pairs
union all
select cte.ida, pairs.idb, cte.ids || pairs.ida, lev + 1
from cte join
pairs
on cte.idb = pairs.ida and
not cte.ids #> array[pairs.ida]
)
select distinct on (ida) cte.*
from cte
order by ida, idb ;
This adds a new column id which is the unique id for each row.
Here is a db<>fiddle.

I need assign all activities in SQL to one ID, but currently each activity has three id

Lets say that we have views table where each activity has three user ID.
ID1 - user cookie on the server side
ID2 - user cookie on the browser side
ID3 - logged in user
I need to assign activity to one id.
Example:
ID1 ID2 ID3
-----------
1 A
1 A
1 B I
2 B I
3 C I
During the third activity, user registered account and now I know that all activity concern one user. ID1 and ID2 are cookies ID which are uncertain. ID3 is only when user is registered and logged in. Users don't have to be registered and logged. Users can use the site without an account - but I need information about the whole users activities.
How can I count all views and assing to one ID?
In the above example we can see that 5 views were generated by one user.
Is it possible to JOIN another table having the same three ID?
I'm not sure I understood the question correctly but could it look like that :
SELECT ID3 AS "User ID", COUNT(*) AS "Number of views" FROM VIEWS GROUP BY ID3
I tried to play a bit with it but cannot achieve what you want (limited time)
Two solution I think you can work with:
window functions SELECT gen_random_uuid(), count(id1) over (partition by ...)
union
you can hash the columns together to form a distinct key for any 3 combinations
example in the link.
You can assign a sequential id using dense_rank():
select dense_rank() over (order by id1, id2, id3) as my_id
from t;
And you can join to another table, but be careful because of NULL values. The first column always appears to be filled, so:
select . . .
from t1 join
t2
on t1.id1 = t2.id1 and
t1.id2 is not distinct from t2.id2 and
t2.id3 is not distinct from t3.id3
The only caveat is that is not distinct from does not work well with indexes, so this might be slower than you expect on a larger table. If that is an issue, ask a new question. This has gotten pretty far from your original question.
EDIT:
After reflecting on this problem, you have a graph problem with three possible connectors. You should assign a unique id to each row in your original data. Then you can use a recursive CTE to solve this.
Here is how:
with recursive ids as (
select *
from (values (1, 1, 'A', NULL),
(2, 1, 'A', NULL),
(3, 1, 'B', 'I'),
(4, 2, 'B', 'I'),
(5, 3, 'C', 'I'),
(6, 5, NULL, NULL)
) v(id, id1, id2, id3)
),
pairs as (
select distinct a.id as ida, b.id as idb
from ids a join
ids b
on a.id1 = b.id1 or a.id2 = b.id2 or a.id3 = b.id3
),
cte as (
select ida as ida, idb as idb, array[ida] as ids, 1 as lev
from pairs
union all
select cte.ida, pairs.idb, cte.ids || pairs.ida, lev + 1
from cte join
pairs
on cte.idb = pairs.ida and
not cte.ids #> array[pairs.ida]
)
select distinct on (ida) cte.*
from cte
order by ida, idb ;
This adds a new column id which is the unique id for each row.
Here is a db<>fiddle.

Need to get rows where combination of two columns both exist and don't exit

Trying to figure out if it's possible to write a single, set based query to return what I want with data in one single table. The below is just an example, and I need something that could easily work if most (but not all) of combinations 1 to 9 (or 1 to 20 etc) exist.
Table AllCovered has two columns. ID1 and ID2. There are 16 rows in this table, each containing a combination of the numbers 1 to 4 (so 1,1 1,2 1,3 1,4 2,1 .... 4,3 4,4)
Table SomeGaps has the same structure but only has 12 rows, again each row is a combination of 1 to 4, but with some of the combinations missing.
SELECT ID1, ID2, COUNT(ID1) as THIS
FROM AllCovered
GROUP BY ID1, ID2
- this query returns 16 rows, each combination with 1 in the 3rd column (THIS)
SELECT ID1, ID2, COUNT(ID1) as THIS
FROM SomeGaps
GROUP BY ID1, ID2
- this returns the 12 rows. How can I create query that will return 16 rows, of each combination but with 0 in THIS for the combinations that are missing in somegaps?
ID1 ID2 THIS
1 1 1
1 2 0 (1,2 combination does NOT exist in SomeGaps)
1 3 1
1 4 1
2 1 1
2 2 0 (2,2 combination does NOT exist in SomeGaps)
Obviously I've tried using a crossjoin to get all combinations of ID1 and ID2 but the COUNT is, as expected, vastly inflated.
Hope this makes sense. Apologies if it's an easy solution, I can't seem to crack it!
You can do this by cross-joining all the distinct values for the two columns. Then use left outer join and aggregation to get the counts for all combinations:
select ac.id1, ac.id2, count(ac.id1) as cnt
from (select distinct id1 from AllCovered) ac1 cross join
(select distinct id2 from AllCovered) ac2 left join
AllCovered ac
on ac.id1 = ac1.id1 and ac.id2 = ac2.id2
group by ac.id1, ac.id2;
I'm probably missing something obvious, but I'll take a bite anyway:
create table #AllCovered (id1 int, id2 int);
insert #AllCovered values
(1,1),(1,2),(1,3),(1,4),(2,1),(2,2),(2,3),(2,4),(3,1),(3,2),(3,3),(3,4),(4,1),(4,2),(4,3),(4,4);
create table #gaps (id1 int, id2 int);
insert #gaps values(1,1),(1,2),(1,3),(1,4),(2,1),(2,4),(3,1),(3,2),(3,3),(4,1),(4,2),(4,4);
select #AllCovered.id1, #AllCovered.id2,
count(#gaps.id1) as this
from #AllCovered
left outer join #gaps
on #AllCovered.id1 = #gaps.id1 and #AllCovered.id2 = #gaps.id2
group by #AllCovered.id1, #AllCovered.id2;
drop table #AllCovered, #gaps
From your narrative, there are no duplicate combinations of (id1, id2) in neither table, and AllCovered contains all possible combinations -- otherwise will use distinct subqueries and fabricate AllCovered.