SQL multiple columns WHERE condition - sql

I have table
Column A
Column B
Column C
1
a
d
2
b
e
3
c
f
and very large .csv file (about 1M rows). It contains Columns A, B and C.
Column A
Column B
Column C
4
a
d
5
b
w
6
c
f
I need to extract rows from table where (B = 'a' and C = 'd') or (B = 'b' and C = 'w') or (B = 'c' and C = 'f')
Result will be:
Column A
Column B
Column C
1
a
d
3
c
f
I've tried query like in description, but it's too large for request (1M rows)

You are basically trying to join on the columns (b, c) manually. However, SQL knows how to do a join more efficiently.
Import the CSV file as a temporary table, analyze it, create a multi-column index on its (b, c), then do a join like this:
SELECT R.*
FROM realTable R
JOIN csvTable C
ON R.b = C.b AND R.c = C.c;

Related

How to merge the result from the same column

So the data i want to combine column C & D together because they have the same ID as column A
Column A
Column B
Column C
Column D
A
A
B
B
A
A
C
C
And i want to look like this
Column A
Column B
Column C
Column D
A
A
B C
B C
Try to use aggregation query like below (Test on MySQL 8.0):
select Column A, Column B, concat(Column C, Column D) from table group by Column A, Column B;
In MySQL 8.0, this query sql will not work as ERROR 1055 (42000): this is incompatible with sql_mode=only_full_group_by. So set sql_mode=‘’ for current session.

How to fill in missing rows in a table with default values in sqlite

I have a table with 3 columns (a, b, c) and I want to make sure that for each possible combination of values in the first two columns, there is a row containing that combination. For example if this is my table:
a b c
--- --- ---
P X 1
Q Y 2
Q Z 3
R Y 4
S Y 5
S Z 6
The unique values in column a are P, Q, R, S, and the unique values in column b are X, Y, Z. So I want to create a query that returns 12 rows (4×3) that fills in missing values in column c with a default value like 0, for example:
a b c
--- --- ---
P X 1
P Y 0
P Z 0
Q X 0
Q Y 2
Q Z 3
R X 0
R Y 4
R Z 0
S X 0
S Y 5
S Z 6
The way I'm currently doing it is this:
select a, b, ifnull(c, 0)
from (select distinct a from table),
(select distinct b from table)
left join table using (a, b)
Unfortunately, this query is very slow since the table contains like ten thousand rows. If I precompute the query and store it in a table, then accessing the results is faster, but it takes a lot of space, most of which is probably just filled with zeros in the c column. Is there any way to make this query faster?
For this query:
select a.a, b.b, coalesce(c.c, 0)
from (select distinct a from table) a cross join
(select distinct b from table) b left join
table c
using (a, b);
You want indexes on:
(a, b)
(b)
The first index can be used for the select distinct a and for the join. The second can be used for the select distinct b.

Suppress rows with reverse/swapped values

I would like to query a database table that contains rows that have reverse values than other rows. So the table looks like this
Src Trgt ValueA ValueB
A B 1,44 5
B A 1,44 5 <--
C D 1,23 8
D C 1,23 8 <--
F G 5,12 9
G F 5,12 9 <--
What I want is a query that returns all rows that do not again with the source and target value swapped. The rows that should not be queried are the ones that have the same Value A and B like another row, but only with source and target value swapped (The ones marked in above table)
So, the desired results would look like this:
Src Trgt ValueA ValueB
A B 1,44 5
C D 1,23 8
F G 5,12 9
I think this is what you want:
select t.*
from t
where t.src < t.trgt
union all
select t.*
from t
where t.src > t.trgt and
not exists (select 1
from t t2
where t2.src = t.trgt and t2.trgt = t.src and
t2.a = t.a and t2.b = t.b
);
It keeps the first row encountered, filtering out equivalent rows where the first two columns are switched.
EDIT:
Another approach if you just one one row per combo is:
select least(src, trgt) as src, greatest(src, trgt) as trgt, a, b
from t
group by least(src, trgt), greatest(src, trgt), a, b;
This runs the risk of returning a row not in the original data (if the row has no duplicate and trgt > src.
SELECT *
FROM ztable zt
WHERE zt.source < zt.target -- pick only one of the twins
OR NOT EXISTS( -- OR :if it is NOT part of a twin
SELECT *
FROM ztable nx
WHERE nx.source = zt.target
AND nx.target = zt.source
);
Assuming that rows with source=target are not present or not wanted.

SQL sum() with multiple level joins

Let's say I have the following table relationships (via keys existing on both ends)
table a -> table b
table b -> table c
table c -> table d
table d -> table e
table e -> table f
And I want to group by a key on table a and sum() values from table f as if both tables were directly joined.
Problem is that if I do that, information will be duplicated as all relationships from a -> b -> c -> d -> e -> f will repeat (as Andomar said: some information repeats because there are multiple routes from A to F)
Is there a way around that, or is my only choice to create a middle table containing the table a -> table f relationship?
Details:
Table a
id1 | id2
Table b
id2 | id3
Table c
id3 | value
select a.id1, sum(value) from a
inner join b on a.id2 = b.id2
inner join c on b.id3 = c.id3
group by a.id1
Data example:
Doing the join, the relationship is:
a b c value
1 2 2 20
1 3 2 20
1 4 2 20
If I do the sum(), I will get 60 but I want to get 20
Thanks
I'm assuming that some information repeats because there are multiple routes from A to F. If there is a unique key in F, you can un-duplicate the routes using a subquery:
SELECT SubQuery.AValue, sum(SubQuery.FValue)
FROM (
SELECT a.value as AValue, f.key, f.value as FValue
FROM a
INNER JOIN b ON b.key = a.key
INNER JOIN c ON c.key = b.key
INNER JOIN d ON d.key = c.key
INNER JOIN e ON e.key = d.key
INNER JOIN f ON f.key = e.key
GROUP BY a.value, f.key, f.value
) as SubQuery
GROUP BY SubQuery.AValue
The subquery ensures each row in F is only counted once.

Multiple NOT distinct

I've got an MS access database and I would need to create an SQL query that allows me to select all the not distinct entries in one column while still keeping all the values.
In this case more than ever an example is worth thousands of words:
Table:
A B C
1 x q
2 y w
3 y e
4 z r
5 z t
6 z y
SQL magic
Result:
B C
y w
y e
z r
z t
z y
Basically it removes all unique values of column B but keeps the multiple rows of the
data kept. I can "group by b" and then "count>1" to get the not distinct but the result will only list one row of B not the 2 or more that I need.
Any help?
Thanks.
Select B, C
From Table
Where B In
(Select B From Table
Group By B
Having Count(*) > 1)
Another way of returning the results you want would be this:
select *
from
my_table
where
B in
(select B from my_table group by B having count(*) > 1)
select
*
from
my_table t1,
my_table t2
where
t1.B = t2.B
and
t1.C != t2.C
-- apparently you need to use <> instead of != in Access
-- Thanks, Dave!
Something like that?
join the unique values of B you determined with group by b and count > 1 back to the original table to retrieve the C values from the table.