I have append Queries A, B to Table C. I am creating a report based off C, but I need the distinct rows. Now I could select the distinct rows from C, but I want to delete them as I go (ie so Table C does not contain 1,000,000,000+ records over time for each append), so the report has ALL the unique records from C, past, present, future until the end user deletes them.
My question is simply this. Is there any way to append only distinct (not append distinct, rather append to the table distinct) rows to Table C?
If not directly possible, VBA?
Use constraints to enforce this behavior, in this case a (composite) primary key: https://support.office.com/en-us/article/Add-or-change-a-table-s-primary-key-in-Access-07b4a84b-0063-4d56-8b00-65f2975e4379
This will make sure, that you can't insert duplicate values into your table, which means that you won't have to delete the duplicates later on.
Define the primary key over all the columns that make a dataset unique.
Before adding a pk or constraint make sure to clean up your data though in order to remove all duplicate rows. The easiest way would probably to create a new table and fill it by using a SELECT DISTINCT .... from your current table.
Consider using any of the three NOT IN, NOT EXISTS, or LEFT JOIN...NULL queries to append data not currently in that table. Below assumes a primary key, ID, is used for the distinctness.
INSERT INTO TableC (Column1, Column2, Column3)
SELECT a.Column1, a.Column2, a.Column3
FROM QueryA a
LEFT JOIN TableC c
ON a.ID = c.ID
WHERE c.ID IS NULL;
INSERT INTO TableC (Column1, Column2, Column3)
SELECT a.Column1, a.Column2, a.Column3
FROM QueryA a
WHERE NOT EXISTS
(SELECT 1 FROM TableC c
WHERE a.ID = c.ID);
INSERT INTO TableC (Column1, Column2, Column3)
SELECT a.Column1, a.Column2, a.Column3
FROM QueryA a
WHERE a.ID NOT IN
(SELECT c.ID FROM TableC c);
Now, if no single column but multiple fields denote uniqueness, add to the JOIN or WHERE clauses:
...
FROM QueryA a
LEFT JOIN TableC c
ON a.Column1 = c.Column1 AND a.Column2 = c.Column2 AND a.Column3 = c.Column3
WHERE a.Column1 IS NULL OR a.Column2 IS NULL OR a.Column3 IS NULL;
...
WHERE NOT EXISTS
(SELECT 1 FROM TableC c
WHERE a.Column1 = c.Column1 AND a.Column2 = c.Column2 AND a.Column3 = c.Column3);
...
WHERE a.Column1 NOT IN
(SELECT c.Column1 FROM TableC c)
AND a.Column2 NOT IN
(SELECT c.Column2 FROM TableC c)
AND a.Column3 NOT IN
(SELECT c.Column3 FROM TableC c);
Related
So I have some tables with millions of rows of data, and the current query I have is like the following:
WITH first_table AS
(
SELECT
A.column1, A.column2, B.column1 AS column3, C.column1 AS column4
FROM
tableA AS A
LEFT JOIN
tableB AS B ON A.id = B.id
LEFT JOIN
tableC AS C ON A.id = C.id
UNION ALL
SELECT
D.column1, D.column2, NULL AS column3, D.column4
FROM
tableD AS D
UNION ALL
...
)
SELECT
column1, column2, column3, column4, A.col5, A.col6... until A.col20
FROM
first_table
LEFT JOIN
tableA AS A ON first_table.id = A.id
I'm basically appending two tables at least to table A and then joining again table A in the final SELECT statement. I do this because I need like 30 columns from table A and I don't want to fill with NULL values the append statement since I only need 4 or 5 columns from the tables appended to the main one (tableA).
I was wondering if it would be better to avoid the join and then fill all the columns I need since the WITH statement or should I keep my code as it is. All of this is for query performance and improve execution time.
Thanks.
In this query, each row of table a could have hundreds of rows of table b associated with it. So the array_agg contains all of those values. I'd like to be able to set a limit for it, but instide array_agg I can do order by but there's no way to set a limit.
select a.column1, array_agg(b.column2)
from a left join b using (id)
where a.column3 = 'value'
group by a.column1
I could use the "slice" syntax on the array but that's quite expensive since it first has to retrieve all the rows then discard the other ones. What's the proper efficient way to do this?
I would use a lateral join.
select a.column1, array_agg(b.column2)
from a left join lateral
(select id, column2 from b where b.id=a.id order by something limit 10) b using (id)
where a.column3 = 'value'
group by a.column1
Since the "id" restriction is already inside the lateral query, you could make the join condition on true rather than using (id). I don't know which is less confusing.
I think you need to count first and then aggregate:
select a.column1, array_agg(b.column2)
from (select a.column1, b.column2,
row_number() over (partition by a.column1 order by a.column1) as seqnum
from a left join
b
using (id)
where a.column3 = 'value'
) a
where seqnum <= 10
group by a.column1
I couldn't figure out how to remove mirror results like this:
select
b.column1 as result1,
c.column2 as result2
from table a
left join table b on a.column1 = b.column1
left join table c on a.column2 = c.column1;
The results I get are the following:
result1|results2
b1 |b22
b5 |b66
b74 |b31
......
b22 |b1
b66 |b5
b31 |b74
How could I get only the first combination - if there is a combination b1-b22, I don't need b22-b1.
I've tried which distinct on the sum of b.column1 and c.column1 - I casted them to int and it works, but I don't believe it's the best way since there could be duplicated sums coming from different combinations and I'll lose some data.
This is a bit tricky if you don't have all combinations. Your sample results do not have null, so I will change the joins to inner joins and then use distinct on:
select distinct on (least(b.column1, c.column2), greatest(b.column1, c.column2))
b.column1 as result1, c.column2 as result2
from table a join
table b
on a.column1 = b.column1 join
table c
on a.column2 = c.column1
order by least(b.column1, c.column2), greatest(b.column1, c.column2);
Actually as your query is phrased, the joins don't seem needed at all. So you might consider:
select distinct on (least(a.column1, a.column2), greatest(a.column1, a.column2))
a.column1 as result1, a.column2 as result2
from table a
order by least(a.column1, a.column2), greatest(a.column1, a.column2);
One method is to add an inequality condition in the last join:
select
b.column1 as result1,
c.column2 as result2
from table a
left join table b on a.column1 = b.column1
left join table c on a.column2 = c.column1 and b.column1 < c.column2
"Remove mirror results" that sounds like you are for delete. This seems best addressed with a Exists clause:
delete from rtab r1
where r1.result1 > r1.result2
and exists (select null
from rtab r2
where r1.result1 = r2.result2
and r1.result2 = r2.result1
);
I have Query in Access that I'm building in SQL Server.
Access:
DELETE DISTINCT * from [TableA] INNER JOIN TableB
ON [TableA].[Column1]=[TableB].[column1]
AND [TableA].[Column2]=[TableB].[column2]
I know I could use
Delete from tableA where ID in (
Select * from from [TableA] INNER JOIN TableB
ON [TableA].[Column1]=[TableB].[column1]
AND [TableA].[Column2]=[TableB].[column2])
But I get an error saying "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS"
My Goal is to delete the Distinct records from the Access query mentioned at the top.
You want to delete the rows in TableA that are in TableB, according to the column matches. How about doing this:
delete from tableA
where exists (select 1
from tableB
where tableA.Column1 = tableB.Column1 and tableA.column2 = tableB.column2
);
This seems to be the intent of what you are trying to do.
In the sub-query u have to select the ID column from the respective table that is the only column u need
DELETE a
FROM tableA a
JOIN (SELECT DISTINCT Column1 ,column2
FROM tableA
WHERE EXISTS (SELECT 1
FROM tableB
WHERE tableA.Column1 = tableB.Column1
AND tableA.column2 = tableB.column2)) b
ON A.Column1 = B.Column1
AND A.column2 = B.column2
I have two tables A(column1,column2) and B(column1,column2).
How to ensure that a value from A(column1) is not contained within B(column1) and insert it in this column.
My Query will be like this :
insert into B.column1 values()
where
...
I want to complete B.column1 with data from A.column1
What should i put in the where clause ?
Insert Into B(column1)
Select A.Column1
From A
Where A.Column1 not in (Select Column1 From B)
I would use MINUS command and select all rows from A(column1) which are not in B(column1) and then SELECT INTO result into B table.
insert into B
select a.column1, a.column2 from a
left join b
on a.column1 = b.column1
where b.column1 is null