Compare two tables in postgres - sql

I have two tables with identical definition:
create tabA (
user_id int,
contact boolean,
promote boolean
);
create tabB (
user_id int,
contact boolean,
promote boolean
);
I want to compare two columns contact and promote and see if there is any discrepancy in row data. For example:
row from tabA: 1,T,T
row from tabB: 1,T,F
So there is discrepancy now I want to catch that and select only those rows where they are not equal.

SELECT * FROM tabA, tabB
WHERE tabA.user_id = tabA.user_id
AND (
tabA.contact != tabB.contact
OR
tabA.promote != tabB.promote
);

As long as there can be NULL values, you need to use null-safe operators:
SELECT user_id, a.contact AS a_contact, a.promote AS a_promote
, b.contact AS b_contact, b.promote AS b_promote
FROM tabA a
JOIN tabB b USING (user_id)
WHERE a.contact IS DISTINCT FROM b.contact OR
a.promote IS DISTINCT FROM b.promote;

Another option is to use Postgres' record comparison capability:
select *
from taba a
full join tabb b using (user_id)
where a is distinct from b;

To find differences in table content you should compare 3 column's values of next query:
select (
select count(*) from (
select * from a
union
select * from b
) m
) merged,
(select count(*) from a) in_a,
(select count(*) from b) in_b;
If value in merged column is not equal to value in columns in_a and in_b then the content of table a and b has at least one difference.

Related

How do I join two tables together (one to many relationship), but only select the 3rd match from the second table?

I have two tables, table A and table B. There are multiple entries in table B for each entry in table A when joining them together, but I only want to match the 3rd value from table B, which is neither the maximum nor the minimum of the values. The values can be ordered, and it will always be the 3rd value after ordering. Is there a way to do this? Thank you!
WITH
ranked_b AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY key ORDER BY val) AS key_rank
FROM
table_b
)
SELECT
*
FROM
table_a
INNER JOIN
ranked_b
ON ranked_b.key = table_a.key
AND ranked_b.key_rank = 3
Consider below approach
select key,
array_agg(value order by value limit 3)[safe_ordinal(3)] as value
from tableA
left join tableB
on key = foreignkey
group by key
You can use a correlated subquery:
select a.*,
(select b.value
from b
where b.key = a.key
limit 1 offset 2
)
from a;

Postgres, return default row if id doesn't exist in table

I'm selecting rows from a table where the id is in an array. This works, but I also want to return default values for rows that do not exist in the table.
I currently have
SELECT
id,
column1
FROM
table_name
WHERE
id = ANY(ids_array_variable)
but, if some of the ids in the array do not exist in the table, then my result is short a few rows. I need it to still return a default record of {id, default_value} so that the result always has the same number of entries as the ids_array_variable
Use a left join:
select a.id,
coalesce(t.column1, 'some default') as column1
from unnest(ids_array_variable) as a(id)
left join table_name t on t.id = a.id;
if table serviceconfig have n’t key - ‘my_key’ - use default value
thank #a-horse-with-no-name
SELECT COALESCE(KV.value, def.val) as value
FROM (
select def.val from unnest(ARRAY ['default']) as def(val)) as def
LEFT JOIN
(SELECT *
FROM serviceconfig
WHERE key = 'my_key') KV on true;

Compare the results of a ROW COUNT

I have 2 databases in the same server and I need to compare the registers on each one, since one of the databases is not importing all the information
I was trying to do a ROW count but it's not working
Currently I am doing packages of 100,000 rows approximate, and lookup at them in Excel.
Let's say I want a query that does a count for each ID in TABLE A and then compares the count result VS TABLE B count for each ID, since they are the same ID the count should be the same, and I want that brings me the ID on which there where any mismatch between counts.
--this table will contain the count of occurences of each ID in tableA
declare #TableA_Results table(
ID bigint,
Total bigint
)
insert into #TableA_Results
select ID,count(*) from database1.TableA
group by ID
--this table will contain the count of occurences of each ID in tableB
declare #TableB_Results table(
ID bigint,
Total bigint
);
insert into #TableB_Results
select ID,count(*) from database2.TableB
group by ID
--this table will contain the IDs that doesn't have the same amount in both tables
declare #Discordances table(
ID bigint,
TotalA bigint,
TotalB bigint
)
insert into #Discordances
select TA.ID,TA.Total,TB.Total
from #TableA_Results TA
inner join #TableB_Results TB on TA.ID=TB.ID and TA.Total!=TB.Total
--the final output
select * from #Discordances
The question is vague, but maybe this SQL Code might help nudge you in the right direction.
It grabs the IDs and Counts of each ID from database one, the IDs and counts of IDs from database two, and compares them, listing out all the rows where the counts are DIFFERENT.
WITH DB1Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseOne.dbo.TableOne
GROUP BY ID
), DB2Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseTwo.dbo.TableTwo
GROUP BY ID
)
SELECT a.ID, a.CountOfIDs AS DBOneCount, b.CountOfIDs AS DBTwoCount
FROM DB1Counts a
INNER JOIN DB2Counts b ON a.ID = b.ID
WHERE a.CountOfIDs <> b.CountOfIDs
This SQL selects from the specific IDs using the "Database.Schema.Table" notation. So replace "DatabaseOne" and "DatabaseTwo" with the names of your two databases. And of course replace TableOne and TableTwo with the names of your tables (I'm assuming they're the same). This sets up two selects, one for each database, that groups by ID to get the count of each ID. It then joins these two selects on ID, and returns all rows where the counts are different.
You could full outer join two aggregate queries and pull out ids that are either missing in one table, or for which the record count is different:
select coalesce(ta.id, tb.id), ta.cnt, tb.cnt
from
(select id, count(*) cnt from tableA) ta
full outer join (select id, count(*) cnt from tableB) tb
on ta.id = tb.id
where
coalesce(ta.cnt, -1) <> coalesce(tb.cnt, -1)
You seem to want aggregation and a full join:
select coalesce(a.id, b.id) as id, a.cnt, b.cnt
from (select id, count(*) as cnt
from a
group by id
) a full join
(select id, count(*) as cnt
from b
group by id
) b
on a.id = b.id
where coalesce(a.cnt, 0) <> coalesce(b.cnt, 0);

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;

Logical AND between table elements in T-SQL

I have n tables all with the same fields: Username and Value. The same Username can have multiple registers on each table but the combination Username/Value is unique on each one.
I want to join the tables into a single one which contains all the users who appear on all the tables with all the different (Username/Value) pairs.
Example
Table A: {(User1,Value1);(User1,Value2);(User2,Value2);(User3,Value4)]
Table B: {(User1,Value4);(User3,Value5)]
Table C: {(User1,Value5);(User1,Value2);(User2,Value7);(User3,Value8)]
Desired output
Table D: {(User1,Value1);(User1,Value2);(User1,Value4);(User1,Value5);(User3,Value4);(User3,Value5);(User3,Value8)}
Now I'm doing multiple joins (using perl) like this
SELECT *
INTO $target_table
FROM (SELECT *
FROM $table1
WHERE bname IN (SELECT DISTINCT bname FROM $table2)
UNION
SELECT *
FROM $table2
WHERE bname IN (SELECT DISTINCT bname FROM $table1)
) UN
and then doing the same join between a third table and target_table and so on, but I think it should be a better way.
Any hints?
You can use UNION for this:
SELECT username, value
FROM $table1
UNION
SELECT username, value
FROM $table2
...
SELECT username, value
FROM $tablex
SQL Fiddle Demo
This will return you distinct records. If you are interested in duplicates, use UNION ALL.
Given your edits, it appears you only want to return records if the user is in all the tables.
Breaking that down, you need to do a few things. First, combine all your records together again, but this time denote which table each are coming from. Then you need to know the count of tables each user is in. Finally you need to check that number against the overall number of tables.
Here's one way using a few CTEs:
WITH CTE AS (
SELECT username, value, 1 AS tbl
FROM t1
UNION
SELECT username, value, 2 AS tbl
FROM t2
UNION
SELECT username, value, 3 AS tbl
FROM t3
),
CTECnt AS (
SELECT username, COUNT(DISTINCT tbl) tblCnt
FROM CTE
GROUP BY username
),
CTEMaxCnt AS (
SELECT COUNT(DISTINCT tbl) MaxCnt
FROM CTE
)
SELECT C.username, C.value
FROM CTE C
JOIN CTECnt C2 ON C.username = C2.username
JOIN CTEMaxCnt C3 ON C2.tblCnt = C3.MaxCnt
Another SQL Fiddle Demo
With Combined As
(
Select 'A' As TableName, Username, Value
From TableA
Union All
Select 'B', Username, Value
From TableB
Union All
Select 'C', Username, Value
From TableC
)
Select C.Username, C.Value
From Combined As C
Join (
Select C1.Username
From Combined As C1
Group By C1.Username
Having Count(Distinct C1.TableName) = 3
) As Z
On Z.Username = C.Username
Group By C.Username, C.Value
SQL Fiddle version