How to find doubles in master-child table? - sql

I need help with a query to find doubles. Let met explain the situation by example:
tableA (the master table) has a key field keyA with these values :
keyA
1
2
3
etc
tableB (the client table) has a foreign key field keyA and a value field, fieldB
keyA fieldB
1 a
1 b
2 a
2 b
3 a
3 c
4 a
4 b
4 c
etc
So, the values for fieldB in child table tableB are:
for tableA.keyA = 1 are: a and b
for tableA.keyA = 2 are: a and b
for tableA.keyA = 3 are: a and c
for tableA.keyA = 4 are: a, b and c
Now, given a value for keyA I need to find all records in tableA that have matching records in tableB for the field fieldB.
For example, if I search with keyA = 1 then
tableA.keyA = 2 is OK because both have same tableB.fieldB (a and b versus a and b)
tableA.keyA = 3 is not OK because both have not same tableB.fieldB (a and b versus a and c)
tableA.keyA = 4 is not OK because both have not same tableB.fieldB (a and b versus a, b and c)
I need a query that can give me this result. I hope someone can help me with this or can point me into the right direction.

Try this simple query , hope this will solve your problem
DECLARE #vkey int = 1
;WITH cte_test AS (
SELECT keyA,(SELECT ','+fieldb FROM tableB t1 WHERE t1.keyA = t.keyA FOR XML path('')) AS rslt
from tableB t
GROUP BY t.keyA)
SELECT t2.*
FROM cte_test t1
INNER JOIN cte_test t2 ON t1.[rslt] = t2.[rslt] AND t2.[keyA] <> t1.[keyA]
WHERE t1.[keyA] = #vkey
If there is no other item have the same combination , then there is no records in the result, otherwise it will return the matched items.

Assuming there are no duplicates, you can do this with a self-join and aggregation:
select c.keyA, c2.keyA
from (select c.*, count(*) over (partition by keyA) as numBs
from clientTable c
) c join
(select c.*, count(*) over (partition by keyA) as numBs
from clientTable c
) c2
on c2.fieldB = c.fieldB and
c2.keyA <> c.keyA and
c.keyA = 1 -- or whatever key you want to check
where c.numBs = c2.numBs
group by c.keyA, c2.keyA, c.numBs, c2.numBs
having count(*) = c.numBs;
The idea is to count the number of fieldB values for each keyA. These need to be equal (where c.numBs = c2.numBs) and to check that all match (having count(*) = c.numBs).

Related

Select only those record which is either matching or not present in another table

Let's say there is a table A:
Name Class Number
A 1 50
B 2 30
C 3 20
A 1 10
And table B:
Name Class Number
A 1 50
C 3 20
Where the primary key is the Name column.
Question
I would like to keep only records in table A that match one of the following condition:
record from table A exists in table B and values are identical
record from table A does not exist in table B
Expected Result
Name Class Number
A 1 50 // kept because all values match record in table B
B 2 30 // kept because record doesn't exist in table B
C 3 20 // kept because all values match record in table B
I think that this should do it:
select a.*
from tablea a
where not exists (
select 1
from tableb b
where
a.name = b.name
and (a.class <> b.class or a.number <> b.number)
)
This phrases as: select records in table a for which no other record exists in table b with the same id and a different class or number.
Otherwise, we can also express the two conditions explicitly, like so:
select a.*
from tablea a
where
exists (
select 1
from tableb b
where a.name = b.name and a.class = b.class and a.number = b.number
)
or not exists (
select 1 from tableb b where a.name = b.name
)

SQL Server: how can I count values in one field different from another, with multiple values for same identifier?

I have two tables with one common column, and an identifying value that can be duplicate (several observations of same document).
An example:
TableA:
A_identifier | Value
-------------+-------
1 | A
1 | B
TableB:
B_identifier | A_identifier | Value
-------------+--------------+-------
1 | 1 | A
2 | 1 | B
3 | 1 | B
4 | 1 | C
The above example illustrates the type of situation I am looking for in my data - we have a case in TableA with multiple values, of which some are the same in TableB and some are not. So TableA.Value and TableB.Value represent the same concept.
I want to know for each TableA.A_identifier, how many rows of TableB have different values than TableA.Value. If there was only one observation per A_identifier, this could be solved with a not, but the multiple possible values prevent this.
What I have thought about doing is something like this (which does not work):
select distinct
b.B_identifier, a.A_identifier
from
TableB b
join
TableA a in b.A_identifer = a.A_identifier and b.Value != a.Value
While the query technically works, it returns the wrong result - it counts all the cases where the values in TableA and TableB are different in a given row. However, I want it to only count the values in TableB which are not present at all in TableA for each A_identifier.
I tried replacing the != with not in which is what I would do for a static parameter. This syntax is not supported.
I hope my question makes sense, and that somebody can help. Thank you in advance.
Try this query if it works for you,
SELECT COUNT(b.A_identifier)
FROM TableB b
LEFT JOIN TableA a
ON b.A_identifier = a.A_identifier
AND b.Value = a.Value
WHERE a.A_identifier IS NULL -- filters out inexisting value
AND EXISTS (SELECT 1
FROM TableA c
WHERE b.A_identifier = c.A_identifier) -- shows only A_identifier
-- that is present in TableA
However, if you want to get the count for each Value
SELECT b.A_identifier, b.Value, TOTAL_COUNT = COUNT(b.A_identifier)
FROM TableB b
LEFT JOIN TableA a
ON b.A_identifier = a.A_identifier
AND b.Value = a.Value
WHERE a.A_identifier IS NULL
AND EXISTS (SELECT 1
FROM TableA c
WHERE b.A_identifier = c.A_identifier)
GROUP BY b.A_identifier, b.Value
Use NOT EXISTS
select t1.A_identifier, count(t2.value)
from TableA t1
left join TableB t2 on t1.A_identifier = t2.A_identifier and
NOT EXISTS (
select 1
from TableA t3
where t3.A_identifier = t2.A_identifier and
t3.Value = t2.Value
)
group by t1.A_identifier
How about the following SQL?
select distinct TableA.A_identifier,
(select count(*) from TableB
where TableB.A_identifier = TableA.A_identifier
and not exists(
select * from TableA where A_identifier = TableB.A_identifier
and Value = TableB.Value)
)
as TableB_Rows
from TableA

select sql query to merge results

I have a table old_data and a table new_data. I want to write a select statement that gives me
Rows in old_data stay there
New rows in new_data get added to old_data
unique key is id so rows with id in new_data should update existing ones in old_data
I need to write a select statement that would give me old_data updated with new data and new data added to it.
Example:
Table a:
id count
1 2
2 19
3 4
Table b:
id count
2 22
5 7
I need a SELECT statement that gives me
id count
1 2
2 22
3 4
5 7
Based on your desired results:
SELECT
*
FROM
[TableB] AS B
UNION ALL
SELECT
*
FROM
[TableA] AS A
WHERE
A.id NOT IN (SELECT id FROM [TableB])
I think this would work pretty neatly with COALESCE:
SELECT a.id, COALESCE(b.count, a.count)
FROM a
FULL OUTER JOIN b
ON a.id = b.id
Note - if your RDBMS does not contain COALESCE, you can write out the function using CASE as follows:
SELECT a.id,
CASE WHEN b.count IS NULL THEN a.count
ELSE b.count END AS count
FROM ...
You can write a FULL OUTER JOIN as follows:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
UNION ALL
SELECT *
FROM b
LEFT a
ON b.id = a.id
You have to use UPSERT to update old data and add new data in Old_data table and select all rows from Old_data. Check following and let me know what you think about this query
UPDATE [old_data]
SET [count] = B.[count]
FROM [old_data] AS A
INNER JOIN [new_Data] AS B
ON A.[id] = B.[id]
INSERT INTO [old_data]
([id]
,[count])
SELECT A.[id]
,A.[count]
FROM [new_Data] AS A
LEFT JOIN [old_data] AS B
ON A.[id] = B.[id]
WHERE B.[id] IS NULL
SELECT *
FROM [old_data]

Select table as json array

I have three tables in PostgreSQL: A, B, C.
I want to get a row from table A with a specific id, plus all records from tables B and C with matching id as aggregated JSON.
For example:
Table A Table B Table C
---------------------------------------------------------------
id / colum1 / colum2 id/ colum 1 id / column1
1 someValue, somValue 1 someVal1 1 someVal1
1 someVal2 1 someVal2
The expected output for id = 1 would be:
a.column1 a.column2 ARRAY_JSON_B ARRAY_JSON_C
------------------------------------------------------------------------------
someValue someValue [{colum1:'someVal1'}, [{colum1:'someVal1'},
{colum1:'someVal2'}] {colum1:'someVal2'}]
This requires Postgres 9.3 or later.
Simple case
I suggest to use the simpler json_agg() that's meant for this purpose, in LATERAL joins:
SELECT *
FROM a
LEFT JOIN LATERAL (SELECT json_agg(b) AS array_json_b FROM b WHERE id = a.id) b ON true
LEFT JOIN LATERAL (SELECT json_agg(c) AS array_json_c FROM c WHERE id = a.id) c ON true
WHERE id = 1;
LEFT JOIN LATERAL ... ON true keeps rows in the result that have no match on the left side of the join. Details:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Subtle difference: This query returns NULL where no match is found in b or c, #stas' query with correlated subqueries returns an empty array instead. May or may not be important.
Actual answer
Your example in the question excludes the redundant id column in b and c from the result - which makes sense. To achieve this, you can't use #stas' simple correlated subquery. While it would still work for a single column instead of the whole row, it would lose the column name and produce a simple array. Also, it would not work for more than one column.
Use json_object_agg() for a single selected column (which also allows to chose the tag name freely):
SELECT *
FROM a
LEFT JOIN LATERAL (
SELECT json_object_agg('colum1', colum1) AS array_json_b
FROM b WHERE id = a.id
) b ON true
LEFT JOIN LATERAL (
SELECT json_object_agg('colum1', colum1) AS array_json_c
FROM c WHERE id = a.id
) c ON true
WHERE id = 1;
Or use a subselect for any selection (col1 and col2 in this example):
SELECT *
FROM a
LEFT JOIN LATERAL (
SELECT json_agg(x) AS array_json_b
FROM (SELECT col1, col2 FROM b WHERE id = a.id) x
) b ON true
LEFT JOIN LATERAL (
SELECT json_agg(x) AS array_json_c
FROM (SELECT col1, col2 FROM c WHERE id = a.id) x
) c ON true
WHERE id = 1;
Related:
Return multiple columns of the same row as JSON array of objects
How do I return a jsonb array and array of objects from my data?
select
a.*,
to_json(array(select b from b where b.id = a.id)) array_json_b,
to_json(array(select c from c where c.id = a.id)) array_json_c
from a
where
a.id = 1;
I hope your Postgresql version is 9.3 or higher. There is a clever function to_json which can convert anything to json. So we take an array of all related rows from b and convert it. Same with c.

Select 2 Rows from Table when COUNT of another table

Here is the code that I currently have:
SELECT `A`.*
FROM `A`
LEFT JOIN `B` ON `A`.`A_id` = `B`.`value_1`
WHERE `B`.`value_2` IS NULL
AND `B`.`userid` IS NULL
ORDER BY RAND() LIMIT 2
What it currently is supposed to do is select 2 rows from A when the 2 rows A_id being selected are not in value_1 or value_2 in B. And the rows in B are specific to individual users with userid.
What I need to do is make it also so that also checks if there are already N rows in B matching a A_id (either in value_1, or value_2) and userid, and if there are more than N rows, it doesn't select the A row.
The following would handle your first request:
Select ...
From A
Left Join B
On ( B.value_1 = A.A_id Or B.value_2 = A.A_id )
And B.userid = #userid
Where B.<non-nullable column> Is Null
Part of the trick is moving your criteria into the ON clause of the Left Join. I'm not sure how the second part of your request fits with the first part. If there are no rows in B that match on value_1 or value_2 for the given user, then by definition that row count will be zero. Is it that you want it be the situation where there can only be a maximum number of rows in B matching on the given criteria? If so, then I'd write my query like so:
Select ...
From A
Where (
Select Count(*)
From B B2
Where ( B2.value_1 = A.A_id Or B2.value_2 = A.A_id )
And B2.userid = #userid
) <= #MaxItems