Improving a query to find out-of-sync values between two tables

Improving a query to find out-of-sync values between two tables - sql

I have the following query:
SELECT
tableOneId
SUM(a+b+c) AS tableOneData,
MIN(d) AS tableTwoData,
FROM
tableTwo JOIN tableOne ON tableOneId = tableTwoId
GROUP BY
tableOneId
All of the mentioned columns are declared as numeric(30,6) NOT NULL.
In tableOne, I have entries whose sum (columns a, b, c) should be equivalent to column d in Table Two.
A simple example of this:
Table One (id here should read tableOneId to match above query)
id=1, a=1, b=0, c=0
id=1, a=0, b=2, c=0
id=2, a=1, b=0, c=0
Table Two (id here should read tableTwoId to match above query)
id=1, d=3
id=2, d=1
My first iteration used SUM(d)/COUNT(*) but division is messy so I'm currently using MIN(d). What would be a better way to write this query?

Try this:
SELECT
tableOneId,
tableOneData,
d AS tableTwoData
FROM tableTwo
JOIN (select tableOneId, sum(a + b + c) AS tableOneData
from tableone
group by 1) x ON tableOneId = tableTwoId
where tableOneData <> d;
This will return all rows that have incorrect data in table 2.

select tableOneId, SUM(a) + SUM(b) + SUM(c) as tableOneData, d as tableTwoData
from tableTwo JOIN tableOne ON tableOneId = tableTwoId
GROUP BY tableOneId, d

Related

SQL Finding duplicate values in two of the three columns of each row

Let's say we have three columns: A, B, and C.
I would like to filter the results as follows:
The values of A and B are the same (duplicated) for > 1 (more than 1) row, and the value of C is always different.
In the attached image, the values that appear selected would meet the conditions mentioned above.
What I've tried:
SELECT
a.notation as A, a.gene as B, b.id as C
FROM
`db-dummy`.sgdata c
join `db-dummy`.g_info a on a.rec_id = c.gen_id
join `db-dummy`.spec_data b on b.rec_id = c.spec_id GROUP BY A, B HAVING COUNT(*) > 1;
I thought that using GROUP BY and HAVING COUNT(*) > 1 I could get the desired result, but I get the following error:
SQL Error [1055] [42000]: (conn=1632) Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'db-dummy.b.spec_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

If you had a single table, I would suggest just using exists. But because you have a join, use window functions. If you are. looking for different values of id:
SELECT A, B, C
FROM (SELECT a.notation as A, a.gene as B, b.id as C,
MIN(b.id) OVER (PARTITION BY a.notation, a.gene) as min_id,
MAX(b.id) OVER (PARTITION BY a.notation, a.gene) as max_id
FROM `db-dummy`.sgdata c JOIN
`db-dummy`.g_info a
ON a.rec_id = c.gen_id JOIN
`db-dummy`.spec_data b
ON b.rec_id = c.spec_id
) x
WHERE min_id <> max_id;
If you are just looking for multiple rows for a given A and B, then you can use:
SELECT A, B, C
FROM (SELECT a.notation as A, a.gene as B, b.id as C,
COUNT(*) OVER (PARTITION BY a.noation, a.gene) as cnt
FROM `db-dummy`.sgdata c JOIN
`db-dummy`.g_info a
ON a.rec_id = c.gen_id JOIN
`db-dummy`.spec_data b
ON b.rec_id = c.spec_id
) x
WHERE cnt > 1;

SELECT * FROM `db-dummy`.sgdata a
LEFT JOIN
(SELECT COUNT(Id) as count, notation, gene
FROM `db-dummy`.sgdata
GROUP BY notation, gene
HAVING COUNT(id) > 1) b
on a.notation = b.notation AND a.gene = b.gene

Summarized table in postgreSQL for better performance

I am using postgreSQL as my database. I have a table MASTER(A, B, C, D, N1, N2, N3, N4, N5, N6) where the primary key is (A, B, C, D) and N1, N2, N3, N4, N5, N6 are the numeric columns.
I have a query as below to get the summarized data of each A selected from each list in MASTERCOMB.
SELECT MASTERCOM.A
,STATS.sumn1
,STATS.sumn2
,STATS.sumn3
,STATS.sumn4
,STATS.sumn5
,STATS.sumn6
FROM (WITH
sum1 AS (SELECT A, SUM(N1) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N1) DESC LIMIT $2),
sum2 AS (SELECT A, SUM(N2) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N2) DESC LIMIT $2),
sum3 AS (SELECT A, SUM(N3) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N3) DESC LIMIT $2),
sum4 AS (SELECT A, SUM(N4) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N4) DESC LIMIT $2),
sum5 AS (SELECT A, SUM(N5) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N5) DESC LIMIT $2),
sum6 AS (SELECT A, SUM(N6) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N6) DESC LIMIT $2)
SELECT DISTINCT COALESCE(sum1.A, sum2.A, sum3.A, sum4.A, sum5.A, sum6.A) A
FROM sum1
FULL OUTER JOIN sum2 ON sum2.A = sum1.A
FULL OUTER JOIN sum3 ON sum3.A = sum1.A
FULL OUTER JOIN sum4 ON sum4.A = sum1.A
FULL OUTER JOIN sum5 ON sum5.A = sum1.A
FULL OUTER JOIN sum6 ON sum6.A = sum1.A) MASTERCOMB
LEFT JOIN (SELECT A
,SUM(N1) sumn1
,SUM(N2) sumn2
,SUM(N3) sumn3
,SUM(N4) sumn4
,SUM(N5) sumn5
,SUM(N6 sumn6)
FROM MASTER WHERE B = $1 GROUP BY A) AS STATS
ON STATS.A = MASTERCOMB.A
This is just one kind of query with B in the WHERE clause. I may have to query with different combinations like 'WHERE C = $3' OR 'WHERE D = $4'. In rare cases I may have to query with combinations of multiple conditions on B, C and D together;
As the table grows, the performance of the queries could drop. So I am thinking of two aproaches
Approach #1:
Create Summary Tables SMRY_A_B, SMRY_A_C, SMRY_A_D
On each insert, update and delete of MASTER table, SUM the values and insert/update/delete respective tables
Approach #2:
Create a Summary table SMRY_A_B_C_D with primary key (A, B, C, D)
On each insert, update and delete of MASTER table, SUM the values and insert/update/delete SMRY_A_B_C_D table
possible values for SMRY_A_B_C_D could be
(valA, valB, 'N/A', 'N/A', sumn1, sumn2, sumn3, sumn4, sumn5, sumn6)
(valA, 'N/A, valC, 'N/A', sumn1, sumn2, sumn3, sumn4, sumn5, sumn6)
(valA, 'N/A, 'N/A', 'valD', sumn1, sumn2, sumn3, sumn4, sumn5, sumn6)
Questions:
Which approach is better to go with?
Should I not consider both the approaches and query from the master table itself? If so should I optimize the query?

Compare fields from different rows

First off I am using SQL Server.
I am joining a table on itself like in the example below:
SELECT t.theDate,
s.theDate,
t.bitField,
s.bitField,
t.NAME,
s.NAME
FROM table1 t
INNER JOIN table1 s ON t.NAME = s.NAME
If I take a random row (i.e. X) from the dataset produced.
Can I compare values in any field on row X to values in any field on row X-1 OR row X+1?
Example: I want to compare t.theDate on row 5 to s.theDate on row 4 or s.theDate on row 3.
Sample data looks like:
Desired results:
I want to pull all pairs of rows where the t.bitfield and s.bitfield are opposite and t.theDate and s.theDate are opposite.
From the image the would be row (3 & 4), (5 & 6), (7 & 8) ... etc.
I really appreciate any help!
Can it be done?

Varinant 1: It looks like you would like to use ranking function.
if objcet_id('tempdb..#TmpOrderedTable') is not null drop table #TmpOrderedTable
select *, row_number(order by columnlist, (select 0)) rn
into #TmpOrderedTable
from table1 t
select *
from #TmpOrderedTable t0
inner join #TmpOrderedTable tplus on t0.rn = tplus.rn + 1 -- next one
inner join #TmpOrderedTable tminus on t0.rn = tminus.rn - 1 -- previous one
Varinant 2:
To get scalar values you can use ranking function lag and lead. Or subquery.
Varinant 3:
You can use selfjoin, but you have to specify unique nonarbitary key if you don't want duplicates.
Varinant 4:
You can use apply.
Your question isn't too clear, so i hope it was your goal.

How about this?
WITH ts as (
SELECT t.theDate as theDate1, s.theDate as theDate2,
t.bitField as bitField1, s.bitField as bitField2,
t.NAME -- there is only one name
FROM table1 t INNER JOIN
table1 s
ON t.NAME = s.NAME
)
SELECT ts.*
FROM ts
WHERE EXISTS (SELECT 1
FROM ts ts2
WHERE ts2.name = ts.name AND
ts2.theDate1 = ts.theDate2 AND
ts2.theDate2 = ts.theDate1 AND
ts2.bitField1 = ts.bitField2 AND
ts2.bitField2 = ts.bitField1
);

Join Multiple Rows into 1 row different columns SQL Server

I am working on a query and would love some help.
I will provide a simplified version of the query in hopes that it communicates what I am attempting to do.
Given the following Tables:
TableA (RecordNumber, TableAID, SomeValue)
TableB (RecordNumber, X, Y, Z)
TableC (RecordNumber, D, E, F, G)
The result set I am looking for:
TableB.RecordNumber, X, Y, Z, D, E, F, G, SomeValue1, SomeValue2, SomeValue3, SomeValue4
My query currently is
Select
TableB.RecordNumber, X, Y, Z, D, E, F, G, SomeValue1)
inner join
TableC on TableB.RecordNumber = TableC.RecordNumber
inner join
TableA on TableB.RecordNumber = TableA.RecordNumber
I realize that this is returning 1 row per SomeValue in TableA.
What I would like to do is combine each row for a RecordNumber into 1 row populating the SomeValueX with the SomeValue value from row X for that record number.
Thoughts?

You can't have a variable number of columns without using dynamic SQL but you can have a variable number of comma-separated values in a single column.
SELECT b.RecordNumber, X, Y, Z, D, E, F, G,
STUFF((
SELECT ',' + SomeValue
FROM TableA a
WHERE a.RecordNumber = b.RecordNumber
ORDER BY SomeValue
FOR XML PATH('')
), 1, 1, '') AS SomeValues
FROM TableB b
INNER JOIN TableC c
ON b.RecordNumber = c.RecordNumber
If TableA.SomeValue is not of a character or string data type, you would also want to cast it to a varchar of an appropriate length.

Fetching rows from two sql tables

I have two tables RecordMaster and Dummy
Both have columns like Mobile_Number and Insert_Date
I want a row like
1) from Dummy table I want to fetch those rows whose Mobile_Number And Insert_Date are same compared to RecordMaster.
2) from Dummy table I want to fetch those rows whose Mobile_Number And Insert_Date are different compared to RecordMaster.
After that in 1) condition I want to fetch only those rows whose Cpv_Status is not null.
(CPV_STATUS) is one column in the Dummy table..
Help me please ........

To meet your 1) and 3) needs ( optionally include the WHERE as you need).
SELECT d.*
FROM Dummy d
INNER JOIN RecordMaster r
ON r.mobile_number = d.mobile_number
AND r.insert_date = d.insert_date
WHERE d.Cpv_Status IS NOT NULL
2.
SELECT d.*
FROM Dummy d
WHERE NOT EXISTS
(SELECT 1
FROM RecordMaster r
WHERE r.mobile_number = d.mobile_number
AND r.insert_date = d.insert_date
)
To insert these:
INSERT INTO RecordMaster(mobile_number, insert_date)
SELECT d.mobile_number, insert_date
FROM Dummy d
WHERE NOT EXISTS
(SELECT 1
FROM RecordMaster r
WHERE r.mobile_number = d.mobile_number
AND r.insert_date = d.insert_date
)

The following query will give you all records in dummy that match the records in RecordMaster
SELECT a.Mobile_Number,a.Insert_Date ,a.Cpv_Status
FROM RecordMaster a, Dummy b
WHERE a.Mobile_Number = b.Mobile_Number and a.Insert_Date = b.Insert_Date
The following query will give you records in Dummy that don't have matching records in RecordMadter
SELECT a.Mobile_Number,a.Insert_Date ,a.Cpv_Status
FROM Dummy a
WHERE STR(a.Mobile_Number)+STR(a.Mobile_Number) not in
(SELECT STR(Mobile_Number)+STR(Insert_Date) FROM RecordMaster)
if you need both of these results combined in one result set, then use UNION like this
SELECT a.Mobile_Number,a.Insert_Date ,a.Cpv_Status
FROM RecordMaster a, Dummy b
WHERE a.Mobile_Number = b.Mobile_Number and a.Insert_Date = b.Insert_Date
UNION
SELECT a.Mobile_Number,a.Insert_Date ,a.Cpv_Status
FROM Dummy a
WHERE STR(a.Mobile_Number)+STR(a.Mobile_Number) not in
(SELECT STR(Mobile_Number)+STR(Insert_Date) FROM RecordMaster)
Lastly, you can apply any filter you want to the final result set like this:
select * from (
SELECT a.Mobile_Number,a.Insert_Date , a.Cpv_Status,a.Cpv_Status
FROM RecordMaster a, Dummy b
WHERE a.Mobile_Number = b.Mobile_Number and a.Insert_Date = b.Insert_Date
UNION
SELECT a.Mobile_Number,a.Insert_Date ,a.Cpv_Status
FROM Dummy a
WHERE STR(a.Mobile_Number)+STR(a.Mobile_Number) not in
(SELECT STR(Mobile_Number)+STR(Insert_Date) FROM RecordMaster)
) where Cpv_Status is not null

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Improving a query to find out-of-sync values between two tables - sql

Try this: SELECT tableOneId, tableOneData, d AS tableTwoData FROM tableTwo JOIN (select tableOneId, sum(a + b + c) AS tableOneData from tableone group by 1) x ON tableOneId = tableTwoId where tableOneData <> d; This will return all rows that have incorrect data in table 2.

select tableOneId, SUM(a) + SUM(b) + SUM(c) as tableOneData, d as tableTwoData from tableTwo JOIN tableOne ON tableOneId = tableTwoId GROUP BY tableOneId, d

Related

SQL Finding duplicate values in two of the three columns of each row

Summarized table in postgreSQL for better performance

Compare fields from different rows

Join Multiple Rows into 1 row different columns SQL Server

Fetching rows from two sql tables

Categories

Resources