how can build SQL query for below mentioned problem? - sql

I have below mentioned two data sets, Table A and Table B, and I am trying to get to Table C as output dataset using Table A and TableB, can you help me with SQL query to come up wit this out put. I am mainly trying to calculate DueAmount column in TableC, and logic to derive this column is mentioned in Calculation column
Data Screenshot:
TableA, TableB and Output screenshot
I thought about trying the logic in which Table A can be expanded to multiple row for each period , and then join TableA with TableB, but I am looking for some logic which will be more efficient for large number of ID's.

The below will generate the results desired:
SELECT p.PeriodId, CASE WHEN b.BAmt < a.AAmt THEN 0 ELSE b.BAmt - a.AAmt END AS DueAmt
FROM (SELECT PeriodId FROM TableA UNION SELECT PeriodId FROM TableB) p
CROSS APPLY (SELECT SUM(Amount) AS AAmt FROM TableA WHERE PeriodId <= p.PeriodId) a
CROSS APPLY (SELECT SUM(Amount) AS BAmt FROM TableB WHERE PeriodId <= p.PeriodId) b

Related

duplicate query result when join table

I face issue about duplicate data when join table, here my sample data table I have
-- Table A
I want to join with
-- Table B
this my query notation for join both table,
select a.trans_id, name
from tableA a
inner join tableB b
on a.ID_Trans = b.trans_id
and this the result, why I get the duplicating data which should show only two lines of data, please help me to solve this case.
Firstly, as you have been told multiple times in the comments, this is working exactly as you have written, and (more importantly) as intended. You have 2 rows in tableA and those 2 rows match 2 rows in your table tableB according to the ON clause. This means that each join operation, for the each of the rows in tableA, results in 2 rows as well; thus 4 rows (2 * 2 = 4).
Considering that your table, TableA only has one column then it seems that you should be cleaning up that data and deleting the duplicates. There are plenty of examples on how to do that already (example).
Perhaps the column you show us in TableA is one many, and thus instead you have a denormalisation issue, and instead there should be another table with the details of Id_trans and a PRIMARY KEY or UNIQUE CONSTRAINT/INDEX on it. Then you would join fron that table to TableB.
Finally, what you might be after is an EXISTS, which would look like this:
SELECT B.trans_id, B.[name]
FROM dbo.TableB B
WHERE EXISTS(SELECT 1
FROM dbo.TableA A
WHERE A.ID_Trans = B.trans_id); --Odd that it's called ID_Trans in one table, and Trans_ID in another
As the comments mentioned your query does exactly what you asked it to do but I think you wanted something like:
select a.trans_id, a.name, b.name
from tableA a
inner join tableB b on a.trans_id = b.trans_id
group by a.trans_id, a.name, b.name
Since there are two rows in both table with same ID join will make them four. You can use distinct to remove duplicates:
select distinct a.trans_id, name
from tableA a
inner join tableB b
on a.id_trans = b.trans_id
But I would suggest to use exists:
select trans_id, name
from tableB b
exists (select 1 from tableA a where a.trans_id=b.trans_id)

INTERSECT two table of size 500ml rows in vertica

I am very new to vertica db and hence looking for different efficient ways for comparing two tables of average size 500ml-800ml rows in vertica. I have a process that gets the data from vertica view and dump in to SQL server for later merge to final table in sql server. for few large tables combine it is dumping about 3bl rows daily. Instead of dumping all data I want to take daily snapshot, and compare it with previous days snapshot on vertica side only and then push changed rows only in to SQL SEREVER.
lets say previous snapshot is stored in tableA, today's snapshot stored in tableB. PK on both table is column named OrderId.
Simplest way I can think of is
Select * from tableB
Where OrderId NOT IN (
SELECT * from tableA
INTERSECT
SELECT * from tbleB
)
So my questions are:
Is there any other/better option in vertica to get only changed rows between two tables? Or should I
even consider doing this compare on vertica side?
How much doing such comparison should take?
What should I consider to improve the performance of such query?
If your columns have no NULL values, then a massive LEFT JOIN would seem to do what you want:
select b.*
from tableB b left join
tableA a
on b.OrderId = a.OrderId and
b.col1 = a.col1 and
. . . -- for all the columns you care about
However, I think you want except:
select b.*
from tableB b
except
select a.*
from tableA a;
I imagine this would have reasonable performance.
Do you have a primary key in the two tables?
Then my technique, for a complete Change Data Capture, is:
SELECT
'I' AS to_do
, newrows.*
FROM tb_today newrows
LEFT
JOIN tb_yesterday oldrows USING(id)
WHERE oldrows.id IS NULL
UNION ALL
SELECT
'U' AS to_do
, newrows.*
FROM tb_today newrows
JOIN tb_yesterday oldrows
WHERE oldrows.fname <> newrows.fname
OR oldrows.lnamd <> newrows.lname
OR oldrows.bdate <> newrwos.bdate
OR oldrows.sal <> newrows.sal
[...]
OR oldrows.lastcol <> newrows.lastcol
UNION ALL
SELECT
'D' AS to_do
, oldrows.*
FROM tb_yesterday oldrows
LEFT
JOIN tb_today oldrows USING(id)
WHERE newrows.id IS NULL
;
Just leave out the last leg of the UNION SELECT if you don't want to cater for DELETEs ('D')
Good luck
you also do it nicely using joins:
SELECT b.*
FROM tableB AS b
LEFT JOIN tableA AS a ON a.id = b.id
WHERE a.id IS NULL
so above query return only diff from TableB to TableA i.e. data which is present in both table will be skipped...

Select data from multiple table without join sql

I have 2 SQL table.Table A and Table B. Both of this table have 10000 records respectively.
In Table A have 3 Column=>
ColumnA,ColumnB,ColumnC
In Table B have 3 Column=>
ColumnD,ColumnE,ColumnF
My Requirement is to select ColumnA and ColumnD with their original record(10000).
My question is how can I select only ColumnA and ColumnD.
The first problem is I cant join this two table because this two table stands as separately.
The second problem is I cant Union this two table because my requirement is to get Two column but when I union, I only get one column with combining two column.
You could create an on-the-fly join column via ROW_NUMBER, and then join on that:
WITH cte1 AS (
SELECT ColumnA, ROW_NUMBER() OVER (ORDER BY ColumnA) rn
FROM TableA
),
cte2 AS (
SELECT ColumnD, ROW_NUMBER() OVER (ORDER BY ColumnD) rn
FROM TableB
)
SELECT t1.ColumnA, t2.ColumnD
FROM cte t1
INNER JOIN cte t2
ON t1.rn = t2.rn;
Of course, this would just pair the 10K records using the arbitrary orderings in the A and D columns. If you have some specific logic for how these two columns should be paired up, then let us know. The bottom line is that you can't easily get away from the concepts of join or union to bring these two columns together.

Get all rows that are not exist in the appropriate table depending on the date in sql

I have two tables: TableA and TableB (as in the following picture):
The result should be as in the following figure:
What is the best way to get result (as in the table Result) using mssql query?
Thanks.
If I understand correctly, you want the date/value pairs that don't exist.
Generate the list of all date/value pairs using a cross join. Then filter out the ones you don't want:
select b.value, d.date
from tableb b cross join
(select distinct date from tablea a) d
where not exists (select 1 from tablea a where a.date = d.date and a.value = b.value)

SQL MS Access Summing Results Based on Two Table Criteria

I've been asked to find if totals from one table exceed a certain value. However the identifier I need to group these is stored in another table. So I've figured how to isolate what I need and then copy into excel. However I do understand the principle of summing in SQL, as I've made my own queries that look like this:
SELECT * FROM
(SELECT ID, SUM(Table1.Amount) AS Subtotal FROM Table1 WHERE LineNumber = 1 GROUP BY ID) AS a, Table2 AS b
WHERE a.ID = b.ID
AND a.Subtotal > b.Threshold
In this case, the totals I need to sum all have the same LineNumber from the original table (Table1) so it's easy to compare to a value in a different table. What I want to do is be able to is join a subquery back to the original table, then GROUPBY the identifier and sum from the other table, adding a criteria that the original ID from table one must be a duplicate:
SELECT * FROM
((Table1 as t
INNER JOIN
(SELECT ID FROM Table1
GROUP BY ID HAVING COUNT(ID) >1) as b
ON t.ID = b.ID
Joining criteria back to the original table, I need to do this because the table I need to sum doesn't contain ID,
INNER JOIN Table2 as c
ON t.ID2 = c.ID2
WHERE c.LineNumber = 4
This is where I'm stuck. I want to sum all the LineNumber = 4 for each ID. Again, I had to join using ID2 because ID1 isn't in Table2
Perhaps I'm making this too complicated? Any suggestions would be welcome
Thanks!