I have two tables with the same fields, but a final value that is calculated slightly differently. I need to combine the data from these two tables into one but need to prioritise one record over another when there is a match. Do you know how this might be possible?
Below is a mock up of two matching records:
ID Balance Type CCY Payment Final_Balance
28 1068376.037 F - CC GBP 78124 990252.0367
28 1068376.037 F - DD GBP 982905 85470.08293
Apologies if the format comes out poorly, I'm unsure how to format table data.
I have thousands of records in these two tables but for a handful of records I have the same information in both tables. Essentially what I'm trying to get to is where there is a match I want it to select F-CC over F-DD so I end up with unique records in my final table.
Thanks
I personally use ROW_NUMBER() for things like this, but there may be a better solution.
You can re-run this SQL to show how the final answer is slowly built up:
declare #t1 table (id int)
declare #t2 table (id int, txt varchar(2))
insert into #t1
select 1 union
select 2
insert into #t2
select 1, 'FC' union
select 1, 'FD' union
select 2, 'FC' union
select 2, 'FD'
select *, row_number() over (partition by id order by txt) as we_want_the_ones
from #t2
select * from (
select id, txt, row_number() over (partition by id order by txt) as we_want_the_ones
from #t2
) z
where we_want_the_ones = 1
select *
from #t1 a
join (
select * from (
select id, txt, row_number() over (partition by id order by txt) as we_want_the_ones
from #t2
) z
where we_want_the_ones = 1
) b on a.id = b.id
My understanding of the question is that you have two tables (A and B) which have the exact same columns. You want to UNION these tables into one dataset, but sometimes you have rows in the two tables which "match" each other. In this case you only take one of the rows based on some priority.
From your example it seems that..
Match: Occurs when the ID is the same.
Priority: Is based on the Type column, prioritized by lower alphabetical order.
Also I'm assuming SQL Server, since that's what I prefer and you didn't say.
Hopefully all that is correct.. Now, here is how I would approach it.
I would start by performing the UNION of the two tables. Taking all records and not worrying about matching yet, putting them in a temp table to use later.
SELECT ID, Balance, Type, CCY, Payment, Final_Balance
INTO #AllRecords
FROM A
UNION
SELECT ID, Balance, Type, CCY, Payment, Final_Balance
FROM B
Next, I would GROUP BY the fields which determine a match, then use MIN or MAX to get the correct value for priority columns. By my understanding of your problem that means..
SELECT ID, MIN(Type) AS Type
FROM #AllRecords
GROUP BY ID
With that query you now have the natural key for all the records you want to display in your final result. All that is left to do is look up the rest of the columns using those keys, we can do this by using that query as a subquery.
SELECT ID, Balance, Type, CCY, Payment, Final_Balance
FROM #AllRecords r
INNER JOIN (
SELECT ID, MIN(Type) AS Type
FROM #AllRecords
GROUP BY ID ) final ON r.ID = final.ID AND r.Type = final.Type
So all together the resulting query is..
SELECT ID, Balance, Type, CCY, Payment, Final_Balance
INTO #AllRecords
FROM A
UNION
SELECT ID, Balance, Type, CCY, Payment, Final_Balance
FROM B
SELECT ID, Balance, Type, CCY, Payment, Final_Balance
FROM #AllRecords r
INNER JOIN (
SELECT ID, MIN(Type) AS Type
FROM #AllRecords
GROUP BY ID ) final ON r.ID = final.ID AND r.Type = final.Type
Related
Table C(id, type) has list of all unique clients ids, with and without transactions. Every id is unique and has a single type.
Table T(date, id, type, money) is the transaction table, the id is not unique here.
Table C has more unique ids than in T, because not all clients are doing transactions.
The unique ids in the T table are subset of id's in the C table.
SQL for AVG(money) and STD(money) per type for T table:
SELECT
type,
AVG(money) AS avg_for_active_clients,
STDEV(money) AS stdev_for_active_clients,
COUNT(DISTINCT id) as cnt_active_clients
FROM (
SELECT id , type, sum(money) as money
FROM T
GROUP BY id, type
) A
GROUP BY type
SQL for AVG(money) and STD(money) per type for C table:
SELECT
type,
AVG(money) AS avg_for_all_clients,
STDEV(money) stdev_for_all_clients,
COUNT(DISTINCT id) as cnt_all_clients
FROM (
SELECT C.id, C.type , COALESCE(A.money, 0) as money FROM C
LEFT JOIN (
SELECT id , sum(money) as money
FROM T
GROUP BY id
) A
ON C.id = A.id
) B
GROUP BY type
Is it possible to combine 2 SQLs above into single SQL ?
My database is Redshift.
You can combine your select #1 with your select #2 vertically or horizontally.
To combine them vertically you can use UNION ALL. For example:
select #1
union all
select #2
To combine them horizontally you can use FULL JOIN. For example:
select *
from (
select #1
) x
full join (
select #2
) y on y.type = x.type
Can anyone please check whether below code is correct? In cte_1, I’m taking all dimensions and metrics from t1 excpet value1, value2, value3. In cte_2, I’m finding the unique row number for t2. In cte_3, I’m taking all distinct dimensions and metrics using join on two keys such as Date, and Ad. In cte_4, I’m taking the values for only row number 1. I’m getting sum(value1),sum(value2),sum(value3) correct ,but sum(value4) is incorrect
WITH cte_1 AS
(SELECT *except(value1, value2, value3) FROM t1 where Date >"2020-02-16" and Publisher ="fb")
-- Find unique row number from t2--
,cte_2 as(
SELECT ROW_NUMBER() OVER(ORDER BY Date) distinct_row_number, * FROM t2
,cte_3 as
(SELECT cte_2.*,cte_1.*except(Date) FROM cte_2 join cte_1
on cte_2.Date = cte_1. Date
and cte_2.Ad= cte_1.Ad))
,cte_4 AS (
(SELECT *
FROM
(
SELECT *,
row_number() OVER (PARTITION BY distinct_row_number ORDER BY Date) as rn
FROM cte_3 ) T
where rn = 1 ))
select sum(value1),sum(value2),sum(value3),sum(value4) from cte_4
Please see the sample table below:
Whilst your data does not seem compliant with the query you shared, since it is lacking the field named Ad and other fields have different names, such as Date and ReportDate, I was able to identify some issues and propose improvements.
First, within your temp table cte_1, you are only using a filter in the WHERE clause, you could use it within your from statement in your last step, such as :
SELECT * FROM (SELECT field1,field2,field3 FROM t1 WHERE Date > DATE(2020,02,16) )
Second, in cte_2, you need to select all the columns you will need from the table t2. Otherwise, your table will have only the row number and it won't be possible to join it with other tables, once it does not provide any other information. Thus, if you need the row number, you select it together with the other columns, which it has to include your primary key if you will perform any join in the future. The syntax would be as follows:
SELECT field1, field2, ROW_NUMBER() OVER(ORDER BY Date) FROM t2
Third, in cte_3, I assume you want to perform an INNER JOIN. Thus, you need to make sure that the primary keys are present in both tables, in your case Date and Ad, which I could not find within your data. Furthermore, you can not have duplicated names when joining two tables and selecting all the columns. For example, in your case you have Brand, value 1, value 2 and value 3 in both tables, it will cause an error. Thus, you need to specify where these fields should come from by selecting one by one or the using a EXCEPT clause.
Finally, in cte_4 and your final select could be together in one step. Basically, you are selecting only one row of data ordered by Date. Then summing the fields value 1, value 2 and value 3 individually based on the partition by date. Moreover, you are not selecting any identifier for the sum, which means that your table will have only the final sums. In general, when peforming a aggregation, such as SUM(), the primary key(s) is selected as well. Lastly, this step could have been performed in one step such as follows, using only the data from t2:
SELECT ReportDate, Brand, sum(value1) as sum_1,sum(value2) as sum_1,sum(value3) as sum_1, sum(value4) as sum_1 FROM (SELECT t2.*, ROW_NUMBER() OVER(PARTITION BY Date ORDER BY Date) as rn t2)
WHERE rn=1
GROUP BY ReportDate, Brand
UPDATE:
With your explanation in the comment section. I was able to created a more specific query. The fields ReportDate,Brand,Portfolio,Campaign and value1,value2,value3 are from t2. Whilst value4 is from t1. The sum is made based on the row number equals to 1. For this reason, the tables t1 and t2 are joined before being using ROW_NUMBER(). Finally, in the last Select statement rn is not selected and the data is aggregated based on ReportDate, Brand, Portfolio and t2.Campaign.
WITH cte_1 AS (
SELECT t2.ReportDate, t2.Brand, t2.Portfolio, t2.Campaign,
t2.value1, t2.value2, t2.value3, t1.value4
FROM t2 LEFT JOIN t1 on t2.ReportDate = t1.ReportDate and t1.placement=t2.Ad
),
cte_2 AS(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Date ORDER BY ReportDate) as rn FROM cte_1
)
SELECT ReportDate, Brand, Portfolio, Campaign, SUM(value1) as sum1, SUM(value2) as sum2, SUM(value3) as sum3,
SUM(value4) as sum4
FROM cte_2
WHERE rn=1
GROUP BY 1,2,3,4
I’m trying to do an analysis of the different combinations of taxes per invoice to identify how many scenarios exist.
In the tax table, column 1 is invoiceNo, column 2 is taxType. These form the composite key. There can be 1 or more taxType per invoiceNo. Example of data:
https://i.imgur.com/bcQc7vY_d.jpg?maxwidth=640&shape=thumb&fidelity=medium (Sorry but i’m new so can’t add picture).
I want to be able to report on unique taxType for any invoiceNo. Ie, 1 A is unique comb 1, 2 AB is unique comb 2, 3 A is disregarded as already returned for 1, and 4 BC is unique comb 3.
Not sure if this makes sense! Finding it hard to articulate what I’m after!
Expected output would be:
A
AB
BC
The original version of this question was tagged MySQL, so this answers the question.
If I understand correctly, you can use group_concat():
select distinct group_concat(taxtype order by taxtype)
from t
group by invoiceno;
This works with the table you have given and would work with those combinations of Tax types even if they repeat but if there are more tax codes, or there is an AC combination, or if some of the given combinations are omitted then it might get little different! You could develop this to suit the conditions, or you could give some more info: Do invoices have three codes (ABC)? do invoices have just B or just C codes? I notice that the BC invoice etc
WITH CTE (RN,InvoiceNo,TT1,TT2)
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY a.InvoiceNo),a.InvoiceNo,a.TaxType,b.TaxType
FROM UniqueCombo a INNER JOIN UniqueCombo b ON a.InvoiceNo=b.InvoiceNo
)
,
CTE2 (RN,InvoiceNo,TT1,TT2)
AS
(
SELECT * FROM CTE WHERE RN IN
(
SELECT MAX(RN) FROM CTE WHERE TT1=TT2 GROUP BY InvoiceNo HAVING COUNT(InvoiceNo)=1
)
)
SELECT TT1 FROM CTE2 WHERE RN IN
(
SELECT MAX(RN) FROM CTE WHERE TT1=TT2 GROUP BY TT1,TT2 HAVING COUNT(InvoiceNo)>1
)
UNION
SELECT TT1+''+TT2 FROM CTE WHERE RN IN
(
SELECT MAX(RN)-1 FROM CTE WHERE TT1<>TT2 GROUP BY InvoiceNo
)
You can try STRING_AGG. Something like:
SELECT DISTINCT TaxTypeString
FROM
(
SELECT InvoiceNo, STRING_AGG(TaxType, '') AS TaxTypeString
FROM t
GROUP BY InvoiceNo
) x
ORDER BY TaxTypeString
The nested query, called x, should give you one row per invoice number, in the format you want. Then you have to select the distinct tax types from there.
I have two tables with data. Both tables have a CUSTOMER_ID column (which is numeric). I am trying to get a list of all the unique values for CUSTOMER_ID and know whether or not the CUSTOMER_ID exists in both tables or just one (and which one).
I can easily get a list of the unique CUSTOMER_ID:
SELECT tblOne.CUSTOMER_ID
FROM tblOne.CUSTOMER_ID
UNION
SELECT tblTwo.CUSTOMER_ID
FROM tblTwo.CUSTOMER_ID
I can't do just add an identifier column to the SELECT statemtn (like: SELECT tblOne.CUSTOMER_ID, "Table1" AS DataSource) because then the records wouldn't be unique and it will get both sets of data.
I feel I need to add it somewhere else in this query but am not sure how.
Edit for clarity:
For the union query output I need an additional column that can tell me if the unique value I am seeing exists in: (1) both tables, (2) table one, or (3) table two.
If the CUSTOMER_ID appears in both tables then we'll have to arbitrarily pick which table to call the source. The following query uses "tblOne" as the [SourceTable] in that case:
SELECT
CUSTOMER_ID,
MIN(Source) AS SourceTable,
COUNT(*) AS TableCount
FROM
(
SELECT DISTINCT
CUSTOMER_ID,
"tblOne" AS Source
FROM tblOne
UNION ALL
SELECT DISTINCT
CUSTOMER_ID,
"tblTwo" AS Source
FROM tblTwo
)
GROUP BY CUSTOMER_ID
Gord Thompson's answer is correct. But, it is not necessary to do a distinct in the subqueries. And, you can return a single column with the information you are looking for:
select customer_id,
iif(min(which) = max(which), min(which), "both") as DataSource
from (select customer_id, "tblone" as which
from tblOne
UNION ALL
select customer_id, "tbltwo" as which
from tblTwo
) t
group by customer_id
We could add an identifier column with the integer data type and then do an outer query:
SELECT
CUSTOMER_ID,
sum(Table)
FROM
(
SELECT
DISTINCT CUSTOMER_ID,
1 AS Table
FROM tblOne
UNION
SELECT
DISTINCT CUSTOMER_ID,
2 AS Table
FROM tblTwo
)
GROUP BY CUSTOMER_ID`
So if the "sum is 1" then it comes from tablOne and if it is 2 then it comes from tableTwo an if it is 3 then it exists in both
If you want to add a 3rd table in the union then give it a value of 4 so that you should have a unique sum for each combination
I have two tables. Differ in that an archive is a table and the other holds the current record. These are the tables recording sales in the company. In both we have among other fields: id, name, price of sale. I need to select from both tables, the highest and lowest price for a given name. I tried to do with the query:
select name, max (price_of_sale), min (price_of_sale)
from wapzby
union
select name, max (price_of_sale), min (price_of_sale)
from wpzby
order by name
but such an inquiry draws me two records - one of the current table, one table archival. I want to chose a name for the smallest and the largest price immediately from both tables. How do I get this query?
Here's two options (MSSql compliant)
Note: UNION ALL will combine the sets without eliminating duplicates. That's a much simpler behavior than UNION.
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM
(
SELECT Name, Price_Of_Sale
FROM wapzby
UNION ALL
SELECT Name, Price_Of_Sale
FROM wpzby
) as subQuery
GROUP BY Name
ORDER BY Name
This one figures out the max and min from each table before combining the set - it may be more performant to do it this way.
SELECT Name, MAX(MaxPrice) as MaxPrice, MIN(MinPrice) as MinPrice
FROM
(
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM wapzby
GROUP BY Name
UNION ALL
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM wpzby
GROUP BY Name
) as subQuery
GROUP BY Name
ORDER BY Name
In SQL Server you could use a subquery:
SELECT [name],
MAX([price_of_sale]) AS [MAX price_of_sale],
MIN([price_of_sale]) AS [MIN price_of_sale]
FROM (
SELECT [name],
[price_of_sale]
FROM [dbo].[wapzby]
UNION
SELECT [name],
[price_of_sale]
FROM [dbo].[wpzby]
) u
GROUP BY [name]
ORDER BY [name]
Is this more like what you want?
SELECT
a.name,
MAX (a.price_of_sale),
MIN (a.price_of_sale) ,
b.name,
MAX (b.price_of_sale),
MIN (b.price_of_sale)
FROM
wapzby a,
wpzby b
ORDER BY
a.name
It's untested but should return all your records on one row without the need for a union
SELECT MAX(value) FROM tabl1 UNION SELECT MAX(value) FROM tabl2;
SELECT MIN(value) FROM tabl1 UNION SELECT MIN(value) FROM tabl2;
SELECT (SELECT MAX(value) FROM table1 WHERE trn_type='CSL' and till='TILL01') as summ, (SELECT MAX(value) FROM table2WHERE trn_type='CSL' and till='TILL01') as summ_hist