Amalgamating SQL queries stored as views together / Combining tables - sql

I have several summary queries stored as Views...
...and would like to join them together into one combined output as follows:
..so I can use it as a pivot table in Excel.
Date is the only common denominator in the case.
I can do this in Excel using SUMIFS but would prefer to manage it in the SQL before it arrives in Excel.
Can anyone help?

Without a matching ID, the best I can think of is to order by ROW_NUMBER(), which gives a slightly verbose query;
WITH cte1 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY DATE
ORDER BY CASE WHEN Dogs IS NULL THEN 1 END) r1
FROM View1
), cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY DATE
ORDER BY CASE WHEN Region IS NULL THEN 1 END) r2
FROM View2
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY DATE
ORDER BY CASE WHEN Bed IS NULL THEN 1 END) r3
FROM View3
)
SELECT COALESCE(cte1.Date, cte2.Date, cte3.Date) Date,
Dogs, D_Qty, Region, R_Qty, Bed, B_Qty
FROM cte1
FULL OUTER JOIN cte2
ON cte1.Date = cte2.Date AND r1=r2
FULL OUTER JOIN cte3
ON cte1.Date = cte3.Date AND r1=r3
OR cte2.Date = cte3.Date AND r2=r3
ORDER BY Date, COALESCE(r1,r2,r3)
An SQLfiddle to test with.
You may consider adding an order column to your views, using ROW_NUMBER() OVER (PARTITION BY DATE ORDER BY (whatever order is in them), that would eliminate all the cte's and give you a stable ordering of things.

if you can Add one more column in your view1 and view2 and view3 than you can solve your issue easily,
Check this

Related

Why are these two SQL queries so different in efficiency?

I have to use SQL for my internship and while I know the gist of it, I do not really have a background in programming nor do I know what makes codes efficient etc.
Query #1
SELECT DISTINCT
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
FROM
(SELECT *
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM
TABLE) AS b
) AS c
LEFT JOIN
(SELECT
*
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS d
FROM
TABLE) AS e
) AS f ON c.[ID] = f.[ID] AND a = d - 1
ORDER BY
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
Query #2
SELECT DISTINCT
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM TABLE) AS b
LEFT JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS c
FROM TABLE) AS d ON b.[ID] = f.[ID] AND a = c - 1
ORDER BY
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
Queries #1 and #2 return the same result, which is expected, but query #1 has a runtime of roughly 5 seconds while query #2 has a runtime of roughly 1 minute and 35 seconds. In other words, the second query takes a good 1.5 minutes longer to run than the first and I am really curious to know why.
The correct way to write this query uses lead(). I'm pretty sure the select distinct is not needed, so this does what you want:
SELECT stat, event,
LEAD(stat) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_stat,
LEAD(event) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_event
FROM TABLE t
ORDER BY stat, event;
The two queries you have written should be the same in SQL Server. Apparently, the extra subqueries are confusing the optimizer. You would need to learn about execution plans to understand this better.

How to get the records from inner query results with the MAX value

The results are below. I need to get the records (seller and purchaser) with the max count- grouped by purchaser (marked with yellow)
You can use window functions:
with q as (
<your query here>
)
select q.*
from (select q.*,
row_number() over (order by seller desc) as seqnum_s,
row_number() over (order by purchaser desc) as seqnum_p
from q
) q
where seqnum_s = 1 or seqnum_p = 1;
Try this:
SELECT COUNT,seller,purchaser FROM YourTable ORDER BY seller,purchaser DESC
SELECT T2.MaxCount,T2.purchaser,T1.Seller FROM <Yourtable> T1
Inner JOIN
(
Select Max(Count) as MaxCount, purchaser
FROM <Yourtable>
GROUP BY Purchaser
)T2
On T2.Purchaser=T1.Purchaser AND T2.MaxCount=T1.Count
First you select the Seller from which will give you a list of all 5 sellers. Then you write another query where you select only the Purchaser and the Max(count) grouped by Purchaser which will give you the two yellow-marked lines. Join the two queries on fields Purchaser and Max(Count) and add the columns from the joined table to your first query.
I can't think of a faster way but this works pretty fast even with rather large queries. You can further-by order the fields as needed.

Should I put a row number filter in join condition or in a prior CTE?

I have a subscription table and a payments table that I need to join.
I am trying to decide between 2 options and performance is a key consideration.
Which of the two OPTIONS below will perform better?
I am using Impala, and these tables are large (multiple millions of rows) I am needing to only get one row for every id and date grouping (hence the row_number() analytic function).
I have shortened the queries to illustrate my question:
OPTION 1:
WITH cte
AS (
SELECT *
, SUM(amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
),
payment
AS (
SELECT *
FROM cte
WHERE sameday_rownum = 1
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
OPTION 2:
WITH payment
AS (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
AND p.sameday_rownum = 1
An "Option 0" also exists. A far more traditional "derived table" which simply does not require use of any CTE.
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
) p ON s.id = p.id
AND p.sameday_rownum = 1
All options 0,1 and 2 are likely to produce identical or very similar explain plans (although I'm more confident about that statement for SQL Server than Impala).
Adopting a CTE does - in itself - not make a query more efficient or better performing, so the syntax alteration between option 1 and 2 isn't major. I prefer option 0 myself as I prefer to use CTEs for specific tasks (e.g. recursion).
What you should do is use explain plans to study what each option produces.

Is there any optimised way in sql sever to optimse this code, I am trying to find 2nd duplicate

Is there any optimised way in sql sever to optimse this code, I am trying to find 2nd duplicate
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id,AN_KEY ORDER BY [ENTITYID]) AS [rn]
FROM [data].[dbo].[TRANSFER]
)
select *
INTO dbo.#UpSingle
from CTE
where RN=2
UPDATE:
As GurV pointed out - this query doesn't solve the problem. It will only give you the items that have exactly two duplicates, but not the row where the second duplicate lies.
I am just going to leave this here from reference purposes.
Original Answer
Why not try something like this from another SO post: Finding duplicate values in a SQL table
SELECT
id, AN_KEY, COUNT(*)
FROM
[data].[dbo].[TRANSFER]
GROUP BY
id, AN_KEY
HAVING
COUNT(*) = 2
I gather from your original SQL that the cols you would want to group by on are :
Id
AN_KEY
Here is another way to get the the second duplicate row (in the order of increasing ENTITYID of course):
select *
from [data].[dbo].[TRANSFER] a
where [ENTITYID] = (
select min([ENTITYID])
from [data].[dbo].[TRANSFER] b
where [ENTITYID] > (
select min([ENTITYID])
from [data].[dbo].[TRANSFER] c
where b.id = c.id
and b.an_key = c.an_key
)
and a.id = b.id
and a.an_key = b.an_key
)
Provided there is an index on id, an_key and ENTITYID columns, performance of both your query and this should be acceptable.
Let me assume that this query does what you want:
WITH CTE AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id, AN_KEY
ORDER BY [ENTITYID]) AS [rn]
FROM [data].[dbo].[TRANSFER] t
)
SELECT *
INTO dbo.#UpSingle
FROM CTE
WHERE RN = 2;
For performance, you want a composite index on [data].[dbo].[TRANSFER](id, AN_KEY, ENTITYID).

how to join two tables in sql server with out duplication

Hi I have two tables A and B
Table A:
Order Pick up
100 Toronto
100 Mississauga
100 Scarborough
Table B
Order Drop off
100 Oakvile
100 Hamilton
100 Milton
Please let me know how can I can get this output (ie I just want to join the fields from in B in right hand side of A)
Order pickup Dropoff
100 Toronto oakvile
100 Mississauga Hamilton
100 Scarborough Milton
How can I write query for the same I try to join a.rownum = b.rownum but no luck.
As OP has not mention any RDBMS
I am taking the liberty for taking SQL SERVER 2008 as his RDBMS. If OP wants the following Query can be converted to any other RDBMS easily.
select A.[Order],
ROW_NUMBER() OVER(ORDER BY A.[Pick up]) rn1,
A.[Pick up]
into A1
FROM A
;
select B.[Order],
ROW_NUMBER() OVER(ORDER BY B.[Drop off]) rn2,
B.[Drop off]
into B1
FROM B
;
Select A1.[Order],
A1.[Pick up],
B1.[Drop off]
FROM A1
INNER JOIN B1 on A1.rn1=B1.rn2
SQL FIDDLE to Test
From the use rownum, I'm presuming that you are using Oracle. You can attempt the following:
select a.Order as "order", a.Pickup, b.DropOff
from (select a.*, rownum as seqnum
from a
) a join
(select b.*, rownum as seqnum
from b
) b
on a.order = b.order and a.seqnum = b.seqnum;
(This assumes that all orders match up exactly.)
I must emphasize that although this might seem to work (and it should work on small examples), it will not work in general. And, it will not work on data that has deleted records. And, it probably won't work on parallel systems. If you have a small amount of data, I'd suggest dumping it in Excel and doing the work there -- that way, you can see if the pairs make sense.
Also, if you do have a column that specifies the ordering, then basically the same structure will work:
select coalesce(a.Order, b.Order) as "order", a.Pickup, b.DropOff
from (select a.*,
row_number() over (partition by "order" order by <ordering field>) as seqnum
from a
) a join
(select b.*,
row_number() over (partition by "order" order by <ordering field>) as seqnum
from b
) b
on a.order = b.order and a.seqnum = b.seqnum;
I'd use a CTE along with the ROW_NUMBER windowing function.
WITH keyed_A AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id
,[Order]
,[Pick Up]
FROM A
), keyed_B AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id
,[Order]
,[Drop Off]
FROM B
)
SELECT
a.[Pick Up]
,b.[Drop Off]
FROM keyed_A AS a
INNER JOIN keyed_B AS b
ON a.id = b.id
;
The CTE can be thought of as a virtual table with an id that crosses the two tables. The OVER clause with the Windowing function ROW_NUMBER can be used to create an id in the CTE. Since we are relying on the physical storage of the records (not a good idea, please add keys to the tables) we can ORDER BY (SELECT NULL) which means just use the order in will be read in.
SQLFiddle to test