Row_Number() returning duplicate rows - sql

This is my query,
SELECT top 100
UPPER(COALESCE(A.DESCR,C.FULL_NAME_ND)) AS DESCR,
COALESCE(A.STATE, (SELECT TOP 1 STATENAME
FROM M_STATEMASTER
WHERE COUNTRYCODE = B.CODE)) AS STATENAME,
COALESCE(A.STATECD, (SELECT TOP 1 CODE
FROM M_STATEMASTER
WHERE COUNTRYCODE = B.CODE)) AS STATECD,
COALESCE(A.COUNTRYCD, B.CODE) AS COUNTRYCODE
FROM
M_CITY A
JOIN
M_COUNTRYMASTER B ON A.COUNTRYCD = B.CODE
JOIN
[GEODATASOURCE-CITIES-FREE] C ON B.ALPHA2CODE = C.CC_FIPS
WHERE
EXISTS (SELECT 1
FROM [GEODATASOURCE-CITIES-FREE] Z
WHERE B.ALPHA2CODE=Z.CC_FIPS)
ORDER BY
A.CODE
Perfectly working fine, but when I'm trying to get the Row_number() over(order by a.code) I'm getting the duplicate column multiple time.
e.g
SELECT top 100
UPPER(COALESCE(A.DESCR,C.FULL_NAME_ND)) AS DESCR,
COALESCE(A.STATE, (SELECT TOP 1 STATENAME
FROM M_STATEMASTER
WHERE COUNTRYCODE = B.CODE)) AS STATENAME,
COALESCE(A.STATECD, (SELECT TOP 1 CODE
FROM M_STATEMASTER
WHERE COUNTRYCODE = B.CODE)) AS STATECD,
COALESCE(A.COUNTRYCD, B.CODE) AS COUNTRYCODE
ROW_NUMBER() OVER(ORDER BY A.CODE) AS RN -- i made a change here
FROM
M_CITY A
JOIN
M_COUNTRYMASTER B ON A.COUNTRYCD = B.CODE
JOIN
[GEODATASOURCE-CITIES-FREE] C ON B.ALPHA2CODE = C.CC_FIPS
WHERE
EXISTS (SELECT 1
FROM [GEODATASOURCE-CITIES-FREE] Z
WHERE B.ALPHA2CODE=Z.CC_FIPS)
ORDER BY
A.CODE
WHERE
EXISTS (SELECT 1
FROM [GEODATASOURCE-CITIES-FREE] Z
WHERE B.ALPHA2CODE = Z.CC_FIPS)
Another try, when I'm using ROW_NUMBER() OVER(ORDER BY newid()) AS RN it's taking logn time to execute.
Remember: CODE is the Pk of table M_CITY and there is no key in [GEODATASOURCE-CITIES-FREE] table.
Another thing: About JOIN(inner join), Join returns the matched Rows, right???
e.g:
table 1 with 20 rows,
table2 with 30 rows ,
table 3 with 30 rows
If I joined these 3 table on a certain key then the possibility of getting maximum rows is 20, am I right?

Your first query doesn't work fine. It just appears to. The reason is that you are using TOP without an ORDER BY, so an arbitrary set of 100 rows is returned.
When you add ROW_NUMBER(), the query plan changes . . . and the ordering of the result set changes as well. I would suggest that you fix the original query to use a stable sort.

Related

Return only one row based on search

Query
select
a.id,
a.ba,
b.status,
b.custid
from balist as a
inner join customer as b
on a.ba = b.ba
I have a table "balist" that has a list of (ba) and i inner join table "customer" on (ba) and right now by output is like the following
id
ba
status
custid
1
ba-1234455
A
123-321-123-321a
2
ba-1234455
I
123-321-123-321a
3
ba-1234457
A
123-321-123-321b
4
ba-1234458
A
123-321-123-321c
5
ba-1234459
I
123-321-123-321d
and I want to return all A and I status but remove the row that has status I that also have a A status. Like the following.
I have a table customer like the following
id
ba
status
custid
1
ba-1234455
A
123-321-123-321a
3
ba-1234457
A
123-321-123-321b
4
ba-1234458
A
123-321-123-321c
5
ba-1234459
I
123-321-123-321d
You could use a row_number() to filter your resulting rows eg
SELECT
id,ba,status,custid
FROM (
SELECT
a.id,
a.ba,
b.status,
b.custid,
ROW_NUMBER() OVER (
PARTITION BY a.ba
ORDER BY b.status ASC
) as rn
FROM
balist as a
INNER JOIN
customer as b ON a.ba = b.ba
)
WHERE rn=1
Let me know if this works for you.

Combine rows from Mulitple tables into single table

I have one parent table Products with multiple child tables -Hoses,Steeltubes,ElectricCables,FiberOptics.
ProductId -Primary key field in Product table
ProductId- ForeignKey field in Hoses,Steeltubes,ElectricCables,FiberOptics.
Product table has 1 to many relationship with Child tables
I want to combine result of all tables .
For eg - Product P1 has PK field ProductId which is used in all child tables as FK.
If Hoses table has 4 record with ProductId 50 and Steeltubes table has 2 records with ProductId 50 when I perform left join then left join is doing cartesian product of records showing 8 record as result But it should be 4 records .
;with HOSESTEELCTE
as
(
select '' as ModeType, '' as FiberOpticQty , '' as NumberFibers, '' as FiberLength, '' as CableType , '' as Conductorsize , '' as Voltage,'' as ElecticCableLength , s.TubeMaterial , s.TubeQty, s.TubeID , s.WallThickness , s.DWP ,s.Length as SteelLength , h.HoseSeries, h.HoseLength ,h.ProductId
from Hoses h
left join
(
--'' as HoseSeries,'' as HoseLength ,
select TubeMaterial , TubeQty, TubeID , WallThickness , DWP , Length,ProductId from SteelTubes
) s on (s.ProductId = h.ProductId)
) select * from HOSESTEELCTE
Assuming there are no relationships between child tables and you simply want a list of all child entities which make up a product you could generate a cte which has a number of rows which are equal to the largest number of entries across all the child tables for a product. In the example below I have used a dates table to simplify the example.
so for this data
create table products(pid int);
insert into products values
(1),(2);
create table hoses (pid int,descr varchar(2));
insert into hoses values (1,'h1'),(1,'h2'),(1,'h3'),(1,'h4');
create table steeltubes (pid int,descr varchar(2));
insert into steeltubes values (1,'t1'),(1,'t2');
create table electriccables(pid int,descr varchar(2));
truncate table electriccables
insert into electriccables values (1,'e1'),(1,'e2'),(1,'e3'),(2,'e1');
this cte
;with cte as
(select row_number() over(partition by p.pid order by datekey) rn, p.pid
from dimdate, products p
where datekey < 20050105)
select * from cte
create a cartesian join (one of the rare ocassions where an implicit join helps) pid to rn
result
rn pid
-------------------- -----------
1 1
2 1
3 1
4 1
1 2
2 2
3 2
4 2
And if we add the child tables
;with cte as
(select row_number() over(partition by p.pid order by datekey) rn, p.pid
from dimdate, products p
where datekey < 20050106)
select c.pid,h.descr hoses,s.descr steeltubes,e.descr electriccables from cte c
left join (select h.*, row_number() over(order by h.pid) rn from hoses h) h on h.rn = c.rn and h.pid = c.pid
left join (select s.*, row_number() over(order by s.pid) rn from steeltubes s) s on s.rn = c.rn and s.pid = c.pid
left join (select e.*, row_number() over(order by e.pid) rn from electriccables e) e on e.rn = c.rn and e.pid = c.pid
where h.rn is not null or s.rn is not null or e.rn is not null
order by c.pid,c.rn
we get this
pid hoses steeltubes electriccables
----------- ----- ---------- --------------
1 h1 t1 e1
1 h2 t2 e2
1 h3 NULL e3
1 h4 NULL NULL
2 NULL NULL e1
In fact, the result having 8 rows can be expected to be the result, since your four records are joined with the first record in the other table and then your four records are joined with the second record of the other table, making it 4 + 4 = 8.
The very fact that you expect 4 records to be in the result instead of 8 shows that you want to use some kind of grouping. You can group your inner query issued for SteelTubes by ProductId, but then you will need to use aggregate functions for the other columns. Since you have only explained the structure of the desired output, but not the semantics, I am not able with my current knowledge about your problem to determine what aggregations you need.
Once you find out the answer for the first table, you will be able to easily add the other tables into the selection as well, but in case of large data you might get some scaling problems, so you might want to have a table where you store these groups, maintain it when something changes and use it for these selections.

SQL - Select highest value when data across 3 tables

I have 3 tables:
Person (with a column PersonKey)
Telephone (with columns Tel_NumberKey, Tel_Number, Tel_NumberType e.g. 1=home, 2=mobile)
xref_Person+Telephone (columns PersonKey, Tel_NumberKey, CreatedDate, ModifiedDate)
I'm looking to get the most recent (e.g. the highest Tel_NumberKey) from the xref_Person+Telephone for each Person and use that Tel_NumberKey to get the actual Tel_Number from the Telephone table.
The problem I am having is that I keep getting duplicates for the same Tel_NumberKey. I also need to be sure I get both the home and mobile from the Telephone table, which I've been looking to do via 2 individual joins for each Tel_NumberType - again getting duplicates.
Been trying the following but to no avail:
-- For HOME
SELECT
p.PersonKey, pn.Phone_Number, pn.Tel_NumberKey
FROM
Persons AS p
INNER JOIN
xref_Person+Telephone AS x ON p.PersonKey = x.PersonKey
INNER JOIN
Telephone AS pn ON x.Tel_NumberKey = pn.Tel_NumberKey
WHERE
pn.Tel_NumberType = 1 -- e.g. Home phone number
AND pn.Tel_NumberKey = (SELECT MAX(pn1.Tel_NumberKey) AS Tel_NumberKey
FROM Person AS p1
INNER JOIN xref_Person+Telephone AS x1 ON p1.PersonKey = x1.PersonKey
INNER JOIN Telephone AS pn1 ON x1.Tel_NumberKey = pn1.Tel_NumberKey
WHERE pn1.Tel_NumberType = 1
AND p1.PersonKey = p.PersonKey
AND pn1.Tel_Number = pn.Tel_Number)
ORDER BY
p.PersonKey
And have been looking over the following links but again keep getting duplicates.
SQL select max(date) and corresponding value
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
SQL Server: SELECT only the rows with MAX(DATE)
Am sure this must be possible but been at this a couple of days and can't believe its that difficult to get the most recent / highest value when referencing 3 tables. Any help greatly appreciated.
select *
from
( SELECT p.PersonKey, pn.Phone_Number, pn.Tel_NumberKey
, row_number() over (partition by p.PersonKey, pn.Phone_Number order by pn.Tel_NumberKey desc) rn
FROM
Persons AS p
INNER JOIN
xref_Person+Telephone AS x ON p.PersonKey = x.PersonKey
INNER JOIN
Telephone AS pn ON x.Tel_NumberKey = pn.Tel_NumberKey
WHERE
pn.Tel_NumberType = 1
) tt
where tt.rn = 1
ORDER BY
tt.PersonKey
you have to use max() function and then you have to order by rownum in descending order like.
select f.empno
from(select max(empno) empno from emp e
group by rownum)f
order by rownum desc
It will give you all employees having highest employee number to lowest employee number. Now implement it with your case then let me know.

SQL display two results side-by-side

I have two tables, and am doing an ordered select on each of them. I wold like to see the results of both orders in one result.
Example (simplified):
"SELECT * FROM table1 ORDER BY visits;"
name|# of visits
----+-----------
AA | 5
BB | 9
CC | 12
.
.
.
"SELECT * FROM table2 ORDER BY spent;"
name|$ spent
----+-------
AA | 20
CC | 30
BB | 50
.
.
.
I want to display the results as two columns so I can visually get a feeling if the most frequent visitors are also the best buyers. (I know this example is bad DB design and not a real scenario. It is an example)
I want to get this:
name by visits|name by spent
--------------+-------------
AA | AA
BB | CC
CC | BB
I am using SQLite.
Select A.Name as NameByVisits, B.Name as NameBySpent
From (Select C.*, RowId as RowNumber From (Select Name From Table1 Order by visits) C) A
Inner Join
(Select D.*, RowId as RowNumber From (Select Name From Table2 Order by spent) D) B
On A.RowNumber = B.RowNumber
Try this
select
ISNULL(ts.rn,tv.rn),
spent.name,
visits.name
from
(select *, (select count(*) rn from spent s where s.value>=spent.value ) rn from spent) ts
full outer join
(select *, (select count(*) rn from visits v where v.visits>=visits.visits ) rn from visits) tv
on ts.rn = tv.rn
order by ISNULL(ts.rn,tv.rn)
It creates a rank for each entry in the source table, and joins the two on their rank. If there are duplicate ranks they will return duplicates in the results.
I know it is not a direct answer, but I was searching for it so in case someone needs it: this is a simpler solution for when the results are only one per column:
select
(select roleid from role where rolename='app.roles/anon') roleid, -- the name of the subselect will be the name of the column
(select userid from users where username='pepe') userid; -- same here
Result:
roleid | userid
--------------------------------------+--------------------------------------
31aa33c4-4e66-4da3-8525-42689e46e635 | 12ad8c95-fbef-4287-9834-7458a4b250ee
For RDBMS that support common table expressions and window functions (e.g., SQL Server, Oracle, PostreSQL), I would use:
WITH most_visited AS
(
SELECT ROW_NUMBER() OVER (ORDER BY num_visits) AS num, name, num_visits
FROM visits
),
most_spent AS
(
SELECT ROW_NUMBER() OVER (ORDER BY amt_spent) AS num, name, amt_spent
FROM spent
)
SELECT mv.name, ms.name
FROM most_visited mv INNER JOIN most_spent ms
ON mv.num = ms.num
ORDER BY mv.num
Just join table1 and table2 with name as key like bellow:
select a.name,
b.name,
a.NumOfVisitField,
b.TotalSpentField
from table1 a
left join table2 b on a.name = b.name

How to display the record with the highest value in Oracle?

I have 4 tables with the following structure:
Table artist:
artistID lastname firstname nationality dateofbirth datedcease
Table work:
workId title copy medium description artist ID
Table Trans:
TransactionID Date Acquired Acquistionprice datesold askingprice salesprice customerID workID
Table Customer:
customerID lastname Firstname street city state zippostalcode country areacode phonenumber email
First question is which artist has the most works of artsold and how many of the artist works have been sold.
My SQL query is this:
SELECT * From dtoohey.artist A1
INNER JOIN
(
SELECT COUNT(W1.ArtistID) AS COUNTER, artistID FROM dtoohey.trans T1
INNER JOIN dtoohey.work W1
ON W1.workid = T1.Workid
GROUP BY W1.artistID
) TEMP1
ON TEMP1.artistID = A1.artistID
WHERE A1.artistID = TEMP1.artistId
ORDER BY COUNTER desc;
I am to get the whole table but I only want show only the first row which is the highest count how do I do that??
I have tried inserting WHERE ROWNUM <=1 but it shows artist ID with 1
qns 2 is sales of which artist's work have resulted in the highest average profit (i.e) the average of the profits made on each sale of worksby an artist), and what is that amount.
My SQL query is:
SELECT A1.artistid, A1.firstname FROM
(
SELECT
(salesPrice - AcquisitionPrice) as profit,
w1.artistid as ArtistID
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
on W1.workid = T1.workid
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artistID = TEMP1.artistID
GROUP BY A1.artistid
HAVING MAX(PROFIT) = AVG(PROFIT);
I'm not able to execute it
I have tried query below but still not able to get it keep getting the error missing right parenthesis
SELECT A1.artistid, A1.firstname, TEMP1.avgProfit
FROM
(
SELECT
AVG(salesPrice - AcquisitionPrice) as avgProfit,
W1.artistid as artistid
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
ON W1.workid = T1.workid
GROUP BY artistid
ORDER BY avgProfit DESC
LIMIT 1
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artisid = TEMP1.artistid
Sometimes ORA-00907: missing right parenthesis means exactly that: we have a left bracket without a matching right one. But it can also be thrown by a syntax error in a part of a statement bounded by parentheses.
It's that second cause here: LIMIT is a Mysql command which Oracle does not recognise. You can use an analytic function here:
SELECT A1.artistid, A1.firstname, TEMP1.avgProfit
FROM
(
select artistid
, avgProfit
, rank() over (order by avgProfit desc) as rnk
from (
SELECT
AVG(salesPrice - AcquisitionPrice) as avgProfit,
W1.artistid as artistid
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
ON W1.workid = T1.workid
GROUP BY artistid
)
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artisid = TEMP1.artistid
where TEMP1.rnk = 1
This uses the RANK() function which will return more than one row if several artists achieve the same average profit. You might want to use ROW_NUMBER() instead. Analytic functions can be very powerful. Find out more.
You can apply ROWN_NUMBER(), RANK() and DENSE_RANK() to any top-n problem. You can use one of them to solve your first problem too.
"however the avg profit is null."
That's probably a data issue. If one of the numbers in (salesPrice - AcquisitionPrice) is null the result will be null, and won't be included in the average. If all the rows for an artist are null the AVG() will be null.
As it happens the sort order will put NULL last. But as the PARTITION BY clause sorts by AvgProfit desc that puts the NULL results at rank 1. The solution is to use the NULLS LAST in the windowing clause:
, rank() over (order by avgProfit desc nulls last) as rnk
This will guarantee you a non-null result at the top (providing at least one of your artists has values in both columns).
1st question - Oracle does not guarantee the order by which rows are retrieved. Hence you must first order and then limit the ordered set.
SELECT * from (
SELECT A1.* From dtoohey.artist A1
INNER JOIN
(
SELECT COUNT(W1.ArtistID) AS COUNTER, artistID FROM dtoohey.trans T1
INNER JOIN dtoohey.work W1
ON W1.workid = T1.Workid
GROUP BY W1.artistID
) TEMP1
ON TEMP1.artistID = A1.artistID
WHERE A1.artistID = TEMP1.artistId
ORDER BY COUNTER desc
) WHERE ROWNUM = 1
2nd question: I believe (haven't tested) that you have that LIMIT 1 wrong. That keyword is for use with Bulk collecting.