SELECTing only one copy of a row with a specific key that is coming from multiple tables - sql

I am new to SQL so bear with me. I am returning data from multiple tables. Followed is my SQL (let me know if there is a better approach):
SELECT [NonScrumStory].[IncidentNumber], [NonScrumStory].[Description], [DailyTaskHours].[ActivityDate], [Application].[AppName], [SupportCatagory].[Catagory], [DailyTaskHours].[PK_DailyTaskHours],n [NonScrumStory].[PK_NonScrumStory]
FROM [NonScrumStory], [DailyTaskHours], [Application], [SupportCatagory]
WHERE ([NonScrumStory].[UserId] = 26)
AND ([NonScrumStory].[PK_NonScrumStory] = [DailyTaskHours].[NonScrumStoryId])
AND ([NonScrumStory].[CatagoryId] = [SupportCatagory].[PK_SupportCatagory])
AND ([NonScrumStory].[ApplicationId] = [Application].[PK_Application])
AND ([NonScrumStory].[Deleted] != 1)
AND [DailyTaskHours].[ActivityDate] >= '1/1/1990'
ORDER BY [DailyTaskHours].[ActivityDate] DESC
This is what is being returned:
This is nearly correct. I only want it to return one copy of PK_NonScrumStory though and I can't figure out how. Essentially, I only want it to return one copy so one of the top two rows would not be returned.

You could group by the NonScrumStore columns, and then aggregate the other columns like this:
SELECT [NonScrumStory].[IncidentNumber],
[NonScrumStory].[Description],
MAX( [DailyTaskHours].[ActivityDate]),
MAX( [Application].[AppName]),
MAX([SupportCatagory].[Catagory]),
MAX([DailyTaskHours].[PK_DailyTaskHours]),
[NonScrumStory].[PK_NonScrumStory]
FROM [NonScrumStory],
[DailyTaskHours],
[Application],
[SupportCatagory]
WHERE ([NonScrumStory].[UserId] = 26)
AND ([NonScrumStory].[PK_NonScrumStory] = [DailyTaskHours].[NonScrumStoryId])
AND ([NonScrumStory].[CatagoryId] = [SupportCatagory].[PK_SupportCatagory])
AND ([NonScrumStory].[ApplicationId] = [Application].[PK_Application])
AND ([NonScrumStory].[Deleted] != 1)
AND [DailyTaskHours].[ActivityDate] >= '1/1/1990'
group by [NonScrumStory].[IncidentNumber], [NonScrumStory].[Description],[NonScrumStory].[PK_NonScrumStory]
ORDER BY 3 DESC

From the screenshot it seems DISTINCT should have solved your issue but if not you could use the ROW_NUMBER function.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [NonScrumStory].[PK_NonScrumStory] ORDER BY [DailyTaskHours].[ActivityDate] DESC) AS RowNum,
[NonScrumStory].[IncidentNumber], [NonScrumStory].[Description], [DailyTaskHours].[ActivityDate], [Application].[AppName], [SupportCatagory].[Catagory], [DailyTaskHours].[PK_DailyTaskHours],n [NonScrumStory].[PK_NonScrumStory]
FROM [NonScrumStory], [DailyTaskHours], [Application], [SupportCatagory]
WHERE ([NonScrumStory].[UserId] = 26)
AND ([NonScrumStory].[PK_NonScrumStory] = [DailyTaskHours].[NonScrumStoryId])
AND ([NonScrumStory].[CatagoryId] = [SupportCatagory].[PK_SupportCatagory])
AND ([NonScrumStory].[ApplicationId] = [Application].[PK_Application])
AND ([NonScrumStory].[Deleted] != 1)
AND [DailyTaskHours].[ActivityDate] >= '1/1/1990'
)
SELECT * FROM CTE WHERE RowNum = 1 ORDER BY [ActivityDate] DESC

I believe if you add DISTINCT to your query that should solve your problem. Like so
SELECT DISTINCT [NonScrumStory].[IncidentNumber], [NonScrumStory].[Description],...

Related

"ORA-00923: FROM keyword not found where expected\n what should I fix

I have an oracle query as follows but when I make changes to pagination the results are different. what should i pass for my code
SELECT *
FROM (
SELECT b.*,
ROWNUM r__
FROM (
select a.KODE_KLAIM,
a.NO_SS,
a.LA,
a.NAMA_TK,
a.KODE_K,
(
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K and rownum=1
) TEM_LAHIR,
(
select TO_CHAR(tk.TLAHIR, 'DD/MM/RRRR')
from KN.VW_KTK tk
where tk.KODE_K = a.KODE_K
and rownum=1
) TLAHIR
from PN.KLAIM a
where nvl(a.STATUS_BATAL,'X') = 'T'
and A.NOMOR IS NOT NULL
and A.TIPE_KLAIM = 'JPN01'
)b
)
where 1 = 1
WHERE ROWNUM < ( ( ? * ? ) + 1 )
WHERE r__ >= ( ( ( ? - 1 ) * ? ) + 1 )
but i run this query i have result ORA-00900: invalid SQL statement
You have three WHERE clauses at the end (and no ORDER BY clause). To make it syntactically valid you could change the second and third WHERE clauses to AND.
However, you mention pagination so what you probably want is to use:
SELECT *
FROM (
SELECT b.*,
ROWNUM r__
FROM (
select ...
from ...
ORDER BY something
)b
WHERE ROWNUM < :page_size * :page_number + 1
)
WHERE r__ >= ( :page_number - 1 ) * :page_size + 1
Note: You can replace the named bind variables with anonymous bind variables if you want.
Or, if you are using Oracle 12 or later then you can use the OFFSET x ROWS FETCH FIRST y ROWS ONLY syntax:
select ...
from ...
ORDER BY something
OFFSET (:page_number - 1) * :page_size ROWS
FETCH FIRST :page_size ROWS ONLY;
Additionally, you have several correlated sub-queries such as:
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K and rownum=1
This will find the first matching row that the SQL engine happens to read from the datafile and is effectively finding a random row. If you want a specific row then you need an ORDER BY clause and you need to filter using ROWNUM AFTER the ORDER BY clause has been applied.
From Oracle 12, the correlated sub-query would be:
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K
ORDER BY something
FETCH FIRST ROW ONLY

Improve SQL query performance. UNION vs OR in this situation

Problem
So the situation that I am facing here with this SQL Query, is that it is taking about 12 seconds to run making the screen super slow.
Goal
Do the necessary changes in order to improve the performance and make it faster. I was thinking about instead of the OR in the Where clause to use the UNION?
SELECT Tool.*, Interview.*
FROM Tool
INNER JOIN Interview ON Interview.Id = Tool.InterviewId
WHERE (Tool.ToolTypeId = #ToolTypeId
AND Tool.Is_Active = 1
AND Tool.InterviewId = #InterviewId
AND Tool.ToolId = #ToolId
AND Tool.CustomerId = #CustomerId)
OR Tool.Id = (
SELECT TOP 1 SubTool.Id
FROM Tool SubTool
INNER JOIN Interview subInterview ON subInterview.Id = SubTool.ToolId
WHERE SubTool.ToolTypeId = #ToolTypeId
AND SubTool.Is_Active = 1
AND SubTool.InterviewId != #InterviewId
AND SubTool.ToolId = #ToolId
AND subTool.CustomerId = #CustomerId
AND convert(datetime, subTool.DateTime, 120) < #ToolDateTime
ORDER BY subTool.DateTime DESC, subTool.StartDate DESC,
subTool.EndDate, subTool.Id DESC
)
ORDER BY Tool.StartDate, Tool.Id
NOTE: I believe the actual query output is not necessary in this case, since we are looking for some structural issues that might be impacting the performance.
I would suggest rephrasing the query to eliminate the subquery in the WHERE clause.
If you are looking for one row in the result set regardless of conditions, you can use:
SELECT TOP (1) Tool.*, Interview.*
FROM Tool JOIN
Interview
ON Interview.Id = Tool.InterviewId
WHERE Tool.ToolTypeId = #ToolTypeId AND
Tool.Is_Active = 1
Tool.ToolId = #ToolId AND
Tool.CustomerId = #CustomerId AND
(Tool.InterviewId = #InterviewId OR
convert(datetime, Tool.DateTime, 120) < #ToolDateTime
)
ORDER BY (CASE WHEN Tool.InterviewId = #InterviewId THEN 1 ELSE 2 END),
Tool.DateTime DESC, sTool.StartDate DESC, Tool.EndDate, Tool.Id DESC;
Your final ORDER BY suggests that you are expecting more than one row for the first condition. So, you can use a subquery and window functions:
SELECT ti.*
FROM (SELECT Tool.*, Interview.*,
COUNT(*) OVER (PARTITION BY (CASE WHEN Tool.InterviewId = #InterviewId THEN 1 ELSE 0 END)) as cnt_interview_match,
ROW_NUMBER() OVER (ORDER BY subTool.DateTime DESC, subTool.StartDate DESC, subTool.EndDate, subTool.Id DESC) as seqnum
FROM Tool JOIN
Interview
ON Interview.Id = Tool.InterviewId
WHERE Tool.ToolTypeId = #ToolTypeId AND
Tool.Is_Active = 1
Tool.ToolId = #ToolId AND
Tool.CustomerId = #CustomerId AND
(Tool.InterviewId = #InterviewId OR
convert(datetime, Tool.DateTime, 120) < #ToolDateTime
)
) ti
WHERE InterviewId = #InterviewId OR
(cnt_interview_match = 0 AND seqnum = 1);
Note that the subquery requires that the columns have different names, so you might need to fiddle with that.
Then, you want an index on TOOL(ToolTypeId, Is_Active, ToolId, CustomerId, InterviewId, DateTime). I assume that Interview(Id) is already indexed as the primary key of the table.
My interpretation of what your query does (and by inference, what you want it to do) is different from #GordonLinoff's.
Gordon's queries could be paraphrased as...
If there are rows with Tool.InterviewId = #InterviewId...
Return those rows and only those rows
If there are no such rows to return...
Return the latest row for other InterviewId's
My understanding of what your query actually does is that you want both possibilities returned together, at all times.
This ambiguity is an example of why you really should include example data.
For example, this would be a re-wording or your existing query (using UNION as per your own suggestion)...
WITH
filter_tool_customer AS
(
SELECT Tool.*, Interview.*
FROM Tool
INNER JOIN Interview ON Interview.Id = Tool.InterviewId
WHERE Tool.ToolTypeId = #ToolTypeId
AND Tool.Is_Active = 1
AND Tool.ToolId = #ToolId
AND Tool.CustomerId = #CustomerId
),
matched AS
(
SELECT *
FROM filter_tool_customer
WHERE InterviewId = #InterviewId
),
mismatched AS
(
SELECT TOP 1 *
FROM filter_tool_customer
WHERE InterviewId <> #InterviewId
AND DateTime < CONVERT(VARCHAR(20), #ToolDateTime, 120)
ORDER BY DateTime DESC,
StartDate DESC,
EndDate,
Id DESC
),
combined AS
(
SELECT * FROM matched
UNION ALL
SELECT * FROM mismatched
)
SELECT * FROM combined ORDER BY StartDate, Id

Impala SQL Query

Error Message :
select list expression not produced by aggregation output (missing
from GROUP BY clause?): CASE WHEN (flag = 1) THEN date_add(lead_ctxdt,
-1) ELSE ctx_date END lot_endt
code :
select c.enrolid, c.ctx_date, c.ctx_regimen, c.lead_ctx, c.lead_ctxdt, min(c.ctx_date) as lot_stdt,
case when (flag = 1 ) then date_add(lead_ctxdt, -1)
else ctx_date
end as lot_endt
from
(
select p.*,
case when (ctx_regimen <> lead_ctx) then 1
else 0
end as flag
from
(
select a.*, lead(a.ctx_regimen, 1) over(partition by enrolid order by ctx_date) as lead_ctx,
lead(ctx_date, 1) over (partition by enrolid order by ctx_date) as lead_ctxdt
from
(
select enrolid, ctx_date, group_concat(distinct ctx_codes) as ctx_regimen
from lotinfo
where ctx_date between ctx_date and date_add(ctx_date, 5)
group by enrolid, ctx_date
) as a
) as p
) as c
group by c.enrolid, c.ctx_date, c.ctx_regimen, c.lead_ctx, c.lead_ctxdt
I want to get the lead_ctx date minus one as the date when the flag is 1
So i found the answer by executing a couple of times the minor changes. Let me tell you, that when you are trying to min or max alongside you have group_conact in the same query then in Impala this doesn't work. You have to write it in two queries per one more sub query and the min() of something in the outer query or vice versa.
Thank you #dnoeth for letting me understand I have the answer with me already.

Query with equation

I have 3 queries that return 3 values. I'd Like to join the queries to perform the following expression:
(MgO + CaO)/SiO2
How can I do that?
MgO:
SELECT sampled_date, result
FROM AF_VW WHERE element = 'MgO' AND ROWNUM = 1 ORDER BY sampled_date DESC;
CaO:
SELECT sampled_date, result
FROM AF_VW WHERE element = 'CaO' AND ROWNUM = 1 ORDER BY sampled_date DESC;
SiO2:
SELECT sampled_date, result
FROM AF_VW WHERE element = 'SiO2' AND ROWNUM = 1 ORDER BY sampled_date DESC;
with x as (
SELECT sampled_date, result, element,
row_number() over(partition by element order by sampled_date desc) rn
FROM AF_VW)
, y as (
select
case when element = 'MgO' then result end as MGO,
case when element = 'CaO' then result end as CaO,
case when element = 'SiO2' then result end as SiO2
FROM x where rn = 1)
select (mgo+cao)/sio2 from y;
You can use row_number function instead of rownum and then select the results for the 3 elements.
This is a bit long for a comment.
The queries in your question are probably not doing what you expect. Oracle evaluates the WHERE clause before the order by. So, the following chooses one arbitrary row with MgO and then does the trivial ordering of the one row by date:
SELECT sampled_date, result
FROM AF_VW
WHERE element = 'MgO' AND ROWNUM = 1
ORDER BY sampled_date DESC;
Really, to get the equivalent result, you would need to emulate the same, unstable logic. Unstable, because the results are not guaranteed to be the same if the query is run multiple times:
with mg as (
SELECT sampled_date, result
FROM AF_VW
WHERE element = 'MgO' AND ROWNUM = 1
),
coa as (
SELECT sampled_date, result
FROM AF_VW
WHERE element = 'CaO' AND ROWNUM = 1
),
sio2 as (
SELECT sampled_date, result
FROM AF_VW
WHERE element = 'SiO2' AND ROWNUM = 1
)
select (mgo.result + cao.result) / sio2.result
from mgo cross join cao cross join sio2;
I suspect you really want the most recent sample date, which is what VKP's answer provides. I just thought you should know that is not what your current queries are doing.

SQL ROW_NUMBER with INNER JOIN

I need to use ROW_NUMBER() in the following Query to return rows 5 to 10 of the result. Can anyone please show me what I need to do? I've been trying to no avail. If anyone can help I'd really appreciate it.
SELECT *
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity
You need to stick it in a table expression to filter on ROW_NUMBER. You won't be able to use * as it will complain about the column name starRating appearing more than once so will need to list out the required columns explicitly. This is better practice anyway.
WITH CTE AS
(
SELECT /*TODO: List column names*/
ROW_NUMBER()
OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity) AS RN
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT /*TODO: List column names*/
FROM CTE
WHERE RN BETWEEN 5 AND 10
ORDER BY RN
You can use a with clause. Please try the following
WITH t AS
(
SELECT villa_data.starRating,
villa_data.capacity,
villa_data.bedrooms,
villa_prices.period,
villa_prices.price,
ROW_NUMBER() OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity ) AS 'RowNumber'
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT *
FROM t
WHERE RowNumber BETWEEN 5 AND 10;