I am new to the SQL world and working with the below mentioned query, the table contains 3000000+ records. Can you please suggest how to reduce query run time or any other query for the same result.
I tried two queries:
#1
SELECT *
FROM (SELECT ID,
Priority,
Agent_Name,
Urgency,
Status,
Agent_Group_Name,
Country,
Region,
Due_by,
Type,
Created_Date,
Resolved_Date,
Closed_Date,
Resolution_Status,
Requester_Location,
WH_Region,
ExecDate,
Date,
Full_Date,
Datatype,
Department_Name,
Requester_Emails,
ROW_NUMBER()
OVER (
PARTITION BY ID
ORDER BY Execdate DESC ) nn
FROM weekly_tickets
WHERE Created_date >= '2022-01-01 12:00:00 AM') sub_table
WHERE sub_table.nn = 1
#2
WITH cte
AS (SELECT ID,
Priority,
Agent_Name,
Urgency,
Status,
Category,
Item_Category,
Agent_Group_Name,
What_is_the_Impact_,
Country,
Impact,
Region,
Resolution_Time_in_Bhrs,
Sub_Category,
Due_by,
Type,
Issue_Owner,
Created_Date,
Number_of_Users,
Approval_Status,
Resolved_Date,
Closed_Date,
How_is_the_issue_affecting_the_service_,
Number_of_Users_staffed,
Resolution_Status,
Sites,
Requester_Location,
Number_of_Users_affected,
WH_Region,
CampaignOriginId,
ExecDate,
Date,
Full_Date,
AgeEvol,
Datatype,
Department_Name,
Requester_Emails,
ROW_NUMBER()
OVER (
PARTITION BY ID
ORDER BY Execdate DESC ) nn
FROM weekly_tickets
WHERE Created_date >= '2022-01-01 12:00:00 AM')
SELECT *
FROM cte
WHERE cte.nn = 1
Always read execution plan or show someone else if it's too complex for you. This query seems pretty obvious to reconstruct with a high degree of certainty. I assume following optimization steps:
Scan table and filter by Created_date, return all columns required by next steps (here all columns used by SELECT clause),
Order by ID, ExecDate DESC
Segment
[..]
Filter comes from WHERE clause. Ordering and segmenting comes from OVER clause. I also assume that date filter significantly reduces number of rows returned. Since there is less rows to process, query should perform better.
All that means you should start with the following index:
CREATE INDEX IX_Predicted ON weekly_tickets(Created_Date) INCLUDE (ID,
Priority,
Agent_Name,
Urgency,
Status,
Category,
Item_Category,
Agent_Group_Name,
What_is_the_Impact_,
Country,
Impact,
Region,
Resolution_Time_in_Bhrs,
Sub_Category,
Due_by,
Type,
Issue_Owner,
Number_of_Users,
Approval_Status,
Resolved_Date,
Closed_Date,
How_is_the_issue_affecting_the_service_,
Number_of_Users_staffed,
Resolution_Status,
Sites,
Requester_Location,
Number_of_Users_affected,
WH_Region,
CampaignOriginId,
ExecDate,
Date,
Full_Date,
AgeEvol,
Datatype,
Department_Name,
Requester_Emails);
You could create an index to help with performance:
CREATE NONCLUSTERED INDEX index_POC_ticket
ON dbo.weekly_tickets (ID, Execdate DESC)
WITH (DROP_EXISTING = ON);
Edit: added DESC to the Execdate column in the index, since I believe it should support the ordering in your window function.
If ID is the primary key, I don't believe you need to include it in the above index (maybe someone more knowledgeable can comment here).
I should think that your query can be rewrite as :
SELECT ID,
Priority,
Agent_Name,
Urgency,
STATUS,
Agent_Group_Name,
Country,
Region,
Due_by,
Type,
Created_Date,
Resolved_Date,
Closed_Date,
Resolution_Status,
Requester_Location,
WH_Region,
ExecDate,
Date,
Full_Date,
Datatype,
Department_Name,
Requester_Emails
FROM weekly_tickets AS wt
WHERE Created_date >= '2022-01-01 12:00:00 AM'
AND NOT EXISTS(SELECT *
FROM weekly_tickets AS t
WHERE wt.ID <> t.ID
AND wt.Execdate < t.Execdate);
And can be much more efficient with the following index :
CREATE INDEX X ON weekly_tickets (ID, Execdate);
If not exists...
Test it !
Related
I need to tracking the monthly median transaction price of certain product but so far my query isn't working at all
This is the raw table and the result the result that I expect to see:
the raw:
the result expected:
So far I try to write query similar like this
SELECT
DISTINCT year_confirm, month_confirm, item_info, median_price
FROM (SELECT
item_info,
product_price,
extract(month from created_date) as month_confirm,
extract(year from created_date) as year_confirm,
PERCENTILE_CONT(product_price, 0.5) OVER (PARTITION BY item_info)
AS median_price
FROM
table_name
order by item_info asc, year_confirm asc, month_confirm asc
)
but the result will show same median price of the product on each month. Is there any solution of it? Thank you
Regarding your SQL, adding the columns you want in PARTITION BY will help.
SELECT DISTINCT item_info, month_confirm, year_confirm, PERCENTILE_CONT(product_price, 0.5) OVER (PARTITION BY item_info, month_confirm, year_confirm) as median_price
FROM (SELECT item_info,
product_price,
extract(month from created_date) as month_confirm,
extract(year from created_date) as year_confirm,
product_price
FROM table_name
ORDER BY item_info asc, year_confirm asc, month_confirm asc)
I am currently using postgres and there is this sql window function that is used to generate employee check in and out time based on date. But its flawed.
CREATE OR REPLACE VIEW view_test
AS
SELECT row_number() OVER () AS id,
a.created_at::date AS created_at,
date_part('year'::text, a.created_at) AS year,
date_part('month'::text, a.created_at) AS month,
date_part('day'::text, a.created_at) AS day,
date_part('dow'::text, a.created_at) AS dow,
a.company_id,
a.employee_id,
e.employee_type,
array_agg(
CASE
WHEN a.activity_type = 1
THEN a.created_at::time(0)
ELSE NULL::time
END) AS time_in,
array_agg(
CASE
WHEN a.activity_type = 2
THEN a.created_at::time(0)
ELSE NULL::time
END) AS time_out
FROM attendance_table a
LEFT JOIN employee_table e ON e.id = a.employee_id
GROUP BY a.created_at, date_part('day'::text, a.created_at),
a.employee_id, a.company_id, e.employee_type
ORDER BY date_part('year'::text, a.created_at), date_part('month'::text, a.created_at),
date_part('day'::text, a.created_at), a.employee_id;
this generate this result
I am trying to generate this way where time_in and time_out is consolidated based on created_at (date) and employee_id. The idea is for each date, I would know the employee check in and out times. caveat: {NULL} should not appear if there are records in the array.
view and fields of the data I am trying to manipulate
https://docs.google.com/spreadsheets/d/1hn3w0mnezrV6_f-fPAKPZHuqdDjBBc_ArcgeBmgblq4/edit?usp=sharing
How can I modify the windows function sql view above so that the time_in and time_out is consolidated based on the created_at (date) and employee_id ? (the last screenshot is the desired output) caveat: {NULL} should not appear if there are records in the array.
UPDATE (very close answer but not quite solved):
CREATE OR REPLACE VIEW view_group
as
SELECT
created_at,
year,
month,
day,
dow,
company_id,
employee_id,
array_agg(check_in) as time_in,
array_agg(check_out) as time_out
FROM
view_test, unnest(time_in) check_in, unnest(time_out) check_out
GROUP BY 1,2,3,4,5,6,7
it produces this:
manage to concatenate the array by unnest. But notice NULL is still there. How do I remove NULL (but if there is nothing, NULL should be there) ? or something that make sense and possible.
Without knowing your original data I can only work with your current result.
To transform your result into the expected one, you can simple add a group step:
SELECT
created_at,
year,
month,
day,
dow,
company_id,
employee_id,
MAX(time_in) as time_in,
MAX(time_out) as time_out
FROM
-- <your query>
GROUP BY 1,2,3,4,5,6,7
Edit: TO added original data
You need a simple pivot step. This can be achieved using a conditional aggregation (GROUP BY and FILTER clauses):
demo:db<>fiddle
SELECT
created_at::date,
company_id,
employee_id,
MAX(created_at::time) FILTER (WHERE activity_type = 1),
MAX(created_at::time) FILTER (WHERE activity_type = 2)
FROM
attendance
GROUP BY created_at::date, company_id, employee_id
i have a table with traffic_id, date, start_time, session_id, page, platform, page-views, revenue, segment_id, and customer_id columns in my sessions table. Each customer_id could have multiple session_id with different revenue/date/start_time/page/platform/page_views/segment_id values. Sample data is shown below.
traffic_id|date|start_time|session_id|page|platform|page_views|revenue|segment_id|customer_id
303|1/1/2017|05:23:33|123457080|homepage|mobile|581|37.40|1|310559
I would like to know the max session revenue per customer and the session sequence number as the table shown below.
Customer_id|Date|Maximum|session_revenue|Session_id|Session_Sequence|
138858|1/13/17|100.44|123458749|5
I thought I could just use a subquery to do the job. But all the ranking values are 1 and session_id and date are wrong. Please help!---------------------------------------------------------------------------------------
SELECT max(revenue),customer_id, date, session_id, session_sequence
FROM (
SELECT
revenue,
date,
customer_id,
session_id,
RANK() OVER(partition by customer_id ORDER BY date,start_time ASC) AS session_sequence
FROM sessions
) AS a
group by customer_id
;
Your query should generate an error because the GROUP BY columns and SELECT columns are inconsistent.
Presumably you want the maximum revenue and the sequence number where that occurs.
SELECT s.*
FROM (SELECT s.*,
RANK() OVER (partition by customer_id ORDER BY date, start_time ASC) AS session_sequence,
MAX(revenue) OVER (PARTITION BY customer_id) as max_revenue
FROM sessions
) s
WHERE revenue = max_revenue;
I am facing a simple problem with an SQL query that I do not know how to tackle.
I have a table with the following structure
CITY COUNTRY DATES TEMPERATURE
Note that for a given country, I can have several cities. And, for a given city, I have several rows giving me the TEMPERATURE at each available DATE. This is just a time serie.
I would like to write a query which gives me for every cities the DATE where the TEMPERATURE is the MIN and the DATE where the TEMPERATURE is the MAX. The query should return something like that:
CITY COUNTRY DATE_MIN_TEMPERATURE MIN_TEMPERATURE DATE_MAX_TEMPERATURE MAX_TEMPERATURE
Any idea on how to achieve this?
Best regards,
Deny
Oracle provides keep/dense_rank first for this purpose:
select city,
min(temperature) as min_temperature,
max(date) keep (dense_rank first order by temperature asc) as min_temperature_date,
max(temperature) as max_temperature,
max(date) keep (dense_rank first order by temperature desc) as max_temperature_date
from t
group by city;
Note that this returns only one date if there are ties. If you want to handle that, more logic is needed:
select city, min(temperature) as min_temperature,
listagg(case when seqnum_min = 1 then date end, ',') within group (order by date) as mindates,
max(temperature) as max_temperature,
listagg(case when seqnum_max = 1 then date end, ',') within group (order by date) as maxdates,
from (select t.*,
rank() over (partition by city order by temperature) as seqnum_min,
rank() over (partition by city order by temperature desc) as seqnum_max
from t
) t
where seqnum_min = 1 or seqnum_max = 1
group by city;
In Oracle 11 and above, you can use PIVOT. In the solution below I use LISTAGG to show all the dates in case of ties. Another option is, in the case of ties, to show the most recent date when the extreme temperature was reached; if that is preferred, simply replace LISTAGG(dt, ....) (including the WITHIN GROUP clause) with MAX(dt). However, in that case the first solution offered by Gordon (using the first function) is more efficient anyway - no need for pivoting.
Note that I changed "date" to "dt" - DATE is a reserved word in Oracle. I also show the rows by country first, then city (the more logical ordering). I created test data in a WITH clause, but the solution is everything below the comment line.
with
inputs ( city, country, dt, temperature ) as (
select 'Palermo', 'Italy' , date '2014-02-13', 3 from dual union all
select 'Palermo', 'Italy' , date '2002-01-23', 3 from dual union all
select 'Palermo', 'Italy' , date '1998-07-22', 42 from dual union all
select 'Palermo', 'Italy' , date '1993-08-24', 30 from dual union all
select 'Maseru' , 'Lesotho', date '1994-01-11', 34 from dual union all
select 'Maseru' , 'Lesotho', date '2004-08-13', 12 from dual
)
-- >> end test data; solution (SQL query) begins with the next line
select country, city,
"'min'_DT" as date_min_temp, "'min'_TEMP" as min_temp,
"'max'_DT" as date_max_temp, "'max'_TEMP" as max_temp
from (
select city, country, dt, temperature,
case when temperature = min(temperature)
over (partition by city, country) then 'min'
when temperature = max(temperature)
over (partition by city, country) then 'max'
end as flag
from inputs
)
pivot ( listagg(to_char(dt, 'dd-MON-yyyy'), ', ')
within group (order by dt) as dt, min(temperature) as temp
for flag in ('min', 'max'))
order by country, city -- ORDER BY is optional
;
COUNTRY CITY DATE_MIN_TEMP MIN_TEMP DATE_MAX_TEMP MAX_TEMP
------- ------- ------------------------ ---------- -------------- ----------
Italy Palermo 23-JAN-2002, 13-FEB-2014 3 22-JUL-1998 42
Lesotho Maseru 13-AUG-2004 12 11-JAN-1994 34
2 rows selected.
Instead of keep/dense_rank first function you can also use FIRST_VALUE and LAST_VALUE:
select distinct city,
MIN(temperature) OVER (PARTITION BY city) as min_temperature,
FIRST_VALUE(date) OVER (PARTITION BY city ORDER BY temperature) AS min_temperature_date,
MAX(temperature) OVER (PARTITION BY city) as max_temperature,
LAST_VALUE(date) OVER (PARTITION BY city ORDER BY temperature) AS max_temperature_date
FROM t;
I’m using Oracle and trying to find the maximum transaction count (and associated date) for each station.
This is the code I have but it returns each transaction count and date for each station rather than just the maximum. If I take the date part out of the outer query it returns just the maximum transaction count for each station, but I need to know the date of when it happened. Does anyone know how to get it to work?
Thanks!
SELECT STATION_ID, STATION_NAME, MAX(COUNTTRAN), TESTDATE
FROM
(
SELECT COUNT(TRANSACTION_ID) AS COUNTTRAN, STATION_ID,
STATION_NAME, TO_CHAR(TRANSACTION_DATE, 'HH24') AS TESTDATE
FROM STATION_TRANSACTIONS
WHERE COUNTRY = 'GB'
GROUP BY STATION_ID, STATION_NAME, TO_CHAR(TRANSACTION_DATE, 'HH24')
)
GROUP BY STATION_ID, STATION_NAME, TESTDATE
ORDER BY MAX(COUNTTRAN) DESC
This image shows the results I currently get vs the ones I want:
What your query does is this:
Subquery: Get one record per station_id, station_name and date. Count the transactions for each such combination.
Main query: Get one record per station_id, station_name and date. (We already did that, so it doesn't change anything.)
Order the records by transaction count.
This is not what you want. What you want is one result row per station_id, station_name, so in your main query you should have grouped by these only, excluding the date:
select
station_id,
station_name,
max(counttran) as maxcount,
max(testdate) keep (dense_rank last over order by counttran) as maxcountdate
from
(
select
count(transaction_id) as counttran,
station_id,
station_name,
to_char(transaction_date, 'hh24') as testdate
from station_transactions
where country = 'GB'
group by station_id, station_name, to_char(transaction_date, 'hh24')
)
group by station_id, station_name;
An alternative would be not to group by in the main query again, for actually you already have the desired records already and only want to remove the others. You can do this by ranking the records in the subquery, i.e. give them row numbers, with #1 for the best record per station (this is the one with the highest count). Then dismiss all others and you are done:
select station_id, station_name, counttran, testdate
from
(
select
count(transaction_id) as counttran,
row_number() over(partition by station_id order by count(transaction_id) desc) as rn
station_id,
station_name,
to_char(transaction_date, 'hh24') as testdate
from station_transactions
where country = 'GB'
group by station_id, station_name, to_char(transaction_date, 'hh24')
)
where rn = 1;