Aggregate to 'plain' query - sql

I have a query which uses aggregate functions to assign the maximum absolute of the values to another column in the table. The problem is that it takes whole lot of time (apprx. adds upto 10-15 seconds) to query completion time. This is what the query looks like:
UPDATE calculated_table c
SET tp = (SELECT MAX(ABS(s.tp))
FROM ts s INNER JOIN tc t ON s.id = t.id
GROUP BY s.id);
Where id is not unique, hence the grouping. tp is a numeric whole number field. Here is what the tables look like:
TABLE ts
PID(primary) | id (FKEY) | tp (integer)
--------------------+-----------------------------+------------------------------------------------------
1 | 2 | -100
2 | 2 | -500
3 | 2 | -1000
TABLE tc
PID(primary) | id (FKEY)
--------------------+-----------------------------+-------------------------
1 | 2
I want the output to look like:
TABLE c
PID(primary) | tp (integer)
--------------------+-----------------------------+--------
1 | 1000
I tried to make it work like this:
UPDATE calculated_table c
SET tp = (SELECT s.tp
FROM ts s INNER JOIN tc t ON s.id = t.id
ORDER BY s.tp DESC
LIMIT 1);
Though it improved the performance, however the results are incorrect.. any help would be appreciated?

I did manage to modify the query, turnsout nesting aggregate functions is not a good option. However, if it helps anyone, here is what I ended up doing:
UPDATE calculated_table c
SET tp = (SELECT ABS(s.trade_position)
FROM ts s INNER JOIN tc t ON s.id = t.id
WHERE c.id = s.id
ORDER BY ABS(s.tp) DESC
LIMIT 1);

Though it improved the performance, however the results are incorrect.
The operation was a success, but the patient died.
The problem with your query is that
SELECT MAX(ABS(s.tp))
FROM ts s INNER JOIN tc t ON s.id = t.id
GROUP BY s.id);
doesn't produce a scalar value; it produces a column of values, one for each s.id. Your DBMS really should raise a syntax error. In terms of performance, I think you're sequentially applying each row produced by the subquery to each row in the target table. It's probably both slow and wrong.
What you want is to correlate your select output with the table you're updating, and limit the rows updated to those correlated. Here's ANSI syntax to update one table from another:
UPDATE calculated_table
SET tp = (SELECT MAX(ABS(s.tp))
FROM ts s INNER JOIN tc t ON s.id = t.id
where s.id = calculated_table.id)
where exists ( select 1 from ts join tc
on ts.id = tc.id
where ts.id = calculated_table.id )
That should be close to what you want.
BTW, it's tempting to interpret correlated subqueries literally, to think that the subquery is run N times, once for each row in the target table. And that's the right way to picture it, logically. The DBMS won't implement it that way, though, in all likelihood, and performance should be much better than that picture would suggest.

Try:
UPDATE calculated_table c
SET tp = (SELECT greatest( MAX( s.tp ) , - MIN( s.tp ))
FROM ts s INNER JOIN tc t ON s.id = t.id
WHERE c.id = s.id
);
Also try to create a multicolumn index on ts( id, tp )

I hope the below sql will be helpful to you, I tested in netezza, but not postgresql. Also, I didn't put update on top of it.
SELECT ABS(COM.TP)
FROM TC C LEFT OUTER JOIN
(SELECT ID,TP
FROM TS A
WHERE NOT EXISTS (SELECT 1
FROM TS B
WHERE A.ID = B.ID
AND ABS(B.TP)>ABS(A.TP))) COM
ON C.ID = COM.ID

Related

SQL: Joining 3 table in SQL, Return the earliest date and the date is not null

I'm new in SQL. Will need you guys provide me some guide.
I have join 2 table to get the container information and would like to join another table in order to get the date. Here's the code for the first join.
Select a.ConsolNumber, a.ConsolType,a.ConsolTransport,b.Container_20F,b.Container_20R,b.Container_20H, b.Container_40F,b.DeliveryMode
FROM ConsolHeader a
LEFT Join Containers b on a.Consolnumber = b.Consolnumber
For the second join, here's come with a trickle part which some of the consolnumber having few transit.
For example
|ConsolNumber| ETD |
|------------|---------|
|C00713392 | null |
|C00713392 | 1/1/2021|
|C00713392 | 2/1/2021|
I would love to get the earliest date (1/1/2021) but not null. Here is the code I tried, In result, there is no null ETD date taken but some of the Consolnumber return with the latest date. (2/1/2021)
Select a.ConsolNumber, a.ConsolType,a.ConsolTransport,b.Container_20F,b.Container_20R,b.Container_20H, b.Container_40F,b.DeliveryMode,c.Min(c.ETD)
FROM ConsolHeader a
LEFT Join Containers b on a.Consolnumber = b.Consolnumber
INNER Join ConsolLegs c on a.Consolnumber = c.ConsolNumber
WHERE c.ETD is not null
GROUP BY a.ConsolNumber, a.ConsolType,a.ConsolTransport,b.Container_20F,b.Container_20R,b.Container_20H, b.Container_40F,b.DeliveryMode
More than that, I have more than 100k data row, kindly suggest query which will run more efficiency.
Appreciate and thanks any helps given!
A correlated subquery is a simple method:
SELECT ch.ConsolNumber, ch.ConsolType, ch.ConsolTransport, ch.Container_20F,
c.Container_20R, c.Container_20H, c.Container_40F, c.DeliveryMode,
(SELECT MIN(cl.ETD)
FROM ConsolLegs cl
WHERE cl.Consolnumber = ch.Consolnumber
) as min_ETD
FROM ConsolHeader ch LEFT JOIN
Containers c
ON c.Consolnumber = ch.Consolnumber;
Notes:
MIN() automatically ignores NULLs.
Meaningful table aliases make the query easier to write and to read.
This avoids the outer GROUP BY, which is usually a performance win.
In most databases you want an index on ConsoleLegs(Consolnumber, ETD) for performance.
You can use the NOT EXISTS as follows:
Select a.ConsolNumber, a.ConsolType,
a.ConsolTransport, b.Container_20F,
b.Container_20R, b.Container_20H,
b.Container_40F, b.DeliveryMode,
c.ETD
FROM ConsolHeader a
LEFT Join Containers b on a.Consolnumber = b.Consolnumber
INNER Join ConsolLegs c on a.Consolnumber = c.ConsolNumber
WHERE c.ETD is not null
AND not exists
(select 1 from ConsolLegs cc where c.Consolnumber = cc.Consolnumber
and cc.etd < c.etd)
you can get min ETD first:
SELECT MIN(CL.ETD) FROM ConsolLegs CL
then get result :
Select a.ConsolNumber, a.ConsolType,
a.ConsolTransport, b.Container_20F,
b.Container_20R, b.Container_20H,
b.Container_40F, b.DeliveryMode,
c.ETD
FROM ConsolHeader a
LEFT Join Containers b on a.Consolnumber = b.Consolnumber
INNER Join ConsolLegs c on a.Consolnumber = c.ConsolNumber
AND c.ETD = (SELECT MIN(CL.ETD) FROM ConsolLegs CL)
if query is slow ,try add index on ConsolLegs.ETD

SQL - Simple Pivot Table while using Join statements?

I'm trying to transpose my results from the following code which is joining multiple tables together. I know i need to use a PIVOT for this and it may be a simple fix, but i'm having huge difficultly getting the code to work. My code is as follows:
SELECT F.SetValue, D.Name FROM Device D
INNER JOIN Location L ON D.LocationId = L.LocationId
INNER JOIN Fitting F ON L.LocationId = F.LocationId
INNER JOIN LocationTypeFitting LTF ON F.LocationTypeFittingId = LTF.LocationTypeFittingId
WHERE D.DeviceName = 'Device 1' AND LTF.Name LIKE '%Television%';
which prints the following results:
SetValue | Name
===========================
1 | TV_Power
1 | TV_Volume
1 | TV_Source
I need to return the values as below:
TV_Power | TV_Volume | TV_Source
================================
1 | 1 | 1
I know i'll also need a GROUP BY statement, but the the joining of additional tables is making this particular query increasingly difficult. Any help would be very much appreciated.
I would do the following two things:
Wrap the whole query in a sub-query and apply the pivot syntax.
Add another column, such as DeviceName from the Device table (or some other table), so that you can differentiate the rows once the pivot has been executed (I assume there will be more than one row).
it also shows where the group by would go in the comments.
select post_pivot.*
from (
SELECT F.SetValue, D.Name, D.DeviceName FROM Device D
INNER JOIN Location L ON D.LocationId = L.LocationId
INNER JOIN Fitting F ON L.LocationId = F.LocationId
INNER JOIN LocationTypeFitting LTF ON F.LocationTypeFittingId = LTF.LocationTypeFittingId
WHERE D.DeviceName = 'Device 1' AND LTF.Name LIKE '%Television%'
--group by (if needed)
) as pre_pivot
pivot (max(pre_pivot.set_value) for pre_pivot.Name in ([TV_Power], [TV_Volume], [TV_Source])) as post_pivot
Hopefully this will be sufficient or will give you enough to go on.
SELECT 'DeviceType' as DeviceTYpe,* FROM
(
SELECT D.Name, F.SetValue FROM Device D
INNER JOIN Location L ON D.LocationId = L.LocationId
INNER JOIN Fitting F ON L.LocationId = F.LocationId
INNER JOIN LocationTypeFitting LTF ON F.LocationTypeFittingId = LTF.LocationTypeFittingId
WHERE D.DeviceName = 'Device 1' AND LTF.Name LIKE '%Television%'
) AS SourceTable
PIVOT
(
MAX(SetValue)
FOR Name in ([TV_Power], [TV_Volume], [TV_Source])
) As PivotTable

SQL not efficient enough, tuning assistance required

We have some SQL that is ok on smaller data volumes but poor once we scale up to selecting from larger volumes. Is there a faster alternative style to achieve the same output as below? The idea is to pull back a single unique row to get latest version of the data... The SQL does reference another view but this view runs very fast - so we expect the issue is here below and want to try a different approach
SELECT *
FROM
(SELECT (select CustomerId from PremiseProviderVersionsToday
where PremiseProviderId = b.PremiseProviderId) as CustomerId,
c.D3001_MeterId, b.CoreSPID, a.EnteredBy,
ROW_NUMBER() OVER (PARTITION BY b.PremiseProviderId
ORDER BY a.effectiveDate DESC) AS rowNumber
FROM PremiseMeterProviderVersions a, PremiseProviders b,
PremiseMeterProviders c
WHERE (a.TransactionDateTimeEnd IS NULL
AND a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND b.PremiseProviderId = c.PremiseProviderId)
) data
WHERE data.rowNumber = 1
As Bilal Ayub stated above, the correlated subquery can result in performance issues. See here for more detail. Below are my suggestions:
Change all to explicit joins (ANSI standard)
Use aliases that are more descriptive than single characters (this is mostly to help readers understand what each table does)
Convert data subquery to a temp table or cte (temp tables and ctes usually perform better than subqueries)
Note: normally, you should explicitly create and insert into your temp table but I chose not to do that here as I do not know the data types of your columns.
SELECT d.CustomerId
, c.D3001_MeterId
, b.CoreSPID
, a.EnteredBy
, rowNumber = ROW_NUMBER() OVER(PARTITION BY b.PremiseProviderId ORDER BY a.effectiveDate DESC)
INTO #tmp_RowNum
FROM PremiseMeterProviderVersions a
JOIN PremiseMeterProviders c ON c.PremiseMeterProviderId = a.PremiseMeterProviderId
JOIN PremiseProviders b ON b.PremiseProviderId = c.PremiseProviderId
JOIN PremiseProviderVersionsToday d ON d.PremiseProviderId = b.PremiseProviderId
WHERE a.TransactionDateTimeEnd IS NULL
SELECT *
FROM #tmp_RowNum
WHERE rowNumber = 1
You are running a correlated query that will run in loop, if size of table is small it will be faster, i would suggest to change it and try to join the table in order to get customerid.
(select CustomerId from PremiseProviderVersionsToday where PremiseProviderId = b.PremiseProviderId) as CustomerId
Consider derived tables including an aggregate query that calculates maximum EffectoveDate by PremiseProviderId and unit level query, each using explicit joins (current ANSI SQL standard) and not implicit as you currently use:
SELECT data.*
FROM
(SELECT t.CustomerId, c.D3001_MeterId, b.CoreSPID, a.EnteredBy,
b.PremiseProviderId, a.EffectiveDate
FROM PremiseMeterProviders c
INNER JOIN PremiseMeterProviderVersions a
ON a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND a.TransactionDateTimeEnd IS NULL
INNER JOIN PremiseProviders b
ON b.PremiseProviderId = c.PremiseProviderId
INNER JOIN PremiseProviderVersionsToday t
ON t.PremiseProviderId = b.PremiseProviderId
) data
INNER JOIN
(SELECT b.PremiseProviderId, MAX(a.EffectiveDate) As MaxEffDate
FROM PremiseMeterProviders c
INNER JOIN PremiseMeterProviderVersions a
ON a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND a.TransactionDateTimeEnd IS NULL
INNER JOIN PremiseProviders b
ON b.PremiseProviderId = c.PremiseProviderId
GROUP BY b.PremiseProviderId
) agg
ON data.PremiseProviderId = agg.PremiseProviderId
AND data.EffectiveDate = agg.MaxEffDate

Receiving 1 row from joined (1 to many) postgresql

I have this problem:
I have 2 major tables (apartments, tenants) that have a connection of 1 to many (1 apartment, many tenants).
I'm trying to pull all my building apartments, but with one of his tenants.
The preffered tenant is the one who have ot=2 (there are 2 possible values: 1 or 2).
I tried to use subqueries but in postgresql it doesn't let you return more than 1 column.
I don't know how to solve it. Here is my latest code:
SELECT a.apartment_id, a.apartment_num, a.floor, at.app_type_desc_he, tn.otype_desc_he, tn.e_name
FROM
public.apartments a INNER JOIN public.apartment_types at ON
at.app_type_id = a.apartment_type INNER JOIN
(select t.apartment_id, t.building_id, ot.otype_id, ot.otype_desc_he, e.e_name
from public.tenants t INNER JOIN public.ownership_types ot ON
ot.otype_id = t.ownership_type INNER JOIN entities e ON
t.entity_id = e.entity_id
) tn ON
a.apartment_id = tn.apartment_id AND tn.building_id = a.building_id
WHERE
a.building_id = 4 AND tn.building_id=4
ORDER BY
a.apartment_num ASC,
tn.otype_id DESC
Thanx in advance
SELECT a.apartment_id, a.apartment_num, a.floor
,at.app_type_desc_he, tn.otype_desc_he, tn.e_name
FROM public.apartments a
JOIN public.apartment_types at ON at.app_type_id = a.apartment_type
LEFT JOIN (
SELECT t.apartment_id, t.building_id, ot.otype_id
,ot.otype_desc_he, e.e_name
FROM public.tenants t
JOIN public.ownership_types ot ON ot.otype_id = t.ownership_type
JOIN entities e ON t.entity_id = e.entity_id
ORDER BY (ot.otype_id = 2) DESC
LIMIT 1
) tn ON (tn.apartment_id, tn.building_id)=(a.apartment_id, a.building_id)
WHERE a.building_id = 4
AND tn.building_id = 4
ORDER BY a.apartment_num; -- , tn.otype_id DESC -- pointless
Crucial part emphasized.
This works in either case.
If there are tenants for an apartment, exactly 1 will be returned.
If there is one (or more) tenant of ot.otype_id = 2, it will be one of that type.
If there are no tenants, the apartment is still returned.
If, for ot.otype_id ...
there are 2 possible values: 1 or 2
... you can simplify to:
ORDER BY ot.otype_id DESC
Debug query
Try removing the WHERE clauses from the base query and change
JOIN public.apartment_types
to
LEFT JOIN public.apartment_types
and add them back one by one to see which condition excludes all rows.
Do at.app_type_id and a.apartment_type really match?

Replace IN with EXISTS or COUNT. How to do it. What is missing here?

I am using IN keyword in the query in the middle of a section. Since I am using nested query and want to replace In with Exists due to performance issues that my seniors have told me might arise.
Am I missing some column, what you are looking for in this query. This query contain some aliases for readibility.
How can I remove it.
SELECT TX.PK_MAP_ID AS MAP_ID
, MG.PK_GUEST_ID AS Guest_Id
, MG.FIRST_NAME
, H.PK_CATEGORY_ID AS Preference_Id
, H.DESCRIPTION AS Preference_Name
, H.FK_CATEGORY_ID AS Parent_Id
, H.IMMEDIATE_PARENT AS Parent_Name
, H.Department_ID
, H.Department_Name
, H.ID_PATH, H.DESC_PATH
FROM
dbo.M_GUEST AS MG
LEFT OUTER JOIN
dbo.TX_MAP_GUEST_PREFERENCE AS TX
ON
(MG.PK_GUEST_ID = TX.FK_GUEST_ID)
LEFT OUTER JOIN
dbo.GetHierarchy_Table AS H
ON
(TX.FK_CATEGORY_ID = H.PK_CATEGORY_ID)
WHERE
(MG.IS_ACTIVE = 1)
AND
(TX.IS_ACTIVE = 1)
AND
(H.Department_ID IN -----How to remove this IN operator with EXISTS or Count()
(
SELECT C.PK_CATEGORY_ID AS DepartmentId
FROM
dbo.TX_MAP_DEPARTMENT_OPERATOR AS D
INNER JOIN
dbo.M_OPERATOR AS M
ON
(D.FK_OPERATOR_ID = M.PK_OPERATOR_ID)
AND
(D.IS_ACTIVE = M.IS_ACTIVE)
INNER JOIN
dbo.L_USER_ROLE AS R
ON
(M.FK_ROLE_ID = R.PK_ROLE_ID)
AND
(M.IS_ACTIVE = R.IS_ACTIVE)
INNER JOIN
dbo.L_CATEGORY_TYPE AS C
ON
(D.FK_DEPARTMENT_ID = C.PK_CATEGORY_ID)
AND
(D.IS_ACTIVE = C.IS_ACTIVE)
WHERE
(D.IS_ACTIVE = 1)
AND
(M.IS_ACTIVE = 1)
AND
(R.IS_ACTIVE = 1)
AND
(C.IS_ACTIVE = 1)
)--END INNER QUERY
)--END Condition
What new problems might I get if I replace IN with EXISTS or COUNT ?
Basically, as I understand your question, you are asking how can I replace this:
where H.department_id in (select departmentid from...)
with this:
where exists (select...)
or this:
where (select count(*) from ...) > 1
It is fairly straight forward. One method might be this:
WHERE...
AND EXISTS (select c.pk_category_id
from tx_map_department_operator d
inner join m_operator as m
on d.fk_operator_id = m.pk_operator_id
inner join l_user_role l
on m.fk_role_id = r.pk_role_id
inner join l_category_type c
on d.fk_department_id = c.pk_category_id
where h.department_id = c.pk_category_id
and d.is_active = 1
and m.is_active = 1
and r.is_active = 1
and c.is_active = 1
)
I removed the extra joins on is_active because they were redundant. You should test how it runs with your indexes, because that might have been faster. I doubt it though. But it is worth comparing whether it is faster to add the join clause (join on ... and x.is_active=y.is_active) or to check in the where clause (x.is_active=1 and y.is_active=1 and z.is_active=1...)
And I'd recommend you just use exists, instead of count(*), because I know that exists should stop after finding 1 row, whereas count probably continues to execute until done, and then compares to your reference value (count > 1).
As an aside, that is a strange column naming standard you have. Do you really have PK prefixes for the primary keys, and FK prefixes for the foreign keys? I have never seen that.