Cross verifying sums in 12 different views - sql

I want to create a control that verifies that the sum of a number between 12 different views, within three companies in each view is the same.
From time to time, something goes wrong and there's a small difference in the sums in two or more of the views in one or two of the companies.
Right now, I'm working on a SQL that gets very very long, since I'm calculating the sum of table A, and joining the sum from table B and selecting those companies where there's a difference. Then I'm making a UNION ALL from a new SQL where I'm comparing the sum in table A with the sum in table C. Since I have 12 different views that all has to be compared to each other, the SQL just get so long and the risk of an error increases.
Is there a smart and nicer way to go about this, where I don't have to manually cross-compare all 12 view with other in all possible combinations?
Certainly. When I'm talking about difference I'm not talking about small differences in the third decimal. This could be thousands or even a couple of millions because someone made a transaction error in the main-program that feeds into the tables.
In the code below I'm comparing the sums in TableA to the sums in TableB and TableC. If I manually have to add all the possible comparisons between the 12 different view, the code will end up being VERY long and I'd like to avoid that if possible.
> SELECT a.Companyid ,
> ( a.[View] + ' vs. ' + b.[View] ) AS Views ,
> ( a.[Mv.Facts.TableA] - b.[Mv.Facts.TableB] ) AS Difference FROM ( SELECT Companyid ,
> 'Facts.TableA' AS [View] ,
> SUM(MV) AS [Mv.Facts.TableA]
> FROM TABLE a
> WHERE Companyid IN ( 3, 5, 36 )
> GROUP BY Companyid ,
> ) a
> LEFT OUTER JOIN ( SELECT Companyid ,
> 'Facts.TableB' AS [View] ,
> SUM(Mv) AS [Mv.Facts.TableB]
> FROM TABLE B
> WHERE Companyid IN ( 3, 5, 36 )
> GROUP BY Companyid) b ON a.Companyid = b.Companyid WHERE ROUND(( a.[Mv.Facts.TableA] - b.[Mv.Facts.TableB]
> ), 0) <> 0
>
>
> UNION ALL SELECT Companyid ,
> ( a.[View] + ' vs. ' + c.[View] ) AS Views ,
> ( a.[Mv.Facts.TableA] - c.[Mv.Facts.TableC] ) AS Difference FROM ( SELECT Companyid ,
> 'Facts.TableA' AS [View] ,
> SUM(Mv) AS [Mv.Facts.TableA]
> FROM TABLE a
> WHERE Companyid IN ( 3, 5, 36 )
> GROUP BY Companyid ) a LEFT OUTER JOIN (SELECT Companyid ,
> 'Facts.TableC' AS [View] ,
> SUM(Mv) AS [Mv.Facts.TableC]
> FROM TABLE C
> WHERE Companyid IN ( 3, 5, 36 )
> GROUP BY Companyid) c ON a.Companyid = c.Companyid WHERE ROUND((a.[Mv.Facts.TableA] - c.[Mv.Facts.TableC]),0) <> 0

Related

Bizarre Join with comma

I'm looking at someone else's code and find this bizarre join:
SELECT
SUM(
(
intUnitOverheadCost + intUnitLaborCost + intUnitMaterialCost + intUnitSubcontractCost
+ intUnitDutyCost + intUnitFreightCost + intUnitMiscCost
)
*
(
(
CASE
WHEN imtSource = 3
THEN - 1
ELSE 1
END
) * intQuantity
)
)
FROM PartTransactions --imt
INNER JOIN PartTransactionCosts --int
ON imtPartTransactionID = intPartTransactionID
LEFT JOIN Warehouses --imw
ON imtPartWarehouseLocationID = imwWarehouseID
, ProductionProperties --xap <-- weird join
WHERE imtJobID = jmpJobID
AND imtSource IN (2,3)
AND imtReceiptID = ''
AND Upper(imtTableName) <> 'RECEIPTLINES'
AND imtNonInventoryTransaction <= {?CHECKBOXGROUP_4_ShowNonInventory}
AND imtJobType IN (1, 3)
AND imtTransactionDate < DATEADD(d, 1, {?PROMPT_1_TODATE})
AND (
imtNonNettable = 0
OR (
imtNonNettable <> 0
AND ISNULL(imwDoNotIncludeInJobCosts, 0) = 0
)
)
AND intCostType = (
CASE -- Always 1
WHEN xapIMCostingMethod = 1
THEN 1
WHEN xapIMCostingMethod = 2
THEN 2
WHEN xapIMCostingMethod = 3
THEN 3
ELSE 4
END
)
There is only one record in table ProductionProperties and the result of select xapIMCostingMethod from ProductionProperties is always 1.
There are always 4 enumerated results in PartTransactionCosts, but only 1 result is allowed.
ProductionProperties.xapIMCostingMethod is implicitly joining to PartTransactionCosts.intCostType
My specific question is what is really going on with this comma join? It looks like it has to be a cross-join, later filtered in the WHERE clause with one possible result.
Agree with the previous answer. It is a cartesian join but since the rows are 1 it doesn't cause an issue.
I'm thinking if you added rows to ProductionProperties then it would serve as a multiplier for your sum. I did a little experiment to show the issue:
declare #tableMoney table (
unit int,
Product char(5),
xapIMPCostingMethod int,
Cost money
)
declare #tableProdProperties table (
xapIMPCostingMethod int
)
insert #tableMoney (unit, Product, xapIMPCostingMethod, Cost)
values
(1,'bike',1, 2.00),
(1,'car',1, 2.25),
(2,'boat',2, 4.50)
insert #tableProdProperties (xapIMPCostingMethod)
values (1),
(2)
select sum(Cost)
from #tableMoney, #tableProdProperties
I also don't like to use joins where it isn't clear what is joining to what so I always use an alias:
select sum(Cost)
from #tableMoney tbm join #tableProdProperties tpp
on tbm.xapIMPCostingMethod = tpp.xapIMPCostingMethod

Count Joins from Multiple Tables

For reference, I am using Postgres 9.2.23.
I have several tables where one table (user_group) is related to some other tables (eg: posts, group_invites, and some more other ones). There is, also a groups table, but it doesn't hold any data that I need for the purposes of these queries.
Table user_group:
fk_user_group_id, fk_user_id, fk_group_id, fk_invite_id user_status, ...
Table message:
pk_message_id, fk_user_id, fk_group_id, child_message_id, ...
Table group_prospective_user:
pk_prospective_user_id, fk_group_id, ...
I want to get some statistics for each of the related tables for a list of specified group ids if the user is a member of the group.
Right now I do this with one query for each related table, eg:
select
"public"."user_group"."fk_group_id" as "groupId",
count(case
when (
"public"."message"."child_message_id" is null
and "public"."message"."pk_message_id" is not null
) then "public"."message"."pk_message_id"
end) as "numDiscussions",
count("public"."message"."pk_message_id") as "numDiscussionPosts"
from "public"."user_group"
left outer join "public"."message"
on "public"."message"."fk_group_id" = "public"."user_group"."fk_group_id"
where (
"public"."user_group"."fk_group_id" in (
1, 11, 23, 530, 1070
)
and "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
)
and "public"."user_group"."fk_user_id" = 17517
)
group by "public"."user_group"."fk_group_id"
And for invites:
select
"public"."user_group"."fk_group_id" as "groupId",
count(case
when "public"."prospective_user"."status" = 1 then "public"."prospective_user"."pk_prospective_user_id"
end) as "numInviteesExternal"
from "public"."user_group"
left outer join "public"."prospective_user"
on "public"."prospective_user"."fk_group_id" = "public"."user_group"."fk_group_id"
where (
"public"."user_group"."fk_group_id" in (
1, 11, 23, 530, 6176
)
and "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
)
and "public"."user_group"."fk_user_id" = 17517
)
group by "public"."user_group"."fk_group_id"
The query to count the number of group invites is very similar to the above query. Just the count when and join on change.
Each of the queries to these tables has the same related logic for checking the groups to which the current user is an active member. Is there efficient way to merge multiple similar queries like this into a single query?
I tried using multiple LEFT JOINs with select count distinct, but that ran into performance issues on groups with both lots of messages, and lots of invites. Is there a way to easily/efficiently do this with, say, a subquery?
The answer from user #Parfait was the most scalable solution I could find. I based my queries on this tutorial: https://www.sqlteam.com/articles/using-derived-tables-to-calculate-aggregate-values.
While this isn't perfect, and results in a bunch of subqueries running, it does get me all the data at once, and with a single trip to the DB.
It ended up like this:
"groups"."groupId",
coalesce(
"members"."member_count",
0
) as "numActiveMembers",
coalesce(
"members"."invitee_count",
0
) as "numInviteesInternal",
coalesce(
"discussions"."discussions_count",
0
) as "numDiscussions",
coalesce(
"discussions"."posts_count",
0
) as "numDiscussionPosts"
from (
select "public"."user_group"."fk_group_id" as "groupId"
from "public"."user_group"
where (
"public"."user_group"."fk_group_id" in (
1, 2, 3, 4, 5
)
and "public"."user_group"."role" = 'ADMINISTRATOR'
and "public"."user_group"."fk_user_id" = 123
)
group by "public"."user_group"."fk_group_id"
) as "groups"
left outer join (
select
"public"."user_group"."fk_group_id" as "members_group_id",
count(distinct case
when "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
) then "public"."user_group"."pk_user_group_id"
end) as "member_count",
count(distinct case
when "public"."user_group"."role" = 'INVITEE' then "public"."user_group"."pk_user_group_id"
end) as "invitee_count"
from "public"."user_group"
group by "public"."user_group"."fk_group_id"
) as "members"
on "members_group_id" = "groupId"
left outer join (
select
"public"."message"."fk_group_id" as "discussions_group_id",
count(case
when (
"public"."message"."child_message_id" is null
and "public"."message"."pk_message_id" is not null
) then "public"."message"."pk_message_id"
end) as "discussions_count",
count("public"."message"."pk_message_id") as "posts_count"
from "public"."message"
group by "public"."message"."fk_group_id"
) as "discussions"
on "discussions_group_id" = "groupId"```

How to implement a SQL Server query which has several join conditions

I am trying to implement this query but I can’t figure out why I am not getting the result.
Here are the descriptions:
Lets say I have a table call: TableAct
Acct# Date WithdrawAmt DepositAmt
!24455 2012-11-19-00.00.00 1245.77 200.50
125577 2011-02-12-00.00.00 100.98 578.00
Another table TableCustomerOrd:
ID# COrder# CustID Ord_Description VendorType
124455 7712AS 123 1AAA Permanent
125577 9914DL 346 1BBB Partial
... UK1234 111 2HJ5 Permanent'
,,, FR0912 567 5LGY Partial
Then TableCustomerDtls:
CustID Descriptions Delivery Address ZipCode
123 1AAA_BLUESHARE SUCCESSFUL 222 Main St 97002
346 1BBB_CHASE DECLINE 40 West Side 97122
111 2HJ5_CITIBANK SUCCESSFUL ……. …….
567 5LGY_VANGURD DECLINED ---- -----
And table DelivaryFlight:
FlightOrder# FlightCustID FlightDt
7712AS 123 2011-9-29-00.00.00
9914DL 346 2010-11-2-00.00.00
UK1234 111 2012-4-1-00.00.00
FR0912 567 2012-9-11-00.00.00
I want to update TableAct on the following conditions:
TableAct. Acct# = TableCustomerOrd.ID#, AND:
TableCustomerOrd. CustID = TableCustomerDtls.CustID and at the same time, TableCustomerOrd.Ord_Descriptions field should match with TableCustomerDtls. Descriptions field anything before “_” . Therefore ‘1AAA’, ‘2HJ5’ etc. AND:
DelivaryFlight.FlightOrder# = TableCustomerOrd.COrder#, AND: DelivaryFlight.FlightCustID = TableCustomerOrd. CustID. Also TableCustomerDtls. Delivery = ‘SUCCESSFUL’ AND:
DelivaryFlight.FlightOrder# = TableCustomerOrd. COrder#
AND DelivaryFlight.FlightCustID = TableCustomerOrd. CustID
Also TableCustomerDtls. Delivery = ‘DECLINED
Then I want to compare: elivaryFlight.FlightDt > DelivaryFlight.FlightDt.
Basically I need to match table DelivaryFlight columns FlightOrder#, FlightCustID with TableCustomerOrd.
Moreover TableCustomerDtls column Delivery to ck for delivary status such as ‘DECLINED’.
And ‘SUCCESSFUL’ condition and compare ‘SUCCESSFUL’ FlightDt with ‘DECLINED’ FlightDt.
Here's my query but please help me to understand, I am sure this could be done in a better way.
The query is not working:
Update
Set …
FROM TableAct AC
Join TableCustomerOrd CustOd
ON AC.Acct# = CustOd.ID#
Join TableCustomerDtls CDtls
ON CDtls. CustID = CustOd. CustID
AND (CustOd.Ord_Descriptions =
Left(CDtls.Descriptions, LEN(rtrim(CDtls.Descriptions))))
JOIN DelivaryFlight DF
ON DF.FlightOrder# = CustOd.COrder#
AND DF.FlightCustID = CustOd.CustID
AND CDtls.Delivery = ‘SUCCESSFUL’
JOIN DelivaryFlight DF2
ON DF2.FlightOrder# = DF.COrder#
AND DF2.FlightCustID = DF.CustID
AND CDtls.Delivery = ‘DECLINED’
WHERE DelivaryFlight. FlightDt > DelivaryFlight. FlightDt
AND DepositAmt > 100
Your Help will be monumental 'cause my project due end of this week.
Thank you
If I have a complex query like this, I start by creating a "simple" select which produces only the rows to be updated.
It should also return both the update values and the pk for the updated table
It is then (relatively) straight forward to (inner) join this with the table to be updated and do the update remebering to only update matching rows by including
WHERE tblTobeUpdated.pk = SimpleSelect.pk
Hope this helps
I don't have the time to look at this in depth but I suspect you at least want to fix:
WHERE DelivaryFlight. FlightDt > DelivaryFlight. FlightDt
This is a condition that can never be met.
You probably want:
WHERE DF. FlightDt > DF2. FlightDt
it is also useful with these complex queires for an update to be able to see the records that would be updated, so I usually do something like this:
Update
Set …
--Select *
FROM TableAct AC
Then instead of running the update, I run just highlight and run the part that starts with select to see the results and don't test the update until I am sure I am selecting the records I want to select and that the values I will be replacing are correct.
Try breaking your query down, heres a query I wrote today, test each part separately
SELECT
Employee
, Reference
, Payroll
, [Hours] / 60
[Hours]
, [Days]
FROM
(
SELECT
Employee
, Reference
, Payroll
, SUM( Duration ) AS [Hours]
, AvailableID
FROM
(
SELECT
RequirerID
, Duration
, RTRIM( COALESCE(MA.MemberLastName, '')
+ ' ' + COALESCE(MA.MemberFirstName, '')
+ ' ' + COALESCE(MA.MemberInitial, '')) Employee
, COALESCE(MA.Detailref1, '') Reference
, COALESCE(MA.PayrollRef, '') Payroll
, Available.AvailableId
FROM
(
SELECT DISTINCT
RequirerID
, ShiftDate
, CAST(ShiftStart - ShiftEnd - ShiftBreak AS DECIMAL(19,2)) ShiftDuration
, Id RequirementRecordID
FROM
Requirements
WHERE
Requirements.ShiftDate BETWEEN #ParamStartDate
AND #ParamEndDate
AND RequirerID IN (SELECT ID FROM MemberDetails WHERE CompanyID = #ParamCompanyID)
)
R
INNER JOIN
ShiftConfirmed
INNER JOIN
Available
INNER JOIN
MemberDetails MA
ON Available.AvailableID = MA.ID
ON ShiftConfirmed.AvailableRecordID = Available.ID
ON R.RequirementRecordID = ShiftConfirmed.RequirementRecordID
WHERE
R.ShiftDate BETWEEN #ParamStartDate
AND #ParamEndDate
AND COALESCE(ShiftChecked, 0) BETWEEN 0 AND 1
)
ShiftDay
Group By
Employee
, Reference
, Payroll
, AvailableId
) Shifts
INNER JOIN
(
SELECT
COUNT( * ) AS [Days]
, AvailableID
FROM
(
SELECT DISTINCT
R.ShiftDate
, Available.AvailableId
FROM
(
SELECT DISTINCT
ShiftDate
, Id RequirementRecordID
FROM
Requirements
WHERE
Requirements.ShiftDate BETWEEN #ParamStartDate
AND #ParamEndDate
AND RequirerID IN (SELECT ID FROM MemberDetails WHERE CompanyID = #ParamCompanyID)
)
R
INNER JOIN
ShiftConfirmed
INNER JOIN
Available
INNER JOIN
MemberDetails MA
ON Available.AvailableID = MA.ID
ON ShiftConfirmed.AvailableRecordID = Available.ID
ON R.RequirementRecordID = ShiftConfirmed.RequirementRecordID
WHERE
R.ShiftDate BETWEEN #ParamStartDate
AND #ParamEndDate
AND COALESCE(ShiftChecked, 0) BETWEEN 0 AND 1
)
ShiftDay
Group By
AvailableId
) D
ON Shifts.AvailableID = D.AvailableID
WHERE [Hours] > 0
ORDER BY
Employee

How to get the deepest levels of a hierarchical sql query

I'm using SQLServer 2008.
Say I have a recursive hierarchy table, SalesRegion, whit SalesRegionId and ParentSalesRegionId. What I need is, given a specific SalesRegion (anywhere in the hierarchy), retrieve ALL the records at the BOTTOM level.
I.E.:
SalesRegion, ParentSalesRegionId
1, null
1-1, 1
1-2, 1
1-1-1, 1-1
1-1-2, 1-1
1-2-1, 1-2
1-2-2, 1-2
1-1-1-1, 1-1-1
1-1-1-2, 1-1-1
1-1-2-1, 1-1-2
1-2-1-1, 1-2-1
(in my table I have sequencial numbers, this dashed numbers are only to be clear)
So, if the user enters 1-1, I need to retrieve al records with SalesRegion 1-1-1-1 or 1-1-1-2 or 1-1-2-1 (and NOT 1-2-2). Similarly, if the user enters 1-1-2-1, I need to retrieve just 1-1-2-1
I have a CTE query that retrieves everything below 1-1, but that includes rows that I don't want:
WITH SaleLocale_CTE AS (
SELECT SL.SaleLocaleId, SL.SaleLocaleName, SL.AccountingLocationID, SL.LocaleTypeId, SL.ParentSaleLocaleId, 1 AS Level /*Added as a workaround*/
FROM SaleLocale SL
WHERE SL.Deleted = 0
AND (#SaleLocaleId IS NULL OR SaleLocaleId = #SaleLocaleId)
UNION ALL
SELECT SL.SaleLocaleId, SL.SaleLocaleName, SL.AccountingLocationID, SL.LocaleTypeId, SL.ParentSaleLocaleId, Level + 1 AS Level
FROM SaleLocale SL
INNER JOIN SaleLocale_CTE SLCTE ON SLCTE.SaleLocaleId = SL.ParentSaleLocaleId
WHERE SL.Deleted = 0
)
SELECT *
FROM SaleLocale_CTE
Thanks in advance!
Alejandro.
I found a quick way to do this, but I'd rather the answer to be in a single query. So if you can think of one, please share! If I like it better, I'll vote for it as the best answer.
I added a "Level" column in my previous query (I'll edit the question so this answer is clear), and used it to get the last level and then delete the ones I don't need.
INSERT INTO #SaleLocales
SELECT *
FROM SaleLocale_GetChilds(#SaleLocaleId)
SELECT #LowestLevel = MAX(Level)
FROM #SaleLocales
DELETE #SaleLocales
WHERE Level <> #LowestLevel
Building off your post:
; WITH CTE AS
(
SELECT *
FROM SaleLocale_GetChilds(#SaleLocaleId)
)
SELECT
FROM CTE a
JOIN
(
SELECT MAX(level) AS level
FROM CTE
) b
ON a.level = b.level
There were a few edits in there. Kept hitting post...
Are you looking for something like this:
declare #SalesRegion as table ( SalesRegion int, ParentSalesRegionId int )
insert into #SalesRegion ( SalesRegion, ParentSalesRegionId ) values
( 1, NULL ), ( 2, 1 ), ( 3, 1 ),
( 4, 3 ), ( 5, 3 ),
( 6, 5 )
; with CTE as (
-- Get the root(s).
select SalesRegion, CAST( SalesRegion as varchar(1024) ) as Path
from #SalesRegion
where ParentSalesRegionId is NULL
union all
-- Add the children one level at a time.
select SR.SalesRegion, CAST( CTE.Path + '-' + cast( SR.SalesRegion as varchar(10) ) as varchar(1024) )
from CTE inner join
#SalesRegion as SR on SR.ParentSalesRegionId = CTE.SalesRegion
)
select *
from CTE
where Path like '1-3%'
I haven't tried this on a serious dataset, so I'm not sure how it'll perform, but I believe it solves your problem:
WITH SaleLocale_CTE AS (
SELECT SL.SaleLocaleId, SL.SaleLocaleName, SL.AccountingLocationID, SL.LocaleTypeId, SL.ParentSaleLocaleId, CASE WHEN EXISTS (SELECT 1 FROM SaleLocal SL2 WHERE SL2.ParentSaleLocaleId = SL.SaleLocaleID) THEN 1 ELSE 0 END as HasChildren
FROM SaleLocale SL
WHERE SL.Deleted = 0
AND (#SaleLocaleId IS NULL OR SaleLocaleId = #SaleLocaleId)
UNION ALL
SELECT SL.SaleLocaleId, SL.SaleLocaleName, SL.AccountingLocationID, SL.LocaleTypeId, SL.ParentSaleLocaleId, CASE WHEN EXISTS (SELECT 1 FROM SaleLocal SL2 WHERE SL2.ParentSaleLocaleId = SL.SaleLocaleID) THEN 1 ELSE 0 END as HasChildren
FROM SaleLocale SL
INNER JOIN SaleLocale_CTE SLCTE ON SLCTE.SaleLocaleId = SL.ParentSaleLocaleId
WHERE SL.Deleted = 0
)
SELECT *
FROM SaleLocale_CTE
WHERE HasChildren = 0

coverage percentage using a complex sql query...?

Ok, i've been trying to solve this for about 2 hours now... Please advise:
Tables:
PROFILE [id (int), name (varchar), ...]
SKILL [id (int), id_profile (int), id_app (int), lvl (int), ...]
APP [id (int), ...]
The lvl can basically go from 0 to 3.
I'm trying to get this particular stat:
"What is the percentage of apps that is covered by at least two people having a skill of 2 or higher?"
Thanks a lot
SELECT AVG(covered)
FROM (
SELECT CASE WHEN COUNT(*) >= 2 THEN 1 ELSE 0 END AS covered
FROM app a
LEFT JOIN skill s ON (s.id_app = a.id AND s.lvl >= 2)
GROUP BY a.id
)
More efficient way for MySQL:
SELECT AVG
(
IFNULL
(
(
SELECT 1
FROM skill s
WHERE s.id_app = a.id
AND s.lvl >= 2
LIMIT 1, 1
), 0
)
)
FROM app a
This will stop counting as soon as it finds the second skilled person for each app.
Efficient if you have a few app's but lots of person's.
Untested
select convert(float,count(*)) / (select count(*) from app) as percentage
from (
select count(*) as number
from skill
where lvl >= 2
group by id_app ) t
where t.number >= 2
The logic is: percentage = 100 * ( number of apps of interest ) / ( total number of apps )
select 'percentage' =
-- 100 times
( cast( 100 as float ) *
-- number of apps of interest
( select count(id_app)
from ( select id_app, count(*) as skilled_count
from skill
where lvl >= 2
group by id_app
having count(*) >= 2 ) app_counts )
-- divided by total number of apps
/ ( select count(*) from app )
The convert to float is needed so sql doesn't just do integer arithmetic.
SELECT SUM( CASE lvl WHEN 3 THEN 1 WHEN 2 THEN 1 ELSE 0 END ) / SUM(1) FROM SKILL
If your database has an if/then function instead of CASE, use that. For example, in MySQL:
SELECT SUM( IF( lvl >= 2, 1, 0 ) ) / SUM(1) FROM SKILL
I'm not sure if this is any better or worse than tvanfosson's answer, but here it is anyway:
SELECT convert(float, count(*)) / (Select COUNT(id) FROM APP) AS percentage
FROM APP INNER JOIN SKILL ON APP.id = SKILL.id
WHERE (
SELECT COUNT(id)
FROM SKILL AS Skill2 WHERE Skill2.id_app = APP.id and lvl >= 2
) >= 2