Calculating AVG between two tables (JOIN) - sql

The table PROBE looks like this:
'ProbeID'----- 'TranscriptID' ---- 'Start'---- 'End'
'1056'-----------'7981326'----------'1013'---'1010'
'1057'-----------'7878826'----------'1011'---'1015'
etc..
The table EXPRESSION2 looks like this:
'ProbeID'----- 'SampleID' ---- 'Value'
'10425'---------'7981326'-----'16.55''
'11123'---------'7878826'----- '3.55'
etc..
I need to find the top 100 largest differences in transcripts (i.e. take average of probes).
Essentially, I need to link the ProbeID of the EXPRESSION2 table with the TranscriptID in the PROBE table and calculate the average for the top 100.
I tried the code below, but keep getting "null" return. Any alternative scripts will be much appreciated. I think I am missing something.
The EXPRESSION2 table has no null values, fyi
`select avg(value)
from expression2
where probeID in
(
select P.ProbeId
from Probe P
join Transcript T on P.TranscriptID = T.TranscriptID
)`
limit 100;`

If you whant to eliminate the null values
select e.probeID, avg(value) as avarage
from expression2 e
inner join
(
select P.ProbeId
from Probe P
join Transcript T on P.TranscriptID = T.TranscriptID
) p on e.probeID = p.probeID
where e.value is not null
group by e.probeID
if you whant to treat them as 0
select e.probeID, avg(COALESCE(value,0)) as avarage
from expression2 e
inner join
(
select P.ProbeId
from Probe P
join Transcript T on P.TranscriptID = T.TranscriptID
) p on e.probeID = p.probeID
group by e.probeID

Which DBMS are you using? Is it significant that there are no matching ProbeIDs in the two tables in your example, but there are matching TranscriptID/SampleID pairs?
Select Top 100
p.TranscriptID,
Avg(e.Value)
From
Probe p
Inner Join
Expression2 e
on p.ProbeID = e.ProbeID
Group By
p.TranscriptID
Order By
2 Desc

Related

JOIN AND CASE MORE AN TABLE

I have 2 tables; the first one ORG contains the following columns:
ORG_REF, ARB_REF, NAME, LEVEL, START_DATE
and the second one WORK contains these columns:
ARB_REF, WORK_STREET - WORK_NUM, WORK_ZIP
I want to do the following: write a select query that search in work and see if the WORK_STREET, WORK_ZIP are duplicate together, then you should look at WORK_NUM. If it is the same then output value ' ok ', but if WORK_NUM is not the same, output 'not ok'
I wrote this SQL query:
select
A.ARB_REF, A.WORK_STREET, A.WORK_NUM, A.WORK_ZIP
case when B.B = 1 then 'OK' else 'not ok' end
from
work A
join
(select
WORK_STREET, WORK_ZIP count(distinct , A.WORK_NUM) B
from
WORK
group by
WORK_STREET, WORK_ZIP) B on B.WORK_STREET = A.WORK_STREET
and B.WORK_ZIP = A.WORK_ZIP
Now I want to join the table ORG with this result I want to check if every address belong to org if it belong I should create a new column result and set it to yes in it (RESULT) AND show the "name" column otherwise set no in 'RESULT'.
Can anyone help me please?
While you can accomplish your result by adding a left outer join to the query you've already started, it might be easiest to just use count() over....
with org_data as (
-- do the inner join before the left join later
select * from org1 o1 inner join org2 o2 on o2.orgid = o1.orgid
)
select
*,
count(*) over (partition by WORK_STREET, WORKZIP) as cnt,
case when o.ARB_REF is not null then 'Yes' else 'No' end as result
from
WORK w left outer join org_data o on o.ARB_REF = w.ARB_REF

How to use multiple count and where condition sql server 2008?

I have this two query
1.
select CL_Clients.cl_id,CL_Clients].cl_name,COUNT(*) AS number_of_orders
from CL_Clients,CLOI_ClientOrderItems
where CL_Clients.cl_id=CLOI_ClientOrderItems.cl_id
group by CL_Clients.cl_name,CL_Clients.cl_id
2.
select CL_Clients.cl_id,count(cloi_current_status) as dis
from CLOI_ClientOrderItems,CL_Clients
where cloi_current_status]='12'
and CL_Clients.cl_id=CLOI_ClientOrderItems.cl_id
group by CL_Clients.cl_name,CL_Clients.cl_id,CLOI_ClientOrderItems.cloi_current_status
i have this column i need to put count function and where condition
[cloi_current_status]
166
30
30
30
150
150
150
150
150
150
150
Quite simple, you just encapsulate the queries and give their result sets an alias and then do a JOIN between their aliases on the column that is common. (In the query below I assume you'll be joining by client id)
SELECT *
FROM (
SELECT CL_Clients.cl_id,
CL_Clients].cl_name,
COUNT(*) AS number_of_orders
FROM CL_Clients,
CLOI_ClientOrderItems
WHERE CL_Clients.cl_id = CLOI_ClientOrderItems.cl_id
GROUP BY CL_Clients.cl_name,
CL_Clients.cl_id
) A
INNER JOIN (
SELECT CL_Clients.cl_id,
count(cloi_current_status) AS dis
FROM CLOI_ClientOrderItems,
CL_Clients
WHERE cloi_current_status] = '12'
AND CL_Clients.cl_id = CLOI_ClientOrderItems.cl_id
GROUP BY CL_Clients.cl_name,
CL_Clients.cl_id,
CLOI_ClientOrderItems.cloi_current_status
) B
ON A.cl_id = B.cl_id
WHERE ...
GROUP BY ...
This will be treated as a separate result set, so you can also filter results with a WHERE or just a GROUP BY, just like in a normal SELECT.
UPDATE:
To answer the question in your comments, when you join two tables that have a column with the same value and use
SELECT * FROM A INNER JOIN B the * will show all columns returned by the join, meaning all columns from A and all columns from B, this is why you have duplicate columns.
If you want to filter the columns returned you can specifiy which columns you want returned. So, in your case, the top SELECT * can be replaced with
SELECT A.cl_id, A.cl_name, A.number_of_orders, B.dis so, your query becomes:
SELECT A.cl_id, A.cl_name, A.number_of_orders, B.dis
FROM (
SELECT CL_Clients.cl_id,
CL_Clients].cl_name,
COUNT(*) AS number_of_orders
FROM CL_Clients,
CLOI_ClientOrderItems
WHERE CL_Clients.cl_id = CLOI_ClientOrderItems.cl_id
GROUP BY CL_Clients.cl_name,
CL_Clients.cl_id
) A
INNER JOIN (
SELECT CL_Clients.cl_id,
count(cloi_current_status) AS dis
FROM CLOI_ClientOrderItems,
CL_Clients
WHERE cloi_current_status] = '12'
AND CL_Clients.cl_id = CLOI_ClientOrderItems.cl_id
GROUP BY CL_Clients.cl_name,
CL_Clients.cl_id,
CLOI_ClientOrderItems.cloi_current_status
) B
ON A.cl_id = B.cl_id
UPDATE #2:
For your last question, you need to GROUP BY at the end of the big query and use a HAVING condtion, like this:
GROUP BY A.cl_id, A.cl_name, A.number_of_orders, B.dis
HAVING COUNT(cloi_current_status) > 100
All depends on what data you are trying to get, but you can go about it like this.
SELECT Column_x, Column_y, etc..
FROM ClL_Clients a
JOIN (select CL_Clients.cl_id,CL_Clients].cl_name,COUNT(*) AS number_of_orders
from CL_Clients,CLOI_ClientOrderItems
where CL_Clients.cl_id=CLOI_ClientOrderItems.cl_id
group by CL_Clients.cl_name,CL_Clients.cl_id) b
on a.cl_id = b.cl_id
JOIN (select CL_Clients.cl_id,count(cloi_current_status) as dis
from CLOI_ClientOrderItems,CL_Clients
where cloi_current_status]='12'
and CL_Clients.cl_id=CLOI_ClientOrderItems.cl_id
group by CL_Clients.cl_name,CL_Clients.cl_id,CLOI_ClientOrderItems.cloi_current_status) c
on a.cl_id = c.cl_id
Group by BLAH BLAH
Hope this gets you in the right direction.

Why left join is not giving distinct result?

I have following sql query and my left join is not giving me distinct result please help me to trace out.
SELECT DISTINCT
Position.Date,
Position.SecurityId,
Position.PurchaseLotId,
Position.InPosition,
ISNULL(ClosingPrice.Bid, Position.Mark) AS Mark
FROM
Fireball_Reporting.dbo.Reporting_DailyNAV_Pricing POSITION WITH (NOLOCK, READUNCOMMITTED)
LEFT JOIN Fireball.dbo.AdditionalSecurityPrice ClosingPrice WITH (NOLOCK, READUNCOMMITTED) ON
ClosingPrice.SecurityID = Position.PricingSecurityID AND
ClosingPrice.Date = Position.Date AND
ClosingPrice.SecurityPriceSourceID = #SourceID AND
ClosingPrice.PortfolioID IN (5,6)
WHERE
DatePurchased > #NewPositionDate AND
Position.Date = #CurrentPositionDate AND
InPosition = 1 AND
Position.PortfolioId IN (
SELECT
PARAM
FROM
Fireball_Reporting.dbo.ParseMultiValuedParameter(#PortfolioId, ',')
) AND
(
Position > 1 OR
Position < - 1
)
Now here in above my when I use LEFT JOIN ISNULL(ClosingPrice.Bid, Position.Mark) AS Mark and LEFT JOIN it is giving me more no of records with mutiple portfolio ids
for e.g . (5,6)
If i put portfolioID =5 giving result as 120 records
If i put portfolioID =6 giving result as 20 records
When I put portfolioID = (5,6) it should give me 140 records
but it is giving result as 350 records which is wrong . :(
It is happening because when I use LEFT JOIN there is no condition of PurchaseLotID in that as table Fireball.dbo.AdditionalSecurityPrice ClosingPrice not having column PurchaseLotID so it is giving me other records also whoes having same purchaseLotID's with diferent prices .
But I dont want that records
How can I eliminate those records ?
You get one Entry per DailyLoanAndCashPosition.PurchaseLotId = NAVImpact.PurchaseLotId
which would mean you must have more entrys in with the same PurchaseLotId
The most likely cause is that the left join produces duplicated PurchaseLotIds. The best way to know if if you perform a select distinct(PurchaseLotId) on your left side of the inner join.

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...

Why is my SQL 'NOT IN' clause producing different results from 'NOT EXISTS'

I have two SQL queries producing different results when I would expect them to produce the same result. I am trying to find the number of events that do not have a corresponding location. All locations have an event but events can also link to non-location records.
The following query produces a count of 16244, the correct value.
SELECT COUNT(DISTINCT e.event_id)
FROM events AS e
WHERE NOT EXISTS
(SELECT * FROM locations AS l WHERE l.event_id = e.event_id)
The following query produces a count of 0.
SELECT COUNT(DISTINCT e.event_id)
FROM events AS e
WHERE e.event_id NOT IN (SELECT l.event_id FROM locations AS l)
The following SQL does some summaries of the data set
SELECT 'Event Count',
COUNT(DISTINCT event_id)
FROM events
UNION ALL
SELECT 'Locations Count',
COUNT(DISTINCT event_id)
FROM locations
UNION ALL
SELECT 'Event+Location Count',
COUNT(DISTINCT l.event_id)
FROM locations AS l JOIN events AS e ON l.event_Id = e.event_id
And returns the following results
Event Count 139599
Locations Count 123355
Event+Location Count 123355
Can anyone shed any light on why the 2 initial queries do not produce the same figure.
You have a NULL in the subquery SELECT l.event_id FROM locations AS l so NOT IN will always evaluate to unknown and return 0 results
SELECT COUNT(DISTINCT e.event_id)
FROM events AS e
WHERE e.event_id NOT IN (SELECT l.event_id FROM locations AS l)
The reason for this behaviour can be seen from the below example.
'x' NOT IN (NULL,'a','b')
≡ 'x' <> NULL and 'x' <> 'a' and 'x'
<> 'b'
≡ Unknown and True and True
≡ Unknown
The NOT IN form works differently for NULLs. The presence of a single NULL will cause the entire statement to fail, thus returning no results.
So you have at least one event_id in locations that is NULL.
Also, your query might be better written as a join:
SELECT
COUNT(DISTINCT e.event_id)
FROM
events AS e
LEFT JOIN locations AS l ON e.event_id = l.event_id
WHERE
l.event_id IS NULL
[UPDATE: apparently, the NOT EXISTS version is faster.]
In and Exists are processed very very differently.
Select * from T1 where x in ( select y from T2 )
is typically processed as:
select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;
The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to
the original table -- typically.
As opposed to
select * from t1 where exists ( select null from t2 where y = x )
That is processed more like:
for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop
It always results in a full scan of T1 whereas the first query can make use of an index
on T1(x).