altering query in db2 to fix count from a join

altering query in db2 to fix count from a join - sql

I'm getting an aggregated count of records for orders and I'm getting the expected count on this basic query:
SELECT
count(*) as sales_180,
180/count(*) as velocity
FROM custgroup g
WHERE g.cstnoc = 10617
AND g.framec = 4847
AND g.covr1c = 1763
AND g.colr1c = 29
AND date(substr(g.extd1d,1,4)||'-'||substr(g.EXTD1d,5,2)||'-'||substr(g.EXTD1d,7,2) ) between current_Date - 180 DAY AND current_Date
But as soon as I add back in my joins and joined values then my count goes from 1 (which it should be) to over 200. All I need from these joins is the customer ID and the manager number. so even if my count is high, I'm basically just trying to say "for this cstnoc, give me the slsupr and xlsno"
How can I perform this below query without affecting the count? I only want my count (sales_180 and velocity) coming from the custgroup table based on my where clause, but I then just want one value of the xcstno and xslsno based on the cstnoc.
SELECT
count(*) as sales_180,
180/count(*) as velocity,
c.xslsno as CustID,
cr.slsupr as Manager
FROM custgroup g
inner join customers c
on g.cstnoc = c.xcstno
inner join managers cr
on c.xslsno = cr.xslsno
WHERE g.cstnoc = 10617
AND g.framec = 4847
AND g.covr1c = 1763
AND g.colr1c = 29
AND date(substr(g.extd1d,1,4)||'-'||substr(g.EXTD1d,5,2)||'-'||substr(g.EXTD1d,7,2) ) between current_Date - 180 DAY AND current_Date
GROUP BY c.xslsno, cr.slsupr

You are producing multiple rows when joining, so your count is now counting all the resulting rows with all that [unintended] multiplicity.
The solution? Use a table expression to pre-compute your count, and then you can join it to the other tables, as in:
select
g2.sales_180,
g2.velocity,
c.xslsno as CustID,
cr.slsupr as Manager
from customers c
join managers cr on c.xslsno = cr.xslsno
join ( -- here the Table Expression starts
SELECT
count(*) as sales_180,
180/count(*) as velocity
FROM custgroup g
WHERE g.cstnoc = 10617
AND g.framec = 4847
AND g.covr1c = 1763
AND g.colr1c = 29
AND date(substr(g.extd1d,1,4)||'-'||substr(g.EXTD1d,5,2)
||'-'||substr(g.EXTD1d,7,2) )
between current_Date - 180 DAY AND current_Date
) g2 on g2.cstnoc = c.xcstno
You can also use a Common Table Expression (CTE) that will produce the same result:
with g2 as (
SELECT
count(*) as sales_180,
180/count(*) as velocity
FROM custgroup g
WHERE g.cstnoc = 10617
AND g.framec = 4847
AND g.covr1c = 1763
AND g.colr1c = 29
AND date(substr(g.extd1d,1,4)||'-'||substr(g.EXTD1d,5,2)
||'-'||substr(g.EXTD1d,7,2) )
between current_Date - 180 DAY AND current_Date
)
select
g2.sales_180,
g2.velocity,
c.xslsno as CustID,
cr.slsupr as Manager
from customers c
join managers cr on c.xslsno = cr.xslsno
join g2 on g2.cstnoc = c.xcstno

Related

Query on large table

I have the following query -
SELECT n.fname
,ii.render AS practitioner_npi
,n.address AS address1
,substring(n.postal,0,6) AS zip
,substring(n.postal,6,4) AS zip4
,ii.count AS count
FROM
(
SELECT render, count(*) AS count
FROM dx sl
JOIN annual caq
ON DATE_TRUNC('quarter', date_of_service::date) >= caq.start
JOIN entities n
ON sl.render = n.npi
WHERE dx_cd IN (
SELECT DISTINCT dx_cd
FROM dx_per_code pc
JOIN bucket bac
ON pc.code = bac.hcpccode
WHERE
bucketname = 'something'
AND dx_rank BETWEEN 1 AND 5
)
AND n.npi_type = '1'
GROUP BY render
)
ii
JOIN npi n ON n.npi = ii.render
LEFT JOIN taxonomy t ON t.code = n.taxonomy
ORDER BY ii.count DESC;
The dx table does not have any indexes and contains approax 8B records. This query currently takes 20 minutes to run. What indexes/optimizations can I make to get this to run faster?

How to force postgres to return 0 even if there are no rows matching query, using coalesce, group by and join

I've been trying hopelessly to get the following SQL statement to return the query results and default to 0 if there are no rows matching the query.
This is the intended result:
vol | year
-------+------
0 | 2018
Instead I get:
vol | year
-----+------
(0 rows)
Here is the sql statement:
select coalesce(vol,0) as vol, year
from (select sum(vol) as vol, year
from schema.fact_data
join schema.period_data
on schema.fact_data.period_tag = schema.period_data.tag
join schema.product_data
on schema.fact_data.product_tag =
schema.product_data.tag
join schema.market_data
on schema.fact_data.market_tag = schema.market_data.tag
where "retailer"='MadeUpRetailer'
and "product_tag"='FakeProductTag'
and "year"='2018' group by year
) as DerivedTable;
I know the query works because it returns data when there is data. Just doesn't default to 0 as intended...
Any help in finding why this is the case would be much appreciated!

Using your subquery DerivedTable, you could write:
SELECT coalesce(DerivedTable.vol, 0) AS vol,
y.year
FROM (VALUES ('2018'::text)) AS y(year)
LEFT JOIN (SELECT ...) AS DerivedTable
ON DerivedTable.year = y.year;

Remove the GROUP BY (and the outer query):
select 2018 as year, coalesce(sum(vol), 0) as vol
from schema.fact_data f join
schema.period_data p
on f.period_tag = p.tag join
schema.product_data pr
on f.product_tag = pr.tag join
schema.market_data m
on fd.market_tag = m.tag
where "retailer" = 'MadeUpRetailer' and
"product_tag" = 'FakeProductTag' and
"year" = '2018';
An aggregation query with no GROUP BY always returns exactly one row, so this should do what you want.
EDIT:
The query would look something like this:
select v.yyyy as year, coalesce(sum(vol), 0) as vol
from (values (2018), (2019)) v(yyyy) left join
schema.fact_data f
on f.year = v.yyyy left join -- this is just an example. I have no idea where year is coming from
schema.period_data p
on f.period_tag = p.tag left join
schema.product_data pr
on f.product_tag = pr.tag left join
schema.market_data m
on fd.market_tag = m.tag
group by v.yyyy
However, you have to move the where conditions to the appropriate on clauses. I have no idea where the columns are coming from.

From the code you posted it is not clear in which table you have the year column.
You can use UNION to fetch just 1 row in case there are no rows in that table for the year 2018 like this:
select sum(vol) as vol, year
from schema.fact_data innrt join schema.period_data
on schema.fact_data.period_tag = schema.period_data.tag
inner join schema.product_data
on schema.fact_data.product_tag = schema.product_data.tag
inner join schema.market_data
on schema.fact_data.market_tag = schema.market_data.tag
where
"retailer"='MadeUpRetailer' and
"product_tag"='FakeProductTag' and
"year"='2018'
group by "year"
union
select 0 as vol, '2018' as year
where not exists (
select 1 from tablename where "year" = '2018'
)
In case there are rows for the year 2018, then nothing will be fetched by the 2nd query,

How to use alias of a subquery to get the running total?

I have a UNION of 3 tables for calculating some balance and I need to get the running SUM of that balance but I can't use PARTITION OVER, because I must do it with a sql query that can work in Access.
My problem is that I cannot use JOIN on an alias subquery, it won't work.
How can I use alias in a JOIN to get the running total?
Or any other way to get the SUM that is not with PARTITION OVER, because it does not exist in Access.
This is my code so far:
SELECT korisnik_id, imePrezime, datum, Dug, Pot, (Dug - Pot) AS Balance
FROM (
SELECT korisnik_id, k.imePrezime, r.datum, SUM(IIF(u.jedinstven = 1, r.cena, k.kvadratura * r.cena)) AS Dug, '0' AS Pot
FROM Racun r
INNER JOIN Usluge u ON r.usluga_id = u.ID
INNER JOIN Korisnik k ON r.korisnik_id = k.ID
WHERE korisnik_id = 1
AND r.zgrada_id = 1
AND r.mesec = 1
AND r.godina = 2017
GROUP BY korisnik_id, k.imePrezime, r.datum
UNION ALL
SELECT korisnik_id, k.imePrezime, rp.datum, SUM(IIF(u.jedinstven = 1, rp.cena, k.kvadratura * rp.cena)) AS Dug, '0' AS Pot
FROM RacunP rp
INNER JOIN Usluge u ON rp.usluga_id = u.ID
INNER JOIN Korisnik k ON rp.korisnik_id = k.ID
WHERE korisnik_id = 1
AND rp.zgrada_id = 1
AND rp.mesec = 1
AND rp.godina = 2017
GROUP BY korisnik_id, k.imePrezime, rp.datum
UNION ALL
SELECT uu.korisnik_id, k.imePrezime, uu.datum, '0' AS Dug, SUM(uu.iznos) AS Pot
FROM UnosUplata uu
INNER JOIN Korisnik k ON uu.korisnik_id = k.ID
WHERE korisnik_id = 1
GROUP BY uu.korisnik_id, k.imePrezime, uu.datum
) AS a
ORDER BY korisnik_id

You can save a query (let's name it Query1) for the UNION of the 3 tables and then create another query that returns each row in the first query and calculates the sum of the rows that are before it (optionally checking that they are in the same group).
It should be something like this:
SELECT *, (
SELECT SUM(Value) FROM Query1 AS b
WHERE b.GroupNumber=a.GroupNumber
AND b.Position<=a.Position
) AS RunningSum
FROM Query1 AS a
However, it's more efficient to do that in the report.

DAX SUM outside current context

This is my current data model ->
part of data model
I need help in creating a measure.
The measure I want to create should show the overall capacity of the ships that had trips in a time interval, as well as be able to see the total capacity based on the room type.
I am interested in obtaining the DAX equivalent of :
SELECT SUM(sc.capacity)
FROM Reporting.Trips t
JOIN Reporting.Dates d ON d.DateId = t.DateId
JOIN Reporting.ShipCapacities sc on sc.ShipId = t.ShipId
WHERE d.FiscalYear= 2017 AND d.FiscalWeek=15
OR per type
SELECT SUM(sc.capacity)
FROM Reporting.Trips t
JOIN Reporting.Dates d ON d.DateId = t.DateId
JOIN Reporting.ShipCapacities sc on sc.ShipId = t.ShipId
WHERE d.FiscalYear= 2017 AND d.FiscalWeek=15 and sc.CabinTypeId = 11
I tried to define a measure in this manner:
Total Rooms:=CALCULATE ( SUM ( 'Ship Capacities'[Ship Capacity] ) )
I have seen that this outputs the T-SQL equivalent of:
SELECT SUM(Capacity)
FROM
(
SELECT DISTINCT sc.*
FROM Reporting.Trips t
JOIN Reporting.Dates d ON d.DateId = t.DateId
JOIN Reporting.ShipCapacities sc on sc.ShipId = t.ShipId
WHERE d.FiscalYear= 2017 AND d.FiscalWeek=15
) x
OR per type
SELECT SUM(Capacity)
FROM
(
SELECT DISTINCT sc.*
FROM Reporting.Trips t
JOIN Reporting.Dates d ON d.DateId = t.DateId
JOIN Reporting.ShipCapacities sc on sc.ShipId = t.ShipId
WHERE d.FiscalYear= 2017 AND d.FiscalWeek=15 and sc.CabinTypeId = 11
) x
These are the query results ->Query results
Thanks.

This is the final measure that worked for me:
Total Rooms:=CALCULATE (
SUMX (
'Ship Capacities',
'Ship Capacities'[Ship Capacity]
* COUNTROWS ( FILTER ( Trips, AND( Trips[Sailed] = TRUE (),Trips[ShipId] = 'Ship Capacities'[ShipId] ) ))
)
)
Notice the COUNTROWS.
In the case of a many to one, then a one to many relation, it did not iterate (but previously I used SUM and not SUMX).
I found it unusual that I have to mention the relation between the many to many tables.
Otherwise it does not work.

Rollup / recursive addition SQL Server 2008

I have a query with rollup that outputs data like (the query is a little busy, but I can post if necessary)
range subCounts Counts percent
1-9 3 100 3.0
10-19 13 100 13.0
20-29 30 100 33.0
30-39 74 100 74.0
NULL 100 100 100.0
How is it possible to keep a running summation total of percent? Say I need to find the bottom 15 percentile, in this case 3+13=16 so I would like for the last row to be returned read
range subCounts counts percent
10-19 13 100 13.0
EDIT1: here the query
select '$'+cast(+bin*10000 + ' ' as varchar(10)) + '-' + cast(bin*10000+9999 as varchar(10)) as bins,
count(*) as numbers,
(select count(distinct patient.patientid) from patient
inner join tblclaims on patient.patientid = tblclaims.patientid
and patient.admissiondate = tblclaims.admissiondate
and patient.dischargedate = tblclaims.dischargedate
inner join tblhospitals on tblhospitals.hospitalnpi = patient.hospitalnpi
where (tblhospitals.hospitalname = 'X')
) as Totals
, round(100*count(*)/cast((select count(distinct patient.patientid) from patient
inner join tblclaims on patient.patientid = tblclaims.patientid
and patient.admissiondate = tblclaims.admissiondate
and patient.dischargedate = tblclaims.dischargedate
inner join tblhospitals on tblhospitals.hospitalnpi = patient.hospitalnpi
where (tblhospitals.hospitalname = 'X')) as float),2) as binsPercent
from
(
select tblclaims.patientid, sum(claimsmedicarepaid) as TotalCosts,
cast(sum(claimsmedicarePaid)/10000 as int) as bin
from tblclaims inner join patient on patient.patientid = tblclaims.patientid
and patient.admissiondate = tblclaims.admissiondate
and patient.dischargedate = tblclaims.dischargedate
inner join tblhospitals on patient.hospitalnpi = tblhospitals.hospitalnpi
where tblhospitals.hospitalname = 'X'
group by tblclaims.patientid
) as t
group by bin with rollup

OK, so for whomever might use this for reference I figured out what I needed to do.
I added row_number() over(bin) as rownum to the query and saved all of this as a view.
Then I used
SELECT *,
SUM(t2.binspercent) AS SUM
FROM t t1
INNER JOIN t t2 ON t1.rownum >= t2.rownum
GROUP BY t1.rownum,
t1.bins, t1.numbers, t1.uktotal, t1.binspercent
ORDER BY t1.rownum
by joining t1.rownum >=t2.rownum you can get the rolling count sort of thing.

This isn't exactly what i was looking for, but it's on the same track:
http://blog.tallan.com/2011/12/08/sql-server-2012-windowing-functions-part-1-of-2-running-and-sliding-aggregates/ and http://blog.tallan.com/2011/12/19/sql-server-2012-windowing-functions-part-2-of-2-new-analytic-functions/ - check out PERCENT_RANK
CUME_DIST
PERCENTILE_CONT
PERCENTILE_DISC
Sorry for the lame answer

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

altering query in db2 to fix count from a join - sql

Related

Query on large table

How to force postgres to return 0 even if there are no rows matching query, using coalesce, group by and join

How to use alias of a subquery to get the running total?

DAX SUM outside current context

Rollup / recursive addition SQL Server 2008

Categories

Resources