How to refactor complicated SQL query which is broken

How to refactor complicated SQL query which is broken - sql

Here is the simplified model of the domain
In a nutshell, unit grants documents to to a customer. There are two types of units: main units and their child units. Both belong to the same province, and to one province may belong multiple cities. Document has numerous events (processing history). Customer belongs to one city and province.
I have to write query, which returns random set of documents, given a target main unit code. Here is the criteria:
Return 10 documents where the newest event_code = 10
Each document must belong to a different customer living in any city of the unit's region (prefer different cities)
Return the Customers newest Document which meets the criteria
There must be both document types present in the result
Result (customers chosen) should be random with each query
But...
If there's not enough customers, try to use multiple documents of the same customer as a last resort
If there aren't enough documents either, return as much as possible
If there's not a single instance of another document type, then return all the same
There may be million of rows, and the query must be as fast as possible, it is executed frequently.
I'm not sure how to structure this kind of complex query in a sane manner. I'm using Oracle and PL/SQL. Here is something I tried, but it isn't working as expected (returns wrong data). How should I refactor this query and get the random result, and also honor all those borderline rules? I'm also worried about the performance regarding the joins and wheres.
CURSOR c_documents IS
WITH documents_cte AS
SELECT d.document_id AS document_id, d.create_dt AS create_dt,
c.customer_id
FROM documents d
JOIN customers c ON (c.customer_id = d.customer_id AND
c.province_id = (SELECT region_id FROM unit WHERE unit_code = 1234))
WHERE exists (
SELECT 1
FROM event
where document_id = d.document_id AND
event_code = 10
AND create_dt =
SELECT MAX(create_dt)
FROM event
WHERE document_id = d.document_id)
SELECT * FROM documents_cte d
WHERE create_dt = (SELECT MAX(create_dt)
from documents_cte
WHERE customer_id = d.customer_id)
How to correctly make this query with efficiency, randomness in mind? I'm not asking for exact solution, but guidelines at least.

I'd avoid hierarchic tables whenever possible. In your case you are using a hierarchic table to allow for an unlimited depth, but at last it's just two levels you store: provinces and their cities. That should better be just two tables: one for provinces and one for cities. Not a big deal, but that would make your data model simpler and easier to query.
Below I am starting with a WITH clause to get a city table, as such doesn't exist. Then I go step by step: get the customers belonging to the unit, then get their documents and rank them. At last I select the ranked documents and randomly take 10 of the best ranked ones.
with cities as
(
select
c.region_id as city_id,
o.region_id as province_id
from region c
join region p on p.region_id = c.parent_region_id
)
, unit_customers as
(
select customer_id
from customer
where city_id in
(
select city_id
from cities
where
(
select region_id
from unit
where unit_code = 1234
) in (city_id, province_id)
)
)
, ranked_documents as
(
select
document.*,
row_number(partition by customer_id order by create_dt desc) as rn
from document
where customer_id in -- customers belonging to the unit
(
select customer_id
from unit_customers
)
and document_id in -- documents with latest event code = 10
(
select document_id
from event
group by document_id
having max(event_code) keep (dense_rank last order by create_dt) = 10
)
)
select *
from ranked_documents
order by rn, dbms_random.value
fetch first 10 rows only;
This doesn't take into account to get both document types, as this contradicts the rule to get the latest documents per customer.
FETCH FIRST is availavle as of Oracle 12c. In earlier versions you would use one more subquery and another ROW_NUMBER instead.
As to speed, I'd recommend these indexes for the query:
create index idx_r1 on region(region_id); -- already exists for region_id = primary key
create index idx_r2 on region(parent_region_id, region_id);
create index idx_u1 on unit(unit_code, region_id);
create index idx_c1 on customer(city_id, customer_id);
create index idx_e1 on event(document_id, create_dt, event_code);
create index idx_d1 on document(document_id, customer_id, create_dt);
create index idx_d2 on document(customer_id, document_id, create_dt);
One of the last two will be used, the other not. Check which with EXPLAIN PLAN and drop the unused one.

Related

Rank order ST_DWithin results by the number of radii a result appears in

I have a table of existing customers and another table of potential customers. I want to return a list of potential customers rank ordered by the number of radii of existing purchasers that they appear in.
There are many rows in the potential customers table per each existing customer, and the radius around a given existing customer could encompass multiple potential customers. I want to return a list of potential customers ordered by the count of the existing customer radii that they fall within.
SELECT pur.contact_id AS purchaser, count(pot.*) AS nearby_potential_customers
FROM purchasers_geocoded pur, potential_customers_geocoded pot
WHERE ST_DWithin(pur.geom,pot.geom,1000)
GROUP BY purchaser;
Does anyone have advice on how to proceed?
EDIT:
With some help, I wrote this query, which seems to do the job, but I'm verifying now.
WITH prequalified_leads_table AS (
SELECT *
FROM nearby_potential_customers
WHERE market_val > 80000
AND market_val < 120000
)
, proximate_to_existing AS (
SELECT pot.prop_id AS prequalified_leads
FROM purchasers_geocoded pur, prequalified_leads_table pot
WHERE ST_DWithin(pot.geom,pur.geom,100)
)
SELECT prequalified_leads, count(prequalified_leads)
FROM proximate_to_existing
GROUP BY prequalified_leads
ORDER BY count(*) DESC;

I want to return a list of potential customers ordered by the count of the existing customer radii that they fall within.
Your query tried the opposite of your statement, counting potential customers around existing ones.
Inverting that, and after adding some tweaks:
SELECT pot.contact_id AS potential_customer
, rank() OVER (ORDER BY pur.nearby_customers DESC
, pot.contact_id) AS rnk
, pur.nearby_customers
FROM potential_customers_geocoded pot
LEFT JOIN LATERAL (
SELECT count(*) AS nearby_customers
FROM purchasers_geocoded pur
WHERE ST_DWithin(pur.geom, pot.geom, 1000)
) pur ON true
ORDER BY 2;
I suggest a subquery with LEFT JOIN LATERAL ... ON true to get counts. Should make use of the spatial index that you undoubtedly have:
CREATE INDEX ON purchasers_geocoded USING gist (geom);
Thereby retaining rows with 0 nearby customers in the result - your original join style would exclude those. Related:
What is the difference between LATERAL and a subquery in PostgreSQL?
Then ORDER BY the resulting nearby_customers in the outer query (not: nearby_potential_customers).
It's not clear whether you want to add an actual rank. Use the window function rank() if so. I made the rank deterministic while being at it, breaking ties with an additional ORDER BY expression: pot.contact_id. Else, peers are returned in arbitrary order which can change for every execution.
ORDER BY 2 is short syntax for "order by the 2nd out column". See:
Select first row in each GROUP BY group?
Related:
How do I query all rows within a 5-mile radius of my coordinates?

Easier way to limit rows in SELECT subquery?

I perform queries on an Oracle database. Let's say I have a table, PEOPLE. Each person can have multiple reference numbers. The reference numbers are stored in a different table, REFERENCENUMBERS.
REFERENCENUMBERS contains a column, PERSON_ID, which is identical to the ID column of the PEOPLE table. It is through this ID that the tables are joined.
Let's say I want to perform a query on the PEOPLE table. However I only want a single reference number returned per person record: i.e if a person has multiple reference numbers, I don't want multiple rows returned per person per reference number.
I choose a criterion for how to select only one reference number: the one which was created earliest. The date of reference number creation is stored in the REFERENCENUMBERS table as DATECREATED.
The following code does this job:
SELECT
PEOPLE.ID,
PEOPLE.NAME,
PEOPLE.AGE,
PEOPLE.ADDRESS,
-- Subquery to return the earliest-created reference number for this person
(
SELECT
REFERENCENUMBERS.NUMBER
FROM
REFERENCENUMBERS
WHERE
REFERENCENUMBERS.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
AND REFERENCENUMBERS.DATECREATED =
-- Sub-sub query simply to match the earliest date
(
SELECT
MIN(R.DATECREATED) -- To ensure that only the earliest-created reference number is returned.
FROM
REFERENCENUMBERS R -- Give this sub-sub query an alias for the table
WHERE
R.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
)
)
FROM
PEOPLE
WHERE
PEOPLE.AGE > 18 -- Or whatever
However, my question to you knowledgeable SQL people, is.. is there an easier way of doing this? It just appears cumbersome to have to include a sub-sub-query solely for the purpose of finding the earliest date, and limiting the WHERE clause of the sub-query.
There must be an easier, or cleaner way of doing this. Any suggestions?
(By the way - the sample code is greatly simplified from what I'm actually working on. Please don't provide answers which substantively modify my primary query with different-style JOINs etc - thanks).

The simplest would be a top-n filter:
select people.id
, people.name
, people.age
, people.address
, ( select referencenumbers.number
from referencenumbers
where referencenumbers.person_id = people.id
order by referencenumbers.datecreated
fetch first row only )
from people
where people.age > 18;
More details here (requires Oracle 12.1 or later.)
Or this (works in earlier versions):
select people.id
, people.name
, people.age
, people.address
, ( select min(rn.person_id) keep (dense_rank first order by rn.datecreated)
from referencenumbers rn
where rn.person_id = people.id )
from people
where people.age > 18;
(I gave referencenumbers a shorter alias for readability.)

Try this
SELECT
PEOPLE.ID,
PEOPLE.NAME,
PEOPLE.AGE,
PEOPLE.ADDRESS,
REFERENCENUMBERS.NUMBER
FROM PEOPLE
JOIN REFERENCENUMBERS ON REFERENCENUMBERS.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
JOIN
(
SELECT
R.PERSON_ID,
MIN(R.DATECREATED) minc -- To ensure that only the earliest-created reference number is returned.
FROM
REFERENCENUMBERS R -- Give this sub-sub query an alias for the table
GROUP BY R.PERSON_ID
) t ON t.minc = REFERENCENUMBERS.DATECREATED and
t.PERSON_ID = REFERENCENUMBERS.PERSON_ID
WHERE
PEOPLE.AGE > 18 -- Or whatever

How to modify query to walk entire table rather than a single

I wrote several SQL queries and executed them against my table. Each individual query worked. I kept adding functionality until I got a really ugly working query. The problem is that I have to manually change a value every time I want to use it. Can you assist in making this query automatic rather than “manual”?
I am working with DB2.
Table below shows customers (cid) from 1 to 3. 'club' is a book seller, and 'qnty' is the number of books the customer bought from each 'club'. The full table has 45 customers.
Image below shows all the table elements for the first 3 users (cid=1 OR cid=2 OR cid=3). The final purpose of all my queries (once combined) is it to find the single 'club' with the largest 'qnty' for each 'cid'. So for 'cid =1' the 'club' is Readers Digest with 'qnty' of 3. For 'cid=2' the 'club' is YRB Gold with 'qnty' of 5. On and on until cid 45 is reached.
To give you a background on what I did here are my queries:
(Query 1-starting point for cid=1)
SELECT * FROM yrb_purchase WHERE cid=1
(Query 2 - find the 'club' with the highest 'qnty' for cid=1)
SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC
(Query 3 – combine the record from the above query with it’s cid)
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club
(Query 4) make sure there is only one record for cid=1
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club FETCH FIRST ROWS ONLY
To get the 'club' with the highest 'qnty' for customer 2, I would simply change the text cid=1 to cid=2 in the last query above. My query seems to always produce the correct results. My question is, how do I modify my query to get the results for all 'cid's from 1 to 45 in a single table? How do I get a table with all the cid values along with the club which sold that cid the most books, and how many books were sold within one tablei? Please keep in mind I am hoping you can modify my query as opposed to you providing a better query.
If you decide that my query is way too ugly (I agree with you) and choose to provide another query, please be aware that I just started learning SQL and may not be able to understand your query. You should be aware that I already asked this question: For common elements, how to find the value based on two columns? SQL but I was not able to make the answer work (due to my SQL limitations - not because the answer wasn't good); and in the absence of a working answer I could not reverse engineer it to understand how it works.
Thanks in advance
****************************EDIT #1*******************************************
The results of the answer is:

You could use OLAP/Window Functions to achieve this:
SELECT
cid,
club,
qnty
FROM
(
SELECT
cid,
club,
qnty,
ROW_NUMBER() OVER (PARTITION BY cid order by qnty desc) as cid_club_rank
FROM
(
SELECT
cid,
club,
sum(qnty) as qnty
FROM yrb_purchase
GROUP BY cid, club
) as sub1
) as sub2
WHERE cid_club_rank = 1
The inner most statement (sub1) just grabs a total quantity for each cid/club combination. The second inner most statement (sub2) creates a row_number for each cid/club combination ordering by the quantity (top down). Then the outer most query chooses only records where that row_number() is 1.

How to get the most frequent value SQL

I have a table Orders(id_trip, id_order), table Trip(id_hotel, id_bus, id_type_of_trip) and table Hotel(id_hotel, name).
I would like to get name of the most frequent hotel in table Orders.
SELECT hotel.name from Orders
JOIN Trip
on Orders.id_trip = Trip.id_hotel
JOIN hotel
on trip.id_hotel = hotel.id_hotel
FROM (SELECT hotel.name, rank() over (order by cnt desc) rnk
FROM (SELECT hotel.name, count(*) cnt
FROM Orders
GROUP BY hotel.name))
WHERE rnk = 1;

The "most frequently occurring value" in a distribution is a distinct concept in statistics, with a technical name. It's called the MODE of the distribution. And Oracle has the STATS_MODE() function for it. https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions154.htm
For example, using the EMP table in the standard SCOTT schema, select stats_mode(deptno) from scott.emp will return 30 - the number of the department with the most employees. (30 is the department "name" or number, it is NOT the number of employees in that department!)
In your case:
select stats_mode(h.name) from (the rest of your query)
Note: if two or more hotels are tied for "most frequent", then STATS_MODE() will return one of them (non-deterministic). If you need all the tied values, you will need a different solution - a good example is in the documentation (linked above). This is a documented flaw in Oracle's understanding and implementation of the statistical concept.

Use FIRST for a single result:
SELECT MAX(hotel.name) KEEP (DENSE_RANK FIRST ORDER BY cnt DESC)
FROM (
SELECT hotel.name, COUNT(*) cnt
FROM orders
JOIN trip USING (id_trip)
JOIN hotel USING (id_hotel)
GROUP BY hotel.name
) t

Here is one method:
select name
from (select h.name,
row_number() over (order by count(*) desc) as seqnum -- use `rank()` if you want duplicates
from orders o join
trip t
on o.id_trip = t.id_trip join -- this seems like the right join condition
hotels h
on t.id_hotel = h.id_hotel
) oth
where seqnum = 1;

** Getting the most recent statistical mode out of a data sample **
I know it's more than a year, but here's my answer. I came across this question hoping to find a simpler solution than what I know, but alas, nope.
I had a similar situation where I needed to get the mode from a data sample, with the requirement to get the mode of the most recently inserted value if there were multiple modes.
In such a case neither the STATS_MODE nor the LAST aggregate functions would do (as they would tend to return the first mode found, not necessarily the mode with the most recent entries.)
In my case it was easy to use the ROWNUM pseudo-column because the tables in question were performance metric tables that only experienced inserts (not updates)
In this oversimplified example, I'm using ROWNUM - it could easily be changed to a timestamp or sequence field if you have one.
SELECT VALUE
FROM
(SELECT VALUE ,
COUNT( * ) CNT,
MAX( R ) R
FROM
( SELECT ID, ROWNUM R FROM FOO
)
GROUP BY ID
ORDER BY CNT DESC,
R DESC
)
WHERE
(
ROWNUM < 2
);
That is, get the total count and max ROWNUM for each value (I'm assuming the values are discrete. If they aren't, this ain't gonna work.)
Then sort so that the ones with largest counts come first, and for those with the same count, the one with the largest ROWNUM (indicating most recent insertion in my case).
Then skim off the top row.
Your specific data model should have a way to discern the most recent (or the oldest or whatever) rows inserted in your table, and if there are collisions, then there's not much of a way other than using ROWNUM or getting a random sample of size 1.
If this doesn't work for your specific case, you'll have to create your own custom aggregator.
Now, if you don't care which mode Oracle is going to pick (your bizness case just requires a mode and that's it, then STATS_MODE will do fine.

Complex select query question for hardcore SQL designers

Very complex query been trying to construct it for few days with more real success.
I'm using SQL-SERVER 2005 Standard
What i need is :
5 CampaignVariants from Campaigns whereas 2 are with the largest PPU number set and 3 are random.
Next condition is that CampaignDailyBudget and CampaignTotalBudget are below what is set in Campaign ( calculation is number of clicks in Visitors table connected to Campaigns via CampaignVariants on which users click)
Next condition CampaignLanguage, CampaignCategory, CampaignRegion and CampaignCountry must be the ones i send to this select with (languageID,categoryID,regionID and countryID).
Next condition is that IP address i send to this select statement won't be in IPs list for current Campaign ( i delete inactive for 24 hours IPs ).
In other words it gets 5 CampaignVariants for user that enters the site, when i take from user PublisherRegionUID,IP,Language,Country and Region
view diagram
more details
i get countryID, regionID, ipID, PublisherRegionUID and languageID from Visitor. This are filter parameters. While i first need to get what Publisher is about to show on his site by it's categories, language so on.... and then i filter all remaining Campaigns by Visitors's params with all parameters besides PublisherRegionUID.
So it has two actual fiters. One What Publisher wants to Publish and other one what Visitor can view...
campaignDailyBudget and campaignTotalBudget are values set by Users who creates a Campaign. Those two compared to (number of clicks per campaign)*(campaignPPU) while date filters obviously used to filter for campaignDailyBudget with from 12:00AM to 11:59PM of today. campaignTotalBudget is not filtered by date for obvious reasons
Demo of Stored Procedure
ALTER PROCEDURE dbo.CampaignsGetCampaignVariants4Visitor
#publisherSiteRegionUID uniqueidentifier,
#visitorIP varchar(15),
#browserID tinyint,
#countryID tinyint,
#osID tinyint,
#languageID tinyint,
#acceptsCookies bit
AS
BEGIN
SET NOCOUNT ON;
-- check if such #publisherRegionUID exists
if exists(select publisherSiteRegionID from PublisherSiteRegions where publisherSiteRegionUID=#publisherSiteRegionUID)
begin
declare #publisherSiteRegionID int
select #publisherSiteRegionID = publisherSiteRegionID from PublisherSiteRegions where publisherSiteRegionUID=#publisherSiteRegionUID
-- get CampaignVariants
-- ** choose 2 highest PPU and 3 random CampaignVariants from Campaigns list
-- where regionID,countryID,categoryID,languageID meets Publisher and Visitor requirements
-- and Campaign.campaignDailyBudget<(sum of Clicks in Visitors per this Campaign)*Campaign.PPU during this day
-- and Campaign.campaignTotalBudget<(sum of Clicks in Visitors per this Campaign)*Campaign.PPU
-- and #visitorID does not appear in Campaigns2IPs with this Campaign
-- insert visitor
insert into Visitors (ipAddress,browserID,countryID,languageID,OSID,acceptsCookies)
values (#visitorIP,#browserID,#countryID,#languageID,#OSID,#acceptsCookies)
declare #visitorID int
select #visitorID = IDENT_CURRENT('Visitors')
-- add IP to pool Campaigns ** adding ip to all Campaigns whose CampaignVariants were chosen
-- add PublisherRegion2Visitor relationship
insert into PublisherSiteRegions2Visitors values (#visitorID,#publisherSiteRegionID)
-- add CampaignVariant2Visitor relationship
end
END
GO

I also make a number of assumptions about your oblique requirements. I’ll spell them out as I go along, along with explaining the code. Please note that I of course have no reasonable way of testing this code for typos or minor logic errors.
It might be possible to write this as a single ginormous query, but that would be awkward, ugly, and prone to performance issues as the SQL optimizer can have problems buliding plans for overly-large queries. An option would be to write it as a series of queries, populating temp tables for use in subsequent queries (which alows for much simpler debugging). I chose to write this as a large common table expression statement with a series of CTE tables, largely because it kind of “flows” better that way, and it'd probably perform better than the many-temp-tables version.
First assumption: there are several ciruclar references in there. Campaign has links to both Countries and Regions, so both of these parameter values must be checked—even though based on the table link from Countries to Region, this filter could possibly be simplified to just a check on Country (assuming that the country parameter value is always “in” the region parameter). The same applies to Language and Category, and perhaps to IPs and Visitors. This appears to be sloppy design; if it can be cleared up, or if assumptions on the validity of the data can be made, the query could be simplified.
Second assumption: Parameters are passed in as variables in the form of #Region, #Country, etc. Also, there is only one IP address being passed in; if not, then you’ll need to pass in multiple values, set up a temp table containing those values, and add that as a filter where I use the #IP parameter.
So, step 1 is a first pass identifying “eligible” campaigns, by pulling out all those that share the desired country, region, language, cateogory, and that do not have the one IP address associated with them:
WITH cteEligibleCampaigns (CampaignId)
as (select CampaignId
from Campaigns2Regions
where RegionId = #RegionId
intersect select CampaignId
from Campaign2Countries
where CountryId = #CountryId
intersect select CampaignId
from Campaign2Languages
where LanguageId = #LanguageId
intersect select CampaignId
from Campaign2Categories
where CategoryId = #CategoryId
except select CampaignId
from Campaigns2IPs
where IPID = #IPId)
Next up, from these filter out those items where “CampaignDailyBudget and CampaignTotalBudget are below what is set in Campaign ( calculation is number of clicks in Visitors table connected to Campaigns via CampaignVariants on which users click)”. This requirement is not entirely clear to me. I have chosen to interpret it as “only include those campaigns where, if you count the number of visitors for those campaign’s CampaignVariants, the total count is less than both CampaignDailyBudget and CampaignTotalBudget”. Note that here I introduce a random value, used later on in selecting random rows.
,cteTargetCampaigns (CampaignId, RandomNumber)
as (select CampaignId, checksum(newid() RandomNumber)
from cteEligibleCampaigns ec
inner join Campaigns ca
on ca.CampgainId = ec.CampaignId
inner join CampaignVariants cv
on cv.CampgainId = ec.CampaignId
inner join CampaignVariants2Visitors cvv
on cvv.CampaignVariantId = cv. CampaignVariantId
group by ec.CampaignId
having count(*) < ca.CampaignDailyBudget
and count(*) < CampaignTotalBudget)
Next up, identify the two “best” items.
,cteTopTwo (CampaignId, Ranking)
as (select CampaignId, row_number() over (order by CampgainPPU desc)
from cteTargetCampaigns tc
inner join Campaigns ca
on ca.CampaignId = tc.CampaignId)
Next, line up all other campaigns by the randomly assigned number:
,cteRandom (CampaignId, Ranking)
as (select CampaignId, row_number() over (order by RandomNumber)
from cteTargetCampaigns
where CampaignId not in (select CampaignId
from cteTopTwo
where Ranking < 3))
And, at last, pull the data sets together:
select CampaignId
from cteTopTwo
where Ranking <= 2
union all select CampaignId
from cteRandom
where Ranking <= 3
Lump the above sections of code together, debug typos, invalid assumption, and missed requirements (such as order or flags identifying the top two items from the random ones), and you should be good.

I'm not sure I understand this portion of your post:
it gets 5 CampaignVariants for user
that enters the site, when i take from
user
PublisherRegionUID,IP,Language,Country
and Region
I'm assuming "it" is the query. The user given your second "Next Condition" is the IP? What does "when I take from user" mean? Does that mean that is the information you have at the time you execute your query or is that information you returned from your query? If the later, then there are a host of questions that would need to be answered since many of those columns are part of a Many:Many relationship.
Regardless, below is a means to get the 5 campaigns where, according to your second "Next condition", you have an IP address that you want filter out. I'm also assuming that you want five campaigns total which means that the three random ones cannot include the two "highest PPU" ones.
With
ValidCampaigns As
(
Select C.campaignId
From Campaigns As C
Left Join (Campaigns2IPs As CIP
Join IPs
On IPs.ipID = CIP.ipID
And IPs.ipAddress = #IPAddress)
On CIP.campaignId = C.campaignId
Where CIP.campaignID Is Null
)
CampaignPPURanks As
(
Select C.campaignId
, Row_Number() Over ( Order By C.campaignPPU desc ) As ItemRank
From ValidCampaigns As C
)
, RandomRanks As
(
Select campaignId
, Row_Number() Over ( Order By newid() desc ) As ItemRank
From ValidCampaigns As C
Left Join CampaignPPURanks As CR
On CR.campaignId = C.campaignId
And CR.ItemRank <= 2
Where CR.campaignId Is Null
)
Select ...
From CampaignPPURanks As CPR
Join CampaignVariants As CV
On CV.campaignId = CPR.campaignId
And CPR.ItemRank <= 2
Union All
Select ...
From RandomRanks As RR
Join CampaignVariants As CV
On CV.campaignId = RR.campaignId
And RR.ItemRank <= 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas