I'm writing a SQL Server procedure to optimize cut of bars. I haven't found yet the best method. Seems to be CTE request, but I'm stuck.
I try to write a stored procedure to optimize cut of bars. For my test, I have to cut 18 pieces (3 of 1000 mm, 3 of 1500 mm, 3 of 2500mm, 3 of 3500 mm, 3 of 4500 mm and 3 of 6000 mm), and I have 3 sizes of bars (5500mm, 7000mm and 8500mm).
After that, I generate every combination of bars with any cuts as possible.
I tried with a while loop and a temporary table, It takes me one hour and a half. But I think I can do better with a CTE request...
Now, I must generate every combination of many bars to have my 18 cuts. I made another CTE request, but I haven't find the way to stop recursivity when at least one combination has all the cuts. So, my request find over 150 millions combinations, with 8,9,10,11... bars. And it tries every loop with 18 bars. I want it to stop with 8 bars (I know it is the smallest bar count I need for my cuts). And it takes more than two days !
I have 2 temporary tables, on with my combination of bars (#COMBI_BARRE) with this structure : ID_ART : identity for article, COLOR, CUT_COMBI : a varchar concat the cut ID of the bar combination : 1-2-3-4..., NB_CUTSan integer to get the count of cuts in the bar, FIRST_CUT the smaller cut ID of the bar.
I have another temporary table #DET_BAR with the detail of my cuts, with 2 columns : ID_COMBI_BAR the bar combination ID and ID_CUT_STR, the cut ID in varchar (to avoid cast or convert in CTE for better performance).
I store the result in a table call Combi, with the ID_ART, the COLOR, a varchar column Combi who concat the the bar combination ID (1-2-3-4...), a varchar column COMBI_CUT who concat the ID_CUT (1-2-3-4-5...), NB_BAR the count of bar in the combination, NB_CUTS : the count of cuts in the combination, MAX_CUTS the total number of cut I must to for my article and color.
As it makes one loop per bar,I tried to add a exists clause to stop recursivity when the number of loop has at least one combination with all my cuts. I know I must not cut 10 bars if I can do it with 8. But I get an error "the recursive table has multiple reference'.
How can I make my request and avoid every loop ?
;WITH Combi (ID_ART, COLOR, COMBI, COMBI_CUT, NB_BAR, NB_CUTS, MAX_CUTS)
AS
( SELECT C.ID_ART,
C.COLOR,
'-' + ID_COMBI_BAR_STR + '-',
'-' + C.CUT_COMBI + '-',
1,
C.NB_CUTS,
ISNULL(MAXI.CUT_NUM,0)
FROM #COMBI_BARRE C with(nolock)
outer apply (select top 1 D.CUT_NUM
from #DEBITS D
where D.ID_ART = C.ID_ART
and D.COLOR= C.COLOR
order by D.NUM_OCC_DEB desc) MAXI
WHERE C.FIRST_CUT = 1
UNION ALL
SELECT C.ID_ART,
C.COLOR,
Combi.COMBI + ID_COMBI_BAR_STR + '-',
Combi.COMBI_CUT+ C.CUT_COMBI + '-',
Combi.NB_BAR+ 1,
Combi.NB_CUTS+ C.NB_CUTS,
Combi.MAX_CUTS
FROM #COMBI_BARRE C with(nolock)
INNER JOIN Combi on C.ID_ART = Combi.ID_ART
and C.COLOR= Combi.COLOR
where C.FIRST_CUT > Combi.NB_BAR
and Combi.NB_CUTS+ C.NB_CUTS<= Combi.MAX_CUTS
and NOT EXISTS(select * from #DET_BAR D with(nolock)
where D.ID_COMBI_BAR = C.ID_COMBI_BAR
and PATINDEX(D.ID_CUT_STR, Combi.COMBI_CUT) > 0)
and NOT EXISTS(select top 1 * from Combi Combi2 where Combi2.ID_ART = C.ID_ART and Combi2.COLOR = C.COLOR and Combi2.NB_CUTS = Combi2.MAX_CUTS)
)
select * from Combi
This is a variation of the bin packing problem. That search term might help you in the right direction.
Also, you can to go my Bin Packing page, which gives several approaches to the more simplified version of your problem.
A small warning: the linked article(s) don't use any (recursive) CTE, so they won't answer your specific CTE question.
Related
Background
I have a front-end with a list of items with infinite scrolling, and I fetch pages of items by specifying the page limit and offset.
Problem
Apart from simply ordering the result by some of the columns, I would like to add a "random" option. The thing is, I don't want repetitions, so I need to have the entire dataset permutated before doing the limit and offset, and I need to be able to get the same permutation as long as I supply the same seed.
What I tried
A naive approach was to write a table-valued function that takes an int seed and uses it in the ORDER BY clause like so:
SELECT *
FROM dbo.Entities e
ORDER BY HASHBYTES('MD2', e.Title) ^ #seed
OFFSET 0 ROWS
FETCH NEXT (SELECT COUNT(*) FROM dbo.Entities) ROWS ONLY
This seemed to work well at a first glance, but it turned out it's not very "volatile" for the lack of better word - it becomes more visible with sparse result sets, where most seeds (chosen randomly from between 0 and 2147483647) yield the same order.
I thought I would get better results by hashing the seed as well, but SQL Server doesn't allow me to XOR two varbinary variables. Am I even looking in the right direction? Are there any performance considerations that I should be making and I might not be aware of?
The best way is to create a tally table with two columns: first a sequential integer, (between 1 and 1,000,000), second a random integer number. Then generate a random number to get the first value and then make a join with a computed ROW_NUMBER().
CREATE TABLE T_NUM (SEQUENTIAL INT, RANDOM INT);
GO
WITH
N AS(SELECT 0 AS I
UNION ALL
SELECT I + 1
FROM N
WHERE I < 9)
INSERT INTO T_NUM (SEQUENTIAL)
SELECT N1.I + N2.I * 10 + N3.I * 100 + N4.I * 1000 + N5.I * 10000 + N6.I * 100000
FROM N AS N1
CROSS JOIN N AS N2
CROSS JOIN N AS N3
CROSS JOIN N AS N4
CROSS JOIN N AS N5
CROSS JOIN N AS N6;
GO
WITH T AS
(
SELECT SEQUENTIAL, ROW_NUMBER() OVER (ORDER BY CHECKSUM(NEWID())) AS ALEA
FROM T_NUM
)
UPDATE N
SET RANDOM = ALEA
FROM T_NUM AS N
JOIN T ON T.SEQUENTIAL = N.SEQUENTIAL;
GO
DECLARE #SEED INT = FLOOR(1 + RAND() * 1000000);
Now you have your seed to enter in the alea sequence then join your table on sequential order
ORDER BY HASHBYTES('MD2', e.Title + convert(nvarchar(max), #seed))
should work, but performance-wise it would be a disaster. You would calculate MD2 for all records every time. I would not do this on server side at all. You can generate random sequence on client and then just pick from server rows with row number 158, 7, 1027 and 9. But it has still two problems
if item is deleted, row number of all consecutive records shifts. It would just break the whole sequence and you would get duplicities and missing records
row number over millions of records is not that fast either
I see two options. You can query all ids from the table and use them for generating of random order. But that would be a lot of numbers. Or you have to ensure the id space is dense enough. Then you can query 20 random ids and hope at least 10 of them exist. If you are unlucky, you would have to query again.
I know there are some posts on pivoting, which I have used to get where I am today (thanks to the BQ community!). But this post seeks some advice on optimising this where there is a large number of pivot columns needed, distributed table joins are needed....as well and deudping. Not asking much right!
Objective:
We have 2 large BQ tables, with a full 10 years history that needs joining:
sales_order_header (13 GB - 1.35 million rows)
sales_order_line (50GM - 5 million rows)
This is a typical 'header/line' one to many relationship. The data for the tables arrives as 2 seperate streams unfortunately rather then 1 document style where the line is nested inside the header which would be ideal - but its not so distributed joins become necesary for some of the views our BI tool (Tableau) wants to periodically (every 60 mins) call to ingest 'cleansed' data that is:
deduped (both tables that is)
joined header to line (on salesOrderId)
each has its own array of 'sourceData' namve / value paris that needs unpacking / 'pivot' so its not an array
Point 3 presents an issue in its own right. We have a column called 'sourceData' which is basically where the core data is - its an array of string name value pairs (a row in BQ is a replication of a single row from a DB so the key is a column name and value the value for a single row).
Now I think here lay the issue, as there are 250 array entries (we know the exact number up front) , this equates to 250 'unnest' statements each and using the best approach I can think of using sub selects:
(SELECT val FROM UNNEST(sourceData) WHERE name = 'a') AS a,
250 times
And this is done as a pattern for each of the header and the line tables repsective views.
So the SQL for the view for just retrieving a deduped, flattened/pivoted array for the sales_order_header table is as follows. The sales_order_line has the same pattern for its view:
#standardSQL
WITH latest_snapshot_dups AS (
SELECT
salesOrderId,
PARSE_TIMESTAMP("%Y-%m-%dT%H:%M:%E*S%Ez", lastUpdated) AS lastUpdatedTimestampUTC,
sourceData,
_PARTITIONTIME AS bqPartitionTime
FROM
`project.ds.sales_order_header_refdata`
),
latest_snapshot_nodups AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY salesOrderId ORDER BY lastUpdatedTimestampUTC DESC) AS rowNum
FROM latest_snapshot_dups
)
SELECT
salesOrderId,
lastUpdatedTimestampUTC,
(SELECT val FROM UNNEST(sourceData) WHERE name = 'a') AS a,
(SELECT val FROM UNNEST(sourceData) WHERE name = 'b') AS b,
....250 of these
FROM
latest_snapshot_nodups
WHERE
rowNum = 1
Although just showing one here, we have these two similar views (with total of 250 + 300 = 550 unique subqueries that unnest/pivot), and now I want to join the header with the line views and I run into an issue straight away exceeding a limit of subqueries.
Is there a better way to do this, assuming this is the data there is to work with? A better way to 'pivot' perhaps? Or a more efficient way building a single view that optimises the order of things, rather then using 2 discrete views?
Thanks for your help BQ Community!
I run into an issue straight away exceeding a limit of subqueries
You currently using below pattern (removed mot significant part of code for simplicity)
#standardSQL
SELECT
salesOrderId,
(SELECT val FROM UNNEST(sourceData) WHERE name = 'a') AS a,
(SELECT val FROM UNNEST(sourceData) WHERE name = 'b') AS b,
....250 OF these
FROM latest_snapshot_nodups
Try below pattern
#standardSQL
SELECT
salesOrderId,
MAX(IF(name = 'a', val, NULL)) AS a,
MAX(IF(name = 'b', val, NULL)) AS b,
....250 OF these
FROM latest_snapshot_nodups, UNNEST(sourceData) kv
GROUP BY salesOrderId
I have a database table with document names stored as a VARCHAR and I need a way to figure out what the lowest available sequence number is. There are many gaps.
name partial seq
A-B-C-0001 A-B-C- 0001
A-B-C-0017 A-B-C- 0017
In the above example, it would be 0002.
The distinct name values total 227,705. The number of "partial" combinations is quite large A=150, B=218, C=52 so 1,700,400 potential combinations.
I found a way to iterate through from min to max per distinct value and list all the "missing" (aka available) values, but this seems inefficient given we are not using anywhere close to the max potential partial combinations (10,536 out of 1,700,400).
I'd rather have a table based on existing data with a partial value, it's next available sequence value, and a non-existent partial means 0001.
Thanks
Hmmmm, you can try this:
select coalesce(min(to_number(seq)), 0) + 1
from t
where partial = 'A-B-C-' and
not exists (select 1
from t t2
where t2.partial = t.partial and
to_number(T2.seq) = to_number(t.seq) + 1
);
EDIT:
For all partials you need a group by:
You can use to_char() to convert it back to a character, if necessary.
select partial, coalesce(min(to_number(seq)), 0) + 1
from t
where not exists (select 1
from t t2
where t2.partial = t.partial and
to_number(T2.seq) = to_number(t.seq) + 1
)
group by partial;
I have a DB with two tables
tblVideos is about 8 million rows, contains Id(auto increment 1,1), videoId, Name, Tags, (FK)VideoProviderId
tblVideoProviders is about 6 providers at the moment, and has 3 columns:
Id(auto increment 1,1 tiny int), Name, Url(to build the link using the provider + video Id)
Unlike YouTube smaller providers don't have an API to return an array then pick up something random.
retrieving a totally random row takes under a second in both ways I got now:
select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
WHERE ((ABS(CAST(
(BINARY_CHECKSUM
(tblVideos.id, NEWID())) as int))
% 6800000) < 10 )
OR
slightly longer
select top 1 tblVideoProvider.Url + tblVideos.videoId as url,
tblVideos.Name, tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
ORDER BY NEWID()
but once I start looking for something more specific:
select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
where (tblVideos.tags like '%' + #tag + '%')
or (tblVideos.Name like '%' + #tag + '%')
ORDER BY NEWID()
The query hits 8 seconds, removing the last or tblVideos like takes it down to 4~5 seconds, but that's way too high.
retrieving the whole query without the "order by newid()" will make the query take a lot less time but the application will consume about 0.2~2 MB of data per user, and assuming over 200~400 simultanios requests ends up in lots of data
In general the "like" operator is very expensive, and when the pattern starts with a "%" even an index on the respective column (assuming you have one) cannot be used. I think there is no easy way to increase the performance of your query.
I have a table named A. it has only one record with one field. It is an integer named number.
I want to create a view that have A.number records, each are one of the numbers less than A.number.
For example:
select A.number -----> 5
the view should show 5 records 0 1 2 3 4
P.S: This is a real problem that I simplified it a lot. The real problem is like dividing a budget in a fixed period to each day.
This sounds a bit like it might be homework, so I'm wary of providing the code outright.
I can give a pointer for how to solve the question, though. You use a recursive CTE where each iteration adds one to the previous iteration. Just be sure to set the MAXRECURSION option if you'll be checking numbers > 101. You can use a scalar sub query to key the view to the original table:
WITH numbers ( n ) AS (
SELECT 0 UNION ALL
SELECT 1 + n FROM numbers WHERE n < (select number from a) -1)
SELECT n FROM numbers
OPTION ( MAXRECURSION 500) --example
If the number of your table will be < 2048 and you are on SQL Server, this will work for you:
CREATE VIEW MyView AS
SELECT number
FROM master..spt_values
WHERE type = 'p'
AND number < (SELECT value FROM yourTable)
Alternatively you could consider creating your own Numbers table with an appropriate size to suit your application if you require a higher limit, or are not on SQL Server that has this provided to you. Here is a link to a blog post on the idea of having a "Numbers table" handy.