Speed up MATLAB filter command - optimization

I am relatively new to using MATLAB filters. I am trying to filter a fairly large data set (about 2 million data points) using the following commands
rrc = rcosdesign(0.25, 10, floor(Fs/symRate), 'sqrt');
filtered = filter(rrc, 1, samples);
filtered = filtered / sqrt(floor(Fs/symRate));
When I run the MATLAB Profiler, it says the line
filtered = filter(rrc, 1, samples);
takes over 500 seconds to run. Any ideas on how to speed this up? I have tried using a FilterM function I found online ( http://www.mathworks.com/matlabcentral/fileexchange/32261-filterm ) but it takes the same amount of time. Anyone else have any ideas?
Thanks in advance

Few Ideas:
If you have FIR filter (As it seems from the code) you may gain performance using conv2 which uses Intel IPP which might speed things up. Use the 'valid' flag to get filter results.
If the filter is long and the data is long, try using xcorr as it uses FFT to speed up correlations. Since you're after filtering, remember to flip your filter coefficients.
Compile filterX using Visual Studio 2013 or even better Intel C Compiler 2013 with optimization flags (/03). When using it, use the filterX command directly (Skip FilterM wrapper).
Use FFT manually to perform convolution.
Create a MEX version of Intel MKL / Intel IPP filter function.
Any of these should help considerably.

Related

CPLEX won't display decision variables after successful solution

I set up a code for a course scheduling optimization problem using IBM CPLEX.
The decision variable is dvar boolean x[course][roomtype][timeslot];, where x is 1 if the course takes place in a room type r during timeslot t.
The model has worked perfectly fine and is feasible for all instances and scenarios I tried it on. Now, for a new scenario, I increased the number of timeslots from 46 to 240, which increased the overall number of decision variables to over 2 million instead of around 300,000.
Now, I can still run the model and after slightly longer run time I get an optimal solution. Yet, the process I had for analysis before was displaying the decision variables, sorting for the ones with the value of 1 and copying and pasting them into Excel for further analysis.
This is now not possible anymore as CPLEX won't respond for a very long time and then does not let me do anything anymore from that point onwards (Limited extent of decision variable display). I have to close the program and start again.
I assumed the problem was the RAM or overall memory, so I opted for Cloud Services of my university. But even having 128GB of RAM, 12 cores and 500 GB of memory at hand was not sufficient and the performance is exactly the same as with my own private laptop.
Any suggestions on what could be the problem or how to export the solution anyway?
Are there variable limits with CPLEX that would make this impossible to solve?
Thanks a lot in advance!
indeed displaying huge matrixes can freeze the IDE.
You wrote:
Now, I can still run the model and after slightly longer run time I
get an optimal solution. Yet, the process I had for analysis before
was displaying the decision variables, sorting for the ones with the
value of 1 and copying and pasting them into Excel for further
analysis.
You should do that with SheetWrite.
First you build a set of the ones with value 1 and then you export with SheetWrite.
In Excel, Rocket science and optimization
.mod
range A=1..2;
range B=1..3;
range C=1..4;
dvar int X[A][B][C];
subject to
{
forall(a in A,b in B,c in C) X[a][b][c]==a*b*c;
}
tuple someTuple{
int a;
int b;
int c;
int value;
};
{someTuple} someSet = {<i,j,k,X[i][j][k]> | i in A, j in B, k in C:X[i][j][k]==1};
.dat
SheetConnection sheet("write3Darray.xlsx");
someSet to SheetWrite(sheet,"A1:D24");

Yosys ASIC synth flow QoR/PPA metrics

I'm relatively new to Yosys. I've been tinkering with it with some proprietary standard cell libraries and am trying to extract some QoR/PPA metrics, similar to those you can get from DC.
Minimum slack (including worst-case negative slack/WNS)
Max logic depth [0]
Cell area [1]
For [0], I know there's the ltp command, but it only reports topological paths per module. I tried flattening the design using flatten, but there still seems to be a hierarchy in the netlist. Where should I insert the flatten command to actually flatten the netlist?
For [1], I know you can get the number of cells in the netlist using the stat command, but this doesn't tell me the equivalent of DC's CellArea metric (since each cell has a different area). I could just build a library of cell areas for each cell type based on the cell library datasheet, but that's rather laborious.
Also, is it possible to specify a target clock rate for synthesis? I think for abc there was a -D flag for delay, but this sounds to me more like input delay rather than clock period.
Thanks!
-D passed to abc is indeed clock period, not input delay. When specified this should also cause abc to print slack information.
Have you tried stat -liberty file.lib to use a liberty file for cell areas? If this isn't calculating areas as expected (I didn't quite understand your issue) then please create a feature request on GitHub with the difference.
flatten should be run after hierarchy -top top_module_name to do hierarchical elaboration and set the top module.

bartMachine (Bayesian Additive Regression Tree) in >1 million cases in R

I want to use BART via the bartMachine package for a dataframe of just over 1 million cases. With a lot of optimisation in the java memory setting, I can get R on my MacBook to complete the BART model for about 5000 cases. Anything above that is aborted as the system runs out of memory space.
Is there any chance I can use bartMachine() with an input matrix of 1 mio numbers of rows (ca. 15 predictors)?
Otherwise are there any alternative packages that would allow my to at least run predictor selection using BART?
Thanks for your help!
have you tried to increase RAM option ?
options(java.parameters="-Xmx12g") # must be set initially
I'm the maintainer of this package. Have you tried "mem_cache_for_speed = FALSE" as an option?

Advice for bit level manipulation

I'm currently working on a project that involves a lot of bit level manipulation of data such as comparison, masking and shifting. Essentially I need to search through chunks of bitstreams between 8kbytes - 32kbytes long for bit patterns between 20 - 40bytes long.
Does anyone know of general resources for optimizing for such operations in CUDA?
There has been a least a couple of questions on SO on how to do text searches with CUDA. That is, finding instances of short byte-strings in long byte-strings. That is similar to what you want to do. That is, a byte-string search is much like a bit-string search where the number of bits in the byte-string can only be a multiple of 8, and the algorithm only checks for matches every 8 bits. Search on SO for CUDA string searching or matching, and see if you can find them.
I don't know of any general resources for this, but I would try something like this:
Start by preparing 8 versions of each of the search bit-strings. Each bit-string shifted a different number of bits. Also prepare start and end masks:
start
01111111
00111111
...
00000001
end
10000000
11000000
...
11111110
Then, essentially, perform byte-string searches with the different bit-strings and masks.
If you're using a device with compute capability >= 2.0, store the shifted bit-strings in global memory. The start and end masks can probably just be constants in your program.
Then, for each byte position, launch 8 threads that each checks a different version of the 8 shifted bit-strings against the long bit-string (which you now treat like a byte-string). In each block, launch enough threads to check, for instance, 32 bytes, so that the total number of threads per block becomes 32 * 8 = 256. The L1 cache should be able to hold the shifted bit-strings for each block, so that you get good performance.

Circumventing R's `Error in if (nbins > .Machine$integer.max)`

This is a saga which began with the problem of how to do survey weighting. Now that I appear to be doing that correctly, I have hit a bit of a wall (see previous post for details on the import process and where the strata variable came from):
> require(foreign)
> ipums <- read.dta('/path/to/data.dta')
> require(survey)
> ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt)
Error in if (nbins > .Machine$integer.max) stop("attempt to make a table with >= 2^31 elements") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In pd * nl : NAs produced by integer overflow
> traceback()
9: tabulate(bin, pd)
8: as.vector(data)
7: array(tabulate(bin, pd), dims, dimnames = dn)
6: table(ids[, 1], strata[, 1])
5: inherits(x, "data.frame")
4: is.data.frame(x)
3: rowSums(table(ids[, 1], strata[, 1]) > 0)
2: svydesign.default(id = ~serial, weights = ~perwt, strata = ~strata,
data = ipums)
1: svydesign(id = ~serial, weights = ~perwt, strata = ~strata, data = ipums)
This error seems to come from the tabulate function, which I hoped would be straightforward enough to circumvent, first by changing .Machine$integer.max
> .Machine$integer.max <- 2^40
and when that didn't work the whole source code of tabulate:
> tabulate <- function(bin, nbins = max(1L, bin, na.rm=TRUE))
{
if(!is.numeric(bin) && !is.factor(bin))
stop("'bin' must be numeric or a factor")
#if (nbins > .Machine$integer.max)
if (nbins > 2^40) #replacement line
stop("attempt to make a table with >= 2^31 elements")
.C("R_tabulate",
as.integer(bin),
as.integer(length(bin)),
as.integer(nbins),
ans = integer(nbins),
NAOK = TRUE,
PACKAGE="base")$ans
}
Neither circumvented the problem. Apparently this is one reason why the ff package was created, but what worries me is the extent to which this is a problem I cannot avoid in R. This post seems to indicate that even if I were to use a package that would avoid this problem, I would only be able to access 2^31 elements at a time. My hope was to use sql (either sqlite or postgresql) to get around the memory problems, but I'm afraid I'll spend a while getting that to work, only to run into the same fundamental limit.
Attempting to switch back to Stata doesn't solve the problem either. Again see the previous post for how I use svyset, but the calculation I would like to run causes Stata to hang:
svy: mean age, over(strata)
Whether throwing more memory at it will solve the problem I don't know. I run R on my desktop which has 16 gigs, and I use Stata through a Windows server, currently setting memory allocation to 2000MB, but I could theoretically experiment with increasing that.
So in sum:
Is this a hard limit in R?
Would sql solve my R problems?
If I split it up into many separate files would that fix it (a lot of work...)?
Would throwing a lot of memory at Stata do it?
Am I seriously barking up the wrong tree somehow?
Yes, R uses 32-bit indexes for vectors so they can contain no more than 2^31-1 entries and you are trying to create something with 2^40. There is talk of introducing 64-bit indexes but that will be some way off before appearing in R. Vectors have the stated hard limit and that is it as far as base R is concerned.
I am unfamiliar with the details of what you are doing to offer any further advice on the other parts of your Q.
Why do you want to work with the full data set? Wouldn't a smaller sample that can fit in to the restrictions R places on you be just as useful? You could use SQL to store all the data and query it from R to return a random subset of more appropriate size.
Since this question was asked some time ago, I'd like to point that my answer here uses the version 3.3 of the survey package.
If you check the code of svydesign, you can see that the function that causes all the problem is within a check step that looks whether you should set the nest parameter to TRUE or not. This step can be disabled setting the option check.strata=FALSE.
Of course, you shouldn't disable a check step unless you know what you are doing. In this case, you should be able to decide yourself whether you need to set the nest option to TRUE or FALSE. nest should be set to TRUE when the same PSU (cluster) id is recycled in different strata.
Concretely for the IPUMS dataset, since you are using the serial variable for cluster identification and serial is unique for each household in a given sample, you may want to set nest to FALSE.
So, your survey design line would be:
ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt, check.strata=FALSE, nest=FALSE)
Extra advice: even after circumventing this problem you will find that the code is pretty slow unless you remap strata to a range from 1 to length(unique(ipums$strata)):
ipums$strata <- match(ipums$strata,unique(ipums$strata))
Both #Gavin and #Martin deserve credit for this answer, or at least leading me in the right direction. I'm mostly answering it separately to make it easier to read.
In the order I asked:
Yes 2^31 is a hard limit in R, though it seems to matter what type it is (which is a bit strange given it is the length of the vector, rather than the amount of memory (which I have plenty of) which is the stated problem. Do not convert strata or id variables to factors, that will just fix their length and nullify the effects of subsetting (which is the way to get around this problem).
sql could probably help, provided I learn how to use it correctly. I did the following test:
library(multicore) # make svy fast!
ri.ny <- subset(ipums, statefips_num %in% c(36, 44))
ri.ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri.ny)
svyby(~incwage, ~strata, ri.ny.design, svymean, data=ri.ny, na.rm=TRUE, multicore=TRUE)
ri <- subset(ri.ny, statefips_num==44)
ri.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri)
ri.mean <- svymean(~incwage, ri.design, data=ri, na.rm=TRUE)
ny <- subset(ri.ny, statefips_num==36)
ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ny)
ny.mean <- svymean(~incwage, ny.design, data=ny, na.rm=TRUE, multicore=TRUE)
And found the means to be the same, which seems like a reasonable test.
So: in theory, provided I can split up the calculation by either using plyr or sql, the results should still be fine.
See 2.
Throwing a lot of memory at Stata definitely helps, but now I'm running into annoying formatting issues. I seem to be able to perform most of the calculation I want (much quicker and with more stability as well) but I can't figure out how to get it into the form I want. Will probably ask a separate question on this. I think the short version here is that for big survey data, Stata is much better out of the box.
In many ways yes. Trying to do analysis with data this big is not something I should have taken on lightly, and I'm far from figuring it out even now. I was using the svydesign function correctly, but I didn't really know what's going on. I have a (very slightly) better grasp now, and it's heartening to know I was generally correct about how to solve the problem. #Gavin's general suggestion of trying out small data with external results to compare to is invaluable, something I should have started ages ago. Many thanks to both #Gavin and #Martin.