EnglishAnalyzer with better stop world filtering? - lucene

I'm creating TFIDF vectors using Apache Mahout. I specify EnglishAnalyzer as part of document tokenizing like so:
DocumentProcessor.tokenizeDocuments(documentsSequencePath, EnglishAnalyzer.class, tokenizedDocumentsPath, configuration);
which gives me the following vector for a document I've called business.txt. I was surprised to see useless words in there like have, on, i, e.g.. One of my other documents has loads more.
What is the simplest way for me to improve the quality of the terms it's finding? I know EnglishAnalyzer can be passed a stop word list but the constructor gets invoked via reflection so it seems like I can't do that.
Should I write my own Analyzer? I'm a bit confused about how to compose tokenizers, filters etc. Can I reuse EnglishAnalyzer along with my own filters? Subclassing EnglishAnalyzer doesn't seem to be possible this way.
# document: tfidf-score term
business.txt: 109 comput
business.txt: 110 us
business.txt: 111 innov
business.txt: 111 profit
business.txt: 112 market
business.txt: 114 technolog
business.txt: 117 revolut
business.txt: 119 on
business.txt: 119 platform
business.txt: 119 strategi
business.txt: 120 logo
business.txt: 121 i
business.txt: 121 pirat
business.txt: 123 econom
business.txt: 127 creation
business.txt: 127 have
business.txt: 128 peopl
business.txt: 128 compani
business.txt: 134 idea
business.txt: 139 luxuri
business.txt: 139 synergi
business.txt: 140 disrupt
business.txt: 140 your
business.txt: 141 piraci
business.txt: 145 product
business.txt: 147 busi
business.txt: 168 funnel
business.txt: 176 you
business.txt: 186 custom
business.txt: 197 e.g
business.txt: 301 brand

You can pass a custom stop word set to the EnglishAnalyzer ctor. It is typical for this stop word list to be loaded from a file, which is plain text with one stop word per line. That would look something like this:
String stopFileLocation = "\\path\\to\\my\\stopwords.txt";
CharArraySet stopwords = StopwordAnalyzerBase.loadStopwordSet(
Paths.get(StopFileLocation));
EnglishAnalyzer analyzer = new EnglishAnalyzer(stopwords);
I don't, right off, see how you are supposed to pass ctor arguments to the Mahout method you've indicated. I don't really know Mahout. If you aren't able to, then yes, you could create a custom analyzer by copying EnglishAnalyzer, and load your own stopwords there. Here's an example that loads a custom stop word list from a file, and no stem exclusions (that is, removed the stem exclusion stuff, for brevity's sake).
public final class EnglishAnalyzerCustomStops extends StopwordAnalyzerBase {
private static String StopFileLocation = "\\path\\to\\my\\stopwords.txt";
public EnglishAnalyzerCustomStops() throws IOException {
super(StopwordAnalyzerBase.loadStopwordSet(Paths.get(StopFileLocation)));
}
protected TokenStreamComponents createComponents(String fieldName) {
final Tokenizer source = new StandardTokenizer();
TokenStream result = new StandardFilter(source);
result = new EnglishPossessiveFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopwords);
result = new PorterStemFilter(result);
return new TokenStreamComponents(source, result);
}
protected TokenStream normalize(String fieldName, TokenStream in) {
TokenStream result = new StandardFilter(in);
result = new LowerCaseFilter(result);
return result;
}
}

Related

How to convert PDF with images which I don't care about to text?

I'm trying to convert pdf to text files. The problem is that those pdf contain images, which I don't care about (this is the type of file I want to extract (https://www.sia.aviation-civile.gouv.fr/pub/media/store/documents/file/l/f/lf_sup_2020_213_fr.pdf). Note that if I do copy/paste with my mouse, it work quite well (except the line break), so I'd guess that it's possible. Most of the answer I found online work pretty well on dummy pdf with text only, but give especially bad result on the map.
For instance, something like this
from tika import parser # pip install tika
raw = parser.from_file('test2.pdf')
print(raw['content'])
works well for retrieving the text, but I have a lot of trash like this :
ERY
CTR
3
CH
A
which appear because of the map.
Something like this, which work by converting the pdf to images and then reading the images, face the same problem (I found it on a very similar thread on stackoverflow, but there is no answer) :
import pytesseract as pt
from PIL import Image
import sys
def convert(name):
pages = convert_from_path(name, dpi=200)
for idx,page in enumerate(pages):
page.save('page'+str(idx)+'.jpg', 'JPEG')
quote = Image.open('page'+str(idx)+'.jpg')
text = pt.image_to_string(quote, lang="fra")
file_ex = open('page'+str(idx)+'.text',"w")
file_ex.write(text)
file_ex.close()
if __name__ == '__main__':
convert(sys.argv[1])
Finally, I tried to remove the image first, and then using one of the solutions above, but it didn't work better :
from tika import parser # pip install tika
from PyPDF2 import PdfFileWriter, PdfFileReader
# Remove the images
inputStream = open("lf_sup_2020_213_fr.pdf", "rb")
outputStream = open("test3.pdf", "wb")
src = PdfFileReader(inputStream)
output = PdfFileWriter()
[output.addPage(src.getPage(i)) for i in range(src.getNumPages())]
output.removeImages()
output.write(outputStream)
outputStream.close()
# Read from pdf without images
raw = parser.from_file('test2.pdf')
print(raw['content'])
Do you know how to solve this ? It can be in any language.
Thanks
One approach you could try is to use a toolkit capable of parsing the text characters in the PDF then use the object properties to try and remove the unwanted map labels while keeping the text characters required.
For example, the ParsePages method from LEADTOOLS PDF toolkit (which is what I am familiar with since I work for the vendor of this toolkit) can be used to obtain the text from the PDF:
using (PDFDocument document = new PDFDocument(pdfFileName))
{
PDFParsePagesOptions options = PDFParsePagesOptions.All;
document.ParsePages(options, 1, -1);
using (StreamWriter writer = File.CreateText(txtFileName))
{
IList<PDFObject> objects = document.Pages[0].Objects;
writer.WriteLine("Objects: {0}", objects.Count);
foreach (PDFObject obj in objects)
{
if (obj.TextProperties.IsEndOfLine)
writer.WriteLine(obj.Code);
else
writer.Write(obj.Code);
}
writer.WriteLine("---------------------");
}
}
This will obtain all the text in the PDF for the first page, with the unwanted results as you mentioned. Here is an excerpt below:
Objects: 3918
5
91L
F5
4
1 LF
N
OY
L2
1AM
TService
8
26
1de l’Information
0
B09SUP AIP 213/20
7
Aéronautique
Date de publication : 05 NOV
e-mail : sia.qualite#aviation-civile.gouv.fr
Internet : www.sia.aviation-civile.gouv.fr
141
17˚
82
N20
9Objet : Création de 4 zones réglementées temporaires (ZRT) pour l’exercice VOLOPS en région de Chambéry
En vigueur : Du mercredi 25 Novembre 2020 au vendredi 04 décembre 2020
More code can be used to examine the properties for each parsed character:
writer.WriteLine(" ObjectType: {0}", obj.ObjectType.ToString());
writer.WriteLine(" Bounds: {0}, {1}, {2}, {3}", obj.Bounds.Left, obj.Bounds.Top, obj.Bounds.Right, obj.Bounds.Bottom);
writer.WriteLine(" TextProperties.FontHeight: {0}", obj.TextProperties.FontHeight.ToString());
writer.WriteLine(" TextProperties.FontIndex: {0}", obj.TextProperties.FontIndex.ToString());
writer.WriteLine(" Code: {0}", obj.Code);
writer.WriteLine("------");
This will give the properties for each character:
Objects: 3918
ObjectType: Text
Bounds: -60.952693939209, 1017.25231933594, -51.8431816101074, 1023.71826171875
TextProperties.FontHeight: 7.10454273223877
TextProperties.FontIndex: 48
Code: 5
------
Using these properties, the unwanted text might be filtered using their properties. For example, I noticed that the FontHeight for a good portion of the unwanted text is around 7 PDF units, so the first code might be altered to avoid extracting any text smaller than 7.25 PDF units:
foreach (PDFObject obj in objects)
{
if (obj.TextProperties.FontHeight > 7.25)
{
if (obj.TextProperties.IsEndOfLine)
writer.WriteLine(obj.Code);
else
writer.Write(obj.Code);
}
}
The extracted output would give a better result, an excerpt follows:
Objects: 3918
Service
de l’Information
SUP AIP 213/20
Aéronautique
Date de publication : 05 NOV
e-mail : sia.qualite#aviation-civile.gouv.fr
Internet : www.sia.aviation-civile.gouv.fr
Objet : Création de 4 zones réglementées temporaires (ZRT) pour l’exercice VOLOPS en région de Chambéry
En vigueur : Du mercredi 25 Novembre 2020 au vendredi 04 décembre 2020
Lieu : FIR : Marseille LFMM - AD : Chambéry Aix-Les-Bains LFLB, Chambéry Challes les Eaux LFLE
ZRT LE SIRE, MOTTE CASTRALE, ALLEVARD
*
C
D
E
In the end, you will have to try and come up with a good criteria to filter out the unwanted text without removing the text you need to keep, using this approach.

RavenDB C# client throws error as Invalid node tag character: n

I am trying to remove a document in RavenDB which has 3 nodes running in a single machine (development setup).
Below is the code for removing the document.
public bool Remove<T>(string id) where T : new()
{
bool bResult = false;
using (var session = _session.OpenSession())
{
session.Delete(id);
session.SaveChanges();
bResult = true;
}
return bResult;
}
but it throws error on the line session.SaveChanges();
Invalid node tag character: n ...
Stack trace:
System.ArgumentException: Invalid node tag character: n
at Raven.Server.Documents.Replication.ChangeVectorParser.ThrowInvalidNodeTag(Char ch) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\Replication\ChangeVectorParser.cs:line 71
at Raven.Server.Documents.Replication.ChangeVectorParser.ParseNodeTag(String changeVector, Int32 start, Int32 end) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\Replication\ChangeVectorParser.cs:line 52
at Raven.Server.Documents.Replication.ChangeVectorParser.MergeChangeVector(String changeVector, List`1 entries) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\Replication\ChangeVectorParser.cs:line 186
at Raven.Server.Utils.ChangeVectorUtils.MergeVectors(String vectorAstring, String vectorBstring) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Utils\ChangeVectorUtils.cs:line 213
at Raven.Server.Documents.DocumentsStorage.CreateTombstone(DocumentsOperationContext context, Slice lowerId, Int64 documentEtag, CollectionName collectionName, String docChangeVector, Int64 lastModifiedTicks, String changeVector, DocumentFlags flags, NonPersistentDocumentFlags nonPersistentFlags) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\DocumentsStorage.cs:line 1378
at Raven.Server.Documents.DocumentsStorage.Delete(DocumentsOperationContext context, Slice lowerId, String id, LazyStringValue expectedChangeVector, Nullable`1 lastModifiedTicks, String changeVector, CollectionName collectionName, NonPersistentDocumentFlags nonPersistentFlags, DocumentFlags documentFlags) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\DocumentsStorage.cs:line 1195
at Raven.Server.Documents.DocumentsStorage.Delete(DocumentsOperationContext context, String id, String expectedChangeVector) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\DocumentsStorage.cs:line 1091
at Raven.Server.Documents.Handlers.BatchHandler.MergedBatchCommand.ExecuteCmd(DocumentsOperationContext context) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\Handlers\BatchHandler.cs:line 706
at Raven.Server.Documents.TransactionOperationsMerger.ExecutePendingOperationsInTransaction(List`1 pendingOps, DocumentsOperationContext context, Task previousOperation, DurationMeasurement& meter) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\TransactionOperationsMerger.cs:line 825
at Raven.Server.Documents.TransactionOperationsMerger.MergeTransactionsOnce() in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\TransactionOperationsMerger.cs:line 500
--- End of stack trace from previous location where exception was thrown ---
at Raven.Server.Documents.TransactionOperationsMerger.Enqueue(MergedTransactionCommand cmd) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\TransactionOperationsMerger.cs:line 124
at Raven.Server.Documents.Handlers.BatchHandler.BulkDocs() in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Documents\Handlers\BatchHandler.cs:line 96
at Raven.Server.Routing.RequestRouter.HandlePath(RequestHandlerContext reqCtx) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\Routing\RequestRouter.cs:line 124
at Raven.Server.RavenServerStartup.RequestHandler(HttpContext context) in C:\Builds\RavenDB-Stable-4.1\src\Raven.Server\RavenServerStartup.cs:line 172
I've run into the same error yesterday. I restored database on the different machine where I installed brand new RavenDB and (being lazy) named new instance node "A". It appears that RavenDB cannot currently remove documents when change vector and instance tag name don't match.
node-tag-mismatch
It looks like an honest mistake in code instead of intentional behavior.. but its only my guess, because I didn't find anything about this behavior in 4.1 documentation.
Solution (if you confirm there is a mismatch):
You could try to add a new node to your cluster with the name matching change vector of locked documents.
In my case I am not able to configure standalone RavenDB to have node tag of more than 4 characters (that was no problem on Docker)... It might be harder to get database to consistent state then.
Alternatively try exporting and importing data. It fixes the problem because change vectors are updated and match new node tag.

Sensenet: Export Contents

I'm trying to export content from sensenet using (http://wiki.sensenet.com/Export#Configuration)
"export" command call:
Export.exe -SOURCE /Root/Sites/Test -TARGET C:\ExportSensenet -ASM ..\bin
I also tried without the "ASM" parameter.
Export did not complete successfully.
Export ends with error:
System.TypeInitializationException: The type initializer for 'SenseNet.ContentRe
pository.Storage.SR' threw an exception. ---> System.Reflection.ReflectionTypeLo
adException: Unable to load one or more of the requested types. Retrieve the Loa
derExceptions property for more information.
at System.Reflection.RuntimeModule.GetTypes(RuntimeModule module)
at System.Reflection.Assembly.GetTypes()
at SenseNet.ContentRepository.Storage.TypeHandler.GetTypesByInterface(Type in
terfaceType) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\St
orage\TypeHandler.cs:line 209
at SenseNet.ContentRepository.Storage.SR..cctor() in c:\Builds\8\SenseNet\PAC
KAGECommunity\Sources\Source\SenseNet\Storage\SR.cs:line 22
--- End of inner exception stack trace ---
at SenseNet.ContentRepository.Storage.SR.get_ResourceManager()
at SenseNet.ContentRepository.Storage.Caching.Dependency.CacheDependencyFacto
ry.CreateNodeDataDependency(NodeData nodeData) in c:\Builds\8\SenseNet\PACKAGECo
mmunity\Sources\Source\SenseNet\Storage\Caching\CacheDependencyFactory.cs:line 7
5
at SenseNet.ContentRepository.Storage.DataBackingStore.CacheNodeData(NodeData
nodeData, String cacheKey) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Sou
rce\SenseNet\Storage\DataBackingStore.cs:line 325
at SenseNet.ContentRepository.Storage.DataBackingStore.GetNodeData(NodeHead h
ead, Int32 versionId) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\Se
nseNet\Storage\DataBackingStore.cs:line 212
at SenseNet.ContentRepository.Storage.Node.LoadNode(NodeHead head, VersionNum
ber version) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\St
orage\Node.cs:line 1644
at SenseNet.ContentRepository.User.get_Administrator() in c:\Builds\8\SenseNe
t\PACKAGECommunity\Sources\Source\SenseNet\ContentRepository\User.cs:line 38
at SenseNet.ContentRepository.Security.DesktopAccessProvider.get_CurrentUser(
) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\ContentReposi
tory\Security\DesktopAccessProvider.cs:line 36
at SenseNet.ContentRepository.Storage.Security.AccessProvider.ChangeToSystemA
ccount() in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\Storag
e\Security\AccessProvider.cs:line 72
at SenseNet.ContentRepository.Security.DesktopAccessProvider.GetCurrentUser()
in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\ContentReposit
ory\Security\DesktopAccessProvider.cs:line 52
at SenseNet.ContentRepository.Storage.Security.AccessProvider.ChangeToSystemA
ccount() in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\Storag
e\Security\AccessProvider.cs:line 72
at SenseNet.ContentRepository.RepositoryInstance.DoStart() in c:\Builds\8\Sen
seNet\PACKAGECommunity\Sources\Source\SenseNet\ContentRepository\RepositoryInsta
nce.cs:line 144
at SenseNet.ContentRepository.RepositoryInstance.Start(RepositoryStartSetting
s settings) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\Con
tentRepository\RepositoryInstance.cs:line 108
at SenseNet.ContentRepository.Repository.Start(RepositoryStartSettings settin
gs) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\ContentRepo
sitory\Repository.cs:line 58
at SenseNet.Tools.ContentExporter.Exporter.Main(String[] args) in c:\Builds\8
\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\Tools\Export\Exporter.cs:line
139
at SenseNet.ContentRepository.Storage.SR.get_ResourceManager()
at SenseNet.ContentRepository.Storage.Caching.Dependency.CacheDependencyFacto
ry.CreateNodeDataDependency(NodeData nodeData) in c:\Builds\8\SenseNet\PACKAGECo
mmunity\Sources\Source\SenseNet\Storage\Caching\CacheDependencyFactory.cs:line 7
5
at SenseNet.ContentRepository.Storage.DataBackingStore.CacheNodeData(NodeData
nodeData, String cacheKey) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Sou
rce\SenseNet\Storage\DataBackingStore.cs:line 325
at SenseNet.ContentRepository.Storage.DataBackingStore.GetNodeData(NodeHead h
ead, Int32 versionId) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\Se
nseNet\Storage\DataBackingStore.cs:line 212
at SenseNet.ContentRepository.Storage.Node.LoadNode(NodeHead head, VersionNum
ber version) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\St
orage\Node.cs:line 1644
at SenseNet.ContentRepository.User.get_Administrator() in c:\Builds\8\SenseNe
t\PACKAGECommunity\Sources\Source\SenseNet\ContentRepository\User.cs:line 38
at SenseNet.ContentRepository.Security.DesktopAccessProvider.get_CurrentUser(
) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\ContentReposi
tory\Security\DesktopAccessProvider.cs:line 36
at SenseNet.ContentRepository.Storage.Security.AccessProvider.ChangeToSystemA
ccount() in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\Storag
e\Security\AccessProvider.cs:line 72
at SenseNet.ContentRepository.Security.DesktopAccessProvider.GetCurrentUser()
in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\ContentReposit
ory\Security\DesktopAccessProvider.cs:line 52
at SenseNet.ContentRepository.Storage.Security.AccessProvider.ChangeToSystemA
ccount() in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\Storag
e\Security\AccessProvider.cs:line 72
at SenseNet.ContentRepository.RepositoryInstance.DoStart() in c:\Builds\8\Sen
seNet\PACKAGECommunity\Sources\Source\SenseNet\ContentRepository\RepositoryInsta
nce.cs:line 144
at SenseNet.ContentRepository.RepositoryInstance.Start(RepositoryStartSetting
s settings) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\Con
tentRepository\RepositoryInstance.cs:line 108
at SenseNet.ContentRepository.Repository.Start(RepositoryStartSettings settin
gs) in c:\Builds\8\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\ContentRepo
sitory\Repository.cs:line 58
at SenseNet.Tools.ContentExporter.Exporter.Main(String[] args) in c:\Builds\8
\SenseNet\PACKAGECommunity\Sources\Source\SenseNet\Tools\Export\Exporter.cs:line
139
I did not change the export.exe.config file and it has the right database configuration.
Usually this is a sign of a missing library. First I'd try to simply copy all the libraries from the web\bin folder to the web\Tools folder (where you execute the tool). If that does not help, pls make sure that the runtime bindings in the export config are the same as in the web.config.

Why does the Zebra QL 220 printer shut off in the middle of my talking to it?

I've got C# CE CF code that runs on a handheld device (Motorola MC3100) which should cause the Zebra QL220 belt printer to which it is attached to print something (code appended to this post).
I turn on the QL 220 (via the big green button at its base or top, depending on your perspective) as I start my app, but the printer shuts itself off in the middle of my code executing, and so nothing is printed (I’m assuming that’s the reason nothing is printed, anyway).
If I'm right about the cause for the silence of the printer, what must I do to make its “On” button “sticky”?
I tried mashing the blue button on the QL 220, also (icon of a roller and sheet of paper being ejected from it), but all that did was spit out some of the tape/printer paper in "real time."
. . .
using (SerialPort serialPort = new SerialPort())
{
serialPort.BaudRate = 19200;
serialPort.Handshake = Handshake.XOnXOff; // Handshake AKA Flowcontrol?
serialPort.DataBits = 8;
serialPort.Parity = Parity.None;
serialPort.StopBits = StopBits.One;
serialPort.PortName = "COM1:";
serialPort.ReadTimeout = 500;
serialPort.WriteTimeout = 500;
serialPort.StopBits = StopBits.One;
serialPort.Open();
Thread.Sleep(2500); // I don't know why this is needed, or if it really is...
// Try this first:
serialPort.WriteLine("! 0 200 200 210 1");
serialPort.WriteLine("TEXT 4 0 30 40 Bonjour la Monde"); //Hola el Mundo --- Hallo die Welt
serialPort.WriteLine("FORM");
serialPort.WriteLine("PRINT");
// or (if WriteLine does not include a carriage return and line feed):
// serialPort.Write("! 0 200 200 210 1\r\n");
// serialPort.Write("TEXT 4 0 30 40 Bonjour la Monde\r\n"); //Hola el Mundo --- Hallo die Welt
// serialPort.Write("FORM\r\n");
// serialPort.Write("PRINT\r\n");
serialPort.Close();
}
Besides appending the colon to "COM1" as ctacke revealed was necessary on another SO post, I also needed to swap the WriteLine lines for Write lines with the "\r\n" appended to each line, so that they are now:
serialPort.Write("! 0 200 200 210 1\r\n");
serialPort.Write("TEXT 4 0 30 40 Bonjour la Monde\r\n"); //Hola el Mundo --- Hallo die Welt
serialPort.Write("FORM\r\n");
serialPort.Write("PRINT\r\n");
That successfully printed out "Bonjour la Monde" although with too much wasted paper (about a mile above and below the line was printed).

Yii framework error - "failed to open stream: permission denied"

I have just started to use Yii framework on a windows 7 machine. It's giving me this annoying error and it goes away when I restart the computer.
Can anyone shed some light on what's happening and how to fix it?.. Thanks a bunch
Here is the error I get:
PHP warning
copy(C:\www\corp\assets\96296f5a\js\ckeditor\plugins\imagepaste2.3.zip): failed to open stream: Permission denied
C:\www\yii-1.1.13\framework\utils\CFileHelper.php(131)
119
120 $folder=opendir($src);
121 while(($file=readdir($folder))!==false)
122 {
123 if($file==='.' || $file==='..')
124 continue;
125 $path=$src.DIRECTORY_SEPARATOR.$file;
126 $isFile=is_file($path);
127 if(self::validatePath($base,$file,$isFile,$fileTypes,$exclude))
128 {
129 if($isFile)
130 {
131 copy($path,$dst.DIRECTORY_SEPARATOR.$file);
132 if(isset($options['newFileMode']))
133 chmod($dst.DIRECTORY_SEPARATOR.$file,$options['newFileMode']);
134 }
135 elseif($level)
136 self::copyDirectoryRecursive($path,$dst.DIRECTORY_SEPARATOR.$file,$base.'/'.$file,$fileTypes,$exclude,$level-1,$options);
137 }
138 }
139 closedir($folder);
140 }
141
142 /**
143 * Returns the files found under the specified directory and subdirectories.
Stack Trace
#0
+
C:\www\yii-1.1.13\framework\utils\CFileHelper.php(131): copy("C:\www\corp\protected\extensions\bootstrap\assets\js\ckeditor\pl...", "C:\www\corp\assets\96296f5a\js\ckeditor\plugins\imagepaste2.3.zi...")
#1
+
C:\www\yii-1.1.13\framework\utils\CFileHelper.php(136): CFileHelper::copyDirectoryRecursive("C:\www\corp\protected\extensions\bootstrap\assets\js\ckeditor\pl...", "C:\www\corp\assets\96296f5a\js\ckeditor\plugins", "/js/ckeditor/plugins", array(), ...)
#2
+
C:\www\yii-1.1.13\framework\utils\CFileHelper.php(136): CFileHelper::copyDirectoryRecursive("C:\www\corp\protected\extensions\bootstrap\assets\js\ckeditor", "C:\www\corp\assets\96296f5a\js\ckeditor", "/js/ckeditor", array(), ...)
#3
+
C:\www\yii-1.1.13\framework\utils\CFileHelper.php(136): CFileHelper::copyDirectoryRecursive("C:\www\corp\protected\extensions\bootstrap\assets\js", "C:\www\corp\assets\96296f5a\js", "/js", array(), ...)
#4
+
C:\www\yii-1.1.13\framework\utils\CFileHelper.php(63): CFileHelper::copyDirectoryRecursive("C:\www\corp\protected\extensions\bootstrap\assets", "C:\www\corp\assets\96296f5a", "", array(), ...)
#5
+
C:\www\yii-1.1.13\framework\web\CAssetManager.php(251): CFileHelper::copyDirectory("C:\www\corp\protected\extensions\bootstrap\assets", "C:\www\corp\assets\96296f5a", array("exclude" => array(".svn", ".gitignore"), "level" => -1, "newDirMode" => 511, "newFileMode" => 438))
#6
–
C:\www\corp\protected\extensions\bootstrap\components\Bootstrap.php(458): CAssetManager->publish("C:\www\corp\protected\extensions\bootstrap\assets", false, -1, true)
453 if (isset($this->_assetsUrl))
454 return $this->_assetsUrl;
455 else
456 {
457 $assetsPath = Yii::getPathOfAlias('bootstrap.assets');
458 $assetsUrl = Yii::app()->assetManager->publish($assetsPath, false, -1, YII_DEBUG);
459 return $this->_assetsUrl = $assetsUrl;
460 }
461 }
462
463 /**
#7
–
C:\www\corp\protected\extensions\bootstrap\components\Bootstrap.php(163): Bootstrap->getAssetsUrl()
158 * #param string $cssFile the css file name to register
159 * #param string $media the media that the CSS file should be applied to. If empty, it means all media types.
160 */
161 public function registerAssetCss($cssFile, $media = '')
162 {
163 Yii::app()->getClientScript()->registerCssFile($this->getAssetsUrl() . "/css/{$cssFile}", $media);
164 }
165
166 /**
167 * Registers the core JavaScript.
168 * #since 0.9.8
#8
–
C:\www\corp\protected\extensions\bootstrap\components\Bootstrap.php(124): Bootstrap->registerAssetCss("bootstrap.css")
119 /**
120 * Registers the Bootstrap CSS.
121 */
122 public function registerCoreCss()
123 {
124 $this->registerAssetCss('bootstrap' . (!YII_DEBUG ? '.min' : '') . '.css');
125 }
126
127 /**
128 * Registers the Bootstrap responsive CSS.
129 * #since 0.9.8
#9
+
C:\www\corp\protected\extensions\bootstrap\components\Bootstrap.php(102): Bootstrap->registerCoreCss()
#10
+
C:\www\yii-1.1.13\framework\base\CModule.php(387): Bootstrap->init()
#11
+
C:\www\yii-1.1.13\framework\base\CModule.php(523): CModule->getComponent("bootstrap")
#12
+
C:\www\yii-1.1.13\framework\base\CApplication.php(152): CModule->preloadComponents()
#13
+
C:\www\yii-1.1.13\framework\YiiBase.php(125): CApplication->__construct("C:\www\corp/protected/config/main.php")
#14
+
C:\www\yii-1.1.13\framework\YiiBase.php(98): YiiBase::createApplication("CWebApplication", "C:\www\corp/protected/config/main.php")
#15
+
C:\www\corp\index.php(13): YiiBase::createWebApplication("C:\www\corp/protected/config/main.php")
2013-02-25 11:29:18 Apache/2.2.22 (Win32) PHP/5.3.13 Yii Framework/1.1.13
The error basically says that YII is not able to copy the required assets from the extensions on to the assets directory at runtime.
The directory C:\www\corp where your YII project exists should be writable by the web server process.
I would see if there is there a firewall/anti-virus that might be blocking the web server from creating files. Try reading the web server log.