I have encountered one boring problem when working with blueimp file upload. The problem is:
I have carousel in my web site. Each Image has its own Title, Description and other details stored in database. But I have not stored the image's url in database table. The images are stored in a seperate directory as :
carousels <root dir>
|
+---- {carousel_id} <dir>
| |___ image 1 <image file 1>
|
+---- {carousel_id} <dir>
|___ image 1 < image file 1>
|___ image 2 <image file 2> # Which I do not want
I want only one file (image for now) per each directory.
What I am doing now is , First I fetch all the carousels from the database table and find out the related image using the carousel_id (which is auto-increment primary key in database table). But if there are multiple files in some directory (say in dir "1") I am currently selecting one image file randomly. And what I know is each carousel can have only one image.
In very short:
How can I delete other files from the directory when I upload a single file to the same directory successfully using blueimp jquery file-upload.
I am using PHP in server side.
Please Help me.
Thankyou.
Related
I have a folder structure like this as a source
Source/2021/01/01/*.xlsx files
Source/2021/03/02/*.xlsx files
Source/2021/04/03/.*xlsx files
Source/2021/05/04/.*xlsx files
I want to drop all these excel files into a different folder called Output.
Method 1:
When I am trying this, I used copy activity and I am able to get Files with folder structure( not a requirement) in Output folder. I used Binary file format.
Method 2:
Also, I am able to get files as some random id .xlsx in my output folder. I used Flatten Hierrachy.
My requirement is to get files with the same name as source.
This is what i suggest and I have implemented something in the past and i am pretty confident this should work .
Steps
Use getmetada activity and try to loop through all the folder inside Source/2021/
Use a FE loop and pass the ItemType as folder ( so that you get folder only and NO files , I know at this time you dont; have file )
Inside the IF , add a Execute pipeline activity , they should point to a new pipeline which will take a parameter like
Source/2021/01/01/
Source/2021/03/02/
The new pipeline should have a getmetadata activity and FE loop and this time we will look for files only .
Inside the FE loop add a copy activity and now will have to use the full file name as source name .
I am using PdfLaTeX and using pax package. I need to combine the uploaded pdf files and generate a composite pdf with links clickable.
It is working when I upload pdf files with no spaces in its name (eg., test1.pdf, test2.pdf). But links are not working when I upload files with space in its name (eg., test 1.pdf, test 2.pdf).
Why?
I found a solution for this issue.
If you are using pax package and uploading files with space in its name (ie test 1.pdf , test 2.jpg etc).
Before you use the file in includepdf function, you need to add double quotes before and after the filename (ie "test 1".pdf , "test 2".jpg etc) Thus you can create test 1.pax successfully.
I have a flow where I am extracting data from the database, converting the Avro to the CSV format and pushing the CSV in an s3 bucket which has subfolder in it. My S3 structure is like the following:
As you can see in the above screenshot my files are going into a blank folder(highlighted by red) instead of going inside a subfolder called 'Thermal'. Please see my PutS3Object settings:
The final s3 path I want my files to go into is: export-csv-vehicle-telemetry/vin11/Thermal
What settings should I change in my processor so the file goes directly inside the 'Thermal' folder?
Use Bucket name as: export-csv-vehicle-telemetry/vin15/Thermal instead of export-csv-vehicle-telemetry/vin15/Thermal/
The extra slash at the end is not required while specifying bucket names.
BTW, Your image shows vin11 directory instead of vin15. Check if that is correct.
i have an app write data to S3 daily or hourly or just randomly, another app to read data from S3 to local HBase. is there any way to tell what's the last file uploaded from last update, then read files after that, in other word, incrementally copy the files?
for example:
day 1: App1 write files 1,2,3 to folder 1;App2 read those 3 files to HBase;
day 4: App1 write file 4 & 5 to folder 1, 6,7,8 to folder 2; App2 need to read 4 &5 from folder 1 and then 6,7,8 from folder 2.
thanks
The LastModified header field can be used to process data based on the creation date. This requires a built in logic on the client side which stores the items which are already processed and the new items. You can simply store the date which you processed so everything comes after that is considered as new.
Example:
s3cmd ls s3://test
2012-07-24 18:29 36303234 s3://test/dl.pdf
See the date in the front of the file.
I have over 30,000 pdf files. Some files are already OCR and some are not. Is there a way to find out which files are already OCR'd and which pdfs are image only?
It will take for ever if I ran every single file through an OCR processor.
I would write a small script to extract the text from the PDF files and see if it is "empty". If there is text the PDF already was OCRed. You could either use ghostscript or XPDF to extract the text.
EDIT:
This should get you started:
foreach ($pdffile in get-childitem -filter *.pdf){
$pdftext=invoke-expression ("\path\to\xpdf\pdftotext.exe '"+$pdffile.fullname+"' -");
write-host $pdffile.fullname
write-host $pdftext.length;
write-host $pdftext;
write-host "-------------------------------";
}
Unfortunately even when you have only images in your PDF pdftotext will extract some text, so you will have to do some more work to check whether you need to OCR the pdf.
XPDF worked for me in a different way. But not sure it is the right way.
My PDFs with image also gave text content. So I used pdffonts.exe to verify if the fonts are embedded in the document or not.In my case all image files showed 'no' for embedded value.
> Config Error: No display font for 'Symbol'
> Config Error: No display font for 'ZapfDingbats'
> name type emb sub uni object ID
> ------------------------------------ ----------------- --- --- --- ---------
> Helvetica Type 1 no no no 7 0
Where as all searchable PDFs gave 'yes'
> Config Error: No display font for 'Symbol'
> Config Error: No display font for 'ZapfDingbats'
> name type emb sub uni object ID
> ------------------------------------ ----------------- --- --- --- ---------
> ABCDEE+Calibri TrueType yes yes no 7 0
> ABCDEE+Calibri,Bold TrueType yes yes no 9 0
I found that TotalCmd has a plugin that handles this:
https://totalcmd.net/plugring/pdfOCR.html
pdfOCR is wdx plugin that discovers how many pages of PDF file in
current directory needs character recognition (OCR), i.e. how many
pages in PDF file have no searchable text in their layout. This is
mostly needed when one is preparing PDF files for one’s documentation
or archiving system. Generally in one’s work with PDF files they need
to be transformed from scanned version to text searchable form before
they are included in any documentation to allow for manual or
automatic text search. The pdfOCR plugin for Total Commander fulfils a
librarian’s need by presenting the number of pages that are images
only with no text contained. The number of scanned pages are presented
in the column “needOCR”. By comparing the needOCR number of pages with
the number of total pages one can decide if a PDF file needs
additional OCR processing.
You can scan a folder or entire drive using desktop search tool "dtSearch". At the end of the scan, it will show the list of all "image only" PDFs. In addition, it will also show a list of "encrypted" PDFs if any.