Downloading selected PDFs from CAG - pdf

I am trying to download some PDFs from CAG Website https://cag.gov.in/en/state-accounts-report?defuat_state_id=64. I need PDFs for only Monthly Key Indicators, so I am using the code as-
tabID="#tab-360"
for link in soup.select(f"{tabID} a[href$='.pdf']"):
filename=os.path.join(folder_location,link['href'].split('/')[-1])
with open(filename, 'wb') as f:
f.write(requests.get(urljoin(url,link['href'])).content)
This allows me to download the Monthly key indicators file but I need to download only Pdf files from March 2018 to March 2022. How to download March PDFs from 2018 to 2022.

The following code helped me in getting all march files
urllist=[]
url='https://cag.gov.in/en/state-accounts-report?defuat_state_id=79'
response=requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
for link in soup.select(f"{tabID} a[href$='.pdf']"):
urllist.append(link)
final_listMah=[]
list_year=['March, 2022','March(Pre), 2022','March(Pre), 2021','March, 2021','March(Pre), 2020','March(Pre), 2019','April, 2019']
for j in list_year:
for i in range(len(urllist)):
if (urllist[i].text==j):
print(urllist[i])
final_listMah.append(urllist[i])

Related

Date parsing in Pandas from a pdf

I am totally new to Pandas and not managing. I have a pdf (in German) with my working schedule and I would like to read it into pandas, format the date, save it as a csv so I can import it into some calendar (google calendar or whatever). I am using pd.to_datetime and my problem is that I cannot parse the Start Date column to a standard date format.
This is the format that I have:
Start Date Start Time End Time Location Subject
Do., 10. Mai 2018 10:00 11:40 Spain Klettern
Any suggestions would be very much appreciated.
Check out the dateparser module as it should do a good job with these:
In [1]:
import dateparser
s = "Do., 10. Mai 2018"
dateparser.parse(s).date()
Out[1]: datetime.date(2018, 5, 10)

Split and merge pdf files using PDFBOX produces large file

I have this large print file in pdf that's contains 5544 pages and is about 36mb in size. The file is created by MS Word 2010 and contains only text and a logo on each letter/document.
I split it into 5544 files and merge back into 2770 letters, based on keywords. Each letter is approx. 140-145kb.
When I merge all the letters into a new pdf print file, still containing 5544 pages, the size of the file is grown to 396mb.
All text extracting, splitting and merging is performed with calls to Apache PDFBox command-line tools from PHP, but result is the same when run from a console.
Any idea how to reduce the file size of the letters and the final print file?
It seems like PDFBox has just appended each letters in the final print file, instead creating a new pdf-document.
It's only in the testing phase that all the documents are merged into the final print file, some of the documents will be send by email.
I have also tried SAMBox (a fork of PDFBox) but with nearly the same result:
pdfinfo Original.pdf
Title: Printfile
Author: Claus Hjort Bube
Creator: Microsoft® Word 2010
Producer: Microsoft® Word 2010
CreationDate: Fri May 19 12:16:34 2017 CEST
ModDate: Fri May 19 12:16:34 2017 CEST
Tagged: yes
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 5544
Encrypted: no
Page size: 595.32 x 841.92 pts (A4)
Page rot: 0
File size: 36092281 bytes
Optimized: no
PDF version: 1.5
pdfinfo PDFBox.pdf
Title: Printfile
Author: Claus Hjort Bube
Creator: Microsoft® Word 2010
Producer: Microsoft® Word 2010
CreationDate: Fri May 19 12:16:34 2017 CEST
ModDate: Fri May 19 12:16:34 2017 CEST
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 5544
Encrypted: no
Page size: 595.32 x 841.92 pts (A4)
Page rot: 0
File size: 396622354 bytes
Optimized: no
PDF version: 1.4
pdfinfo SAMBox.pdf
Creator: Sejda Console 3.2.17
Producer: SAMBox 1.1.8 (www.sejda.org)
ModDate: Tue Jul 11 23:34:33 2017 CEST
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 5544
Encrypted: no
Page size: 595.32 x 841.92 pts (A4)
Page rot: 0
File size: 378779436 bytes
Optimized: no
PDF version: 1.7
That may sound sad but it is correct. When splitting, each file gets the resources (e.g. fonts and company logo graphic) it needs. When merged back, PDFBox does not know that these may be the same over the whole document, so these are now duplicated a lot.
The only solution I see for you would be to use the PDFBox java API to create the mailing files and the final print file in one step, i.e. without creating single files that are merged back.

Combine multiple csv files time based visual basic

I have multiple csv files, with the same headers and in the same folder. My goal is to combine all the csv files using visual basic (vb.net) based on certain time.
I am using data logger that can log every 15 minutes. So, 1 file csv for 15 minutes logging, 2 files csv for 30 minutes logging, and so on.
I want to combine all of those csv files in one day (from 14.00 on day 1 to 13.59 on day 2) automatically with the file names based on the time.
For example :
Flowdata20170427220000.csv (27-04-2017 22:00:00)
Name,Attribute1,Attribute2,Attribute3
name1,111,abc,zzz
name2,222,def,yyy
Flowdata20170427221500.csv (27-04-2017 22:15:00)
Name,Attribute1,Attribute2,Attribute3
name3,333,ghi,xxx
name4,444,jkl,www
and so on until
Flowdata20170428214500.csv (28-04-2017 21:35:00)
Name,Attribute1,Attribute2,Attribute3
name5,555,mno,vvv
name6,666,pqr,uuu
and the final file is :
Flowdata20170427-20170428.csv
Name,Attribute1,Attribute2,Attribute3
name1,111,abc,zzz
name2,222,def,yyy
name3,333,ghi,xxx
name4,444,jkl,www
...,...,...,...
...,...,...,...
name5,555,mno,vvv
name6,666,pqr,uuu
Can you help me out please? Searching in google but nothing helps me.

How to Reset Date/Time M500 Sport DV Camera?

I recently bought a M500 Sport DV Cam. I am unable to reset/change Date and Time. According to Manual, Cam will create SportDV.txt file in SDCard and we can change Date Time from SportDV.txt file.
But My Cam is not creating any SportDV.txt file. It only creates Two folders Data (which contains an empty base.dat file) and DCIM (Which contains videos and Images).
I tried to create file Manually, but It doesn't change Date/Time. I also tried different methods like creating files with name times.txt, time.txt, timeset.txt, tag.txt, settime.txt but nothing works.
I am unable to change Date and Time. It always shows Year 2158 instead of 2015.
Sample Date: 2158/8/14 22:10:22
I tried everything and failed. But I found the solution.
Open Notepad and Copy & Paste
SPORTS DV
UPDATE:N
FORMAT
EV:6
CTST:100
SAT:100
AWB:0
SHARPNESS:100
AudioVol:1
QUALITY:0
LIGHTFREQ:0
AE:0
RTCDisplay:1
year:2014
month:7
date:7
hour:16
minute:11
second:0
-------------------------------
Exposure(EV)
0 ~ 12, def:6
Contrast(CTST)
1 ~ 200, def:100
Saturation(SAT)
1 ~ 200, def:100
White Balance(AWB)
0 ~ 3, def:0, 0(auto), 1(Daylight), 2(Cloudy), 3(Fluorescent)
Sharpness
1 ~ 200, def:100
AudioVol
0 ~ 2, def:1, 0:Max 1:Mid 2:Min
QUALITY
0 ~ 2, def:0, 0:High 1:Middle 2:Low
LIGHTFREQ
0 ~ 1, def:0, 0:60Hz 1:50Hz
AUTO EXPOSURE(AE)
0 ~ 2, def:0, 0:Average 1:Center 2:Spot
RTCDisplay
0 ~ 1, def:1, 0:Off 1:On
year
2012 - 2038, def:2013
month
01 - 12, def:1
date
01 - 31, def:1
hour
00 - 23, def:0
minute
01 - 59, def:0
second
01 - 59, def:0
Set Update:N to Update:Y,
Change year, month, date ,
and save the file with the name SportDV and Encoding to UTF-8
For versions that have a time.bat file putting a N at the end of the timestamp in the time.txt file removes the timestamp from the video, ie time.txt:
2015.11.13 20:13:31 N
i have the more recent version of the m500 mini camera that doesnt use the sportdv.txt file
It looks same physically as earlier one, same leds, same decals but it instead after being reset has a time.bat file in the root of the card. executing this on a windows machine produced a file called time.txt except the format of this batch file doesnt work,
i edited the time.txt file and restated the camera and it worked after following andys format from his posting on the dx.com site
choose edit and then make sure you replace the (probably nonsense format) contents with 2015.11.13 20:13:31 - in this case that's YYYY.MM.DD HH:MM:SS click save. turn off/eject the camera. Power up now not connected to PC and make a short capture. Now when you check the content the date/time will hopefully be right?
afaik there is no updated firmware for this version of the camera to change from 3 min files or hide the time/date text :-(

Cannot use moviestim2 on Mac OSX 10.9.5

I program my experiments on a Macbook Pro with OSX 10.9.5, graphic card Intel HD Graphics 4000 1024 MB, with VLC Version 2.0.10 Twoflower (Intel 32bit). I used to present videos (avi and mp4 files, 60 frames per second) successfully with moviestim up to version 1.80. After upgrading to version 1.81 by installing the standalone version I tried to use moviestim2, adapting the code in Moviestim2.py. When I run the code below:
from psychopy import visual, core
import time, os, pylab
os.chdir('/Users/till/work/edv/psychopy/test/')
win = visual.Window([1440, 900])
win.setRecordFrameIntervals(True)
mov = visual.MovieStim2(win, 'jwpIntro.mov',
size=[800,800],
pos=[0, 100],
flipVert=False,
flipHoriz=False,
loop=False)
shouldflip = mov.play()
while mov.status != visual.FINISHED:
if shouldflip:
win.flip()
else:
time.sleep(0.001)
shouldflip = mov.draw()
intervalsMS = pylab.array(win.frameIntervals[1:])*1000
m=pylab.mean(intervalsMS)
nTotal=len(intervalsMS)
nDropped=sum(intervalsMS>(1.5*m))
print "nTotal", nTotal
print "nDropped", nDropped
core.quit()
the video is shown in full length, the output is
nTotal 142
nDropped 2
(Warnings deleted). When I run the code with one of my videos (file format mov, size adjusted to 800x800), generated with ffmpeg in format H.264 from 852 png files with 60 frames per second to show moving objects for a tracking task (no audio data), the window closes immediately after probably showing the first frame. The output is
nTotal 0
nDropped 0
/Applications/PsychoPy2.app/Contents/Resources/lib/python2.7/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
/Applications/PsychoPy2.app/Contents/Resources/lib/python2.7/numpy/core/_methods.py:67: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
(Other warnings deleted) Tests with file formats avi and mp4 generated nTotals of 1 to 2 and accordingly no Runtime Warnings and the same result.
Any help would be appreciated, because up to now I was not able to return to PsychoPy 1.80 using moviestim as before with avbin 10 (window freezes, but PsychoPy does not crash) as a workaround.
Best,
Till
The issue likely has to do with your videos not having any audio track. Try setting the 'noAudio' kwarg to True when you create the MovieStim2.
visual.MovieStim2(win, 'jwpIntro.mov',
size=[800,800],
pos=[0, 100],
noAudio=True,
flipVert=False,
flipHoriz=False,
loop=False)
MovieStim2 should really be able to auto detect when there is no audio stream at all; so that should be changed when there is time. ;)
If the above does not work, can you post a link to one of your sample videos so I can download and debug?
Update: I tested my suggested workaround, only to discover it uncovered some other issues. (Arrrg..) These issues are now fixed, however this means that for this suggestion to work, you will need to update your psychopy package source from the psychopy github master stream as of October 23rd, 2014, or use an official package update if one is available that was released after this date.