How to make an SRT file into a dataset? - vba

Is it possible to turn an SRT file, which is used for subtitles in videos into a dataset?
When imported into Excel, the SRT file format looks like this:
1
00:00:03,000 --> 00:00:04,000
OVERLAPS PURE COINCIDENCE THAT
...
This pattern continues as time in the "video"/transcript goes on. I'd like to format the SRT file this way:
number ; start ; end ; text
1 ; 00:00:03,000 ; 00:00:04,000 ; OVERLAPS PURE COINCIDENCE THAT

The VBA procedure below loads a standard .srt (SubRip Movie Subtitle File) from a local file and splits it into rows/columns on the active Excel worksheet.
Import SRT subtitles from Local File:
Sub importSRTfromFile(fName As String)
'Loads SRT from local file and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, x As Long
'load file
Open fName For Input As #1
While Not EOF(1)
Line Input #1, sIn
sOut = sOut & sIn & vbLf
Wend
Close #1
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For x = 1 To UBound(sArr)
Range("A" & x) = sArr(x)
Next x
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & fName
End Sub
Example Usage:
Sub test_FileImport()
importSRTfromFile "c:\yourPath\yourFilename.srt"
End Sub
Import SRT subtitles from Website URL:
Alternatively, you can import an .srt (or other similar text files) from a Website URL such as https://subtitle-index.org/ with this:
Sub importSRTfromWeb(url As String)
'Loads SRT from URL and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, rw As Long
Dim httpData() As Byte, XMLHTTP As Object
'load file from URL
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
httpData = XMLHTTP.responseBody
Set XMLHTTP = Nothing
sOut = StrConv(httpData, vbUnicode)
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & url
End Sub
Example Usage:
Sub testImport()
importSRTfromWeb _
"https://subtitle-index.org/download/4670541854528212663953859964/SRT/Pulp+Fiction"
End Sub
Many sites host free .srt's; you may have to right-click the download button to copy the link (which may have an .srt extension or might be a pointer, like the example above). The procedure won't work on .zip'd files.
More Information:
Wikipedia : SubRip & SRT
MSDN : Split Function (VBA)
Wikipedia : Newline characters
MSDN : UBound Function
MSDN : Range.TextToColumns Method (Excel)
SubRip Official Website

in the above code :
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
should be replaced with:
'breakout into rows
For rw = 0 To UBound(sArr)
Range("A" & rw+1) = sArr(rw)
Next rw
else the output will start from line 2

I used Vim and wrote a quick regex to convert a .srt into a .csv file for a translator friend who needed a similar conversion. The csv file can then be opened in Excel / LibreOffice and saved as .xls, .ods or whatever.
My friend didn't need the subtitle numbers to appear in the first column so the regex code looks like this :
set fileencoding=utf-8
%s/"/""/g
g/^\d\+$/d
%s#^\(.*\) --> \(.*\)\n#"\1","\2","#g
%s/\n^$/"/g
Variant to keep the sub numbering :
set fileencoding=utf-8
%s/"/""/g
%s#\(^\d\+\)$\n^\(.*\) --> \(.*\)\n#"\1","\2","\3","#g
%s/\n^$/"/g
Save this code into a text file with the .vim extension, then source this file when editing your .srt in Vim / Gvim. Save the result as a .csv. Enjoy the magic of Regexes !
NB : my code uses commas as field separators. Change the commas into semi-colons in the above code to use semi-colons. I've also added double-quotes as string delimitors in case double-quotes and commas occur in the subtitle text. Much more error proof !

Related

VBA - Comparing Layouts of Two Files

I am trying to figure out how to check that the layout (not the full content) of a CSV file is the same of that in the preceding month (or, if that file doesn't exist, the last available CSV file).
Often companies change the format/layout of their CSV extracts, so I want my code to automatically detect any changes (new columns added, changing order of columns, etc).
Please let me know if you have an idea of how this could be achieved!
Thanks in advance!
Please, try the next code. It assumes that the csv to be compared is comma separated and ending lines are vbCrLf:
Private Sub CheckCSVfile()
Dim ws As Worksheet, strFile As String, ans As VbMsgBoxResult, sep As String
Dim arrRef, arrCSV, cols, i As Long, strProbl As String
ans = MsgBox("Is the active sheet the one you wan to use as reference to compare the CSV file structure?" & vbCrLf & _
"If this is the situation, please press ""Yes""!", vbYesNo, "Confirm the active sheet as reference")
If ans <> vbYes Then Exit Sub
Set ws = ActiveSheet
'Put the first sheet row values in an array (2D array):
arrRef = ws.Range(ws.cells(1, 1), ws.cells(1, ws.cells(1, ws.Columns.count).End(xlToLeft).Column)).value
''Browse for the .csv file to be checked:
strFile = Application.GetOpenFilename("Text Files (*.csv),*.csv", , "Please select the csv file to be checked.")
If strFile = "False" Then Exit Sub
'Put the content of the csv file in an array (split by the line ending separator). If not vbCrLf, use the appropriate one:
arrCSV = Split(CreateObject("Scripting.FileSystemObject").OpenTextFile(strFile, 1).ReadAll, vbCrLf)
sep = "," 'the csv file separator. Use here the correct one if not comma
cols = Split(arrCSV(0), sep) 'number of columns of the first csv file row (zero based array)
If UBound(cols) + 1 <> UBound(arrRef, 2) Then '+ 1 for the first array because it is of 0 based type
strProbl = strProbl & "The number of columns in the new csv file is different (" & UBound(cols) & " against " & UBound(arrRef) & ")." & vbCrLf
End If
'Comparing each header:
For i = 0 To UBound(arrRef, 2) - 1
If UCase(arrRef(1, i + 1)) <> UCase(cols(i)) Then
strProbl = strProbl & "The value in the column " & i + 1 & " is different (" & cols(i) & " against " & arrRef(1, i + 1) & ")" & vbCrLf
End If
Next i
Stop
If strProbl <> "" Then
MsgBox "The new csv file has a different structure: " & vbCrLf & vbCrLf & strProbl, vbCritical, "Structure problems..."
Else
MsgBox "The both files structure is the same!", vbInformation, "No any structure problem"
End If
End Sub
You must firstly open and activate the sheet of the previous csv file (to be used as reference) and then run the above code.
Please, send some feedback after testing it...

How to get the sheet name using GetOpenFilename in VLOOKUP

I am using this code down below to use a VLOOKUP in another file that you select using the GetOpenFilename. I want shtName to be the name of the sheet in the file that you select, but whenever I step through it, it is always the name of the sheet that I am working in and putting the VLOOKUP in.
I have shtName in my VLOOKUP and it doesn't show anything when I step through it. X shows the filename and path, but shtName right after shows nothing. But my VLOOKUP ends up working anyway and it puts the sheet in the formula.
Why is that? I want to be able to do it myself and so I know I get the sheet name from the file you are selecting.
Dim iRet As Integer
Dim strPrompt As String
Dim strTitle As String
' Promt
strPrompt = "Please select the last Kronos Full File before the dates of this HCM Report." & vbCrLf & _
"This will be used to find the Old Position, Org Unit, and Old Cost Center." & vbCrLf & _
"For example, if the date of this report is 7-28-17 thru 8-25-17, the closest Kronos Full File you would want to use is 7-27-17."
' Dialog's Title
strTitle = "Last Kronos Full File for Old Positions"
'Display MessageBox
iRet = MsgBox(strPrompt, vbOK, strTitle)
Dim LR As Long
Dim X As String
Dim lNewBracketLocation As Long
X = Application.GetOpenFilename( _
FileFilter:="Excel Files (*.xls*),*.xls*", _
Title:="Choose the Kronos Full File.", MultiSelect:=False)
MsgBox "You selected " & X
'Find the last instance in the string of the path separator "\"
lNewBracketLocation = InStrRev(X, Application.PathSeparator)
'Edit the string to suit the VLOOKUP formula - insert "["
X = Left$(X, lNewBracketLocation) & "[" & Right$(X, Len(X) - lNewBracketLocation)
shtName = ActiveWorkbook.Worksheets(1).name
LR = Range("E" & Rows.Count).End(xlUp).Row
Range("T2").Formula = "=VLOOKUP($E2,'" & X & "]shtName'!$B$1:$AP$99999,15,0)"
Stop
Range("T2").AutoFill Destination:=Range("T2:T" & Range("E" & Rows.Count).End(xlUp).Row)
Stop
Range("T2:T" & Range("E" & Rows.Count).End(xlUp).Row).Select
Stop
Range("U2").Formula = "=VLOOKUP($E2,'" & X & "]shtName'!$B$1:$AP$99999,41,0)"
Range("U2").AutoFill Destination:=Range("U2:U" & Range("E" & Rows.Count).End(xlUp).Row)
Range("U2:U" & Range("E" & Rows.Count).End(xlUp).Row).Select
Range("V2").Formula = "=VLOOKUP($E2,'" & X & "]shtName'!$B$1:$AP$99999,18,0)"
Range("V2").AutoFill Destination:=Range("V2:V" & Range("E" & Rows.Count).End(xlUp).Row)
Range("V2:V" & Range("E" & Rows.Count).End(xlUp).Row).Select
Cells.Select
Cells.EntireColumn.AutoFit
Something like the following should give you the worksheets name out of a file
Dim wbk As Workbook
Set wbk = Workbooks.Open(Filename:="YOUR_FILE_PATH", ReadOnly:=True)
Dim shtName As String
shtName = wbk.Worksheets(1).Name
wbk.Close
Note: We can open the workbook in read only mode if we don't plan to change anything.
Additionally I recommend (for a good code following good practices):
Always specify a worksheet.
Eg for every Range("") like Worksheets("YourSheetName").Range("")
Or use With statements:
With Worksheets("YourSheetName")
.Range("A1").Value = 5 'recognize the starting full stop referring to the with statement
End With
Same for every Rows, Columns, Cells, etc.
Avoid using .Select, .Activate and Selection. at all.
(there are many tutorials out there in the Internet how to avoid them).
Use Option Explicit and declare all your variables before use.
(avoids many issues, especially typos).

Removing White Space

I need help creating separate text files from each row in an excel spread sheet called "worksheet". I want the text files to be named with content of Column A + “Result”+ Column H (Which should be hidden in the .txt file, with columns B-G being the content, I have done the coding. Please find the followings. But I have received whitespace in the .txt output file. Please find the screenshots. I am unable to TRIM this white space.
How would I proceed further to solve the problem?
Thanks in Advance.
VBA Code:
Sub WriteTotxt()
Const forReading = 1, forAppending = 3, fsoForWriting = 2
Dim fs, objTextStream, sText As String
Dim lLastRow As Long, lRowLoop As Long, lLastCol As Long, lColLoop As Long
lLastRow = Cells(Rows.Count, 1).End(xlUp).Row
For lRowLoop = 1 To lLastRow
Set fs = CreateObject("Scripting.FileSystemObject")
Set objTextStream = fs.opentextfile("D:\EXCEL_TXT_TEST\New folder\" & Cells(lRowLoop, 8) & "-" & "RESULT" & "-" & Cells(lRowLoop, 1) & ".txt", fsoForWriting, True)
sText = ""
sText1 = ""
For lColLoop = 1 To 7
If lColLoop <> 7 Then
sText = sText & "<" & Cells(lColLoop) & ">" & "," & Chr(0)
sText1 = sText1 & Cells(lRowLoop, lColLoop) & "," & Chr(0)
Else
sText = sText & "<" & Cells(lColLoop) & ">" & Chr(0)
sText1 = sText1 & Cells(lRowLoop, lColLoop) & Chr(0)
End If
Next lColLoop
objTextStream.writeline (Left(sText, Len(Trim(sText)) - 1))
objTextStream.writeline (Left(sText1, Len(Trim(sText1)) - 1))
objTextStream.Close
Set objTextStream = Nothing
Set fs = Nothing
Next lRowLoop
End Sub
You can remove all the blank lines from a string like this
mystr = replace(mystr, vblf & vbcr, "")
This will remove empty lines, not lines that contain spaces or other characters you can't see though..
I can help with your code as well, but have you tried a simple approach first?
Why don't you just file --> save as .csv, and replace your header? Your data output will be VERY similar, save for the " " space preceding your listed entries. Lazy but easy.
adapt this to your module, and you can erase like everything you have ...
ActiveWorkbook.SaveAs Filename:= _
"c:\MyFile.csv", FileFormat:=xlCSV _
, CreateBackup:=False
Then, just read your data back in, and string operations will be easy.

Correct Excel Macro to Save A Copy Excel File as TXT or CSV

So I have this home-made Excel Macro Template.
The task of the macro code that I inserted in my xlsm file is to Save a copy in the same folder with a different format. That format is .txt (see image below)
The expected result of the macro (after saving) should be the same with the excel file (visually) but this time it is in a .txt format.
Unfortunately, that didn't happened. It generates a different txt file and it contains unreadable alpha numeric characters, here's an example of the generated txt file.
¬TËNÃ0 ¼#ñ ‘¯(vဠjÚ # °µ· ©c[^SÚ¿g“–
P ö '±wfvìq 8o\1ÃD6øJœËž(Ðë`¬ŸTâõå¾¼ eð \ðX‰ ’ NOú/‹ˆTpµ§JÔ9Çk¥H×Ø É ÑóÌ8¤ 2 ¦‰Š §0AuÑë]* |FŸËÜbˆAÿ Çðîrq7çßK%#ëEq³\×RU btVCf¡jæ l¨ã±Õ(g#xJá
u j#XBG{Ð~J.Wr%WvŒTÛHgÜÓ †vf»ÜUÝ#ûœ¬Áâ R~€†›Rs§>BšŽB˜ÊÝ «žq®ÑIª ³l#§pçaä ý ë¿ î`ê*IuÃù ( ³´Ü ýÞð JŠ Át` “m'Ýû ™ ªîy¸„ f !å…C:r·KÐ}Ì5$4Ï9q Ž.à;ö. ¼] H ¼„ÿwá+mu S¶¸ŽÃ¦Ã¶fäÔ l;¶×‚A³ [u×Ðà ÿÿ PK ! µU0#ô L _rels/.rels ¢ (
Here's my macro code:
Sub SaveMe()
Dim FName As Range
Dim firstDate As String
Dim firstTime As String
Dim answer As Integer
firstDate = Format(Date, "mmddyyyy")
firstTime = Format(Now, "hhmmssAM/PM")
Set FName = Range("H5")
ActiveWorkbook.SaveCopyAs FileName:=ActiveWorkbook.Path & "\" & "QB JE " & FName & " " & firstDate & " " & firstTime & ".txt", FileFormat:=xlText, CreateBackup:=False
End Sub
I was wondering if anyone could take a look at my code and help to point out whats wrong.
It looks like you want the SaveAs Not the SaveCopyAs.
Fileformat xlText or xlTextMSDOS
You can two step the process. Save a copy, then open it, and save it as a text file.
ActiveWorkbook.SaveCopyAs FileName:=ActiveWorkbook.Path & "\" & "QB JE " & FName & " " & firstDate & " " & firstTime & ".xlsx"
Workbooks.Open (ActiveWorkbook.Path & "\" & "QB JE " & FName & " " & firstDate & " " & firstTime & ".xlsx")
ActiveWorkbook.SaveAs FileName:=ActiveWorkbook.Path & "\" & "QB JE " & FName & " " & firstDate & " " & firstTime & ".txt", FileFormat:=xlText, CreateBackup:=False
https://msdn.microsoft.com/en-us/library/office/ff841185.aspx
https://msdn.microsoft.com/en-us/library/office/ff198017.aspx
See from my post here. Excel VBA Export To Text File with Fixed Column Width + Specified Row and Columns Only + Transpose
Loop all rows and all cells. Send each value to a padspace function. Build the string from for each cells value with spaces padded after the cell value.
You will have to add a reference to you workbook. In the VBA IDE go to the tools pull down menu and select references. Then scroll down and select "Microsoft Scripting Runtime". Then hit OK.
Adjust the pad space function call argument to a number that fits the data that you have in your spreadsheet. So you will change the 20 in the line with the padspace call. PadSpace(20, len(cellValue))
This will do all rows and columns.
Public Sub MyMacro()
Dim lRow As Long
Dim lCol As Long
Dim strRow As String
Dim ws As Excel.Worksheet
Dim ts As TextStream
Dim fs As FileSystemObject
'Create the text file to write to
Set fs = New FileSystemObject
Set ts = fs.CreateTextFile("C:\Temp\test.txt", True, False)
Set ws = Application.ActiveSheet
'Loop through all the rows.
lRow = 1
Do While lRow <= ws.UsedRange.Rows.count
'Clear the string we are building
strRow = ""
'Loop through all the columns for the current row.
lCol = 1
Do While lCol <= ws.UsedRange.Columns.count
'Build a string to write out.
strRow = strRow & ws.Cells(lRow, lCol) & PadSpace(20, Len(ws.Cells(lRow, lCol)))
lCol = lCol + 1
Loop
'Write the line to the text file
ts.WriteLine strRow
lRow = lRow + 1
ws.Range("A" & lRow).Activate
Loop
ts.Close: Set ts = Nothing
Set fs = Nothing
End Sub
'This function will take the max number of spaces you want and the length of the string in the cell and return you the string of spaces to pad.
Public Function PadSpace(nMaxSpace As Integer, nNumSpace As Integer) As String
If nMaxSpace < nNumSpace Then
PadSpace = ""
Else
PadSpace = Space(nMaxSpace - nNumSpace)
End If
End Function

How to create a separate CSV file from VBA?

I need to output some results as a .csv file, that gets parsed later on by another process. In order to produce these results, I have a huge workbook containing all the macros and functions that I need.
Is it possible to "create" a separate .csv file from VBA?
Is it possible to use VBA features to write into it instead of just writing in a "raw textual" approach?
Is something like this what you want?
Option Explicit
Sub WriteFile()
Dim ColNum As Integer
Dim Line As String
Dim LineValues() As Variant
Dim OutputFileNum As Integer
Dim PathName As String
Dim RowNum As Integer
Dim SheetValues() As Variant
PathName = Application.ActiveWorkbook.Path
OutputFileNum = FreeFile
Open PathName & "\Test.csv" For Output Lock Write As #OutputFileNum
Print #OutputFileNum, "Field1" & "," & "Field2"
SheetValues = Sheets("Sheet1").Range("A1:H9").Value
ReDim LineValues(1 To 8)
For RowNum = 1 To 9
For ColNum = 1 To 8
LineValues(ColNum) = SheetValues(RowNum, ColNum)
Next
Line = Join(LineValues, ",")
Print #OutputFileNum, Line
Next
Close OutputFileNum
End Sub
Don't forget you will need to put quotes around any field containing a comma.
Tony's answer generally works but doesn't handle the case where your text contains commas or quotes. You may prefer to use Workbook.SaveAs method.
Here is an example if you want to save the content of the Sheet1 as a separated csv file.
Sub create_csv()
Dim FileName As String
Dim PathName As String
Dim ws As Worksheet
Set ws = ActiveWorkbook.Sheets("Sheet1")
FileName = "filename.csv"
PathName = Application.ActiveWorkbook.Path
ws.Copy
ActiveWorkbook.SaveAs FileName:=PathName & "\" & FileName, _
FileFormat:=xlCSV, CreateBackup:=False
End Sub
Imagine that your Sheet1 contains :
lorem ipsum
lore,m ips"um"
The output csv file will be :
lorem,ipsum
"lore,m","ips""um"""
You may write a macro like to save the current workbook (opened excel file) in CSV from VBA:
ActiveWorkbook.SaveAs Filename:="C:\Book1.csv", _
FileFormat:=xlCSVMSDOS, CreateBackup:=False
For those writing the CSV manually, you need to handle commas, double quotes and new lines.
e.g.
Sub WriteToCsv(Items() as String)
OutFile = FreeFile
Open "Outfile.csv" For Output As #OutFile
Print #OutFile, "Header"
For Each Item In Items
If InStr(1, Item, Chr(34)) > 0 Then Item = Chr(34) & Replace(Item, Chr(34), Chr(34) & Chr(34)) & Chr(34)
If InStr(1, Item, ",") > 0 And Left(Item, 1) <> Chr(34) Then Item = Chr(34) & Item & Chr(34)
If InStr(1, Item, vbLf) > 0 And Left(Item, 1) <> Chr(34) Then Item = Chr(34) & Item & Chr(34)
Print #OutFile, Item
Next
Close OutFile
End Sub
Took your code as a basis (THANKS!!!) but had to modify it to make it work.
It didn't handle multiple rows, all cells were put after each other.
2 loops: one to go through the rows and one to go through the cells of each row.
Each time the row loop starts the temporary string is emptied. Before starting a new row, the temp string is added to the Outfile.
Sub ToCsv()
Dim rng As Range
Dim row As Range
Dim cell As Range
Dim ItemNew As String
Set rng = Range("A1:E2") 'Adjust the range accordingly
OutFile = FreeFile
Open "Outfile.csv" For Output As #OutFile
'Print #OutFile, "Header"
For Each row In rng.Rows
ItemNew = ""
For Each Item In row.Cells
If InStr(1, Item, Chr(34)) > 0 Then Item = Chr(34) & Replace(Item, Chr(34), Chr(34) & Chr(34)) & Chr(34)
If InStr(1, Item, ",") > 0 And Left(Item, 1) <> Chr(34) Then Item = Chr(34) & Item & Chr(34)
If InStr(1, Item, vbLf) > 0 And Left(Item, 1) <> Chr(34) Then Item = Chr(34) & Item & Chr(34)
If ItemNew = "" Then
ItemNew = Item
Else
ItemNew = ItemNew & "," & Item
End If
Next
Print #OutFile, ItemNew
Next
Close OutFile
End Sub