Convert a two-table exists query from SQL to Linq using dynamic fields in the subquery - sql

I'm trying to query old Access database tables and compare them with SQL Server tables.
They often don't have primary keys, or they have extra fields that had some purpose in the nineties, etc., or the new tables have new fields, etc.
I need to find records - based on a set of fields specified at runtime - that are in one table but not another.
So, I do this kind of query all the time in SQL, when I'm comparing data in different tables:
dim fields_i_care_about as string = "field1, field2, field3"
'This kind of thing gets set by a caller, can be any number of fields, depends on the
'table
dim s as string= ""
dim flds = fields_i_care_about.split(",")
for i as integer = 0 to ubound(flds)
if s > "" then s += " AND "
s += " dysfunctional_database_table." & flds(i) & "=current_database_table." & flds(i)
next
s = "SELECT * from dysfunctional_database_table where not exists (SELECT * from current_database_table WHERE " & s & ")"
====
I'm trying to do this using Linq because it seems like some of the datatype problems with two different database types become less of a headache,
but I'm new to Linq and totally stuck.
I got as far as this:
Put old and new tables into datatables as dt1 and dt2
Dim new_records = _
From new_recs In dt2.AsEnumerable
Where Not ( _
From old_recs In dt1.AsEnumerable Where old_recs(field1) = new_recs(field1) AndAlso old_recs(field2) = new_recs(field2)).Any
Select new_recs
But I can't figure out how to put this part in on the fly -
old_recs(field1) = new_recs(field1) AndAlso old_recs(field2) = new_recs(field2)
So far I've tried:
putting the fields I want to compare and making them a string and just putting that string in as a variable ( I thought I was probably cheating, and I guess I was)
dim str = old_recs(field1) = new_recs(field1) AndAlso old_recs(field2) = new_recs(field2)
From new_recs In dt2.AsEnumerable
Where Not ( _
From old_recs In dt1.AsEnumerable Where str).Any
Select new_recs
It tells me it can't convert a Boolean -
Is there any way to do this without Linq expressions? They seem far more complex than what I'm trying to do here, and they take a lot of code, and also I can't seem to find examples of Expressions where we're comparing two fields in a subquery.
Is there a simpler way? I know I could do the usual EXISTS query using JOIN or IN - in this case I don't need the query to be super fast or anything. And I don't need to use a DataTable or DataSet - I can put the data in some other kind of object.

So I found a lot of sample code that used MethodInfo and reflection and things like that, but I couldn't get any of it to work - these Datarows have a Field method but it requires that you add an (of object) argument before the field name argument and that's tricky to do.
So I'm not sure if this solution is the most efficient way, but at least it works. I'd be interested in finding out whether this way of doing it is efficient and why or why not. It seemed like most people used reflection to do this kind of thing, but I couldn't get that working properly and anyway what I'm trying to do is pretty simple while those methods were pretty complex. I suppose I'm doing Linq with a SQL mindset, but anyway it works.
Dim f As Func(Of DataRow, DataRow, String, Boolean) = Function(d1 As DataRow, d2 As DataRow, s As String)
Dim fields = Split(s, ",")
Dim results As Boolean = True
For k As Integer = 0 To UBound(fields)
Dim obj = DataRowExtensions.Field(Of Object)(d1, fields(k))
Dim obj2 = DataRowExtensions.Field(Of Object)(d2, fields(k))
If obj <> obj2 Then results = False : Exit For
Next
Return results
End Function
Dim new_records = _
From new_recs In dt2.AsEnumerable.AsQueryable()
Where Not ( _
From old_recs In dt1.AsEnumerable.AsQueryable Where f(old_recs, new_recs, id_key)).Any
Select new_recs
Try
Return new_records.CopyToDataTable
Catch ex As Exception
Stop
End Try

Related

How to add a column in a SQL expression using a user defined function through VB net (odbc connesction)

I am new to VB.net and I would like to convert and display UnixStamp data in a new column.
Variant DatGridView = datasource is Dataset.
I can create empty columns within an SQL query (DataGridView_Dataset), unfortunately I can't use direct data conversion into new columns using my own function. Error see.SQL Error Code
The function works independently. see. Working UnixStampFunction
I got the result of 56,000 sentences in 7 seconds, without getting date and time values from UnixTimeStamp
Is there a solution for using udf in SQL Statement?
DataGridView variant - odbcExecuteRead Solution
Using the given code is not a problem to display eg 10 sentences (10 sentences result), but if the records are more than about 100 (around 50 thousand by month), an error like this will be displayed (Managed Debug Helper ContextSwitchDeadlock: The module CLR could not go out of context COM 0xb45680 to context 0xb455c8 for 60 seconds.).
Unchecking the ContextSwitchDeadlock option by
Debug > Windows > Exception Settings in VS 2019
I got the result of 56 000 record in awfull 228 seconds.
Is it possible to optimize the code or is it possible to use another solution?
Code:
Public Class Form1
Public strDateTime, strDate, strTime As String
Public x As Integer =0
Public y As Integer =0
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Try
Dim dgvDT As New DataGridView
Dim odbcConn As OdbcConnection
Dim odbcComm As OdbcCommand
Dim odbcAdapter As New Odbc.OdbcDataAdapter
Dim odbcDataset As New DataSet
Dim odbcDataTable As DataTable
Dim strConn, strSQL, strSQL1 As String
dgvDT.Location = New Point(382, 2)
dgvDT.Width = 360
dgvDT.Height = 600
Me.Controls.Add(dgvDT)
strConn = "Driver=Firebird/InterBase(r) driver;User=;Password=;DataSource=...." '
odbcConn = New OdbcConnection(strConn)
odbcConn.Open()
strSQL = "SELECT TEST.UNIXTIMESTAMP, " & "'dd.mm.yyyy" & "'" & "AS Date_ , " & "'hh:mm:ss" & "'" & "AS Time_ " _
& "From TEST " _
& "Where TEST.UNIXTIMESTAMP > 1646092800 " _ '1.3.2022
& "Order By TEST.ID "
strSQL1 = "SELECT TEST.UNIXTIMESTAMP, UnixTimestampToDateOrTime(TEST.UNIXTIMESTAMP,1) As Date_, " & "'hh:mm:ss" & "'" & "AS Time_ " _
& "From TEST " _
& "Where TEST.UNIXTIMESTAMP > 1646092800 " _ '1.3.2022
& "Order By TEST.ID "
odbcComm = New OdbcCommand(strSQL, odbcConn)
'odbcComm = New OdbcCommand(strSQL1, odbcConn)
odbcAdapter.SelectCommand() = odbcComm
odbcAdapter.Fill(odbcDataset, "TEST")
odbcDataTable = odbcDataset.Tables("TEST")
dgvDT.DataSource = odbcDataTable
dgvDT.Columns(0).HeaderText = "UnixTimeStamp"
dgvDT.Columns(1).HeaderText = "Date"
dgvDT.Columns(2).HeaderText = "Time"
dgvDT.Visible = True
Catch ex As Exception
MessageBox.Show("Error: " & ex.Message, "Error")
End Try
End Sub
Private Sub Button3_Click(sender As Object, e As EventArgs) Handles Button3.Click
Try
Dim dgvDT1 As New DataGridView
Dim odbcConn1 As OdbcConnection
Dim odbcComm1 As OdbcCommand
Dim odbcDR As OdbcDataReader
Dim x As Integer = 0
Dim y As Integer = 0
Dim strConn1, strSQL, strSQL2 As String
dgvDT1.Location = New Point(382, 2)
dgvDT1.Width = 360
dgvDT1.Height = 600
For i As Integer = 0 To 2
Dim dgvNC As New DataGridViewTextBoxColumn
dgvNC.Name = "Column" & i.ToString
dgvDT1.Columns.Add(dgvNC)
Next
dgvDT1.Columns(0).HeaderText = "UnixTimeStamp"
dgvDT1.Columns(1).HeaderText = "Date"
dgvDT1.Columns(2).HeaderText = "Time"
dgvDT1.ReadOnly = True
dgvDT1.AllowUserToAddRows = False
dgvDT1.AllowUserToDeleteRows = False
strSQL2 = "SELECT TEST.UNIXTIMESTAMP " _
& "From TEST " _
& "Where TEST.UNIXTIMESTAMP > 1646092800 " _
& "Order By TEST.ID "
strConn1 = "Driver=Firebird/InterBase(r) driver;User=;Password=;DataSource="
odbcConn1 = New OdbcConnection(strConn1)
odbcConn1.Open()
odbcComm1 = New OdbcCommand(strSQL2, odbcConn1)
odbcDR = odbcComm1.ExecuteReader()
While (odbcDR.Read()) 'And y <= 10
dgvDT1.Rows.Add()
dgvDT1.Rows(y).Cells("Column0").Value = (odbcDR.GetValue(0).ToString())
dgvDT1.Rows(y).Cells("Column1").Value = (UnixTimestampToDateOrTime(odbcDR.GetValue(0), 1))
dgvDT1.Rows(y).Cells("Column2").Value = (UnixTimestampToDateOrTime(odbcDR.GetValue(0), 2))
y = y + 1
End While
Me.Controls.Add(dgvDT1)
dgvDT1.Visible = True
Catch ex As Exception
MessageBox.Show("Error: " & ex.Message, "Error")
End Try
End Sub
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
MsgBox(UnixTimestampToDateOrTime(1646092800, 1) & vbNewLine & UnixTimestampToDateOrTime(1646092800, 2))
End Sub
Public Function UnixTimestampToDateOrTime(ByVal _UnixTimeStamp As Long, ByRef _Parameter As Integer) As String
strDateTime = New DateTime(1970, 1, 1, 0, 0, 0).AddSeconds(_UnixTimeStamp).ToString()
strDate = strDateTime.Substring(0, strDateTime.IndexOf(" "))
strTime = strDateTime.Substring(strDateTime.IndexOf(" "), strDateTime.Length - strDateTime.IndexOf(" "))
If _Parameter = 1 Then
Return (strDate)
Else
Return (strTime)
End If
End Function
End Class
You write:
I would like to convert and display UnixStamp data in a new column.
Yes you can. But i do not believe that would make your applicatio nany faster.
However I would outline the strategy i personally would had used getting data conversion without UDF.
Removing UDF, i believe, would enhance your database in general. But would not fix performance problems. Due to amount of information i could not format it as mere comment.
So, about removing UDF from the query.
Link to Firebird 2.5 documentation: https://www.firebirdsql.org/file/documentation/html/en/refdocs/fblangref25/firebird-25-language-reference.html#fblangref25-psql-triggers
Link to Firebird 3 documentation:
https://www.firebirdsql.org/file/documentation/html/en/refdocs/fblangref30/firebird-30-language-reference.html
The key difference would be that FB3+ has "stored functions" and even DETERMINISTIC ones, and FB2 - only "stored procedures".
I would avoid using UDFs here: they are declared obsolete, they make deployment more complex, and they would reduce your flexibiltiy (for example, you can not use Firebird for Linux/ARM if you only have UDF DLL for Win64)
So, the approach i speculate, you can use.
You would have to find an algorythm, how to "parse" UNIX date/time into separate values for day, month, etc.
You would have to implement that algorythm in Procedural SQL stored function (in FB3, make it DETERMINISTIC) or stored procedure (FB2, make it selectable by having SUSPEND command, after you set value for the output parameter and before exit) - see the Ch. 7 of the manual.
For assembling values of DATE or TIME or DATETIME types you would have to use integer to string (VARCHAR) to resulting type coersion, see Ch. 3.8 of the manual.
Example:
select
cast( 2010 || '-' || 2 || '-' || 11 AS DATE ),
cast( 23 || ':' || 2 || ':' || 11 AS TIME )
from rdb$database
CAST | CAST
:--------- | :-------
2010-02-11 | 23:02:11
db<>fiddle here
You would have add the converted columns to your table. Those columns qwould be read-only, but they would change "unix time" to Firebird native date and/or type.
Read Ch. 5.4.2. of the manual, about ALTER TABLE <name> ADD <column> command.
Read 5.4.1. CREATE TABLE of the manual about Calculated fields ( columns, declared using COMPUTED BY ( <expression> ) instead of actual datatype. Those columns you would have to creat, and here is where difference between FB3 and FB2 kicks in.
in FB3 i believe you would be able to directly use your PSQL Function as the expression.
in FB2 i believe you would have to use a rather specific trampoline, to coerce a stored procedure into expression:
ALTER TABLE {tablename}
ADD {columnname} COMPUTED BY
(
(
SELECT {sp_output_param_name} FROM {stored_proc_name}( {table_unixtime_column} )
)
)
In you Visual Basic application you would read those read-only converted columns instead of original value columns
Now, this would address the perfomance problem. Again, writing this aas an answer, because it is too large to fit as a comment.
No, while UDF has a number of problems - those are not about performance, but about being "old-school" and prone to low-level problems, such as memory leaks. UDFs can trigger problems if used in other places in query, by prohibiting SQL optimizer to use fast, indexed codepaths, but it is not your case. You only use your UDF in the result columns list of SELECT - and here should be no prerformance troubles.
Your application/database therefore should have other bottlenecks.
first of all, you use Where TEST.UNIXTIMESTAMP > 1646092800 and Order By TEST.ID in your query. The obvious question is if your table does have index on those columns. If not, you force the database server to do full natural scan to apply where condition, and then use external, temporary file sorting. THAT can be really slow, and can scale badly as the table grows.
Use your database design tool of choice to check query plan of your select.
Does Firebird use indexed or non indexed access paths.
There are articles online how to read Firebird's query. I don't instantly know English language ones though. Also, it would be specific to your database design tool, how to get it.
Sorry, there is no silver bullet. That is where learning about databases is required.
Read Data Definition Language / Create Index chapter of the documentation.
Link to Firebird 2.5 documentation: https://www.firebirdsql.org/file/documentation/html/en/refdocs/fblangref25/firebird-25-language-reference.html#fblangref25-psql-triggers
Link to Firebird 3 documentation: https://www.firebirdsql.org/file/documentation/html/en/refdocs/fblangref30/firebird-30-language-reference.html
Check your database structure, if UNIXTIMESTAMP and ID have index on them or not. In general, indices speed some (not all) reading queries and slow down (slightly) all writing queries.
You may decide you want to add those indices if they do not exist yet.
Again, it would be dependent upon your database design tool, how to check for existing of the indexes. It also would depend on your data and your applications which kind of indexes is needed or not. That is not what someone else can decide for you.
i also have a lot of suspicion about odbcAdapter.Fill(odbcDataset, "TEST") command. Basically, you try to read all the data in one go. And you do it via ODBC connection, that is not natural for C#.
Usually desktop application only read first 100 or so rows. People would rarely actually read anything after first page or two. Humans are not machines.
Try to somehow connect your visual grid to the select query without reading ALL the table. There should be way.
Additionally, there is free Firebird .Net Provider - this should work natively with VB.Net and should be your first choice, not ODBC.
There also is commercial IBProvider, based on native OLE DB technology it should be worse choice than .Net Provider, but it can work too and has some code examples for VB.Net, and i suppose it is still better mantained than ODBC driver.
you may also change Firebird configuration and allow it using more RAM for cache. This may somewhat relax problems of index-less selecting and sorting. But only somewhat relax and offset, not solve. You can find articles about it on www.ib-aid.com

Best way to extract rows from DataTable (based on date field) and copy to another DataTable

I have a DataTable containing about 30 rows and I need to extract all rows having a date field bigger than a date stored into a variable.
(This code will be executed a lot of times)
I found three ways to do this but I would like to know how to choose because I don't know the differences between various codes.
Here is what I was able to write (and my worries):
1st way (DataTable.Select)
Dim SelectedRows() As DataRow = DT_DBData.Select("In_Date=#" & _
LastDate.ToString("yyyy-MM-dd") & "#")
Using New_Dt As DataTable = SelectedRows.CopyToDataTable
'Some code
End Using
I'm worried about the string format: I'm afraid that some rows may be not extracted because of a date formatting error.
2nd way (query Linq)
Using New_Dt As DataTable = (From DBData In DT_DBData.AsEnumerable() _
Where DBData.Field(Of Date)("In_Date") >= LastDate).CopyToDataTable
'Some code
End Using
I never used Linq and so I don't know what kind of issues can it give me.
3rd way (For Each Loop + If Then)
Using New_Dt As DataTable = DT_DBData.Clone
For Each dr As DataRow In DT_DBData.Rows
If dr("In_Date") >= LastDate Then
New_Dt.Rows.Add(dr.ItemArray)
End If
Next
'Some code
End Using
I'm not really worried about this code. I only think that the others could be better or faster (but I can't answer to this)
Faster is kind of irrelevant when dealing with 30 rows.
The first one is kind of wasteful. You start with a DataTable, Select to get a subset, then convert the result into a new DataTable. Time to extract matching Rows: 8 ms.
You can work with the SelectedRows array without putting it into a new DataTable. If it goes back to the DB after "some code", I would not extract it from the DT.
By the way, there is no reason to worry about matching date formats as long as the DB column is a date type (and therefore, the DataTable column will be also). Dates do not have a format; formats are just how computers (and by extension, us) display them to users.
Dim drs = dt.Select(String.Format("StartDate > '{0}'", dtTgt.Date), "")
The date type I pass will compare/filter just fine with the DateTime data for that column. Formats only come into play when you convert them to string, which is mostly only needed for those pesky users.
One option you missed might be especially useful if this will be done over and over: A DataView:
dt.Load(cmd.ExecuteReader)
' create dataview
Dim dv As New DataView(dt)
dv.RowFilter = String.Format("StartDate > '{0}'", dtTgt.Date)
dv.Sort = "StartDate asc"
' show/iterate/whatever
dgv.DataSource = dv
If the data goes back to the DB, using this method, the rows will retain all the rowstate values.

Can't generate a StockID above 10

This is the code that tries to grab the largest StockID from the database (Access database) , but my problem is that it generates StockID's up to "S10", after this it simply doesn't increment any further. This is the subroutine that generates the StockID:
Sub generate_Stock_ID()
Dim Stock_start As String = "S"
Dim Stock_Gen As String = "SELECT MAX(StockID) FROM tblStock WHERE StockID LIKE '" & Stock_start & "%%%' "
Dim da As OleDbDataAdapter = New OleDbDataAdapter(Stock_Gen, conn)
Dim ds As DataSet = New DataSet
da.Fill(ds, "StockID")
Dim dt As DataTable = ds.Tables("StockID")
Dim count As Integer = ds.Tables("StockID").Rows.Count
If ds.Tables("StockID").rows.count = 0 Then
StockID = "S1"
Else
StockID = ds.Tables("StockID").Rows(0).Item(0)
StockID = StockID.Substring(1, (StockID.Length - 1))
StockID = Stock_start & (StockID + 1)
End If
End Sub
Screenshot of my database
Note* there are multiple ID's for various other subroutines which all share the same incrementation issue, so if i fix this i fix the other ones too. So at the moment i think my problem lies in the syntax of my SQL statement, but im open to suggestions.
Thanks!
Don't treat an Integer as String. Otherwese MAX or ORDER BY will use lexicographical instead of numerical order which means that S11 is "lower" than S2.
So you should make this column an int-column and prepend S only where you display it. Then MAX(StockID) returns an Integer, you just have to cast it and add 1:
Using conn As New OleDbConnection("Connection-String")
Using cmd As New OleDbCommand(Stock_Gen, conn)
conn.Open()
Dim stockIDObj As Object = cmd.ExecuteScalar()
If stockIDObj IsNot Nothing Then
Dim maxStockId As Int32 = DirectCast(stockIDObj, Int32)
maxStockId += 1
' ...... '
End If
End Using
End Using
You should also change OPTION STRICT to ON. Then this would never compile since the same variable cannot be used for an Object, String and Integer which is very good since it prevents errors.
If you want to keep it as string you have to cast the substring always in the database which is less readable and less efficient. I also don't know how to do it in access.
If you want to change the type of column in an already populated table you should first add a new column with a similar name which is of type int. If all have S at the beginning you could first remove that, then you can update the new column with the casted int value. Finally you can delete the old column and rename the new to the old.
The root of this issue that StockID is a STRING and 'S1'>'S10' so for all StockId > 10 you get max = 'S1'.
As a fast fix try to change MAX(StockID) to:
SELECT 'S'+CAST(MAX(CAST(SUBSTRING(StockID,2,100) as int)) as varchar(100))
For ACCESS DB try to use:
SELECT "S" & cstr(MAX(CINT(MID(StockID,2,100))))

VB.Net - Efficient way of de-duplicating data

I am dealing with a legacy application which is written in VB.Net 2.0 against a SQL 2000 database.
There is a single table which has ~125,000 rows and 2 pairs of fields with similar data.
i.e. FieldA1, FieldB1, FieldA2, FieldB2
I need to process a combined, distinct list of FieldA, FieldB.
Using SQL I have confirmed that there are ~140,000 distinct rows.
Due to a very restrictive framework in the application I can only retrieve the data as either 2 XML objects, 2 DataTable objects or 2 DataTableReader objects. I am unable to execute custom SQL using the framework.
Due to a very restrictive DB access policy I am unable to add a View or Stored Proc to retrieve as a single list.
What is the most efficient way to combine the 2 XML / DataTable / DataTableReader objects into a single, distinct, IEnumerable object for later processing?
I may have missed something here but could you not combine both DataTables using Merge?
DataTableA.Merge(DataTableB)
You can then use DataTableA.AsEnumerable()
Then see this answer on how to remove duplicates or
You can do this with a DataView as follows: dt.DefaultView.ToTable(True,[Column names])
This is the solution I came up with.
Combine the 2 DataTables using .Merge (thanks to Matt's answer)
Using this as a base I came up with the following code to get distinct rows from the DataTable based on 2 columns:
Private Shared Function GetDistinctRows(sourceTable As DataTable, ParamArray columnNames As String()) As DataTable
Dim dt As New DataTable
Dim sort = String.Empty
For Each columnName As String In columnNames
dt.Columns.Add(columnName, sourceTable.Columns(columnName).DataType)
If sort.Length > 0 Then
sort = sort & ","
End If
sort = sort & columnName
Next
Dim lastValue As DataRow = Nothing
For Each dr As DataRow In sourceTable.Select("", sort)
Dim add As Boolean = False
If IsNothing(lastValue) Then
add = True
Else
For Each columnName As String In columnNames
If Not (lastValue(columnName).Equals(dr(columnName))) Then
add = True
Exit For
End If
Next
End If
If add Then
lastValue = dr
dt.ImportRow(dr)
End If
Next
Return dt
End Function

DataTable Select(String) Function Help VB .NET

I made a datatable with 2 columns a transactionTime column and a numberOfTransactions column. I made the table with the pre-defined transaction times and want to add the number of transactions from an XML file. I have gotten through the XML file and want to add the data to the correct row. Here is the function:
Function AddRow(ByVal timeOfTransaction As String, ByVal numberOfTransactions As String, ByRef dataTableOfTransactions As DataTable) As String
Dim row() As DataRow = dataTableOfTransactions.Select("transactionTime = timeOfTransaction")
If row(0) IsNot Nothing Then
row(0)("numberOfTransactions") = numberOfTransactions
End If
Return Nothing
End Function
When I run this it overwrites the first element in the table's numberOfTransactions coloumn. I know it has to do with the "transactionTime = timeOfTransaction" part but I can't seem to get it to read timeOfTransaction as a reference to a string instead of a literal. Any help would be much appreciated. Thanks!
You need to write something like this :
Dim row() As DataRow = dataTableOfTransactions.Select("transactionTime=#" & timeOfTransaction & "#")
But be careful with your date/month or month/date format, it depends of your regional settings.
row(0)("numberOfTransactions") = numberOfTransactions
Right there you are telling the program to overwrite that value with number of transactions.
If you want that value you need to set it to something, not set something to it.
Also, if you want your select to work properly try doing it like this
dataTableOfTransactions.Select("transactionTime = " + timeOfTransaction)