VB.NET Access Database 255 Columns Limit - vb.net

I'm currently developing an application for a client using Visual Basic .NET. It's a rewrite of an application that accessed an Oracle database, filtered the columns and performed some actions on the data. Now, for reasons beyond my control, the client wants to use an Access (.mdb) database for the new application. The problem with this is that the tables have more than the 255 columns access supports so the client suggested splitting the data into multiple databases/tables.
Well even when the tables are split, at some point, I have to query all columns simultaneously (I did an INNER JOIN on both tables) which, of course, yields an error. The limit apparently is on number of simultaneously queryable columns not on the total number of columns.
Is there a possiblility to circumvent the 255 columns limit somehow? I was thinking in the direction of using LINQ to combine queries of both tables, i.e. have an adapter that emulates a single table I can perform queries on. A drawback of this is that .mdb is not a first-class citizen of LINQ-to-SQL (i.e. no insert/update supported etc.).
As a workaround, I might be able to rewrite my stuff so as to only need all columns at one point (I dynamically create control elements depending on the column names in the table). Therefore I would need to query say the first 250 columns and after that the following 150.
Is there a Access-SQL query that can achieve something like this. I thought of something like SELECT TOP 255 * FROM dbname or SELECT * FROM dbname LIMIT 1,250 but these are not valid.
Do I have other options?
Thanks a lot for your suggestions.

The ADO.NET DataTable object has no real limitations on the number of columns that it could contain.
So, once you have splitted the big table in two tables and set the same primary key in both subtables with less columns, you can use, on the VB.NET side, the DataTable.Merge method.
In their example on MSDN they show two tables with the same schema merged together, but it works also if you have two totally different schemas, but just the Primary key in common
Dim firstPart As DataTable = LoadFirstTable()
Dim secondPart As DataTable = LoadSecondTable()
firstPart.Merge(secondPart)
I have tested this just with only one column of difference, so I am not very sure that this is a viable solution in terms of performance.

As I know there is no way to directly bypass this problem using Access.
If you cannot change the db your only way I can think of is to make a wrapper that understand you're were the field are, automatically splits the query in more queryes and then regroup it in a custom class containing all the columns for every row.
For example you can split every table in more tables duplicating the field you're making the conditions on.
TABLEA
Id | ConditionFieldOne | ConditionFierldTwo | Data1 | Data2 | ... | Data N |
in
TABLEA_1
Id | ConditionFieldOne | ConditionFieldTwo | Data1 | Data2 | ... | DataN/2 |
TABLEA_2
Id | ConditionFieldOne | ConditionFieldTwo | Data(N/2)+1 | Data(n/2)+2 | ... | DataN |
and a query where is
SELECT * FROM TABLEA WHERE CONDITION1 = 'condition'
become with the wrapper
SELECT * FROM TABLEA_1 WHERE ConditionFieldOne = 'condition'
SELECT * FROM TABLEA_2 WHERE ConditionFieldOne = 'condition'
and then join the results.

Related

Redshift IN condition on thousands of values

What's the best way to get data that matches any one of ~100k values?
For this question, I'm using an Amazon Redshift database and have a table something like this with hundreds of millions of rows:
--------------------
| userID | c1 | c2 |
| 101000 | 12 | 'a'|
| 101002 | 25 | 'b'|
____________________
There are also millions of unique userIDs. I have a CSV list of 98,000 userIDs that I care about, and I want to do math on the columns for those specific users.
select c1, c2 from table where userID in (10101, 10102, ...)
What's the best solution to match against a giant list like this?
My approach was to make a python script that read in the result of all users in our condition set, then filtering against the CSV in python. It was dead slow and wouldn't work in all scenarios though.
A coworker suggested uploading the 98k users into a temporary table, then joining against in in the query. This seems like the smartest way, but I wanted to ask if you all had ideas.
I also wondered if printing an insanely long SQL query containing all 98k users to match against and running it would work. Out of curiosity, would that even have ran?
As your coworker suggests, put your IDs into a temporary table by uploading a CSV to S3 and then using COPY to import the file into a table. You can then use an INNER JOIN condition to filter your main data table on the list of IDs you're interested in.
An alternative option, if uploading a file to S3 isn't possible for you, could be to use CREATE TEMP TABLE to set up a table for your list of IDs and then use a spreadsheet to generate a whole of INSERT statements to populate the temp table. 100K of inserts could be quite slow though.

Modifying column in access

I have 2 tables in MS Access, TableA and TableB. Table A has only 1 field: myFieldID, and TableB has only 1 field: myFieldName (In reality I have more fields, but these are the ones that matter for the sake of my problem).
Both tables have records that mean the same thing, but written in a different, but similar way.
For instances TableA has:
|TableA.myFieldId |
|-----------------|
|MM0001P |
|HR0003P |
|MH0567P |
So as you can see all of the records are formated this way (with a P at the end):
([A-Z][A-Z][0-9][0-9][0-9][0-9]P)
then, TableB has:
|TableB.myFieldName |
|--------------------------------------------|
|MH-0567 Materials Handling important Role |
|MM-0001 Materials Management Minor Role |
|HR-0003 Human Resources Super Important Role|
So this one has the format (without 'P' at the end):
([A-Z][A-Z]-[0-9][0-9][0-9][0-9] ([A-Z]|[a-z]*))
First, I would like to make join queries with tableA and tableB on these fields, but as you can see, results will be NULL every time since both fields have completely different records.
So I would like to change every name in TableA.myFieldId with his corresponding name in TableB.myFieldName
Problem is, that both tables have around 1 million records, and the fields are repeated multiple times in both tables, plus I don't know how to do this (MS Access doesn't even let me use Regular Expressions).
I would make a table (or query, if it changes often enough) of all unique entries in the 2nd table and the corresponding key for the 1st table. Then use that table or query to help join the two tables.
Something like
Select myFieldName as FName, left(myFieldName,2) & mid(myFieldName,4,4) & "P" as FID
from TableB
group by FName, FID
Important note - are all IDs found in both files, or do you have records in either table that are not in the other? If they don't always match, you may need additional logic or steps to make a master table from both tableA and tableB.

Postgresql query using multiple WHERE conditions

I am wondering if there is a simple / smart way to pass a query to a Postgresql database. I have a database whose headers look something like this:
measurementPointID | parameterA | parameterB | measurement | measurementTIME
There are some dozens of records within the database.
I would like to pass a query that retrieves data only for a set of measurementPointID's. There are several dozens of thousands of measurementPointID's values that I need to retrieve and I have all of these available in, for example, an CSV file.
The query should do a GROUP BY measurementTIME and ORDER BY measurementTIME as well. One detail is that if the measurement is zero (measurement = 0) there is no row corresponding to the measurementPointID at all.
Am I trying to do something too complicated or in a stupid way?

Using a list with a selected item as a value

Let us consider the following table structures:
Table1
Table1_ID A
1 A1
2 A1;B1
and
Table2
Table2_ID Table1_ID B C
1 1 foobar barfoo
2 2 foofoo barbar
The view I'm using is defined by the following query:
SELECT Table1.A, B, C
FROM Table2
INNER JOIN Table1 ON Table1.Table1_ID = Table2.Table1_ID;
95% of A's data consists in a 2 characters long string. In this case, it works fine. However, 5% of it is actually a list (using a semicolon as a separator) of possible values for this field.
This means my users would like to choose between these values when it is appropriate, and keep using the single value automatically the rest of the time. Of course, this is not possible with a single INNER JOIN, since there cannot be a constant selected value.
Table2 is very large, while Table1 is quite small. Manually filling a local A field in each row within Table2 would be a huge waste of time.
Is there an efficient way for SQL (or, more specifically, SQL Server 2008) to handle this? Such as a list with a selected item within a field?
I was planning to add a "A_ChosenValue" field that would store the chosen value when there's a list in A, and remain empty when A only stores a single value. It would only require users to fill it 5% of the time, which is okay. But I was thinking there could be a better way than using two columns to store a single value.
Ideally you would just alter your schema and add a new entity to support the many-to-many relationship between Table1 and Table2 such as the following with a compound key of all three columns.
Table3
| Table1_ID | Table2_ID | A |
-----------------------------
| 1 | 1 | A1 |
------------------------------
| 2 | 2 | A1 |
------------------------------
| 2 | 2 | B1 |
------------------------------
You could then do a select and join on this table and due to it being indexed you won't lose any performance.
Without altering the table structure or normalizing data it is possible using a conditional select statement like that shown in this SO post but the query wouldn't perform so well as you would have to use a function to split the values containing a semi-colon.
Answering my own question:
I added a LocalA column in Table1, in order that my view actually selects ISNULL(LocalA, Table1.A). Therefore, the displayed value equals A by default, and users can manually overwrite it to select a specific value when A stores a list.
I am not sure whether this is the most efficient solution or not, but at least it works without requiring two columns in the view.

postgreSQL query - finding presence of id in any field of a record

I have two tables which look like the following
tools:
id | part name
---------------
0 | hammer
1 | sickle
2 | axe
people:
personID | ownedTool1 | ownedTool2 | ownedTool3 ..... ownedTool20
------------------------------------------------------------------
0 | 2 | 1 | 3 ... ... 0
I'm trying to find out how many people own a particular tool. A person cannot own multiple copies of the same tool.
The only way I can think of doing this is something like
SELECT COUNT(*)
FROM tools JOIN people ON tools.id = people.ownedTool1.id OR tools.id = people.ownedTool2 ... and so on
WHERE tools.id = 0
to get the number of people who own hammers. I believe this will work, however, this involves having 20 OR statements in the query. Surely there is a more appropriate way to form such a query and I'm interested to learn how to do this.
You shouldn't have 20 columns each possibly containing an ID in the first place. You should properly establish a normalized schema. If a tool can belong to only one user - but a user can have multiple tools, you should establish a One to Many relationship. Each tool will have a user id in its row that maps back to the user it belongs to. If a tool can belong to one or more users you will need to establish a Many to Many relationship. This will require an intermediate table that contains rows of user_id to tool_id mappings. Having a schema set up appropriately like that will make the query you're looking to perform trivial.
In your particular case it seems like a user can have many tools and a tool can be "shared" by many users. For your many-to-many relation all you would have to do is count the number of rows in that intermediate table having your desired tool_id.
Something like this:
SELECT COUNT(ID) FROM UserTools Where ToolID = #desired_tool_id
Googling the terms I bolded should get you pointed in the correct direction. If you're stuck with that schema then the way you pointed out is the only way to do it.
If you cannot change the model (and I'm sure you will tell us that), then the only sensible way to work around this broken datamodel is to create a view that will give you a normalized view (pun intended) on the data:
create view normalized_people
as
select personid,
ownedTool1 as toolid
from people
union all
select personid,
ownedTool2 as toolid
from people
select personid,
ownedTool3 as toolid
from people
... you get the picture ...
Then your query is as simple as
select count(personid)
from normalized_people
where toolid = 0;
You received your (warranted) lectures about the database design.
As to your question, there is a simple way:
SELECT count(*) AS person_ct
FROM tbl t
WHERE translate((t)::text, '()', ',,')
~~ ('%,' || #desired_tool_id::text || ',%')
Or, if the first column is person_id and you want to exclude that one from the search:
SELECT count(*) AS person_ct
FROM tbl t
WHERE replace((t)::text, ')', ',')
~~ ('%,' || #desired_tool_id::text || ',%')
Explanation
Every table is accompanied by a matching composite type in PostgreSQL. So you can query any table this way:
SELECT (tbl) FROM tbl;
Yields one column per row, holding the whole row.
PostgreSQL can cast such a row type to text in one fell swoop: (tbl)::text
I replace both parens () with a comma , so every value of the row is delimited by commas ,.
My second query does not translate the opening parenthesis, so the first column (person_id) is excluded from the search.
Now I can search all columns with a simple LIKE (~~) expression using the desired number delimited by commas ~~ %,17,%
Voilá: all done with one simple command. This is reliable as long as you don't have columns like text or int[] in your table that could also hold ,17, within their values, or additional columns with numbers, which could lead to false positives.
It won't deliver performance wonders as it cannot use standard indexes. (You could create a GiST or GIN index on an expression using the tgrm module in pg 9.1, but that's another story.)
Anyway, if you want to optimize, you'd better start by normalizing your table layout as has been suggested.