SqlDelight/SQLite not executing joins properly? - sqldelight

I have added SqlDelight to my Multiplatform project and have created a few basic tables and queries.
When I try to use more complex queries though, like JOINS the query still executes and the generated interfaces contain all the correct fields, but the actual objects do not contain the generated data.
GithubRepository.sq:
CREATE TABLE GithubRepository (
id INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL
);
Foobar.sq:
CREATE TABLE foobar (
id INTEGER NOT NULL PRIMARY KEY,
foobar TEXT NOT NULL,
repoId INTEGER NOT NULL,
FOREIGN KEY(repoId) REFERENCES GithubRepository(id)
);
findFoobars:
SELECT *
FROM foobar
LEFT JOIN GithubRepository ON GithubRepository.id = foobar.repoId;
insert:
INSERT INTO foobar VALUES ?;
Now, I verify that this is working with a couple of tests, like so:
#Test
fun `Joins working?`() {
assertEquals(0, queries.findAll().executeAsList().size)
queries.insert(GithubRepository.Impl(2, "bar"))
foobarQueries.insert(Foobar.Impl(1, "foo", 2))
assertEquals("bar", foobarQueries.findFoobars().executeAsList()[0].name)
}
All individual queries succeed and I can write to and read from all the tables as expected.
It just happens that when accessing the query with a JOIN the joined property stays empty.
In a more complex setup I have also tested this on an Android emulator, where I can read from all the individual tables, but not from the joined fields.
Can anyone spot where I missed something?

After some more experimenting, I realized that I was using an in memory database on Android. After switching that over to a proper file based SQLite database by passing a database name to the AndroidSqliteDriver constructor, the JOINS are executed correctly and the expected fields are filled correctly.
This seems to be a limitation of the the in memory database driver. I expect that switching the tests over to a file based one will solve the test failures as well.

Related

Inserting test data which references another table without hard coding the foreign key values

I'm trying to write a SQL query that will insert test data into two tables, one of which references the other.
Tables are created from something like the following:
CREATE TABLE address (
address_id INTEGER IDENTITY PRIMARY KEY,
...[irrelevant columns]
);
CREATE TABLE member (
...[irrelevant columns],
address INTEGER,
FOREIGN KEY(address) REFERENCES address(address_id)
);
I want ids in both tables to auto increment, so that I can easily insert new rows later without having to look into the table for ids.
I need to insert some test data into both tables, about 25 rows in each. Hardcoding ids for the insert causes issues with inserting new rows later, as the automatic values for the id columns try and start with 1 (which is already in the database). So I need to let the ids be automatically generated, but I also need to know which ids are in the database for inserting test data into the member database - I don't believe the autogenerated ones are guaranteed to be consecutive, so can't assume I can safely hardcode those.
This is test data - I don't care which record I link each member row I am inserting to, only that there is an address record in the address table with that id.
My thoughts for how to do this so far include:
Insert addresses individually, returning the id, then use that to insert an individual member (cons: potentially messy, not sure of the syntax, harder to see expected sets of addresses/members in the test data)
Do the member insert with a SELECT address_id FROM address WHERE [some condition that will only give one row] for the address column (cons: also a bit messy, involves a quite long statement for something I don't care about)
Is there a neater way around this problem?
I particularly wonder if there is a way to either:
Let the auto increment controlling functions be aware of manually inserted id values, or
Get the list of inserted ids from the address table into a variable which I can use values from in turn to insert members.
Ideally, I'd like this to work with as many (irritatingly slightly different) database engines as possible, but I need to support at least postgresql and sqlite - ideally in a single query, although I could have two separate ones. (I have separate ones for creating the tables, the sole difference being INTEGER GENEREATED BY DEFAULT AS IDENTITY instead of just IDENTITY.)
https://www.postgresql.org/docs/8.1/static/functions-sequence.html
Sounds like LASTVAL() is what you're looking for. It was also work in the real world to maintain transactional consistency between multiple selects, as it's scoped to your sessions last insert.

How to handle select-or-insert operation with jdbcTemplate in shared DB

I've got some basic case and I'd appreciate some suggestions how to deal with that...
There is some_table in existing DB with two columns:
auto-incremented key
and a name describing the entry
I've got application using Spring 4 and jdbcTemplate to handle DB operations.
Application instance A:
searches for entry with given name SELECT * FROM some_table WHERE name='name1'
if it finds one it gets its key
if not, it creates new entry INSERT INTO some_table (name) VALUES ('name1')
and then it gets its key
use retrieved key for further processing...
Instance B may do the same at the same time.
So it may happen that both applications do SELECT at the same time for the same name that doesn't exist at the moment and then these both will INSERT new entries with same names and different keys.
How to synchronize them - what is the best practice here? Can application somehow lock some_table until it creates new entry or decide that it is not necessary and then release the lock? Is it possible to do this using Spring jdbcTemplate? How?
It seems like some basic case for me for which probably there are some patterns and I'd not want to reinvent the wheel... Any hints?
Or maybe the whole idea of using one DB instance directly by 2 application is a total design flaw?
INSERT IF NOT EXIST done with mutex table seems to do the trick in this case (found here):
INSERT INTO some_table(name) SELECT (?) FROM mutex LEFT OUTER JOIN some_table ON some_table.name = ? WHERE mutex.i = 1 AND some_table.name is null
Where mutex is created as follows:
create table mutex(
i int not null primary key
);
insert into mutex(i) values (0), (1);
Each application can now execute:
INSERT IF NOT EXIST whit name = 'name1'
SELECT where name = 'name1', get its key and use it for further processing...
And only one of them will create new entry, so no duplicates should arise.
Now I need some solution for the case where name should be generated as an unique identifier, so the same key retrieved should not be used by 2 application instances...

How to prevent adding identical records to SQL database

I am writing a program that recovers structured data as individual records from a (damaged) file and collects the results into a sqlite database.
The program is invoked several times with slightly different recovery parameters. That leads to recovering often the same, but sometimes different data from the file.
Now, every time I run my program with different parameters, it's supposed to add just the newly (different) found items to the same database.
That means that I need a fast way to tell if each recovered record is already present in the DB or not, in order to add them only if they're not existing in the DB yet.
I understand that for each record I want to add, I could first do a SELECT for all columns to see if there is already a matching record in the DB, and only add the new one if no same is found.
But since I'm adding 10000s of records, doing a SELECT for each of these records feels pretty inefficient (slow) to me.
I wonder if there's a smarter way to handle this? I.e, is there a way I can tell sqlite that I do not want duplicate entries, and so it automatically detects and rejects them? I know about the UNIQUE modifier, but that's not it because it applies to single columns only, doesn't it? I'd need to be able to say that the combination of COL1+COL2+COL3 must be unique. Is there a way to do that?
Note: I never want to update any existing records. I only want to collect a set of different records.
Bonus part - performance
In a classic programming language, I'd use a key-value dictionary where the key is the sum of all a record's values. Similarly, I could calculate a Hash code for each added record and look that hash code up first. If there's no match, then the record is surely not in the DB yet; If there is a match I'd still have to search the DB for any duplicates. That'd surely be faster already, but I still wonder if sqlite can make this more efficient.
Try:
sqlite> create table foo (
...> a int,
...> b int,
...> unique(a, b)
...> );
sqlite>
sqlite> insert into foo values(1, 2);
sqlite> insert into foo values(2, 1);
sqlite> insert into foo values(1, 2);
Error: columns a, b are not unique
sqlite>
You could use UNIQUE column constraint or to declare a multiple columns unique constraint you can use UNIQUE () ON CONFLICT :
CREATE TABLE name ( id int , UNIQUE (col_name1 type , col_name2 type) ON CONFLICT IGNORE )
SQLite has two ways of expressing uniqueness constraints: PRIMARY KEY and UNIQUE. Both of them create an index and so the lookup happens through the created index.
If you do not want to use an SQL approach (as mentioned in other answers) you can do a select for all your data when the program starts, store the data in a dictionary and work with the dictionary do decide which records to insert to your DB.
The benefit of this approach is the single select is much faster than many small selects.
The disadvantage is that it won't work well if you don't have enough memory to store your data in.

sql boolean datatype alternatives

What are the situations when you would use a foreign key to a separate table rather than use a boolean (i.e. BIT in SQL Server)
For example would you replace the following two booleans in this table:
engine_sensors
--------------
id (int, primary key)
name (varchar(50))
fault_code (int)
display_warning (boolean) /* if fault show driver a warning */
is_test_sensor (boolean) /* ignore this sensor in diagnostic checks */
e.g. display_warning here might not display the warning to a driver but do display a warning to a mechanic who is testing the engine. So a separate table would be appropriate.
is_test_sensor could be replaced by sensor_type (FK to sensor_types table) which has types of test,live.
If the fields model a boolean value, I would leave them as booleans. That's why they exist.
I would not attempt to future proof my database design (YAGNI principle), you can always change it at a later date.
This depends why you'd want to avoid it. You could just have number fields with 0 for false and 1 for true. Not sure if there are any benefits though.
Np. Now I see what you are getting at (I think).
If you are asking what I think you are asking than yes, you might want to use a FK to a sensor table and list the sensors. This is typically what I would do...
CREATE TABLE [SensorType](
[Id] [int] NOT NULL,
[Type] [int] NOT NULL,
[DisplayWarningTo] [int] NOT NULL,
[Description] [nvarchar](100) NULL,
CONSTRAINT [PK_SensorType_Id] PRIMARY KEY (Id),
CONSTRAINT [FK_SensorType_WarningReceivor] FOREIGN KEY (DisplayWarningTo) REFERENCES WarningReceivor(Id)
);
CREATE TABLE [WarningReceiver](
[Id] [int] NOT NULL,
[Receiver] [int] NOT NULL,
CONSTRAINT [PK_WarningReceiver_Id] PRIMARY KEY (Id)
);
------
INSERT INTO WarningReceiver(Id, Type) VALUES (1, 'Mechanic');
INSERT INTO WarningReceiver(Id, Type) VALUES (2, 'Driver');
INSERT INTO SensorType(Id, Type, DisplayWarningTo) VALUES (1, 'Rear sensor', 2);
INSERT INTO SensorType(Id, Type, DisplayWarningTo) VALUES (2, 'Test sensor', 1);
INSERT INTO SensorType(Id, Type, DisplayWarningTo) VALUES (3, 'Production sensor', 2);
I tend not to use identity columns on 'type' things like this and specify my own id which I map directly to a C# enumerated constant like
public enum SensorType
{
RearSensor = 1,
TestSensor = 2,
ProductionSensor = 3
}
Then in your code when you pull out your engine sensor from the database you can just compare against your enum. e.g.
var engine_sensor = // get engine sensor from db.
if (engine_sensor == (int)SensorType.RearSensor)
{
// do something
}
else if (engine_sensor == (int)SensorType.TestSensor)
{
// display something to mechanic or whatever
}
I don't really know what your application domain is, so sorry if this doesn't make sense.
So to wrap up a couple of points and try and answer your question;
Yes I do think you are better off with FK's
You could just have them as int columns and define the sensors in code as I did with the enum
I tend to do both
--- define the enum for nice strong typing in code and
--- create a foreign key table to complete my schema in the database. It's still worth having this for two reasons; 1) When writing sql queries in management studio or something and your looking at your engine_sensors table and see numbers for the sensor type you can join on your FK table to see what the sensors are. Makes things a bit easier
Lastly, if you have a FK table it enforces referential integrity and restricts the values you can put in as sensor types to what you have defined in the sensor type table.
Hope this helps somewhat.
You have added one piece of information
'Its is a SQL Server database and I understand that you'd use a bit field. I was wondering on how bad an idea it is to replace them with foreign keys to a separate table'
You should edit your original question and put this in if this is important to the solution you seek so that more people will be able to easily find it.
But you still haven't said why you were wondering about replacing them with a FK. If you tell people what your end goal is or what you are trying to achieve, they are more likely to provide a range of solutions.
I'm sorry I can't suggest a solution. Is a foreign key (to what?) better than a boolean value? compared to ?
I think you need to clarify / re-structure your question a bit. Good luck.
engine_sensors
id (primary key)
name
fault_code
display_warning_engine_sensors /* if fault show driver a warning */
id (primary key, FK to engine_sensors)
test_sensors /* ignore this sensor in diagnostic checks */
id (primary key, FK to engine_sensors)
Remember : codes are poor, tables are rich. Regardless of how contradictory this seems : never use booleans to represent truth-valued information. The relational model already has a way for representing truth-valued information, and that is as the presence of some tuple in some relation that is the value of some relvar (in SQL terms : as the presence of some row in a table).
You cannot easily "extend" booleans to add extra functionality such as "display a warning only if the time is within the range [x hrs - y hrs].
Relational theory can help to answer the questions for you. WHERE the booleans go (in the table as shown or in a separate table as described in your example) should be determined by what the boolean data element is dependent (and fully dependent) on.
For example, if the data element "display_warning" depends only on the sensor (the entity your table is describing), then that column belongs in that table. However if it depends on other entities (the person - owner or mechanic - interacting with the sensor), then it more properly belongs in a different table where it can be fully and only dependent on the primary key of that table.
Ensuring that the data elements are dependent on the key (and nothing other than the key) is arrived at through "normalization" of the data. It's much to envolved to include in an answer here, but there are many references on the web that will help you to understand normalization more fully. Wikipedia is as good a place to start as any:
http://en.wikipedia.org/wiki/Database_normalization#Normal_forms
Indexing is one reason to avoid boolean types. You can;t index boolean fields, so if you have many records and search often on the fields that are boolean, having a separate table might be helpful. A separate sensor type table is also more extensible if you add more types. It is also helpful if there becomes a one to many relationships for a particular sensor type. Suppose an engine needed to have three of a particular type of sensor.
Using 'bit' in SQL server (0 or 1) will automatically map back to a boolean in linq to SQL if you still wanted a boolean in your code.
But you haven't stated what database or why you want an alternative to bool.
First, I'd like to confirm Oded. If you need boolean values you should use the responsible column types.
However, if you have to store very much different boolean values it is sometimes more useful to use old-school bit-masks, stored in an INT, BIGINT or even BINARY column.

Arrays in database tables and normalization

Is it smart to keep arrays in table columns? More precisely I am thinking of the following schema which to my understanding violates normalization:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
Permissions varchar(max) not null default(''),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
and this:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
PermissionID int not null default(-1),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
UPD3: I envision Permissions as a comma-delimited string since MSSQL is our primary deployment target.
UPD: Forgot to mention, in the scope of this concrete question we will consider that the "fetch rows that have permission X" won't be performed, instead all the lookups will be made by GroupID and CategoryID only
UPD2: I envision the typical usage scenario as following:
int category_id=42;
int[] array_of_groups=new int[]{40,2,42};
if(!Permissions.Check(category_id, array_of_groups, Permission.EatAndDrink)) {
throw new StarveToDeathException();
}
Thoughts?
Thanks in advance!
I'd suggest to take the normalized road for the following reasons:
By having a table containing all possible permissions, you have self-documenting data. You may add a description to each permission. This definitely beats concatenated id values without any meaning.
You get all the advantages of referential integrity and can be sure that there are no bogus permission ids in your data.
Inserting and deleting permissions will be easier - you add or delete records. With the concatenated string you will be updating a column, and delete the record only when you remove the last permission.
Your design is future-proof - you say you only want to query by CategoryID and GroupID, you can do this already with normalized tables. On top of that, you will also for example be able to add other properties to your permissions, query by permission, etc.
Performance-wise, I think it will actually be faster to get a resultset of id's than having to parse a string to integers. To be measured with actual data and implementation...
Your second example should probably be:
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID,PermissionID)
Your first example would violate normal form (and string parsing might not be a good use of your processing time), but that doesn't mean it's necessarily wrong for your application. It really depends how you use the data.
Is it smart
Occasionally, it depends. I'd say it depends how narrowly you define the things being normalised.
If you can see no way in which a table with one row for each item would ever be useful then I'd suggest that the encapsulate-in-a-string might be considered.
In the example given, I'd want to be sure that executing a query to find all group/category combinations for a specified permission would not cause me a problem if I had to write a WHERE clause that used string pattern matching. Of course, if I never have to perform such a query then it's a moot point.
In general I'm happiest with this approach when the data being assembled thus has no significance in isolation: the data only makes sense when considered as a complete set. If there's a little more structure, say a list of data/value pairs, then formatting with XML or JSON can be useful.
If you're only querying by GroupID and/or CategoryID then there's nothing wrong with it. Normalizing would mean more tables, rows, and joins. So for large databases this can have a negative performance impact.
If you're absolutely certain you'll never need a query which processes Permissions, and it's only parsed by your application, there's nothing improper about this solution. It could also be preferable if you always want the complete set of permissions (i.e. you're not querying just to get part of the string, but always want all of its values).
The problem with the first implementation is that it doesn't actually use an array but a concatenated string.
This means that you won't easily be able to use the value stored in that string to perform set based queries such as finding all people with a specific permission or specific set of permissions.
If you were using a database that natively supported arrays as an atomic value such PostgreSQL then the argument would be different.
Based upon the second requirement of the proposed query I'd have to suggest the second one is best as you can simply query SELECT count(*) FROM Permissions WHERE CategoryID = 42 AND GroupID IN (40, 2, 42) AND PermissionID = 2 (assuming EatAndDrink has an ID of 2). The first version however would require retrieving all the permissions for each group and parsing the string before you can test if it includes the requested permission.