Express a ternary SQL data type - sql

Context
I just met a single table in a PostgreSQL database which is actually only defining a triplet of coded values that are used across the whole database as a ternary data type. I am a bit astonished at first glance, I feel it's weird; there should be some ternary data type?
I've searched the web, especially the PostgreSQL documentation without any apparent success (I'm probably wrong with my search keywords?!), but maybe there is no other solution.
Question
I would like to know if it exists a ternary (as comparison with binary or boolean) data type in PostgreSQL or more generally in SQL which permits to express a "ternary state" (or "ternary boolean" which is clearly is an abuse of language), which I would represent as a general idea as:
+-------+----------+--------------------+
| id | type | also expressed as |
+-------+----------+--------------------+
| 0 | false | 0 |
| 1 | true | 1 |
| 2 | unknown | 2 |
+-------+----------+--------------------+
where unknown can be whatever third state you are actually dealing with.

I would like to know if it exists a ternary (as comparison with binary or boolean) data type
Actually, the boolean data type is ternary because it can have the values true, false and null.
Consider this table:
create table data (some_number int, some_flag boolean);
And the following data:
insert into data (some_number, some_flag)
values (1, true), (2, false), (3, null);
Then the following:
select *
from data
where some_flag = false;
will only return one row (with some_number = 2)

there is not a specific ternary operator but you could use case
select case when operator =0 then 'false'
when operatore =1 then 'true'
when operator = 2 then 'unknow'
else 'not managed'
end
from your_table

I second a_horse_with_no_name's solution for your specific example, but the more general approach is to use an enum data type:
CREATE TYPE ternary AS ENUM (
'never',
'sometimes',
'always'
);
Constants of such a data type are written as string constantls, e.g. 'never', but the internal storage uses 4 bytes per value, regardless of the length of the label.

Related

Store int, float and boolean in same database column

Is there a sane way of storing int, float and boolean values in the same column in Postgres?
If have something like that:
rid
time
value
2d9c5bdc-dfc5-4ce5-888f-59d06b5065d0
2021-01-01 00:00:10.000000 +00:00
true
039264ad-af42-43a0-806b-294c878827fe
2020-01-03 10:00:00.000000 +00:00
2
b3b1f808-d3c3-4b6a-8fe6-c9f5af61d517
2021-01-01 00:00:10.000000 +00:00
43.2
Currently I'm using jsonb to store it, the problem however now is, that I can't filter in the table with for instance the greater operator.
The query
SELECT *
FROM points
WHERE value > 0;
gives back the error:
ERROR: operator does not exist: jsonb > integer: No operator matches the given name and argument types. You might need to add explicit type casts.
For me it's okay to handle boolean as 1 or 0 in case of true or false. Is there any possibility to achieve that with jsonb or is there maybe another super type which lets me use a column that is able to use all three types?
Performance is not so much of a concern here, as I'm going to have very few records inside of that table, max 5k I guess.
If you were just storing integers and floats, normally you'd use a float or numeric column.
But there's that pesky true.
You could cast the JSON...
select *
from test
where value::float > 1;
...but there's that pesky true.
You have to convert the boolean to a number to make it work.
select *
from test
where
(case when value = 'true' then 1.0 when value = 'false' then 0.0 else value::float end) >= 1;
Or ignore it.
This having to work around the type system suggests that value is actually two or even three different fields crammed into one. Consider separating them into multiple columns.
You should skip the rows where value is not number and cast the value to numeric, e.g.:
with points(id, value) as (
values
(1, 'true'::jsonb),
(2, '2'),
(3, '43.2')
)
select *
from points
where jsonb_typeof(value) = 'number'
and value::text::numeric > 0;
id | value
----+-------
2 | 2
3 | 43.2
(2 rows)
I actually found out, regardless of the jsonb fields value, that you can compare it to other jsonb in postgres. That means, I can for instance do the following:
SELECT *
FROM points
WHERE val > '5'
This correctly gives me back only the third row. It just ignores the bool value. To filter for a certain bool I can achieve that with the following query:
SELECT *
FROM points
WHERE val = 'true'
This is good enough for me. I even could hold timestamps in the json column and compare them using this methodology.
Another way of solving the problem after all your comments seem to be to make the column a numeric. This would work as well, but requires more client side conversion, as I would have to have a second type column, remembering what the actual type is. This type should than be used on the client side to convert the value back into its og value. For integers its trivial, for booleans like #schwern suggested, one can use 1 and 0, for dates, one could use the unix timestamp representation.
When I now want to search for a certain value, the type has to be contained in the where clause as well.

How to do a count of fields in SQL with wrong datatype

I am trying to import legacy data from another system into our system. The problem I am having is that the legacy data is dirty- very dirty! We have a field which should be an integer, but sometimes is a varchar, and the field is defined as a varchar...
In SQL Server, how can I do a select to show those records where the data is varchar instead if int?
Thanks
If you want to find rows1 where a column contains any non-digit characters or is longer than 9 characters (either condition means that we cannot assume it would fit in an int, use something like:
SELECT * FROM Table WHERE LEN(ColumnName) > 9 or ColumnName LIKE '%[^0-9]%'
Not that there's a negative in the LIKE condition - we're trying to find a string that contains at least one non-digit character.
A more modern approach would be to use TRY_CAST or TRY_CONVERT. But note that a failed conversion returns NULL and NULL is perfectly valid for an int!
SELECT * FROM Table WHERE ColumnName is not null and try_cast(ColumnName as int) is null
ISNUMERIC isn't appropriate. It answers a question nobody has ever wanted to ask (IMO) - "Can this string be converted to any of the numeric data types (I don't care which ones and I don't want you to tell me which ones either)?"
ISNUMERIC('$,,,,,,,.') is 1. That should tell you all you need to know about this function.
1If you just want a count, as per the title of the question, then substitute COUNT(*) for *.
In SQL Server, how can I do a select to show those records where the data is varchar instead of int?
I would do it like
CREATE TABLE T
(
Data VARCHAR(50)
);
INSERT INTO T VALUES
('102'),
(NULL),
('11Blah'),
('5'),
('Unknown'),
('1ThinkPad123'),
('-11');
SELECT Data -- Per the title COUNT(Data)
FROM
(
SELECT Data,
cast('' as xml).value('sql:column("Data") cast as xs:int ?','int') Result
FROM T --You can add WHERE Data IS NOT NULL to exclude NULLs
) TT
WHERE Result IS NULL;
Returns:
+----+--------------+
| | Data |
+----+--------------+
| 1 | NULL |
| 2 | 11Blah |
| 3 | Unknown |
| 4 | 1ThinkPad123 |
+----+--------------+
That if you can't use TRY_CAST() function, if you are working on 2012+ version, I'll recommend that you use TRY_CAST() function like
SELECT Data
FROM T
WHERE Data IS NOT NULL
AND
TRY_CAST(Data AS INT) IS NULL;
Demo
Finally, I would say do not use ISNUMERIC() function because of (from docs) ...
Note
ISNUMERIC returns 1 for some characters that are not numbers, such as plus (+), minus (-), and valid currency symbols such as the dollar sign ($). For a complete list of currency symbols, see money and smallmoney (Transact-SQL).

Oracle - Map column data to a value

Let me first point out that my question is going to be very very close to this question: map-column-data-to-a-value-oracle
Please quickly read that one first.
Now in my case I need the exact same thing but not as the primary query. Instead I need the information as one part of my query.
I have this table:
someId | someValue | dataType
1 | 500 | 1
2 | someValue | 2
And I know that dataType "1" means "Integer". I also know the meaning of the other values in the dataType column.
So I want to select all entries in the table but have their dataTypes as their human readable values instead of their numbers:
Results:
1, 500, Integer
2, someString, String
Trying to apply the solution of the question I linked, I created a subquery like
SELECT
someId,
someValue,
(
SELECT CASE
WHEN dataType = 1 THEN 'INTEGER'
WHEN dataType = 2 THEN 'FLOAT'
WHEN dataType = 3 THEN 'TEXT'
ELSE 'DATE'
END
myTable
) as myDataType
I will get a subquery that returns more than 1 result and Oracle will complain.
Since I access the DB through SQL directly, I need a "pure SQL" solution. Otherwise I could just parse the value through a mapping, in say PHP. But that's not possible here. I am shooting some queries at a DB to try and gather information about the data and structure, which we don't know about. So only SQL is available.
Get rid of the subquery:
SELECT someId,
someValue,
CASE
WHEN dataType = 1 THEN 'INTEGER'
WHEN dataType = 2 THEN 'FLOAT'
WHEN dataType = 3 THEN 'TEXT'
ELSE 'DATE'
END as Datatype
from myTable

Why does this SQL statement work the way it does?

I've been looking to find a good way to do a SQL query where I can have a where statement which, if left blank, will act like it wasn't there at all. I've found this, which seems to work quite well:
WHERE (Column = #value OR #value is null)
If I specify a value for #value, the search is filtered like I want. But if I pass in null, it's like saying #value can be anything. I like this. This is good. What I don't understand though is, why does this work like this?
If #value is null, your WHERE clause:
WHERE (Column = #value OR #value is null)
reduces to
WHERE (Column = #value OR 1=1)
(This is similar to if (Column == value || true) in other common languages)
An OR conjunction is true if either of its operands (sides) are true: SQL uses three valued logic
+---------+------+---------+---------+
| A OR B | TRUE | Unknown | FALSE |
+---------+------+---------+---------+
| TRUE | TRUE | TRUE | TRUE |
| Unknown | TRUE | Unknown | Unknown |
| FALSE | TRUE | Unknown | FALSE |
+---------+------+---------+---------+
And so:
If #value is null, the right side of your WHERE clause is true, so the entire conditional is true, and your WHERE clause will always be satisfied.
If #value isn't null, then the right side of your WHERE clause is false, then whether the WHERE clause is satisfied depends on whether Column = #value.
Well, there are two cases:
1) #value is "something".
In this case, the second clause is always false, because "something" is never null. So all that effectively remains is WHERE Column = #value.
2) #value is null
In this case, the second clause is always false, because null never equals anything. So all that effectively remains is WHERE #value is null and #value is known to be null, so this is like WHERE 1 = 1 and the whole WHERE is ignored. The database should be clever enough to figure this out before touching any data, so this should perform just like if there was no condition specified at all.
So what you have here, is a single SQL statement that can act like two, with an "optional WHERE". The advantage over two separate SQL statements for the two cases is that you don't need conditional logic in your application when building the SQL statement (which can get really hairy if there are more than one of these "toggles").

Boolean vs tinyint(1) for boolean values in MySQL

What column type is best to use in a MySQL database for boolean values? I use boolean but my colleague uses tinyint(1).
These data types are synonyms.
I am going to take a different approach here and suggest that it is just as important for your fellow developers to understand your code as it is for the compiler/database to. Using boolean may do the same thing as using tinyint, however it has the advantage of semantically conveying what your intention is, and that's worth something.
If you use a tinyint, it's not obvious that the only values you should see are 0 and 1.
A boolean is ALWAYS true or false.
boolean isn't a distinct datatype in MySQL; it's just a synonym for tinyint. See this page in the MySQL manual.
See the quotes and examples down below from the dev.mysql.com/doc/
BOOL, BOOLEAN These types are synonyms for TINYINT(1). A value of zero
is considered false. Nonzero values are considered true:
mysql> SELECT IF(0, 'true', 'false');
+------------------------+
| IF(0, 'true', 'false') |
+------------------------+
| false |
+------------------------+
mysql> SELECT IF(1, 'true', 'false');
+------------------------+
| IF(1, 'true', 'false') |
+------------------------+
| true |
+------------------------+
mysql> SELECT IF(2, 'true', 'false');
+------------------------+
| IF(2, 'true', 'false') |
+------------------------+
| true |
+------------------------+
However, the values TRUE and FALSE are merely aliases for 1 and 0, respectively, as shown here:
mysql> SELECT IF(0 = FALSE, 'true', 'false');
+--------------------------------+
| IF(0 = FALSE, 'true', 'false') |
+--------------------------------+
| true |
+--------------------------------+
mysql> SELECT IF(1 = TRUE, 'true', 'false');
+-------------------------------+
| IF(1 = TRUE, 'true', 'false') |
+-------------------------------+
| true |
+-------------------------------+
mysql> SELECT IF(2 = TRUE, 'true', 'false');
+-------------------------------+
| IF(2 = TRUE, 'true', 'false') |
+-------------------------------+
| false |
+-------------------------------+
mysql> SELECT IF(2 = FALSE, 'true', 'false');
+--------------------------------+
| IF(2 = FALSE, 'true', 'false') |
+--------------------------------+
| false |
+--------------------------------+
The last two statements display the results shown because 2 is equal
to neither 1 nor 0.
Personally I would suggest use tinyint as a preference, because boolean doesn't do what you think it does from the name, so it makes for potentially misleading code. But at a practical level, it really doesn't matter -- they both do the same thing, so you're not gaining or losing anything by using either.
use enum its the easy and fastest
i will not recommend enum or tinyint(1) as bit(1) needs only 1 bit for storing boolean value while tinyint(1) needs 8 bits.
ref
TINYINT vs ENUM(0, 1) for boolean values in MySQL
While it's true that bool and tinyint(1) are functionally identical, bool should be the preferred option because it carries the semantic meaning of what you're trying to do. Also, many ORMs will convert bool into your programing language's native boolean type.
My experience when using Dapper to connect to MySQL is that it does matter. I changed a non nullable bit(1) to a nullable tinyint(1) by using the following script:
ALTER TABLE TableName MODIFY Setting BOOLEAN null;
Then Dapper started throwing Exceptions. I tried to look at the difference before and after the script. And noticed the bit(1) had changed to tinyint(1).
I then ran:
ALTER TABLE TableName CHANGE COLUMN Setting Setting BIT(1) NULL DEFAULT NULL;
Which solved the problem.
Whenever you choose int or bool it matters especially when nullable column comes into play.
Imagine a product with multiple photos. How do you know which photo serves as a product cover? Well, we would use a column that indicates it.
So far out product_image table has two columns: product_id and is_cover
Cool? Not yet. Since the product can have only one cover we need to add a unique index on these two columns.
But wait, if these two column will get an unique index how would you store many non-cover images for the same product? The unique index would throw an error here.
So you may though "Okay, but you can use NULL value since these are ommited by unique index checks", and yes this is truth, but we are loosing linguistic rules here.
What is the purpose of NULL value in boolean type column? Is it "all", "any", or "no"? The null value in boolean column allows us to use the unique index, but it also messes up how we interpret the records.
I would tell that in some cases the integer can serve a better purpose since its not bound to strict true or false meaning