Storing multiple values in one column vs. Multiple columns for multiple values

Storing multiple values in one column vs. Multiple columns for multiple values - sql

With regards to Performance & 'The proper way of doing things' and trying to figure out which way is better to store configuration data in a SQL database. Assume you have website configuration data for setting a minimum and maximum age that a person must be to access the site.
CREATE TABLE SiteConfig
(
featureName varchar(100),
value varchar(100),
)
Which is better:
To store it all in a single row and process it in PHP with explode();.
featureName: "ageRequirement"
value: "13|60"
To store it in separate rows for each feature and just SELECT the feature you need when needed.
featureName: "minAge"
value: "13"
featureName: "maxAge"
value: "60"
Due to the amount of features of the website, this is a difference of having 60 ROWS of data vs. about 25.

Store it has two separate features.
You may want to access the feature in the database, rather than in the application code.
There is no reason to parse a comma delimited string to get two features.
Admittedly, in the case of just two numbers separated by a comma, not much can go wrong. This is, however, a slippery slope and you might soon find yourself trying to stuff multiple fields into a single string, and then devoting a lot of resources to parsing the string.

If those are values that you never will use by themselves in the database, either would work.
If you suspect that you would ever want to use the value in the database, you would not want them in the same value. That would make querying the data complicated and inefficient.

Related

"Numeric value '' is not recognized" - what column?

I am trying to insert data from a staging table into the master table. The table has nearly 300 columns, and is a mix of data-typed Varchars, Integers, Decimals, Dates, etc.
Snowflake gives the unhelpful error message of "Numeric value '' is not recognized"
I have gone through and cut out various parts of the query to try and isolate where it is coming from. After several hours and cutting every column, it is still happening.
Does anyone know of a Snowflake diagnostic query (like Redshift has) which can tell me a specific column where the issue is occurring?

Unfortunately not at the point you're at. If you went back to the COPY INTO that loaded the data, you'd be able to use VALIDATE() function to get better information to the record and byte-offset level.
I would query your staging table for just the numeric fields and look for blanks, or you can wrap all of your fields destined for numeric fields with try_to_number() functions. A bit tedious, but might not be too bad if you don't have a lot of numbers.
https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html
As a note, when you stage, you should try and use the NULL_IF options to get rid of bad characters and/or try to load them into stage using the actual datatypes in your stage table, so you can leverage the VALIDATE() function to make sure the data types are correct before loading into Snowflake.

Query your staging using try_to_number() and/or try_to_decimal() for number and decimal fields of the table and the use the minus to get the difference
Select $1,$2,...$300 from #stage
minus
Select $1,try_to_number($2)...$300 from#stage
If any number field has a string that cannot be converted then it will be null and then minus should return those rows which have a problem..Once you get the rows then try to analyze the columns in the result set for errors.

data sanitization/clean-up

Just wondering…
We have a table where the data in certain fields is alphanumeric, comprising a 1-2 digit alpha followed by a 1-2 digit number e.g. x2, x53, yz1, yz95
The number of letters added before the number can be determined by the field so that certain fields will always have the same 1 letter added before the number while others will always have the same 2 letters.
For each field, the actual letters and number of letters added (1 or 2) are always the same, thus, we can always tell which letters appear before the numbers just via the field names.
For the purposes of all downstream data analysis, it is only ever the numeric value from the string which is important.
Sql queries are constructed dynamically behind a user form where the final sql can take many forms depending on which selections and switches the user has chosen. With this, the VBA generating the sql constructs is fairly involved, containing many conditions/variable pathways to the final sql construct.
With this, it would make the VBA and sql much easier to write, read, debug, and perhaps increase the sql execution speed, etc. – if we were only dealing with a numeric datatype e.g. I wouldn’t need to accommodate the many apostrophes within the numerous lines of “strSQL = strSQL & …”
Given that the data itself being analysed is a copy that’s imported via regular .csv extracts from a live source, would it be acceptable to pre sanitize/clean-up these fields around the import stage by converting the data within to numeric values and field datatypes?
- perhaps either by modifying the sql used to generate the extract or by modifying the schema/vba process used to import the extract into the analysis table e.g. using something like a Replace function such as “ = Replace(OriginalField,”yz”,””) “ to strip out the yz characters.

Yes, link the csv "as is", and for each linked table create a straight select query that does the sanitization, like:
Select
Val(Mid([Field1], 2)) As NumField1,
Val(Mid([Field2], 1)) As NumField2,
etc.
Val(Mid([FieldN], 2)) As NumFieldN
From
YourLinkedCsvTable
then use this query throughout your application when you need the data.

Finding Non-Numeric Characters in a string

I found a couple threads on this topic but the answers seemed a little too complex for what I'm trying to do.
I am using SQL Developer and have a table, lets call it HR_Employees. This table I have to use to populate a fact table, we will call it Fact_Key. This fact table has 3 columns, including a description as well as the Employee_Key column (which is the primary key).
The task I have is to figure out a way to create the employee_key off of a Department_ID field. For half of the company this is automatically generated based on some payroll information and pushed into the Fact_Key table however for the other half (which is merging into the first half) they do not use the same payroll system so this wouldn't work for us. I have to take the Department_ID and use that to create an employee_key by concatenating it with other fields (employee_number for instance). See below:
INSERT INTO FACT_KEY (Employee_Key, Description)
SELECT RPAD(Employee_Number||Deptartment_ID, 15,'0'), EmployeeJob_Code
FROM HR_EMPLOYEES
I am padding this to keep all keys at a consistent 15 character length. The problem I am experiencing is the Employee_Key field is a numeric data type but the Department_ID has non-numeric characters in it at random places. I need a way to remove those characters or preferably change them into their ASCII counterparts. Is there a simple way to do this?
Everything I've seen appears too complex and I was looking for a simpler way to do this (involving just one or two functions) as this is going into a stored procedure and I was directed to keep it simple. Any help would be appreciated, Thanks!

Just use regexp_replace, example:
select REGEXP_REPLACE('123A45B67CCC89', '[^[:digit:]]','') from dual
returns: 123456789

MySQL command to search CSV (or similar array)

I'm trying to write an SQL query that would search within a CSV (or similar) array in a column. Here's an example:
insert into properties set
bedrooms = 1,2,3 (or 1-3)
title = nice property
price = 500
I'd like to then search where bedrooms = 2+. Is this even possible?

The correct way to handle this in SQL is to add another table for a multi-valued property. It's against the relational model to store multiple discrete values in a single column. Since it's intended to be a no-no, there's little support for it in the SQL language.
The only workaround for finding a given value in a comma-separated list is to use regular expressions, which are in general ugly and slow. You have to deal with edge cases like when a value may or may not be at the start or end of the string, as well as next to a comma.
SELECT * FROM properties WHERE bedrooms RLIKE '[[:<:]]2[[:>:]]';
There are other types of queries that are easy when you have a normalized table, but hard with the comma-separated list. The example you give, of searching for a value that is equal to or greater than the search criteria, is one such case. Also consider:
How do I delete one element from a comma-separated list?
How do I ensure the list is in sorted order?
What is the average number of rooms?
How do I ensure the values in the list are even valid entries? E.g. what's to prevent me from entering "1,2,banana"?
If you don't want to create a second table, then come up with a way to represent your data with a single value.
More accurately, I should say I recommend that you represent your data with a single value per column, and Mike Atlas' solution accomplishes that.

Generally, this isn't how you should be storing data in a relational database.
Perhaps you should have a MinBedroom and MaxBedroom column. Eg:
SELECT * FROM properties WHERE MinBedroom > 1 AND MaxBedroom < 3;

Theory of storing a number and text in same SQL field

I have a three tables
Results:
TestID
TestCode
Value
Tests:
TestID
TestType
SysCodeID
SystemCodes
SysCodeID
ParentSysCodeID
Description
The question I have is for when the user is entering data into the results table.
The formatting code when the row gets the focus changes the value field to a dropdown combobox if the testCode is of type SystemList. The drop down has a list of all the system codes that have a parentsyscodeID of the test.SysCodeID. When the user chooses a value in the list it translates into a number which goes into the value field.
The datatype of the Results.Value field is integer. I made it an integer instead of a string because when reporting it is easier to do calculations and sorting if it is a number. There are issues if you are putting integer/decimal value into a string field. As well, when the system was being designed they only wanted numbers in there.
The users now want to put strings into the value field as well as numbers/values from a list and I'm wondering what the best way of doing that would be.
Would it be bad practice to convert the field over to a string and then store both strings and integers in the same field? There are different issues related to this one but i'm not sure if any are a really big deal.
Should I add another column into the table of string datatype and if the test is a string type then put the data the user enters into the different field.
Another option would be to create a 1-1 relationship to another table and if the user types in a string into the value field it adds it into the new table with a key of a number.
Anyone have any interesting ideas?

What about treating Results.Value as if it were a numeric ValueCode that becomes an foreign key referencing another table that contains a ValueCode and a string that matches it.
CREATE TABLE ValueCodes
(
Value INTEGER NOT NULL PRIMARY KEY,
Meaning VARCHAR(32) NOT NULL UNIQUE
);
CREATE TABLE Results
(
TestID ...,
TestCode ...,
Value INTEGER NOT NULL FOREIGN KEY REFERENCES ValueCodes
);
You continue storing integers as now, but they are references to a limited set of values in the ValueCodes table. Most of the existing values appear as an integer such as 100 with a string representing the same value "100". New codes can be added as needed.

Are you saying that they want to do free-form text entry? If that's the case, they will ruin the ability to do meaningful reporting on the field, because I can guarantee that they will not consistently enter the strings.
If they are going to be entering one of several preset strings (for example, grades of A, B, C, etc.) then make a lookup table for those strings which maps to numeric values for sorting, evaluating, averaging, etc.
If they really want to be able to start entering in free-form text and you can't dissuade them from it, add another column along the lines of other_entry. Have a predefined value that means "other" to put in your value column. That way, when you're doing reporting you can either roll up all of those random "other" values or you can simply ignore them. Make sure that you add the "other" into your SystemCodes table so that you can keep a foreign key between that and the Results table. If you don't already have one, then you should definitely consider adding one.
Good luck!

The users now want to put strings into
the value field as well as
numbers/values from a list and I'm
wondering what the best way of doing
that would be.
It sounds like the users want to add new 'testCodes'. If that is the case why not just add them to your existing testcode table and keep your existing format.
Would it be bad practice to convert
the field over to a string and then
store both strings and integers in
the same field? There are different
issues related to this one but i'm not
sure if any are a really big deal.
No it's not a big deal. Often PO numbers or Invoice numbers have numbers or a combination of letters and numbers. You are right however about the performance of the database on a number field as opposed to a string, but if you index the string field you end up with the database doing it's scans on numeric indexes anyway.
The problems you may have had with your decimals as strings probably have to do with the floating point data types in which the server essentially estimates the value of the field and only retains accuracy to a certain number of digits. This can lead to a whole host of rounding errors if you are concerned about the digits. You can avoid that issue by using currency fields or the like that have static accuracy of the decimals. lol I learned this the hard way.
Tom H. did a great job addressing everything else.

I think the easiest way to do it would be to convert Results.Value to a "string" (char, varchar, whatever). Yes, this ruins the ability to do numeric sorting (and you won't be able to do a cast or convert on the column any longer since text will be intermingled with integer values), but I think any other method would be too complex to maintain properly. (For example, in the 1-1 case you mentioned, is that integer value the actual value or a foreign key to the string table? Now we need another column to determine that.)

I would create the extra column for string values. It's not true normalization but it's the easiest to implement and to work with.
Using the same field for both numbers and strings would work to as long as you don't plan on doing anything with the numbers like summing or sorting.
The extra table approach while good from a normalization standpoint is probably overly complex.

I'd convert the value field to string and add a column indicating what the datatype should be treated as for post processing and reporting.

Sql Server at least has an IsNumeric function you can use:
ORDER BY IsNumeric(Results.Value) DESC,
CASE WHEN IsNumeric(Results.Value) = 1 THEN Len(Results.Value) ELSE 99 END,
Results.Value

One of two solutions comes to mind. It kind of depends on what you're doing with the numbers. If they just represent a choice of some kind, then pick one. If you need to do math on it (sorting, conversion, etc..) then pick another.
Change the column to be a varchar, and then either put numbers or text in it. Sorting numerically will suck, but hey, it's one column.
Have both a varchar column for the text, and an int column for the number. Use a view to hide the differences, and to control the sorting if necessary. You can coalesce the two columns together if you don't care about whether you're looking at numbers or text.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas