Multidimensional SSAS cube: using local (danish) characters - ssas

I have a dimension called dim_Person. I have values in the Name column (or attribute) which contains danish characters. I have found out that if I have two rows in my datawarehouse table with the same name, but spelled in danish and english, I will get an error. For example:
Surrogate_Key FirstName
1 Ægir
2 Aegir
I will get an error, saying that my FirstName attribute with the value 'Aegir' fails as the cube cannot insert a dublicate key row.
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dim_Person', Column: 'FirstName', Value: 'Aegir'. The attribute is 'First Name'.
I have figured out that if I change all 'Ægir' to 'Aegir' (or vice versa) in my datawarehouse source table I have no problems processing the dimension. But if the two names co-exists it will not process.
I assume that behind the scenes, all values are stored in an (for the developer) unknown table. It is as if it looks up: Does the value 'Aegir' exists? and get a 'no it does not' returned. Then it tries to insert the value, but the 'Ae' is converted to an 'Æ' (or vice versa) and then it fails.
For the moment I have converted all special characters in the source table to english characters, but I would like to know: is there a way I can set up my project so the two names can co-exists in the same dimension?

I have the exact same problem with Æ and get the same error message when trying to process the SSAS dimension. In my case, the problem is the Danish city of: Vedbæk
And I solved it: I changed the collation property (Properties -> KeyColumns -> [open up the attribute] -> Collation) from Accent Sensitive to Binary 2

Related

Default Value for NOT NULL VARCHAR in DB2

I have a RDBMS table in DB2 where a column is defined as "ARMTST" VARCHAR(10) NOT NULL DEFAULT
I was checking data on table and see rows with no value in this column ( Via Toad Viewer) but when I run this query , SELECT COUNT ( * ) FROM ACTIVITY WHERE ARMTST IS NULL; I get result as zero rows.
Attached is screen shot for SELECT ARMTST FROM ACTIVITY ; which shows empty column values for certain rows.
Are these columns not empty even though shown so in UI? Default value is not specified in CREATE TABLE script.
I don't think that code will be able to insert empty values.
An empty string is not null (well, except for on Oracle, but people hate them for that). Often in databases, null is used to represent "we don't know (yet)", while empty is "not present".
Consider middle names. Many people in America (and other countries) have middle names (not your given name, not your family name).
If you (casually) ask me for my name and I respond with only a first and last name (likely, in my case, or you might only get my first name!), what do you know about my middle name? Nothing. You don't know if I have one. This is null - you don't know if I have one, and if so, what it is.
But if you ask me "officially" (like for legal reasons), I'm obligated to use my full name - if I have one (or more), I have to include my middle name. So there's two outcomes here:
I have no middle name. That result is blank - we know the answer, and it was nothing.
I have a middle name. The result is whatever my name is.
Or consider signing up for a website. Before signing up, you are not in their system. Your entire record is null - it doesn't exist. After signing up, though, you get a 0 post count.
So now you should have enough information for your question.
Obviously, because the column was defined as NOT NULL, you can't put a null there. So the system has to default to something else. Since including any data (including a space character - yes, there is one) would make a poor default, the system chooses an empty string. And since the empty string makes an acceptable default, it will also be an acceptable data value to insert.

pentaho database join error to match input data

I have an input.csv file in which I have field "id" .
I need to do a database lookup with below logic.
I need to search whether the "id" is present in the field "supp_text"and extract the field "loc_id".
Eg,
id = 12345.
and, in my supp_text, I have the value "the value present is 12345".
I am using "Database join" function to do this.
viz.
*select loc_id from SGTABLE where supp_text like '%?%';*
and, i am passing "id" as a parameter.
I get the below error when I run.
"Couldn't get field info from [select LOC_ID from SGTABLE WHERE SUPP_TEXT like '%?%']"
offending row : [ID String(5)]
all inputs are string, and table fields are "VARCHAR".
.
I tried with "database lookup option too. But it does not have an option to match substring within a string.
Please help.
The JDBC driver is not replacing the parameter within the string. You must make the wildcard string first and pass the whole thing as a parameter. Here is a quick transform I threw together that does just that:
Note that in the Database Join step the SQL does not have '' quotes around it. Note also that unless used properly, the Database Join step can be a performance killer. This however, looks to be a reasonable use of it if there are going to be a lot of different wildcard values to use (unlike in my transform).

Table or column name cannot start with numeric?

I tried to create table named 15909434_user with syntax like below:
CREATE TABLE 15909434_user ( ... )
It would produced error of course. Then, after I tried to have a bit research with google, I found a good article here that describe:
When you create an object in PostgreSQL, you give that object a name. Every table has a name, every column has a name, and so on. PostgreSQL uses a single data type to define all object names: the name type.
A value of type name is a string of 63 or fewer characters. A name must start with a letter or an underscore; the rest of the string can contain letters, digits, and underscores.
...
If you find that you need to create an object that does not meet these rules, you can enclose the name in double quotes. Wrapping a name in quotes creates a quoted identifier. For example, you could create a table whose name is "3.14159"—the double quotes are required, but are not actually a part of the name (that is, they are not stored and do not count against the 63-character limit). ...
Okay, now I know how to solve this by use this syntax (putting double quote on table name):
CREATE TABLE "15909434_user" ( ... )
You can create table or column name such as "15909434_user" and also user_15909434, but cannot create table or column name begin with numeric without use of double quotes.
So then, I am curious about the reason behind that (except it is a convention). Why this convention applied? Is it to avoid something like syntax limitation or other reason?
Thanks in advance for your attention!
It comes from the original sql standards, which through several layers of indirection eventually get to an identifier start block, which is one of several things, but primarily it is "a simple latin letter". There are other things too that can be used, but if you want to see all the details, go to http://en.wikipedia.org/wiki/SQL-92 and follow the links to the actual standard ( page 85 )
Having non numeric identifier introducers makes writing a parser to decode sql for execution easier and quicker, but a quoted form is fine too.
Edit: Why is it easier for the parser?
The problem for a parser is more in the SELECT-list clause than the FROM clause. The select-list is the list of expressions that are selected from the tables, and this is very flexible, allowing simple column names and numeric expressions. Consider the following:
SELECT 2e2 + 3.4 FROM ...
If table names, and column names could start with numerics, is 2e2 a column name or a valid number (e format is typically permitted in numeric literals) and is 3.4 the table "3" and column "4" or is it the numeric value 3.4 ?
Having the rule that identifiers start with simple latin letters (and some other specific things) means that a parser that sees 2e2 can quickly discern this will be a numeric expression, same deal with 3.4
While it would be possible to devise a scheme to allow numeric leading characters, this might lead to even more obscure rules (opinion), so this rule is a nice solution. If you allowed digits first, then it would always need quoting, which is arguably not as 'clean'.
Disclaimer, I've simplified the above slightly, ignoring corelation names to keep it short. I'm not totally familiar with postgres, but have double checked the above answer against Oracle RDB documentation and sql spec
I'd imagine it's to do with the grammar.
SELECT 24*DAY_NUMBER as X from MY_TABLE
is fine, but ambiguous if 24 was allowed as a column name.
Adding quotes means you're explicitly referring to an identifier not a constant. So in order to use it, you'd always have to escape it anyway.

Why am I getting a "[SQL0802] Data conversion of data mapping error" exception?

I am not very familiar with iseries/DB2. However, I work on a website that uses it as its primary database.
A new column was recently added to an existing table. When I view it via AS400, I see the following data type:
Type: S
Length: 9
Dec: 2
This tells me it's a numeric field with 6 digits before the decimal point, and 2 digits after the decimal point.
When I query the data with a simple SELECT (SELECT MYCOL FROM MYTABLE), I get back all the records without a problem. However, when I try using a DISTINCT, GROUP BY, or ORDER BY on that same column I get the following exception:
[SQL0802] Data conversion of data mapping error
I've deduced that at least one record has invalid data - what my DBA calls "blanks" or "4 O". How is this possible though? Shouldn't the database throw an exception when invalid data is attempted to be added to that column?
Is there any way I can get around this, such as filtering out those bad records in my query?
"4 O" means 0x40 which is the EBCDIC code for a space or blank character and is the default value placed into any new space in a record.
Legacy programs / operations can introduce the decimal data error. For example if the new file was created and filled using the CPYF command with the FMTOPT(*NOCHK) option.
The easiest way to fix it is to write an HLL program (RPG) to read the file and correct the records.
The only solution I could find was to write a script that checks for blank values in the column and then updates them to zero when they are found.
If the file has record format level checking turned off [ie. LVLCHK(*NO)] or is overridden to that, then an HLL program. (ex. RPG, COBOL, etc) that was not recompiled with the new record might write out records with invalid data in this column, especially if the new column is not at the end of the record.
Make sure that all programs that use native I/O to write or update records on this file are recompiled.
I was able to solve this error by force-casting the key columns to integer. I changed the join from this...
FROM DAILYV INNER JOIN BXV ON DAILYV.DAITEM=BXV.BXPACK
...to this...
FROM DAILYV INNER JOIN BXV ON CAST(DAILYV.DAITEM AS INT)=CAST(BXV.BXPACK AS INT)
...and I didn't have to make any corrections to the tables. This is a very old, very messy database with lots of junk in it. I've made many corrections, but it's a work in progress.

Understanding user-defined data types in SQL

i am trying to understand the architecture of PUBS Database sample by Microsoft
In there, I am looking at au_id Column, who has user-defined datatype id:varchar(11).
So, if I understand clearly, varchar(11) means it allows to enter 11 characters in the cell. But if I enter
11 alphanumeric characters, it gives error.
11 numeric characters, it gives error
But if I enter the characters in a US Telephone number format i.e. 123-54-2345, it Works
Again, if I enter the dashes(hyphens) in some other order i.e. 1234-5-4544, it again shows error
Why does this happen ? Do they have some method to validate this entry. I can only find a user-defined datatype called id in the User-Defined Data Type Folder
Thank you in advance.
Okay, just found the script that will create the pubs database.
The au_id column on authors is defined as:
CREATE TABLE authors
(
au_id id
CHECK (au_id like '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]')
CONSTRAINT UPKCL_auidind PRIMARY KEY CLUSTERED,
/* More columns */
It's the CHECK constraint that's rejecting your invalid values, rather than anything connected with the user-defined type. If you examine the error messages, it probably mentions that it's a CHECK constraint that's failing.
(BTW - I'd assumed that this was SSN format, not telephone numbers - anyone confirm?)
User defined types in SQL Server (other than table types) don't offer much value - all they really do is associated a shorthand name for a built-in type with all scale/precision/length options fixed.
They would be tremendously useful if the system would let you set up strict types - such that two values of the same underlying type, but with different type names, are not comparable/assignable - you'd get far better warnings/errors rather than queries proceding with mis-aligned joins, for example.