As it commonly known, it is not recommended by SAP to use 255+ character fields in transparent tables. One should use several 255 fields instead, wrap text in LCHR, LRAW or STRING, or use SO10 text etc.
However, while maintaining legacy (and ugly) developments, such problem often arises: how to view what is stored in char500 or char1000 field in database?
The real life scenario:
we have a development where some structure written and read from char1000 field in transparent table
we know field structure and parsing the field through CL_ABAP_CONTAINER_UTILITIES=>FILL_CONTAINER_C or SO_STRUCT_TO_CHAR goes fine, all fields are put wonderfully
displaying the fields via SE11/SE16/SE16n gives nothing as the field is truncated to 255, and to 132 in debugger, AFAIR.
Is there any standard tool, transaction or FM we can use to display such long field?
In the DBA cockpit (ST04), there is a SQL command line, where you can enter directly the "native" SQL commands and display the result as an ALV view. With a substring function, you can split a field into several sections (expl: select substr(sql_text,1,100) s1, substr(sql_text,101,100) s2, substr(sql_text,201,100) s3, substr(sql_text,301,100) s4 from dba_hist_sqltext where sql_id = '0cuyjatkcmjf0'). PS: every ALV cell is 128 characters maximum.
Not sure whether this tool is available for all supported database softwares.
There is also an equivalent program named RSDU_EXEC_SQL (in all ABAP-based systems?)
Unfortunately, they won't work for ersatz of tables by SAP (clustered tables and so on) as they can be queried only with ABAP "Open SQL".
If you have an ERP system to you hand check transaction PP01 out with infotype 1002. Basically They store text in table HRP1002 and HRT1002 and create a special view with an text editor. It looks like this: http://www.sapfunctional.com/HCM/Positions/Page1.13.jpg
In debugger you can switch the view to e.g. HTML and you should see the whole string, but editing is limited as far as i know to a certain number of charachters.
Related
I have a text editor on webpage. It contains function like Bold, Italics, Highlight. So a text may contain any of these. It may even contain numbered or unnumbered lists.
The text editor generates HTML for the formatted text.
Due to this, the format text data (html) is atleast 60% more than what unformatted text would have been.
This consumes lot of space (in terms of characters) which leads to space hungry database.
Is there a way to compress or some other way to store this efficiently ?
There is no built-in compression function in Db2. But you may write your own external functions (using Java or C/C++) to implement such a functionality. I can provide a java example (using java.util.zip package) of such an implementation, if you are interesting.
Another way is to use Db2 Row compression. Db2 may compress any non-LOB columns and so called "inlined" LOBs.
Storing LOBs inline in table rows
If you store your data as XML in a Db2 XML data-type column, it will be stored in a more efficient form than raw text
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.db2.luw.xml.doc/doc/c0022770.html
With Firebird 2.5.8, and a table with a dozen of blob fields, I have this weird behavior querying this way:
SELECT *
FROM TABLE
WHERE BLOBFIELD4 LIKE '%SOMETEXT%'
and I get results though SOMETEXT is actually in a different column and not in BLOBFIELD4 (happens with every blob column).
What am I missing?
Thanks for the data. I made few fast tests using latest IB Expert with Firebird 2.5.5 (what i had on hands).
It seems that you actually have much more data, than you might think you have.
First of all - it is a bad, dangerous practice to keep text data in columns marked as CHARSET NONE ! Make sure that your columns are marked with some reasonable charset, like Windows 1250 or UTF8 or something. And also that the very CONNECTION of your all applicationa (including development tools) to the database server also has some explicitly defined character set that suits your textual data.
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
Or, if you want those BLOBs be seen as binary - then explitly create them as SUB_TYPE BINARY not SUB_TYPE TEXT
However, here is the simple script to run on your database.
alter table comm
add NF_VC VARCHAR(4000) CHARACTER SET UTF8,
add NF_BL BLOB SUB_TYPE 1 SEGMENT SIZE 4096 CHARACTER SET UTF8
then
update comm
set nf_vc = '**' || com1 || '**'
then
update comm
set nf_bl = '##' || nf_vc || '##'
Notice, i intentionally force Firebird to do conversion BLOB -> VARCHAR -> BLOB.
Just to be on a safe side.
Now check some data.
select id_comm, nf_vc
from comm where
nf_vc containing 'f4le dans 2 ans'
and
select id_comm, nf_bl
from comm where
nf_bl containing 'f4le dans 2 ans'
What do you see now?
On the first picture we see that very mystery - the line is selected, but we can not see your search pattern in it, the "f4le dans 2 ans".
BUT !!!
Can you see the marks, the double asterisks, the ** ?
Yes, you can, in the beginning! But you can not see them in the ending!!!
That means, you DO NOT see the whole text, but only some first part of it!
On the second picture - you see the very same row ID=854392, but re-converted back to BLOB and additionally marked with ## at both ends.
Can you see the marks on both start and end?
Can you see your search pattern?
Yes and yes - if you look at the grid row (white).
No and no - if you look and the tooltip (yellow).
So, again, the data you search for - it DOES exist. But you just fail to see it for some reason.
Now, when may be a typical reason the string is not displayed completely?
It can be the zero-value byte (or several bytes, UNICODE codepoint), the way C language marks the end of line, the custom that is vastly used in Windows and many libraries and programs. Or maybe some other unusual value (EOF, EOT, -1, etc), that makes those programs you use falsely detect the end of the text where it did not actually ended yet.
Look at the two screenshots again, where is that, that lines start to differ? it is after \viewkind4 ... \par} and before pard. Notice the weird anomaly! that said pard should start with reversed slash - \ - to be a vaild RTF command. But it is instead prepended with something invisible, something blank. What can it be?...
Let us go back to your original query in your comments.
Also, it is bad practice to put important details into comments! They are hard to find there for any person, that was not tracking the story from the very start. And the more comments added, the harder it gets. The proper avenue for you would have been to EDIT the question adding the new data into the quesiton body, and then adding a comment (for notification sake) saying the question was edited. Please, in future, add new data that way.
select id_comm, COM1
from comm where
COM1 containing 'f4le dans 2 ans'
On the first glance our fishing ended with nothing, we see the text that does not have your pattern, ending at that very \par}.
But is it so? Switch into binary view, and....
Voila! What is there before the found-lost-found-again pard? there is that very ZERO BYTE i talked about earlier.
So, what did happenned, to wrap it up.
Firebird is correct, the data is found because the data is really there, in the BLOBs.
Your applications, reading the data, are not correct. Being confused with zero byte they show you only part of data, not all of data.
Your application, writing the data, might be not correct. Or the data itself.
How did that zero byte ended there? Why RTF structure was corrupt, lacking the reversed slash before pard? Was data size, you passed to the server when inserting that data, larger than it should had been, passing some garbage after a meaningful data? Was data size correct, but the data contents corrupt before inserting?
Something is fishy there. I do not think RTF Specifications explicitly prohibits zero byte, but having it is very untypical, because it triggers bugs like this in way too many applications and libraries.
P.S. the design of the table having MANY columns with BLOB types seems poor.
"wide" tables often lead to problems in future development and maintenance.
While it is not the essense of your quesiton, but please do think about remaking this table into a narrow one, and save your data as a number of one-BLOB rows.
It will give you some fixed added work now, but probably would save you from a snowballing problems in future.
I am populating the data from server to google big query. One of the attributes in the table is a string that has close to 150+ characters in it.
For example, "Had reseller test devices in a vehicle with known working device
Set to power cycle, never got green light Checked with cell provider and all SIMs were active all cases the modem appears to be dead,light in all but not green light".
Table in GBQ gets populated until it hits this specific attribute. When this attribute is about to load, this does not get loaded in the single cell. It gets splitted into different cells and it corroupts the table.
Is there any restriction on each field of the GBQ? Any information regarding this would be appreciated.
My guess is that quote and comma characters in the CSV data are confusing the CSV parser. For example, if one of your fields is hello, world, this will look like two separate fields. The way around this is to quote the field, so you'd need "hello, world". This, of course, has problems if you have embedded quotes in the field. For instance if you wanted to have a field that said She said, "Hello, world", you would either need to escape the quotes by doubling the internal quotes, as in "She said, ""Hello, world""", or by using a different field separator (for instance, |) and dropping the quote separator (using \0).
One final complication is if you have embedded newlines in your field. If you have Hello\nworld, this means you need to set the allow_quoted_newlines on the load job configuration. The downside is that large files will be slower to import with this option, since they can't be done in parallel.
These configuration options are all described here, and can be used via either the web UI or the bq command line shell.
I'm not sure there is a limit imposed, and certainly I have seen string fields with over 8,000 characters.
Can you please clarify, 'When this attribute is about to load, this does not get loaded in the single cell. It gets splitted into different cells and it corroupts the table.'? Does this happen every time? Could it be associated with certain punctuation?
When I was going through all the tables in my database, I could see a table called Measbinary and an attribute attracted me was RawData. Which is Image type and Allow null. I have attached a screenshot of the table Could someone help me understand what is that? and how could I understand How it has been processed ?
Update : I checked the stored procedures and could find that the image parameter is passed to it like
SP_StoreBinary #rawspectra image
and then the value is inserted to the table mentioned above.
This is the raw data of a binary field. It has "no meaning" except being a way for SSMS (Management Studio) to show SOMETHING for a binary field. Remember - SSMS (and the database) have no clue what is in that field (image, word document, whatever) and how to show it. A hex coded string is "as good as it gets" as a generic approach, as it allows a programmer to compare the first bytes.
I'm using MS Access, so file size is a real constraint (2 gigs I think). Am I saving space in the most efficient way?
tbl1: tbl_NamesDescs
pid_NamesDescs <-autonumber
ColName <-text field, Indexed: Yes (No Duplicates)
Descs <- text field
tbl2: tbl_HistStatsSettings
pid_HistStatsSettings <-autonumber
Factor <-text field
etc... (other fields)
So using the two tables above, tbl2 has ~800k records and all of Factor's unique possibilities are listed in ColName (i.e. there is a one to many relationship relationship between ColName and Factor receptively). When I look at the tables in Datasheetview I see all of the names listed (full text) in both Factor and ColName.
Question:
Is that the best way to save space? I would think that Factor should be a list of indices (numbers, not text) corresponding to ColName.
In other words, wouldn't it be more file-space efficient to populate Factor with the pid_NamesDescs autonumers since numbers are smaller than text? If that is true, what is the best way to make this happen (either steps in MS Access or VBA is what I am after here)?
EDIT: added table names and pid names as they really exist
Yes, putting the FactorID as a number instead of text will save space. I can't really answer whether it's the "best" way, but it will definitely save space.
The easiest way to do this is to run the following query:
Update tbl2 LEFT JOIN tbl1 ON tbl2.Factor = tbl1.ColName
SET tbl2.Factor = CStr(tbl1.PID_tbl1)
WHERE Not IsNull(tbl1.ColName)
Then, in design view change the datatype of "factor" to Long. I'd also then change the name of the Field to "FactorID" and change the name of "ColName" To "Factor." I'd make some other changes to the column/table (although you may be giving fake names) names for clarity.
OR make a helper column (as a long int as you suggested in comments) and update the helper field, and then delete the original field.
Then, go into the relationships table and add a relationship between tbl1.PID_tbl1 and tbl2.FactorID
After this, Compact and Repair the database to reduce the size.
*EDIT to add portion about adding the relationship between the tables.
In addition to normalization, also check all your text fields. The default is 255 characters for short text. When you are storing less than 255 characters in a text field, make sure the field size is set to no more than what you typically store. Upon changing, perform compact and repair to reduce file size. Where possible, use the short text over long text.
Also consider a split database approach where data is on the back end and your UI and VBA in the front end.