Records visible but not accessible in MySQL - why? - sql

This is a weird issue. I'm accessing my online database using premiumsofts Navicat for mysql. Some of the records are behaving very strange - let me give an example. I have the following table columns id, name, address, abbreviation, contact. Now when I run a sql query for lets say any entry that has the abbreviation 'ab' it returns zero however such an entry already exists in the database.
Whats even weirder is that when I view the table in navicat - I notice the field of abbreviation is empty for that tuple which has the required value but when I hover over it or highlight it - I can see the value. Its there but its inaccessible and likewise this is a problem with many other tuples in the table.
What could the problem be here - I even tried to delete and recreate the table by executing a dump file but no good came out of that. Help please :(

Check that there aren't any invisible characters at the beginning of the string (like a carriage return or something).

As you can see from the following example, there can be some junk extra character like A0 and should be removed using update.
mysql> select add_code, unhex(replace(hex(add_code), 'A0', '')) from old_new limit 1\G
*************************** 1. row ***************************
add_code: 000242�
unhex(replace(hex(add_code), 'A0', '')): 000242
1 row in set (1.32 sec)
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_unhex

Related

Why does the number of returned samples where name='keyword' does not match the number of observed samples with 'keyword' in table?

I have a Postgres table whose header is [id(uuid), name(str), arg_name(str), measurements(list), run_id(uuid), parent_id(uuid)] with a total of 237K entries.
When I want to filter for specific measurements I can use 'name', but for the majority of entries in the table 'name' == 'arg_name' and thus map to the same sample.
In my peculiar case I am interested in retrieving samples whose 'name'='TimeM12nS' and whose 'arg_name'='Time'. These two attributes point to the same samples when visually inspecting the table through PgAdmin. That is to say all entries which have arg_name='Time' also have the name='TimeM12nS' and vice-versa.
Its obvious there's a problem because of the quantity of returned samples is not the same. I first noticed the problem using django orm, but the problem is also present when I query the DB using PgAdmin.
SELECT *
FROM TableA
WHERE name='TimeM12nS'
returns 301 entries (name='TimeM12nS' and arg_name='Time' in all cases)
BUT the query:
SELECT *
FROM TableA
WHERE arg_name='Time'
returns 3945 (name='TimeM12nS' and arg_name='Time' in all cases)
I am completely stumped, anyone think they can shed some light into what's happening here?
EDIT:
I should add that the query by 'arg_name' returns the 301 entries that are returned when querying by 'name'
First let me say thank you to everyone who pitched in ideas to solve this conundrum and especially to JGH for the solution (found in the comments of the original post).
Indeed the problem was a indexing issue. After re-indexing the queries return the same number of entries '3945' as expected.
In Postgress re-indexing a table can be achieved through pgAdmin by navigating to Databases > 'database_name' > Schemas > Tables then right-clicking on the table_name selecting Maintenance and pressing the REINDEX button.
or more simply by running the following command
REINDEX TABLE table_name
Postgress Re-Indexing Docs
Without access to the database, it's not possibly to give a definitive answer. All I can provide is the next query that I would use in this case.
SELECT COUNT(*), LENGTH(name), name, arg_name
FROM TableA
WHERE arg_name='Time'
GROUP BY name, arg_name;
This should show you any differences in the name column that you aren't able to see. The length of that string could also be informative.

Retrieving or deleting a row with a blob in Informix 10

I'm using Informix 10. Using this command as suggested by the documentation (well - the closest doc I could find):
select lotofile(ctbufdata, "foo!", "client") from trg_send_stxn where ctstamp1=60004300
(to database syscdr), it gives the error:
7420: Argument (1: lo_id) is invalid.
The same error occurs if I try to unload to filename select * from...
If I try to delete the row with delete from trg_send_stxn where... , the error is:
(U00001) - blob_destroy: error during processing or invalid LO argument
How can I view this blob? (I want to view its contents to figure out where it came from).
Or; how can I delete it or otherwise recover from this apparent corruption.
As suggested in answers -- the command
select ("0x" || substr(ctbufdata::lvarchar,17,8))::INT sbspace
from trg_send_stxn
where ctstamp1=60004300
produces result 0. And dropping the where clause produces 418 rows all 0.
Since trg_send_stxn is an ER queue table, can we assume this is related to your other, ER related question (Informix 10 replication queues not moving)? If so, is this on server A or B - I'd suspect B since this table is send queue related and you're reporting a problem from B to A?
In any case, this sounds like some sort of data corruption, either within the trg_send_stxn table's row (my guess), or in the actual sblob metadata pointed to by this row (ctbufdata column), i.e. in the sbspace.
Try selecting that row and displaying the LO pointer using technique described here (share output here).

Extracting person names from a free-form text field

I have a large table of 30 million records that contains a free-form text field which may contain names in any position and with any salutation, or not salutation at all.
My job is to mask out the names with Xxxxx Xxxxx to preserve privacy.
I have access to a large surnames database that defines for me what constitutes a name.
Using SQL Server 2012, what is the most efficient technique I can use for this task?
EDIT
Okay, I've got something working pretty decently that involves a Full-Text index/search, the names database, and a stored procedure.
However, I've run into a rather peculiar problem. I'm using a CONTAINS predicate (CONTAINS([textvaluefield], #namestring) where SET #namestring = 'NEAR((Dr.,'+#name+'), 1, TRUE)'.
This works perfectly except when the salutation in the [textvaluefield] is "DR." instead of "Dr.", i.e "DR. Johnson" is not getting picked up, yet "Dr. Johnson" is. I've verified this because if I change the value in the [textvaluefield] of a record from "DR." to "Dr.", yet leave everything else the same, that record will suddenly get picked up. If I revert the record to use "DR.", it will not get picked up again.
What make this bizarre is that I'm definitely using a case insensitive collation (Latin1_General_CI_AS). Anyone have any ideas?
If you can verify that you don't have any records in your "stopwords" tables:
SELECT * FROM sys.[fulltext_system_stopwords] AS FSS WHERE [stopword] LIKE 'Dr_'
SELECT * FROM sys.[fulltext_stopwords] AS FS
I have also encountered a similar issue and resolved it by creating a schema bound view on the tables and columns you need and explicitly create a column using the LOWER function.
CREATE VIEW [User].[UserValues]
WITH
SCHEMABINDING
AS
SELECT
[UserId]
, [UserName]
, LOWER(Username]) AS [LoweredUsername]
FROM
[User].[Values]
Dont forget to add a unique clustered index for the full text to use.

Explanation of particular sql injection

Browsing through the more dubious parts of the web, I happened to come across this particular SQL injection:
http://server/path/page.php?id=1+union+select+0,1,concat_ws(user(),0x3a,database(),0x3a,version()),3,4,5,6--
My knowledge of SQL - which I thought was half decent - seems very limiting as I read this.
Since I develop extensively for the web, I was curious to see what this code actually does and more importantly how it works.
It replaces an improperly written parametrized query like this:
$sql = '
SELECT *
FROM products
WHERE id = ' . $_GET['id'];
with this query:
SELECT *
FROM products
WHERE id = 1
UNION ALL
select 0,1,concat_ws(user(),0x3A,database(),0x3A,version()),3,4,5,6
, which gives you information about the database name, version and username connected.
The injection result relies on some assumptions about the underlying query syntax.
What is being assumed here is that there is a query somewhere in the code which will take the "id" parameter and substitute it directly into the query, without bothering to sanitize it.
It's assuming a naive query syntax of something like:
select * from records where id = {id param}
What this does is result in a substituted query (in your above example) of:
select * from records where id = 1 union select 0, 1 , concat_ws(user(),0x3a,database(),0x3a,version()), 3, 4, 5, 6 --
Now, what this does that is useful is that it manages to grab not only the record that the program was interested in, but also it UNIONs it with a bogus dataset that tells the attacker (these values appear separated by colons in the third column):
the username with which we are
connected to the database
the name of the database
the version of the db software
You could get the same information by simply running:
select concat_ws(user(),0x3a,database(),0x3a,version())
Directly at a sql prompt, and you'll get something like:
joe:production_db:mysql v. whatever
Additionally, since UNION does an implicit sort, and the first column in the bogus data set starts with a 0, chances are pretty good that your bogus result will be at the top of the list. This is important because the program is probably only using the first result, or there is an additional little bit of SQL in the basic expression I gave you above that limits the result set to one record.
The reason that there is the above noise (e.g. the select 0,1,...etc) is that in order for this to work, the statement you are calling the UNION with must have the same number of columns as the first result set. As a consequence, the above injection attack only works if the corresponding record table has 7 columns. Otherwise you'll get a syntax error and this attack won't really give you what you want. The double dashes (--) are just to make sure anything that might happen afterwords in the substitution is ignored, and I get the results I want. The 0x3a garbage is just saying "separate my values by colons".
Now, what makes this query useful as an attack vector is that it is easily re-written by hand if the table has more or less than 7 columns.
For example if the above query didn't work, and the table in question has 5 columns, after some experimentation I would hit upon the following query url to use as an injection vector:
http://server/path/page.php?id=1+union+select+0,1,concat_ws(user(),0x3a,database(),0x3a,version()),3,4--
The number of columns the attacker is guessing is probably based on an educated look at the page. For example if you're looking at a page listing all the Doodads in a store, and it looks like:
Name | Type | Manufacturer
Doodad Foo Shiny Shiny Co.
Doodad Bar Flat Simple Doodads, Inc.
It's a pretty good guess that the table you're looking at has 4 columns (remember there's most likely a primary key hiding somewhere if we're searching by an 'id' parameter).
Sorry for the wall of text, but hopefully that answers your question.
this code adds an additional union query to the select statement that is being executed on page.php. The injector has determined that the original query has 6 fields, thus the selection of the numeric values (column counts must match with a union). the concat_ws just makes one field with the values for the database user , the database, and the version, separated by colons.
It seems to retrieve the user used to connect to the database, the database adress and port, the version of it. And it will be put by the error message.

String or binary data would be truncated -- Heisenberg problem

When you get this error, the first thing you ask is, which column? Unfortunately, SQL Server is no help here. So you start doing trial and error. Well, right now I have a statement like:
INSERT tbl (A, B, C, D, E, F, G)
SELECT A, B * 2, C, D, E, q.F, G
FROM tbl
,othertable q
WHERE etc etc
Note that
Some values are modified or linked in from another table, but most values are coming from the original table, so they can't really cause truncation going back to the same field (that I know of).
Eliminating fields one at a time eventually makes the error go away, if I do it cumulatively, but — and here's the kicker — it doesn't matter which fields I eliminate. It's as if SQL Server is objecting to the total length of the row, which I doubt, since there are only about 40 fields in all, and nothing large.
Anyone ever seen this before?
Thanks.
UPDATE: I have also done "horizontal" testing, by filtering out the SELECT, with much the same result. In other words, if I say
WHERE id BETWEEN 1 AND 100: Error
WHERE id BETWEEN 1 AND 50: No error
WHERE id BETWEEN 50 AND 100: No error
I tried many combinations, and it cannot be limited to a single row.
Although the table had no keys, constraints, indexes, or triggers, it did have statistics, and therein lay the problem. I killed all the table's stats using this script
http://sqlqueryarchive.blogspot.com/2007/04/drop-all-statistics-2005.html
And voila, the INSERT was back to running fine. Why are the statistics causing this error? I don't know, but that's another problem...
UPDATE: This error came back even with the stats deleted. Because I was convinced that the message itself was inaccurate (there is no evidence of truncation), I went with this solution instead:
SET ANSI_WARNINGS OFF
INSERT ...
SET ANSI_WARNINGS ON
Okay, it's more of a hack than a solution, but it allows me — and hopefully someone else — to move on to other things.
Is there a reason you can't simply cast the fields as the structural equivalent of their destination column like so:
Select Cast(A as varchar(42))
, Cast(B * 2 as Decimal(18,4))
, Cast(C As varchar(10))
...
From Table
The downside to this approach is that it will truncate the text values at their character limit. However, if you are "sure" that this shouldn't happen, then no harm will come.
In some cases you can run into a problem if you have any other column with default values which might cause the problem.
Ex. you might have added a column to trace the user who created the row, like USER_ENTERED with default value of suser_sname() but the column length is less than the current username.
There is a maximum row size limit in SQL Server 2005. See here.
Most of the time you'll run into this w/lots of nvarchar columns.
Yes, when I ran into this, I had to create another table/tables which mimic the current structure. I then did not change the code, but changed my data type sizes to all nvarchar (MAX) for each field till it stopped, then eliminated them one by one. Yes long and dragged out but I had major issues trying anything else. Once I tried a bunch of stuff that was causing too much of a headache I just decided to take the "Cave Man" Approach as we laughed about it later.
Also I have seen a similar issue with FKs, where you must ask:
What are the foriegn key constraints? Are there any?
Since there are not any , try this guy's DataMgr component:
http://www.bryantwebconsulting.com/blog/index.cfm/2005/11/21/truncated
Also check this out:
http://forums.databasejournal.com/showthread.php?t=41969
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=138456
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=97349
You could also dump the table you are selecting from to a temp table and find out what line gives the error if it errors out to a temp table.
It should tell you the line in the error, if you put each column on another line, it should tell you exactly where it is bombing.
If it looks like "total length" then do you have audit trigger concatenating the columns for logging?
Edit: after your update, I really would consider the fact you have a trigger causing this...
Edit 2, after seeing your statistics answer...
Because the total stats attribute length was probably greater then 900 bytes?
Not sure if this applies to statistics though and I'm not convinced.
Do you have a reference please because I'd like to know why stats would truncate when they are simply binary histograms (IIRC)