How to write this kind of SQL in MySQL? - sql

I want to update a column col in table tab,whose data is like follows(comma separated, with the heading comma):
,test,oh,whatever,....,
Which can be too long to display,how can I update the column so that only the first 10 words are left?

You are looking for substring_index
UPDATE table
SET column = SUBSTRING_INDEX(column, ',', 11)
(do check your UPDATES with SELECT before running them)

Not an answer to your question, but I would recommend to do stuff like this on application level, not in the database.
You are not saying what language you are using. In PHP, this would be a job for the wordwrap function. It is able to intelligently chop off strings at the right position.
Alternatively, is storing the full string in the database, and doing the cutting at output time, not an option as well?

Related

Alternative for regexp_replace for BIGINT

I'm quite new at programming with Oracle and DB2 and have a question. I need to mask a field that has a BIGING as datatype. But when i tried to execute a query with regexp_replace, i have this error line SQLCODE=-420, SQLSTATE=22018.
Is there a alternative for a regexp_replace for BIGING.
Thanks a lot!
You can 'mask' integers by replace all digits except first and last by zeroes using next code (Oracle):
select
N, -- source number
FLOOR(N/POWER(10, FLOOR(LOG(10, N)))) * POWER(10, FLOOR(LOG(10, N))) + MOD(N, 10) MASKED
from a;
run sql online
Depending on the platform and version of Db2, you might consider using CREATE MASK if available. That would ensure the data is always masked without needing to do it in every application.
A quick search seems to indicate the Oracle also has similar support but they call it redaction. Masking in oracle seems to be tied into subsetting and exporting data from production to DEV/TEST.
Do you really need a solution for both RDBMs?
And if you really want to roll your own, you need to provide some examples of the masked value you want returned.
EDIT
Here is a part of the code. PK_PERSON has BIGINT as datatype. update
Person.T_PERSON set PK_PERSON = REGEXP_REPLACE(PK_PERSON, '[0-9]',
'*') where PK_PERSON in ('117888')
That's not going to work, you can't set a BIGINT column to a string. That's also not how masking works. Masking generally refers to a process that happens when the data is read out of the DB.

Convert all selected columns to_char

I am using oracle SQL queries in an external Program (Pentaho Data Integration (PDI)).
I need to convert all columns to string values before I can proceed with using them.
What i am looking for is something that automatically applies the
select to_date(col1), to_date(col2),..., to_date(colN) from example_table;
to all columns, so that you might at best wrap this statement:
select * from example_table;
and all columns are automatically converted.
For explanation: I need this because PDI doesn't seem to work fine when getting uncasted DATE columns. Since I have dynamic queries, I do not know if a DATE column exists and simply want to convert all columns to strings.
EDIT
Since the queries vary and since I have a long list of them as an input, I am looking for a more generic method than just manually writing to_char() infront of every column.
If you are looking for a solution in PDI, you need to create a job (.kjb) where in you take 2 transformations. First .ktr will rebuild the query and the Second .ktr will execute the new query.
1. First Transformation: Rebuild the query
Read the columns in the Source Table Step (use Table Input step in your case). Write the query select * from example_table; and limit the rows to either 0 or 1. The idea here is not to fetch all the rows but to recreate the query.
Use Meta Structure Step to get the meta-structure of the table. It will fetch you the list of columns coming in from the prev. step.
In the Modified JavaScript step, use a small snip of code to check if the data type of column is Date and then concat to_Char(column) to the rows.
Finally Group and Set the variables into a variable.
This is the point where the fields are recreated for you automatically. Now the next step is to execute this field with the new query.
2. Second Transformation: Using this set variable in the next step to get the result. ${NWFIELDNAME} is the variable you have set with the modified column in the above transformation.
Hope this helps :)
I have placed the code for the first ktr in gist here.
select TO_CHAR(*) from example_table;
You should not use * in your production code, it is a bad coding practice. You should explicitly mention the column names which you want to fetch.
Also, TO_CHAR(*) makes no sense. How would you convert date to string? You must use proper format model.
In conclusion, it would take a minute or two at max to list down the column names using a good text editor.
I can so not immagine an application that does not know about the actual data types but if you really want to automa(gi)cally convert all columns to strings, I see two possibilities in Oracle:
If your application language allows you to specify the binding type, you simply bind all your output variables to a string variable. The Oracle driver than takes care to convert all types to strings and this is for example possible with jdbc (Java).
If (as it seems) your application language does not allow the first solution, the best way I could think of, is to define a view for each select you want to use with the appropriate TO_CHAR convertions already and then select from the view. Those views could eventually also be generated automatically from the table repository (user_table) with some PL/SQL.
Please also note, that TO_CHAR will convert your columns acccording to the NLS settings of your session and this might lead to unwanted results, so you might also want to always specify how to convert:
SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD') FROM DUAL;
using these 2 tables, you could write a procedure with looks at the columns on each table and then performs the appropriate TO_CHAR depending on the current datatype
select * from user_tab_columns
select * from user_tables
psuedo code
begin
loop on table -- user_tables
loop on column -- user_tab_columns
if current data_type = DATE then
lnewColumn = TO_CHAR(oldColumn...(
elsif current data_type = NUMBER then
...

SQL Remove Substring From Query Results

I have a query that is returning data from a database. In a single field there is a rather long text comment with a segment, which is clearly defined with marking tags like !markerstart! and !markerend!. I would like to have a query return with the string segment between the two markers removed (and the markers removed too).
I would normally do this client-side after I get the data back, however, the problem is that the query is an INSERT query that gets it's data from a SELECT statement. I don't want the text segment to be stored in the archival/reporting table (working with an OLTP application here), so I need to find a way to get the SELECT statement to return exactly what is to be inserted, which, in this case, means getting the SELECT statement to strip out the unwanted phrase instead of doing it in post-processing client-side.
My only thought is to use some convoluted combination of SUBSTRING, CHARINDEX, and CONCAT, but I'm hoping there is a better way, but, based on this, I don't see how. Anyone have ideas?
Sample:
This is a long string of text in some field in a database that has a segment that needs to be removed. !markerstart! This is the segment that is to be removed. It's length is unknown and variable. !markerend! The part of this field that appears after the marker should remain.
Result:
This is a long string of text in some field in a database that has a segment that needs to be removed. The part of this field that appears after the marker should remain.
SOLUTION USING STUFF:
I really don't like how verbose this is, but I can put it in a function if I really need to. It isn't ideal, but it is easier and faster than a CLR routine.
SELECT STUFF(CAST(Description AS varchar(MAX)), CHARINDEX('!markerstart!', Description), CHARINDEX('!markerend!', Description) + 11 - CHARINDEX('!markerstart!', Description), '') AS Description
FROM MyTable
You may want to consider implementing a CLR user-defined function that returns the parsed data.
The following link demonstrates how to use a CLR UDF RegEx function for pattern matching and data extraction.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Regards,
You can use Stuff function or Replace function and replace your unwanted symbols with ''.
STUFF('EXP',START_POS,'NUMBER_OF_CHARS','REPLACE_EXP')

Fastest way to find string by substring in SQL?

I have huge table with 2 columns: Id and Title. Id is bigint and I'm free to choose type of Title column: varchar, char, text, whatever. Column Title contains random text strings like "abcdefg", "q", "allyourbasebelongtous" with maximum of 255 chars.
My task is to get strings by given substring. Substrings also have random length and can be start, middle or end of strings. The most obvious way to perform it:
SELECT * FROM t LIKE '%abc%'
I don't care about INSERT, I need only to do fast selects. What can I do to perform search as fast as possible?
I use MS SQL Server 2008 R2, full text search will be useless, as far as I see.
if you dont care about storage, then you can create another table with partial Title entries, beginning with each substring (up to 255 entries per normal title ).
in this way, you can index these substrings, and match only to the beginning of the string, should greatly improve performance.
If you want to use less space than Randy's answer and there is considerable repetition in your data, you can create an N-Ary tree data structure where each edge is the next character and hang each string and trailing substring in your data on it.
You number the nodes in depth first order. Then you can create a table with up to 255 rows for each of your records, with the Id of your record, and the node id in your tree that matches the string or trailing substring. Then when you do a search, you find the node id that represents the string you are searching for (and all trailing substrings) and do a range search.
Sounds like you've ruled out all good alternatives.
You already know that your query
SELECT * FROM t WHERE TITLE LIKE '%abc%'
won't use an index, it will do a full table scan every time.
If you were sure that the string was at the beginning of the field, you could do
SELECT * FROM t WHERE TITLE LIKE 'abc%'
which would use an index on Title.
Are you sure full text search wouldn't help you here?
Depending on your business requirements, I've sometimes used the following logic:
Do a "begins with" query (LIKE 'abc%') first, which will use an index.
Depending on if any rows are returned (or how many), conditionally move on to the "harder" search that will do the full scan (LIKE '%abc%')
Depends on what you need, of course, but I've used this in situations where I can show the easiest and most common results first, and only move on to the more difficult query when necessary.
You can add another calculated column on the table: titleLength as len(title) PERSISTED. This would store the length of the "title" column. Create an index on this.
Also, add another calculated column called: ReverseTitle as Reverse(title) PERSISTED.
Now when someone searches for a keyword, check if the length of keyword is same as titlelength. If so, do a "=" search. If length of keyword is less than the length of the titleLength, then do a LIKE. But first do a title LIKE 'abc%', then do a reverseTitle LIKE 'cba%'. Similar to Brad's approach - ie you do the next difficult query only if required.
Also, if the 80-20 rules applies to your keywords/ substrings (ie if most of the searches are on a minority of the keywords), then you can also consider doing some sort of caching. For eg: say you find that many users search for the keyword "abc" and this keyword search returns records with ids 20, 22, 24, 25 - you can store this in a separate table and have this indexed.
And now when someone searches for a new keyword, first look in this "cache" table to see if the search was already performed by an earlier user. If so, no need to look again in main table. Simply return results from "cache" table.
You can also combine the above with SQL Server TextSearch. (assuming you have a valid reason not to use it). But you could nevertheless use Text search first to shortlist the result set. and then run a SQL query against your table to get exact results using the Ids returned by the TExt Search as a parameter along with your keyword.
All this is obviously assuming you have to use SQL. If not, you can explore something like Apache Solr.
Create index view there is new feature in sql create index on the column that you need to search and use that view after in your search that will give your more faster result.
Use ASCII charset with clustered indexing the char column.
The charset influences the search performance because of the data
size on both ram and disk. The bottleneck is often I/O.
Your column is 255 characters long so you can use normal index on
your char field rather than full text, which is faster. Do not
select unnecessary columns in your select statement.
Lastly, add more RAM to the server and Increase cache size.
Do one thing, use primary key on specific column & index it in cluster form.
Then search using any method (wild card or = or any), it will search optimally because the table is already in clustered form, so it knows where he can find (because column is already in sorted form)

Removing extraneous characters in column using T-SQL

I am attempting to remove extraneous characters from data in a primary key column..the data in this column serves as a control number, and the extra characters are preventing a Web application from effectively interacting with the data.
As an example, one row may look like this:
ocm03204415 820302
I want to remove everything after the space...so the characters '820302'. I could manually do it, but, there are around 2,000 records that have these extra values in the column. It would be great if I could remove them programmatically. I can't do a simple Replace because the characters have no pattern...I couldn't define a rule to discover them...the only thing uniform is the space...although, now that I look at the data set, they do all start with 8.
Is there a way I could remove these characters programmatically? I am familiar with PL/SQL in the Oracle environment, and was wondering if Transactional SQL would offer some possibilities in the MS-SQL environment?
Thanks so much.
You may want to look into the CHARINDEX function to find the space. Then you can use SUBSTRING to grab everything up to the space in a single UPDATE statement.
Try this:
UPDATE YourTable
SET YourColumn = LEFT(YourColumn,CHARINDEX(' ',YourColumn)-1)
WHERE CHARINDEX(' ',YourColumn) > 1