Full text search with special characters - sql

I have a table with values such as "F-10" or "Jim-beam". Is there a way for me to get these results if a user had searched say "F10" or "Jimbeam"? Basically, the user may not know there is a dash in the entries but I want the search to be forgiving enough to find it.
Right now I'm trying to use:
SELECT *
WHERE
CONTAINS(table.*, ,'"F10*" Or "Jimbeam*"')

you can create array to get the inserted value from the user
then use the like %value1% or like %value2% for each value in the array
it could be a solution

You could try to replace the values in the database with values that do not contain any special characters. I used the function described in this answer before: https://stackoverflow.com/a/1008566/894974
So with that function installed, your where-clause would become:
where dbo.RemoveNonAlphaCharacters([columnToSearch]) like '%'+#searchString+'%'

You can create a separate column where dashes are removed from these words, then perform your full text searches against that column.
For example, your table could look like this (or the new column could be part of a new table):
Id Text TextForFullTextSearch
-----------------------------------------
1 Jim-beam Jimbeam
2 F-10 F10
3 blah blah blah blah
If you need to support searches on both "Jimbeam*" and "Jim-beam*" then you could perform the full text search against both the old and new columns.
This does require you to store the text twice so there will be more gears in your process. The benefits will be in the search accuracy and performance (LIKE will be much slower), so you'll have to weigh those benefits against the increased complexity.
Some ideas for populating the new column as data is inserted and updated:
Handle this in your data layer, i.e. all insert and update statements should include both Text and TextForFullTextSearch.
Add an insert/update trigger to the table that, whenever Text is inserted or updated, simultaneously updates TextForFullTextSearch.
Create an automated job that continually polls the table for inserts/updates, then updates TextForFullTextSearch accordingly.

Related

PowerApps filter returning incomplete data record...?

I have an Azure SQL database, and my records inside table Spiderfood_RITMData in that database includes 13 different fields. Lots of stuff. I have confirmed in SQL-SMS that the records have data in each field.
There are way more items in the database than PowerApps can see using LOOKUP (1600-9000 records or more). However, I know FOR A FACT that there is only ONE record that has any given value in the NUMBER column. It's not a primary key, but it is unique in the table.
In PowerApps, I am trying to pull that field so that I can eventually parse out the individual items.
So, the commands I'm trying are:
ClearCollect(MLE_test1, Filter('Spiderfood_RITMData', "RITM2170467" in Number));
ClearCollect(MLE_test2, Search('Spiderfood_RITMData',"RITM2170467", "Number"));
However, the Collection results for MLE_test1 and MLE_test2 both are empty EXCEPT for the value of NUMBER. Say what?!
I'm trying to use the examples posted on https://learn.microsoft.com/en-us/powerapps/maker/canvas-apps/functions/function-filter-lookup but I am honestly getting baffled by this.
How should I be formatting this call such that I can pull the whole record?
Big picture explanation: I need to do a lot of data LOOKUPS into my table Spiderfood_RITMData table, but it has way more than 2000 rows, and PowerApps will not perform the Lookup correctly. So my presumably smart idea is to create a MUCH SMALLER "version" of Spiderfood_RITMData as a local collection, using a more delegateable function (such as FILTER or IN). If I filter by all records containing the values of NUMBER, then I go from, say a 10,000-record SQL table to a 10-record Collection. And I can do LOOKUPS against that collection for the rest of the function (uh, I think -- I'm still trying to experiment accordingly). Please let me know if this is crazy or not.
LookUp is just used to get one record, instead try this:
ClearCollect(MLE_test1, Filter('Spiderfood_RITMData', "RITM2170467" = Number));
This gets a collection with all the items where Number is = to "RITM2170467"
Collections are limited to only 2000 records in each collections.
I had same issue. Go to App settings. Under Upcoming Features make sure Explicit column selection is turned off. Hope this does it for you.

Separating columns ( array of arrays) - Advanced SQL looping

I tried using a name that more accurately describes my question but msg said I am limited to 150 chars.
Looking for assistance from someone who has advanced SQL skills. Ideally I want to do it in SQL to let the computer do the work. Too much manual manipulation is ripe with the possibility of mistakes.
I've already searched for users groups within Google. All emails are being returned saying the email does not exist anymore.
What I am using appears to be a proprietary version of Dremel SQL / Google SQL, however, someone experienced in Dremel SQL will probably be able to guide me in the right direction.
BACKGROUND INFO:
Pulling a column that is an array column which holds another array (a notes column). I think maybe an array of arrays?
I have not figured a way to do what I am trying to do with Google or Dremel SQL yet.
So for now, I am doing it the hard way.
As originally pulled, the data looks like this [{Array of arrays}, {Array of arrays}, {Array of arrays}, etc., repeat... :
More specifically: [{4 or more text fields which could also hold numbers and separated by commas}, {another set of fields}, {another set of fields}...]
I.E. (this is all in just one column of data and hundreds of rows)
[
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":534, "User_email":"someone#emailaddress.com","user_shortname":"someone"},
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":1224, "User_email":"someone#emailaddress.com","user_shortname":"someone"},
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":1664, "User_email":"someone#emailaddress.com","user_shortname":"someone"}
...
]
The number of these is different for each row pulled and each has a specific ID #
A typical row of data is:
ID #, start_date, end_date, some other fields, notes_(the array field)
WHAT I AM DOING NOW is:
SQL data pull,
exporting to google sheets,
make separate tabs for the different array columns.
copying the notes column (the array column holding arrays) to a separate tab on Google Sheets, then
Split Text To Columns using the first curly brace "{" as the separater.
Here is where my dillema is.
Once pulled, I need to split all of those columns again to separate each of the individual elements in each array. Unable to Split text to Columns again with all of them highlighted. I can Split Text to Columns again one at a time but will really be a pain if I have to do that individually for each column and every row (hundreds of rows). Need to find a way to automate this.
I will also need to change each of unix dates to calendar dates within each array PLUS add rows to the spreadsheet depending on the number of columns from the first split. The columns are different for each row depending on how many notes have been added.
OR... do it with SQL (which appears to be a proprietary type of SQL similar to NoSQL but not the same). I have tried using syntax's for IBM SQL, Oracle SQL, SQL Server, and others found online but none work.
OR... do it with a looping function within Google Sheets.
Possibly re-add it to the database as a new table once both sets of arrays are completely split up.
END RESULT
ID#, date1, date 2, first created date (right now a unix date), first note, first other field, etc...
Then add a new row with
Same ID# from above, date1 from row above, date 2 from row above, next (2nd) created date (right now a unix date), 2nd note, 2nd other field, etc...
Add a new row...
3rd set of notes etc.

Best way to handle multi-valued fields as a view/grid

In several notes applications, instead of handling related data as separate documents, if the size of the data is small (less than the 32k limit), I'll make several multi valued fields and display it in what I call a "List Panel". It's a table where each column displays one multi-value field. Since fielda(1) goes with fieldb(1) that goes with fieldc(1) there is a concept of rows. (I did a similar thing in my auditing routine discussed here )
It is always assumed that each field has exactly the same number of elements.
All the multi-value fields are then stored on the single document. This avoids several coding conventions that made my eyes bleed like having date changed, who changed it, new value fields for each field we wanted to audit. Another thing that this kept to a minimum was having to provide multiple fields for the same thing that locked you into a limit. Taxrate1, Taxrate2, Taxrate3, etc...
In my "Listpanel" the first column is a vertical checkbox. (One for each element in my lists) This is so I can select one item to bring up and edit, or select multiple values to delete "rows" or apply some kind of mass change to them.
What would be the best way to handle this under xPages to get this functionality? I tried making a table but am having the devil of a time to get the checkboxes to line up with their corresponding data items.
Views and dojo-grids seem to assume we're using a document for each row.....
This TableWalker may provide what you want http://www-10.lotus.com/ldd/ddwiki.nsf/dx/Tutorial-Introduction-to-XPages-Exercise-23
It was created when XPages was all very new, so it's SSJS rather than Java. But if you're comfortable wiith Java, converting it probably won't be a challenge.
You could use a repeat control to display the values and build a table using the table row tags in the repeat. You would want to calculate the id of the checkbox to be able to take an action on that selected row. The repeat var would be just one of your multi-value fields and you use the index of the repeat to get the value for that row from the other multi-value fields.

Turn string variables into numeric representatives and store the strings elsewhere?

I don't know the best way to describe my problem and I'm just looking for a push in the right direction, or where to start. I'd be perfectly happy with an answer that's a very useful link or pseudo code.
My problem, I have a database that's about to hit the MS Access hard coded 2 GB database limit and I don't want to split the database.
What I think is a possible solution - make the database more efficient in it's data storage. I think, but don't know if this is true, that I could do this by turning some string fields into numeric fields. Stay with me...
For instance:
My database has several million records of a field we'll call TooLongString
Each value is about 50 characters
Every record has a value for this field
There's only 9 possible values for TooLongString
Would it decrease my database size to instead store a number that
represents one of the 9 possible values and store the text value in a small table? (So go from 50 characters to 1 character several million times)
Did I explain my issue correctly? Is my potential solution actually a solution? How would I go about doing this?
Thanks!
The short answer is yes, that would reduce the size of your database. You could have a second table that holds the nine possible values for "TooLongString" and just store the ID of the appropriate answer in the main table, as you suggested. You would then need to join these tables when pulling the data out in order to retrieve the actual text instead of the ID.
I would set up your new table first, then add a new column for the ID into your existing one. As there are only nine possible values, I'd be tempted to just manually run an UPDATE query nine times, e.g. if the first string in your new table is "MyFirstString" with ID 1, you could run "UPDATE existingTableName SET newColumn = 1 WHERE oldColumn = 'MyFirstString'". Do this for each of the nine values then you can remove the old string column from your table at the end.

Find or Strip Invalid characters from Database

We are using a database where the front end software has allowed the input of invalid characters. (I have no control or re-writing of the software.)
The types of characters are carriage returns, line breaks, �, ¶, basically anything that is not 0-9, a-z or standard punctuation causes us issues with the database and how we use the data.
I'm looking for a way to scan the entire database to identify these invalid codes and either display them as results or strip them out?
I had been looking at This site wondering if there was a way of searching for a certain range? But I might be barking up the wrong tree.
I'm fairly new to SQL so be gentle with me, thanks.
The only way I could think to do this would be to write a stored procedure which uses system tables to get a list of all fields in the database/schema in question. Have it exclude system tables (or only include those that are user defined) then dynamically write out SQL update statements based on the columns/tables found in the system table inquiries. Using regular expressions or character removal like in this article
The system tables in question are:
SELECT
table_name,column_name
FROM
information_schema.columns
Psudo code would be:
Get list of tables we want to do this for
For each table in list
get list of columns for table that have string data.
For each column in table
generate update statement to strip unwanted characters
--Consider writing out table, column key, before after values to history table. incase this
has to be undone.
--Consider counter so I have an idea of what was updated
execute updatestatement
next column
next table
write out counter
Since you say
the data then moves to a second program that cannot handle these
characters and this causes the process to fail.
I'm wondering if you can leave the unreadable data where it is and create a new column for changed data that's only populated if/when the 2nd process fails. You'll still have to test every character of the data in the failed cell, but you wouldn't have to test every character of every row. After you determine the updated text to process, you can call the 2nd process again with the updated value.