Search for string in pandas data frame and get column index - pandas

I am trying to get a column index by searching for a string. I have tried a code like this "Units_Conv.columns.get_loc(Current_Unit)". Current_Unit is a string variable. But give me error as
Error Message
My dataframe is as below:
Data Frame Screen Shot
Any help would be appreciated.
I want to clarify my question again with following:
I want to search for a text and get column index. In the example, I am searching for 'kg / hr' (note the lower case) and get column name 'KG / HR' and find index of that column ie 6. Finally I looking for a index. Also I found out that I need to searching for a specific text in all columns except the first column (ie index 0). I hope we can find solution.
Thanks

df.columns will give you a list of columns names it have nothing to do with filtering and searching.
use: `df[df['KG / HR']=='string']' to make a filter, and make sure to write it in UPPER case because python distinguish between upper and lower case.
if you want to just get index of a matching string then use:
df.index[df['KG / HR']==current_unit].tolist()

Related

How to keep the column name in upper case after use sort_array_by in hive?

all,
I encountered a problem after I used the function sort_array_by() in hive. I found after use that function, the original column names became lower cases, while the original ones contained some capital letters.
My data is like:
[{"tagName":"a","valueDegree":100},{"tagName":"b","valueDegree":200}] , and the column name is cc.
what I want to do is to reorder the data by the column 'valueDegree' in descending order.Namely, I want the data to be like the following:
[{"tagName":"b","valueDegree":200},{"tagName":"a","valueDegree":100}].
I used the functionsort_array_by().
However, after I used sort_array_by(cc, 'valueDegree', 'DESC'), the order was correct but the column names(tagName, valueDegree) in the struct were changed into lower case accidentally. I got the result like:
[{"tagname":"b","valuedegree":200},{"tagname":"a","valuedegree":100}]
Anyone knows how to keep the original column names?

Knime converting dataframes

I'm trying to convert this dataframe:
into this dataframe in Knime
Any suggestions?
(Assuming you do not know in advance the name of the last column.)
I believe with my HiTS extension's Unpivot node it should work with a pattern like this (you will probably need a Column Renamer/String Manipulator to adjust it):
(q\d)(.*)
In case this is really just this single input, just use the Constant Value Column nodes to create the quarter, timing columns and the Column Rename/Column Resorter nodes to achieve the Dataframe2.

Searching text using CONTAINS

Using CONTAINS, I am searching for the word 'text\' followed by any string:
select * from table1
where CONTAINS (availableText, 'TEXT\%')
However, this query returns hits where there is text before the 'TEXT' string; for example, this is one false hit: 'there is no text available'.
Looking for a way to just get the hits like 'TEXT\path\...' and not 'dir\TEXT\path\..'.
I know how to do this using LIKE, but would prefer CONTAINS instead.
I doubt this is possible using CONTAINS.
CONTAINS uses a full-text index internally (a structure that can be imagined like a big index of all the words for a set of texts). This is the reason why CONTAINS is significantly faster than LIKE for most cases because it can lookup your queried word(s) in the full-text index and retrieve the corresponding key(s) to the row(s) containing the text with that word(s).
LIKE scans all rows (if '%' is the beginning of your query) or at least an index on the queried column (if your query doesn't start with a wildcard character).
In your case I would advise to create an index on the (n)varchar column and use LIKE since you are doing prefix based searches.
If you definitely want to use CONTAINS you might want to use that operator first and then use LIKE on that restricted result set.

Microsoft Access 2010 SQL Split String at "X" and Multiply

I have a table with a package size column with a data type of text that I need to convert to an integer for mathmatical reasons. The values in this column typically look something like "100ML","20GM","UD 20","13OZ" here is where it gets tricky there are occasionally values like "6X12ML","UD 5X6ML". The ones with the "X" in them I need to remove the "ML" I'm currently doing this with
Replace([TABLE_NAME].[COLUMN_NAME],"ML","")
in an expression column in a query. I can use nested Replace functions to remove the "ML","GM","OZ" and "UD ". All of my attempts to do this have failed, I figured the end solution would be something like
IIf([TABLE_NAME].[COLUMN_NAME] Like "X", (CInt(Left([TABLE_NAME].[COLUMN_NAME],InStr(1,[TABLE_NAME].[COLUMN_NAME],"X")-1))*CInt(Right([TABLE_NAME].[COLUMN_NAME],InStr(1,[TABLE_NAME].[COLUMN_NAME],"X")+1))),[TABLE_NAME].[COLUMN_NAME])
I have tried using a variation of the code above with no avail. All suggestions are appreciated, I would preffer to get this knocked out in one query but I do realize I can use and expression and just split the text before and after the "X" into two differenct expression columns. Then use another query to multiply the values.
QTY_ORDERED: IIf(InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X")>1,[CRX_HISTORIC_PO].[QUANTITY]/Left(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X")-1)*Right(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),Len(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""))-InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X"))*-1,[CRX_HISTORIC_PO].[QUANTITY]/Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ","")*-1)
The code above is what I used to complete the task at hand.

Fastest way to find string by substring in SQL?

I have huge table with 2 columns: Id and Title. Id is bigint and I'm free to choose type of Title column: varchar, char, text, whatever. Column Title contains random text strings like "abcdefg", "q", "allyourbasebelongtous" with maximum of 255 chars.
My task is to get strings by given substring. Substrings also have random length and can be start, middle or end of strings. The most obvious way to perform it:
SELECT * FROM t LIKE '%abc%'
I don't care about INSERT, I need only to do fast selects. What can I do to perform search as fast as possible?
I use MS SQL Server 2008 R2, full text search will be useless, as far as I see.
if you dont care about storage, then you can create another table with partial Title entries, beginning with each substring (up to 255 entries per normal title ).
in this way, you can index these substrings, and match only to the beginning of the string, should greatly improve performance.
If you want to use less space than Randy's answer and there is considerable repetition in your data, you can create an N-Ary tree data structure where each edge is the next character and hang each string and trailing substring in your data on it.
You number the nodes in depth first order. Then you can create a table with up to 255 rows for each of your records, with the Id of your record, and the node id in your tree that matches the string or trailing substring. Then when you do a search, you find the node id that represents the string you are searching for (and all trailing substrings) and do a range search.
Sounds like you've ruled out all good alternatives.
You already know that your query
SELECT * FROM t WHERE TITLE LIKE '%abc%'
won't use an index, it will do a full table scan every time.
If you were sure that the string was at the beginning of the field, you could do
SELECT * FROM t WHERE TITLE LIKE 'abc%'
which would use an index on Title.
Are you sure full text search wouldn't help you here?
Depending on your business requirements, I've sometimes used the following logic:
Do a "begins with" query (LIKE 'abc%') first, which will use an index.
Depending on if any rows are returned (or how many), conditionally move on to the "harder" search that will do the full scan (LIKE '%abc%')
Depends on what you need, of course, but I've used this in situations where I can show the easiest and most common results first, and only move on to the more difficult query when necessary.
You can add another calculated column on the table: titleLength as len(title) PERSISTED. This would store the length of the "title" column. Create an index on this.
Also, add another calculated column called: ReverseTitle as Reverse(title) PERSISTED.
Now when someone searches for a keyword, check if the length of keyword is same as titlelength. If so, do a "=" search. If length of keyword is less than the length of the titleLength, then do a LIKE. But first do a title LIKE 'abc%', then do a reverseTitle LIKE 'cba%'. Similar to Brad's approach - ie you do the next difficult query only if required.
Also, if the 80-20 rules applies to your keywords/ substrings (ie if most of the searches are on a minority of the keywords), then you can also consider doing some sort of caching. For eg: say you find that many users search for the keyword "abc" and this keyword search returns records with ids 20, 22, 24, 25 - you can store this in a separate table and have this indexed.
And now when someone searches for a new keyword, first look in this "cache" table to see if the search was already performed by an earlier user. If so, no need to look again in main table. Simply return results from "cache" table.
You can also combine the above with SQL Server TextSearch. (assuming you have a valid reason not to use it). But you could nevertheless use Text search first to shortlist the result set. and then run a SQL query against your table to get exact results using the Ids returned by the TExt Search as a parameter along with your keyword.
All this is obviously assuming you have to use SQL. If not, you can explore something like Apache Solr.
Create index view there is new feature in sql create index on the column that you need to search and use that view after in your search that will give your more faster result.
Use ASCII charset with clustered indexing the char column.
The charset influences the search performance because of the data
size on both ram and disk. The bottleneck is often I/O.
Your column is 255 characters long so you can use normal index on
your char field rather than full text, which is faster. Do not
select unnecessary columns in your select statement.
Lastly, add more RAM to the server and Increase cache size.
Do one thing, use primary key on specific column & index it in cluster form.
Then search using any method (wild card or = or any), it will search optimally because the table is already in clustered form, so it knows where he can find (because column is already in sorted form)