A column has many multiple mispelled or similar values. How to set the correct values using some rules? - pandas

I have combined multiple dataframes, and the resulting dataframe df, has three columns 'Date', 'Product', 'Price'.
There are many rows where:
the 'Product' value is either 'Kiwi' or 'Kiwi '.
the 'Product' value is either 'apricot' or 'Apricot'.
the 'Product' value is either 'Apple / imported' or 'Apple / local'
and so on.
I am trying to apply some rules to rename the values such as:
if value contains 'Kiwi' then set the value as 'Kiwi'
if value contains 'apricot' then set the value as 'Apricot'
if value contains 'Apple' then set the value as 'Apple'
Using df.loc[:,'Product'].sort_values().unique() and examining the results, I have created a dictionary 'rename_product' containing key:value pairs, where the keys are the texts to search for, and the values are the new values that should be assigned, such as:
rename_product = {
'Kiwi' : 'Kiwi',
'apricot' : 'Apricot',
'Apple' : 'Apple'
}
How to proceed to the substitution of values?

I think this is what you are looking for
pandas replace
your implementation in the comment doesn't seem correct to me. Try implementing one of the solutions in the article I included and see if it works. You can also look at the re package. If I remember right though you'll need to use apply and lambda with it to loop through each row in your column.

Related

How to check if any item within a list is contained in a column's name? - Presto SQL

I'm wondering how I would go about checking if any item within a list is contained within the name of a column that I'm trying to select.
Suppose I have a list like: list = ['apple', 'banana', 'cat']. How can I check if a column's name contains any items in that list?
I was originally thinking something similar to:
SELECT column_name,
CASE WHEN column ('apple', 'banana', 'cat') IN column_name THEN true ELSE NULL END AS flag
FROM information_schema.columns;
But obviously this doesn't work. I was looking into the SUBSTR function as well but not sure how this would work with a list of values.
Also what I've provided is more of a minimum reproducible example. The actual list will contain around 40 elements. Would appreciate any help, thanks!
You can use regular expressions:
SELECT column_name,
(CASE WHEN REGEXP_LIKE(column_name, 'apple|banana|cat') THEN true ELSE NULL
END) AS flag
FROM information_schema.columns;

Return right content searching a substring in dictionary- Postgresql

I need the value of substring depending the string filtered using Postgresql.
For example:
Table Users
{'Name': 'Eric', 'Age':'29', 'Weight': '80kg'}
{'Age':'41','Name': 'Alex', 'Weight': '100kg'}
{'Weight': '90kg','Age':'18', 'Name': 'Jason'}
The order of the fields is not organized, the position is not fixed because the length of strings;
This result is a unique JSON field.
So, I need the value depending the string I search, like:
Searched string (Dummy example):
SELECT "Age" FROM Users WHERE Name = 'Jason'
Results: '18'
OR
SELECT "Age" FROM Users WHERE Name = 'Alex'
Results: '41'
Probably I will use the function Right (https://w3resource.com/PostgreSQL/right-function.php) with some other function. I tried to use substring (https://w3resource.com/PostgreSQL/substring-function.php) too together, but do not fit in this case.
Don't use string methods!
Since you are dealing with a JSON datatype, you can just use Postgres JSON accessor operator ->> to access the value of a given key:
select user_return ->> 'Age' from Users where user_return ->> 'Name' = 'Alex'
Note: keys in a JSON object have no special ordering whatsoever.

Search JSON column for JSON that contains a specific value

I have a postgresql database with a table called choices, in the choices table I have a column called json that contains JSON entries, for example: [1,2,3]
I need a query that returns all entires that contains a specific value.
For example I have the following entries:
[1,2,3] [6,7,1] [4,5,2]
I want to get all entries that contain the value 1 so it would return:
[1,2,3]
[6,7,1]
Thanks,
demo: db<>fiddle
The json_array_elements_textfunctions expands the json arrays into one row each element (as text). With that you can filter it by any value you like.
SELECT
json_data
FROM choices, json_array_elements_text(json_data) elem
WHERE value = '1'
Documentation: JSON functions
Please notice that "json" is a the name for the json type in PostgreSQL. You should better rename your column to avoid some conflicts. (I called mine json_data)

JCR SQL2 Multivalue properties search

I want to do a search in the content repository using one or more of the values as an input parameter for a multivalue property
Something like: find all nodes with the primary type 'nt:unstructured' whose property 'multiprop' (multivalue property) contains both values "one" and "two".
How would the queryString passed to queryManager.createQuery should loook like?
Thank you.
You can treat the criteria on multi-valued properties just like other criteria. For example, the following query will find all nodes that have a value of 'white dog' on the 'someProp' property:
SELECT * FROM [nt:unstructured] WHERE someProp = 'white dog'
If the 'someProp' property has multiple values, then a node with at least one value that satisfies the criteria will be included in the results.
To find nodes that have multiple values of a multi-valued property, simply AND together multiple criteria. For example, the following query will return all nodes that have both of the specified values:
SELECT * FROM [nt:unstructured] WHERE someProp = 'white dog'
AND someProp = 'black dog'
Any of the operators will work, including 'LIKE':
SELECT * FROM [nt:unstructured] WHERE someProp LIKE '%white%'
AND someProp LIKE '%black%'
Other combinations are possible, of course.

Can you compare text alphabetically in a WHERE clause?

I'm using SQLite, and I need to do the following:
SELECT * FROM fruit WHERE name<='banana'
This statement should return all entries whose "name" column contains text that comes alphabetically before (or is equal to) the word "banana". So it should return the row with "apple", but not the row with "pear" or "orange".
It appears that simply using the <= operator doesn't work, so is there another way?
It should work. But beware: comparison is binary. 'A' < 'a' in ASCII. To compare alphabetically, without sensitivity to case, you should do wHERE LOWER(my_column) < 'value'