Is there a way to alter all columns in a SQL table to have 'utf-8' format - sql

I'm using Spark and I found that my data is not being correctly interpreted. I've tried using decode and encode built-in functions but they can be applied only to one column at a time.
Update:
An example of the behaviour I am having:
+-----------+
| Pa�s |
+-----------+
| Espa�a |
+-----------+
And the one I'm expecting:
+-----------+
| País |
+-----------+
| España |
+-----------+
The sentence is just a simple
SELECT * FROM table

Related

Cleaning Varchar data in SQL Snowflake

I'm trying to clean some varchar data on Snowflake and having some issues on this column - I'd like all data where information is missing to display as null, rather than eg. 'unknown'. The data looks like this:
+---------------------------+
| Entity |
+---------------------------+
| Walgreens |
+---------------------------+
| Apple |
+---------------------------+
| Microsoft |
+---------------------------+
| 2018 Quora hack |
+---------------------------+
| Unknown government agency |
+---------------------------+
| Unknown |
+---------------------------+
And I'd like to standardised it, either by changing the original column or adding a revised ones, so it looks like this:
+-----------+
| Entity |
+-----------+
| Walgreens |
+-----------+
| Apple |
+-----------+
| Microsoft |
+-----------+
| Quora |
+-----------+
| null |
+-----------+
| null |
+-----------+
Here's what I've tried so far. The plan was to find something that would work for the 'Unknown' bits of data and then apply it to more specific cases like simplifying the '2018 Quora hack.'
1
select *
from data_breaches
order by case when "Entity" like '%nknown%'
then NULL else "Entity" end
This returned the data, but put entities which said 'Unknown' in them at the end of the table and didn't change them to null
2
select "Sector", "Records Number", "Method"
if "Entity" IN('Unknown'), NULL, "Entity") as Enclean
from data_breaches
Returned this error: Syntax error: unexpected '"Entity"'. (line 2)
I think maybe Snowflake doesn't support this syntax?
3
select "Year", "Records", "Organization type","Method"
iff("Entity" like '%nknown', NULL,"Entity")
from data_breaches
Returned this error: Syntax error: unexpected '('. (line 2)
Using ILIKE and CASE expression to handle Unknown inside column:
SELECT CASE WHEN NOT Entity ILIKE '%Unknown%' THEN Entity END AS Entity
FROM data_breaches;

Remove/delete values in a column SQL

I am very new to using SQL and require help.
I have a table containing comma in the values
+-------------------+
| Sample |
+-------------------+
| sdferewr,yyuyuy |
| q45345,ty67rt |
| wererert,rtyrtytr |
| werr,ytuytu |
+-------------------+
I would want to delete/remove the values after the comma(,) and keep only those values before it.
Output required.
+----------+
| Sample |
+----------+
| sdferewr |
| q45345 |
| wererert |
| werr |
+----------+
How would I be able to do this in SQL? please help
Assuming that the table name is "TABLE_NAME" and the field name is "sample". Then
update TABLE_NAME set sample=SUBSTRING_INDEX(`sample`, ',', 1)
The most simple way to do that is
UPDATE table_name
SET column = substring(column for position('',' in column))
WHERE condition;
position(',' in column) will return the position of the comma and substring(column for n) returns the first n characters

Query to show all column, table and schema names together in IMPALA

I want to get metadata of impala db in one query. Probably It will be like
SELECT columnname,tablename,schemaname from SYSTEM.INFO
Is there a way to do that? and I dont want to fetch only current tables columns for example;
SHOW COLUMN STATS db.table_name
This query is not answer of my question. I want to select all metadata in one query.
From impala-shell you have commands like:
describe table_name
describe formatted table_name
describe database_name
EXPLAIN { select_query | ctas_stmt | insert_stmt }
and the SHOW Statement that is a flexible way to get information about different types of Impala objects. You can follow this link to the Impala documentation SHOW statement.
On the other hand, information about the schema objects is held in the metastore database. This database is shared between Impala and Hive.
In particular, Impala keeps its table definitions in a traditional MySQL or PostgreSQL database known as the metastore, the same database where Hive keeps this type of data. Thus, Impala can access tables defined or loaded by Hive, as long as all columns use Impala-supported data types, file formats, and compression codecs.
If you want to query this information in one shot you would have to query to MySQL, PostgreSQL, Oracle, or etc, it's depending on your particular case.
For example, in my case Impala keeps metadata in MySQL.
use metastore;
-- Database changed
SHOW tables;
+---------------------------+
| Tables_in_metastore |
+---------------------------+
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| COMPACTION_QUEUE |
| COMPLETED_TXN_COMPONENTS |
| DATABASE_PARAMS |
| DBS |
.......
........
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TYPES |
| TYPE_FIELDS |
| VERSION |
+---------------------------+
54 rows in set (0.00 sec)
SELECT * FROM VERSION;
+--------+----------------+----------------------------+-------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT | SCHEMA_VERSION_V2 |
+--------+----------------+----------------------------+-------------------+
| 1 | 1.1.0 | Hive release version 1.1.0 | 1.1.0-cdh5.12.0 |
+--------+----------------+----------------------------+-------------------+
1 row in set (0.00 sec)
Hope this helps.

Why am I getting empty result by using regex in sql?

There is a column named as keyword of the product table.
+---------+
| keyword |
+---------+
| dump |
| dump2 |
| dump4 |
| dump5 |
| pro |
+---------+
I am fetching those results from product table by using regex whose keyword containing the string du anywhere.
I used select * from products where keyword LIKE '%[du]%';
but it is returning empty set.
What am I doing wrong here ?
If you must use regex, you can just use du as the regex; that will match the string du anywhere in the keyword:
SELECT *
FROM products
WHERE keyword REGEXP 'du'
Output:
keyword
dump
dump2
dump4
dump5
Demo on dbfiddle

How to get part of the String before last delimiter in AWS Athena

Suppose I have the following table in AWS Athena
+----------------+
| Thread |
+----------------+
| poll-23 |
| poll-34 |
| pool-thread-24 |
| spartan.error |
+----------------+
I need to extract the part of the string from columns before last delimiter(Here '-' is delimiter)
Basically need a query which can give me output as
+----------------+
| Thread |
+----------------+
| poll |
| poll |
| pool-thread |
| spartan.error |
+----------------+
Also i need a group by query which ca generate this
+---------------+-------+
| Thread | Count |
+---------------+-------+
| poll | 2 |
| pool-thread | 1 |
| spartan.error | 1 |
+---------------+-------+
I tried various forms of MySql queries using LEFT(), RIGHT(), LOCATE(), SUBSTRING_INDEX() functions but it seems that athena does not support all these functions.
You could use regexp_replace() to remove the part of the string that follows the last '-':
select regexp_replace(thread, '-[^-]*$', ''), count(*)
from mytable
group by regexp_replace(thread, '-[^-]*$', '')