I had a client upload a malformed table with a name like foo.bar into an Athena instance. What syntax can I use to drop the table? If I try
drop table if exists `foo.bar`
The command silently fails, presumably because the parser interprets foo as the database name. If I try adding the database name explicitly as
drop table if exists dbname."foo.bar"
or
drop table if exists dbname.`foo.bar`
I get a parse error from Athena.
Unfortunately, I don't have access to the Glue console to remove the table from there so I was wondering if it's possible to drop such a table via Athena SQL. Thanks!
Even if you don't have access to the Glue console you can use the the AWS CLI to delete the table directly using the Glue API:
aws glue delete-table --database-name dbname --name foo.bar
Related
I have a delta table schema that needs new columns/changed data types (Usually I do this on non delta tables and those work fine)
I have already dropped the existing delta table and tried dropping the schema and getting a 'v1 session catalog' error.
I am currently using SQL, 10.4 LTS cluster, spark3.2.1, scala 2.12 (I cant change these computes), driver and workers are standard E_v4
What I already did, and worked as usual
drop table if exists dbname.tablename;
What I wanted to do next:
drop schema if exists dbname.tablename;
The error I got instead:
Error in SQL statement: AnalysisException: Nested databases are not supported by v1 session catalog: dbname.tablename
When I try recreating the schema in the same location I get the error:
AnalysisException: The specified schema does not match the existing schema at dbfs:locationOfMy/table
... Differences
-Specified schema has additional fields newColNameIAdded, anotherNewColIAdded
-Specified type for myOldCol is different from existing schema ...
If your intention is to keep the existing schema, you can omit the
schema from the create table command. Otherwise please ensure that
the schema matches.
How can I do the schema drop and re-register it in same location and same name with new definitions?
Answering a month later since I didnt get replies and found the right solution;
Delta files have left over partitions and logs that cannot be updated using the drop commands. I had to manually delete the logs depending on where my location was.
Try this:
dbutils.fs.rm(path, True)
Use the path of your schema.
Then create your table again.
I'm running the following in my Synapse pyspark notebook to create a database and table:
%%sql
CREATE DATABASE IF NOT EXISTS Database1 LOCATION '/Database1';
CREATE TABLE IF NOT EXISTS Database1.Table1(Column1 int) USING CSV OPTIONS (header=true);
Then I see 'database1' and 'table1' in the studio ui. BTW, I see 'Database1/table1' as the folder names in ADLS.
Is there a way to preserve the case for both of these?
running this seems to do the trick: spark.conf.set('spark.sql.caseSensitive', True)
I have a table in Athena created from S3. I wanted to update the column values using the update table command. Is the UPDATE Table command not supported in Athena?
Is there any other way to update the table ?
Thanks
Athena only supports External Tables, which are tables created on top of some data on S3. Since the S3 objects are immutable, there is no concept of UPDATE in Athena. What you can do is create a new table using CTAS or a view with the operation performed there, or maybe use Python to read the data from S3, then manipulate it and overwrite it.
I loaded json file in s3 location ,in which a key starts with numeric (3party_count).I created table in aws Athena on top of this location by using crawler n aws glue.so column names has be created named 3party_count
But I couldn't do select query using this column ?
Error -invalid request exception
Can anyone help me on this ?
use double quotes
CREATE OR REPLACE VIEW "123view" AS
SELECT column_name1, column_name2
FROM "234table"
Names for Tables, Databases, and Columns AWS Athena
We have streaming applications storing data on S3. The S3 partitions might have duplicated records. We query the data in S3 through Athena.
Is there a way to remove duplicates from S3 files so that we don't get them while querying from Athena?
You can write a small bash script that executes a hive/spark/presto query for reading the dat, removing the duplicates and then writing it back to S3.
I don't use Athena but since it is just presto then I will assume you can do whatever can be done in Presto.
The bash script does the following :
Read the data and apply a distinct filter (or whatever logic you want to apply) and then insert it to another location.
For Example :
CREATE TABLE mydb.newTable AS
SELECT DISTINCT *
FROM hive.schema.myTable
If it is a recurring task, then INSER OVERWRITE would be better.
Don't forget to set the location of the hive db to easily identify the data destination.
Syntax Reference : https://prestodb.io/docs/current/sql/create-table.html
Remove the old data directory using aws s3 CLI command.
Move the new data to the old directory
Now you can safely read the same table but the records would be distinct.
Please use CTAS:
CREATE TABLE new_table
WITH (
format = 'Parquet',
parquet_compression = 'SNAPPY')
AS SELECT DISTINCT *
FROM old_table;
Reference: https://docs.aws.amazon.com/athena/latest/ug/ctas-examples.html
We can not remove duplicate in Athena as it works on file it have work arrounds.
So some how duplicate record should be deleted from files in s3, most easy way would be shellscript.
Or
Write select query with distinct option.
Note: Both are costly operations.
Using Athena can make EXTERNAL TABLE on data stored in S3. If you want to modify existing data then use HIVE.
Create a table in hive.
INSERT OVERWRITE TABLE new_table_name SELECT DISTINCT * FROM old_table;