About sum using AWS Athena - sql

I have an table, in aws athena.
CREATE EXTERNAL TABLE s3inventory(
bucket string,
key string,
version_id string,
is_latest boolean,
is_delete_marker boolean,
size string,
last_modified_date string,
e_tag string,
storage_class string,
is_multipart_uploaded boolean,
replication_status string,
encryption_status string,
object_lock_retain_until_date string,
object_lock_mode string,
object_lock_legal_hold_status string,
intelligent_tiering_access_tier string,
bucket_key_status string,
checksum_algorithm string
) PARTITIONED BY (
dt string
And I need to sum the field size, but even changing the field still the error that doesn't convert , they said that cant sum the number.
SYNTAX_ERROR: line 1:8: Unexpected parameters (varchar) for function sum. Expected: sum(double) , sum(real) , sum(bigint) , sum(interval day to second) , sum(interval year to month) , sum(decimal(p,s))

Related

Bigquery: how to create table to accept bytes and string on same column?

i have one issue in bigquery.
Problem statement:
we have a json file which we have to upload to bigquery table.
When it is loaded , a table will be created.
TABLE DDL:
CREATE TABLE vb_reports
(
records ARRAY<STRUCT<rightData BYTES, mismatchEnd INT64, mismatchStart INT64, fields ARRAY, rightRecLength INT64, leftRecLength INT64, segments ARRAY<STRUCT<leftData BYTES, diffDesc STRING, endIndex INT64, startIndex INT64, rightData BYTES, segmentName STRING, segmentID STRING>>, leftData BYTES, recordNumber INT64>>,
iterationID INT64,
jobSummary STRING,
jobName STRING
)
The rightData and leftData columns are created as bytes type column.
But when we load data from json file, these columns are interpreted as Strings and sometimes as Bytes.
When the BigQuery Interprets these fields as String, then we get below error:
Provided Schema does not match Table vb_reports Field records.segments.leftData has changed type from BYTES to STRING
we are loading the json file from Bigquery consoles.
The problem is even though we have byte data , bigQuery Interprets it as String.
Please help us to create a column (leftData & rightData) with Bytes type which Accepts both Bytes & string Data Types.
Thanks in Advance.
hayyal

Data insert issue

So i had this problem when adding a CSV file to my HQL code and run it on HDFS.
i found that when inserting data it get Nulls in partition parts and some columns gets delete i tried many different ways to insert data but still i gets this weird symbols and lost columns it is like that it cant read the CSV file ,
here is a Pic
enter image description here and here is the code`
CREATE database covid_db;
use covid_db;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=500;
set hive.exec.max.dynamic.partitions.pernode=500;
CREATE TABLE IF NOT EXISTS covid_db.covid_staging
(
Country STRING,
Total_Cases DOUBLE,
New_Cases DOUBLE,
Total_Deaths DOUBLE,
New_Deaths DOUBLE,
Total_Recovered DOUBLE,
Active_Cases DOUBLE,
Serious DOUBLE,
Tot_Cases DOUBLE,
Deaths DOUBLE,
Total_Tests DOUBLE,
Tests DOUBLE,
CASES_per_Test DOUBLE,
Death_in_Closed_Cases STRING,
Rank_by_Testing_rate DOUBLE,
Rank_by_Death_rate DOUBLE,
Rank_by_Cases_rate DOUBLE,
Rank_by_Death_of_Closed_Cases DOUBLE
)
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_HDFS_LZ'
tblproperties ("skip.header.line.count"="1", "serialization.null.format" = "''");
CREATE EXTERNAL TABLE IF NOT EXISTS covid_db.covid_ds_partitioned
(
Country STRING,
Total_Cases DOUBLE,
New_Cases DOUBLE,
Total_Deaths DOUBLE,
New_Deaths DOUBLE,
Total_Recovered DOUBLE,
Active_Cases DOUBLE,
Serious DOUBLE,
Tot_Cases DOUBLE,
Deaths DOUBLE,
Total_Tests DOUBLE,
Tests DOUBLE,
CASES_per_Test DOUBLE,
Death_in_Closed_Cases STRING,
Rank_by_Testing_rate DOUBLE,
Rank_by_Death_rate DOUBLE,
Rank_by_Cases_rate DOUBLE,
Rank_by_Death_of_Closed_Cases DOUBLE
)
PARTITIONED BY (COUNTRY_NAME STRING)
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_HDFS_PARTITIONED';
FROM
covid_db.covid_staging
INSERT INTO TABLE covid_db.covid_ds_partitioned PARTITION(COUNTRY_NAME)
SELECT *,Country WHERE Country is not null;
CREATE EXTERNAL TABLE covid_db.covid_final_output
(
TOP_DEATH STRING,
TOP_TEST STRING
)
PARTITIONED BY (COUNTRY_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_FINAL_OUTPUT';
`
1st: You are checking file contents, and partition column is not stored in the file, it is stored in the metadata. Also dynamically created partition are directories in the format key=value. So, the last column you see in the file is not the partition column, it is Rank_by_Death_of_Closed_Cases.
2nd: You did not specify delimiter in second table DDL as well as NULL format. The default delimiter is '\001' (Ctrl-A). You can specify delimiter, for example TAB (\t) and desired NULL:
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
STORED AS TEXTFILE;
But better do not redefine NULL format if you want to be able to distinguish NULLs and empty strings.

incompatible table partitioning specification when using bigquery CLI for an array table

I have the following SQL:
#standardSql
CREATE OR REPLACE TABLE batch_report
(
date DATE,
memberId STRING OPTIONS(description="xxxx member ID"),
variables ARRAY<STRUCT<
id STRING,
datatype STRING,
effectiveDate TIMESTAMP,
values ARRAY<STRUCT<
id STRING,
value STRING
>
>,
isSensitive BOOLEAN,
name STRING
>
>
)
PARTITION BY date
OPTIONS (
partition_expiration_days=62, -- two months
description="Stores the raw response from the xxxx batch endpoint"
)
when running this via the CLI using bq query --dataset=dev < create_batch_report.sql it will give me the following error message:
Incompatible table partitioning specification. Expected partitioning specification none, but input partitioning specification is
interval(type:day,field:date)
However, when running it in the BigQuery console and supplying the dataset name in the CREATE OR REPLACE TABLE statement, it will execute correctly. Is this a bug, if so how do I get around it?
When running via the CLI, I modified the first line to include the dataset rather than passing it in using the dataset flag. This caused it to execute correctly. I've modified the SQL to be:
#standardSql
CREATE OR REPLACE TABLE {ENVIRONMENT}.batch_report
(
date DATE,
memberId STRING OPTIONS(description="xxxx member ID"),
variables ARRAY<STRUCT<
id STRING,
datatype STRING,
effectiveDate TIMESTAMP,
values ARRAY<STRUCT<
id STRING,
value STRING
>
>,
isSensitive BOOLEAN,
name STRING
>
>
)
PARTITION BY date
OPTIONS (
partition_expiration_days=62, -- two months
description="Stores the raw response from the xxxx batch endpoint"
)
and execute it via CLI with:
sed s/"{ENVIRONMENT}"/${ENVIRONMENT}/g create_batch_report.sql | \
bq query

Error: No matching signature for operator = for argument types: STRUCT<id STRING, name STRING>, STRING. Supported signatures: ANY = ANY at [4:7]

I added a public database that uses standard sql. It appears as follows:
I add #standardsql in addition I changed that from the settings. The query looks like:
#standardsql
SELECT field1,field2
FROM `censys-io.domain_public.current`
WHERE filed3 = "some_string_here";
I get this error:
Error: No matching signature for operator = for argument types: STRUCT<id STRING, name STRING>, STRING. Supported signatures: ANY = ANY at [4:7]
Can you please tell me the reason and how to fix the issue
Inspecting the error
Error: No matching signature for operator = for argument types: STRUCT<id STRING, name STRING>, STRING. Supported signatures: ANY = ANY at [4:7]
tells us that your line
WHERE filed3 = "some_string_here";
has an incorrect comparison. The left side has STRUCT<id STRING, name STRING> which makes it seem like filed3 is either a struct field or a table on its own. Comparing this with the string "some_string_here" is therefore not valid.

Parse error while creating External Table

I am new to Hadoop. I am trying to create an EXTERNAL table in Hive.
The following is the query I am using:
CREATE EXTERNAL TABLE stocks (
exchange STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 'hdfs:///data/stocks'
I am getting an error:
' ParseException cannot recognize input near 'exchange' 'STRING' ',' in column specification.
What am I missing? I tried reading the command - I don't think I am missing anything.
Because exchange is a keyword in hive, so you can't use exchange to be your column name. If you want to use it just add backticks around exchange
Exchange is reserved keyword in Hive So try to use different keyword in place of that-
Create table Stocks (exchange1 String, stock_symbol String, stock_date String, stock_price_open double, stock_price_high double, stock_price_low do
uble, stock_price_close double, stock_volume double, stock_price_adj_close double) row format delimited fields terminated by ",";