Can hdfs blocks be split into further blocks? - hive

Suppose that I have a hive table ride_w_partition with columns as:
ride_id string
rideable_type string
started_at string
ended_at string
start_station_name string
start_station_id string
end_station_name string
end_station_id string
start_lat string
start_lng string
end_lat string
end_lng string
member_casual string
rideable_type can be 'casual','member' or 'registered'. So, if I partition on this column I will get three partitions.
Now I partition this table based on rideable_type and load data into it. The data size is 134MB.
So, two blocks are needed to store this data in a non-partitioned table if the block size is 128MB.
But, if the table is partitioned, then is each block going to be split into 3 blocks so that the total number of blocks after partitioning is 6 blocks?
If the blocks are split further, then is there data shuffling also so that a block only contains data of a particular partition?

Related

CSV upload into BigQuery partitioned table using a string field as partition

I need to upload csv file into bigquery table. My question is if "datestamp_column" is STRING how I can use it as partition field?
An example value from "datestamp_column": 2022-11-25T12:56:48.926500Z
def upload_to_bq():
client = bigquery.Client())
job_config = bigquery.LoadJobConfig(
schema = client.schema_from_json("schemaa.json"),
skip_leading_rows=1,
time_partitioning=bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="datestamp_column",
expiration_ms=7776000000, # 90 days.
),
)
This is failing as it is complaining datestamp_column is STRING and should be TIMESTAMP, DATE or DATETIME
To be able using the partition on the datestamp_column field, you have to use TIMESTAMP, DATE or DATETIME.
The format you indicated for this field corresponds to a timestamp : datestamp_column: 2022-11-25T12:56:48.926500Z
In your BigQuery schema, you have to change the column type from STRING to TIMESTAMP for datestamp_column field.
Then your ingestion should work correctly because the timestamp format 2022-11-25T12:56:48.926500Z should be ingested as TIMESTAMP in BigQuery.
input -> STRING with value 2022-11-25T12:56:48.926500Z
result column in BigQuery is TIMESTAMP

Date and time string to Datetime Timestamp Type in Spark scala dataframe

I have a date and time field in Spark data frame. Both are String type columns.
Want to combine both dates and times and later want to store it in Hive column with Timestamp Type.
eg: Date as 2018-09-10 and time as 140554(HHMMSS) or sometimes 95200(HMMSS)
Now When I use concat(col("datefield"),lapd(col("timefield"),6,"0)), I get the result as String
2018-09-10140554, but later when I convert it to Timestamp type it is not working properly.
Tried the below steps, but did not get the correct result. I just want to combine both dates and time string columns and make it a timestamp type.
to_timestamp with format YYYY-MM-ddHHMMSS
from_unixtimestamp(unix_timestamp)

How to convert unix/epoch timestamp into date string in Apache Phoenix SQL

We have created Phoenix views on top of Hbase tables and querying the data. One of the the columns holds epoch timestamp data and we need to convert it into a valid date format, couldn't find any appropriate functions, any help much appreciated.
If type of "the column holds epoch timestamp data" is INTEGER or BIGINT, you can use:
CAST("epoch_time" AS TIMESTAMP)
if its type is VARCHAR, you should first convert value to number through TO_NUMBER()
built-in function, i.e.
CAST(TO_NUMBER("epoch_time") AS TIMESTAMP)

Extract date from String column in sql

I have a column as current_data and it has data which is of type string and data as {"settlement_date":"2018-07-21"}.
My question is that for each trade the settlement date will be diffrent and i want to extract the date i.e 2018-07-21 from the column current_data for each trade. I tried using select to_char(date_trunc(d.current_data,'YYYY-MM-DD')) as "Current_date" also i have tried the trim fuction but it does not work
It looks like JSON data. Since you're saying it's a text column internally you could use substring function to cut only the data you're looking for.
select substring(current_data from 21 for 10) from yourtable
You start taking the substring from 21 character and specify that it's length will be the next 10 characters.
With your sample data the result would be
db=# select substring('{"settlement_date":"2018-07-21"}' from 21 for 10);
substring
------------
2018-07-21
Beware though that this solution relies on length of the string and is designed for static input where the extracted substring is always within the same position.

Is it possible to convert a column from string to datetime type in a SQL statement?

I need to query SQLite datebase table using the following SQL Statement.
SELECT *
FROM Alarms
WHERE ALARMSTATE IN (0,1,2)
AND ALARMPRIORITY IN (0,1,2,3,4)
AND ALARMGROUP IN (0,1,2,3,4,5,6,7)
AND DateTime(ALARMTIME) BETWEEN datetime("2012-08-02 00:00:00")
AND datetime("2012-08-03 00:00:00")
ORDER BY ALARMTIME DESC
ALARMTIME is of TEXT datatype.
ALARMTIME is displayed in the datagridview as follow "08/03/2012 11:52 AM". Can you use that format for checking like DateTime(ALARMTIME)?
The only problem have with this SQL Statement is that it always returns zero dataset or records. However, SQLite doesn't complain about the syntax.
Strictly speaking, SQLite doesn't have datetime types for its columns:
SQLite does not have a storage class set aside for storing dates
and/or times. Instead, the built-in Date And Time Functions of SQLite
are capable of storing dates and times as TEXT, REAL, or INTEGER
values
The problem here is that the string you're using condition isn't a valid date/time, so it's treated as null:
sqlite> SELECT datetime("2012-08-3 00:00:00");
sqlite> SELECT datetime("2012-08-03 00:00:00");
2012-08-03 00:00:00
(Note 2012-08-03 instead of 2012-08-3.)
In addition, make sure that the values in your ALARMTIME are correctly formatted too.